CN110162704B

CN110162704B - Multi-scale key user extraction method based on multi-factor genetic algorithm

Info

Publication number: CN110162704B
Application number: CN201910421711.9A
Authority: CN
Inventors: 刘静; 任珍妮
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2019-05-21
Filing date: 2019-05-21
Publication date: 2022-06-10
Anticipated expiration: 2039-05-21
Also published as: CN110162704A

Abstract

The invention discloses a multi-scale key user extraction method based on a multi-factor genetic algorithm, which solves the technical problem of simultaneous extraction of key users of different scales in a recommendation system and mainly comprises the following steps: generating a candidate sequence set, and calculating the average absolute error of a plurality of key user extraction tasks of the candidate sequence set; calculating a capability factor and a scalar fitness value of the parent candidate sequence; performing matching genetic operation on the parent candidate sequence set; performing selective average absolute error updating on the filial generation candidate sequence set; and outputting a plurality of key user sets with different scales. The method represents the optimized population in the multi-factor genetic algorithm by the key user candidate sequence set, calculates the capability factor and the scalar fitness value of each parent candidate sequence, performs matching genetic operation on the parent candidate sequence set, and selectively updates the offspring candidate sequence set. The recommendation accuracy is high, and the extraction efficiency is high. The method can be used for extracting the network key information.

Description

Multi-scale key user extraction method based on multi-factor genetic algorithm

Technical Field

The invention belongs to the technical field of computers and networks, and further relates to attention information extraction, in particular to a multi-scale key user extraction method based on a multi-factor genetic algorithm. The method can be used for simultaneously extracting the key users of the recommendation system in different scales, and the extracted key users of the system in different scales and the information carried by the key users are used for completing the recommendation process, so that accurate and efficient recommendation is provided for the target users of the system.

Background

Recommendation systems are tools and data mining techniques that help users quickly discover desired items and useful information. Recommendation systems are able to provide useful suggestions in a variety of decision-making applications. A group of key users which are objectively and reliably carried and beneficial to the recommendation process exist in the recommendation system, the key users in the system are extracted, the recommendation process is completed by using information carried by the key users of the system, and favorite articles can be recommended to a target user of the system. The target user of the system is the object which is recommended in the recommendation system. At present, the existing extraction method of the key users of the recommendation system mainly comprises a greedy algorithm and an evolutionary algorithm which take accuracy as a target.

Caihong Mu et al, in its published article "Information core optimization using evolution Algorithm with Elite Population in recommendation systems" (Proceedings of the 2017IEEE consistency on evolution calculation, article numbers: 441-462, 2017), propose a method for extracting key users of recommendation system based on the evolution Algorithm of Elite reservation policy. The method comprises the following implementation steps: step 1, constructing a system user item scoring matrix; step 2, initializing a parent individual population, and calculating the fitness of all individuals; step 3, sequencing all individuals according to the fitness according to an M elite strategy, and performing sequential cross operation on all individuals to obtain system key users; and 4, finishing the recommendation process by using the extracted system key users and giving a recommendation result. The method has the defects that when the system key users extracted by the method are used, only key users of one scale can be obtained in the same evolution process, and the efficiency of extracting the key users is low.

The patent document "two-part graph recommendation method based on key users and time contexts" (application number: 201711190064.2, application publication number: CN 108038746A) applied by the university of river and sea discloses a personalized recommendation method based on key users and time contexts. The method comprises the following implementation steps: step 1, collecting behavior feedback data of all users on an article; step 2, measuring the transaction experience degree and the scoring accuracy of the user according to the scoring quantity weight and the scoring sample standard deviation of the commodity of the user, determining the authority degree of each user, and extracting key users according to the authority degree of the user; step 3, constructing an interest preference neighbor set of each user; step 4, performing resource diffusion in the cut bipartite graph, and introducing a time context in the diffusion process of the second step; and 5, recommending the first N items which have the most resources and are not purchased by the user in the step 4 for the target user, wherein N is the number of the items recommended by the target user. The recommendation method has the defects that the authority of the specified user is used as a standard for extracting key users, the standard is formulated according to actual engineering experience, and the accuracy of the recommendation result obtained in actual application is not high.

The existing extraction method for the key users of the recommendation system has the defects of low accuracy and low extraction efficiency.

Disclosure of Invention

The invention aims to provide a method for extracting multi-scale key users based on a multi-factor genetic algorithm, which has high accuracy and high extraction efficiency, aiming at the defects of the prior art.

The invention relates to a multi-scale key user extraction method based on a multi-factor genetic algorithm, which is characterized by comprising the following steps of:

(1) acquiring data and dividing the data into a training set, a verification set and a test set: intercepting the grading data of all users including key users of different scales on the articles from the internet as basic data, and dividing the basic data into a training set, a verification set and a test set according to the proportion of 60%, 20% and 20%;

(2) generating a candidate sequence set; inputting k tasks, wherein each task is an extraction task of a recommendation system key user, and the extraction scale of the key user corresponding to each key user extraction task is p₁,p₂,…,p_kK is the number of the multi-scale key user extraction tasks of the recommendation system and is also the number of the key user extraction scales; based on a multi-factor genetic algorithm coding method, performing descending order arrangement on the extraction scales of k key users, selecting the extraction scale of the first key user as the length q of a candidate sequence, and respectively randomly generating parent candidate sequences with N dimensions of q and child candidate sequences with N dimensions of q, wherein the random value of each dimension is a random number between 0 and1, wherein N is the number of the parent candidate sequences and is also the number of the child candidate sequences; taking a set of N parent candidate sequences and N child candidate sequences as 2N intermediate candidate sequences, and mapping the generated parent, child and intermediate candidate sequence sets to obtain a key user sequence set containing multi-scale key users;

(3) calculating the average absolute error of k key user extraction tasks of each parent candidate sequence: for eachThe top p of the parent candidate sequence is mapped according to a mapping formula_jBit as the key user sequence obtained from the jth key user extraction task, where p_jExtracting the scale of key users of a j-th multi-scale key user extraction task to obtain a key user sequence of each parent candidate sequence on k key user extraction tasks; for the key user sequences of each parent candidate sequence on k key user extraction tasks, completing a recommendation process by using a key user-based collaborative filtering recommendation algorithm to obtain a prediction score of a user in a basic data verification set, and calculating an average absolute error of each parent candidate sequence on each key user extraction task according to an average absolute error calculation formula;

(4) computing a power factor and a scalar fitness value for each parent candidate sequence: for each key user extraction task, respectively sequencing the average absolute errors of all parent candidate sequences on each key user extraction task in an ascending manner to obtain the sequencing serial numbers of each parent candidate sequence on k key user extraction tasks, and calculating the capability factor and the scalar fitness value of each parent candidate sequence by using the obtained sequencing serial numbers;

(5) input termination threshold: the termination threshold value is set through experience according to actual engineering;

(6) performing matching genetic operation on the parent candidate sequence set, and updating the child candidate sequence set: based on a multi-factor genetic algorithm, two parent candidate sequences are selected from a parent candidate sequence set in a successive random manner, matching inheritance is carried out according to random mating probability and capability factors of the two parent candidate sequences, uniform crossing operation is carried out on the two parent candidate sequences with the same capability factor to obtain two offspring candidate sequences, and basic bit variation operation is respectively carried out on the two parent candidate sequences with different capability factors to obtain two offspring candidate sequences;

(7) selectively updating the mean absolute error of the k key user extraction tasks for each child candidate sequence: randomly selecting a key user extraction task corresponding to the capacity factor of any one parent candidate sequence for a child candidate sequence obtained by uniformly crossing two parent candidate sequences, calculating the average absolute error of the key user extraction task corresponding to the capacity factor of the unique parent candidate sequence for the child candidate sequence obtained by standard bit variation of the unique parent candidate sequence, and updating the average absolute error of the key user extraction task of which each child candidate sequence is not calculated to be 100000;

(8) merging the parent candidate sequence set and the child candidate sequence set to obtain an intermediate candidate sequence set;

(9) calculating and updating the capability factor and scalar fitness value for each intermediate candidate sequence: aiming at each key user extraction task, respectively sequencing the average absolute errors of all intermediate candidate sequences on each key user extraction task in an ascending order to obtain the sequencing serial numbers of all intermediate candidate sequences on k key user extraction tasks, and calculating and updating the capability factor and the scalar fitness value of each intermediate candidate sequence by using the obtained sequencing serial numbers;

(10) updating the parent candidate sequence set: arranging scalar fitness values of the 2N intermediate candidate sequences in a descending order, and selecting the first N intermediate candidate sequences as an updated parent candidate sequence set;

(11) calculating the minimum average absolute error sum, and judging whether the minimum average absolute error sum is smaller than a termination threshold value: arranging all the parent candidate sequences in a descending order according to the scalar fitness value, and calculating the average absolute error sum of k key user extraction tasks of the first parent candidate sequence as the minimum average absolute error sum of the multi-scale key user extraction tasks; judging whether the minimum average absolute error sum is smaller than a termination threshold value, if so, executing the step (12), otherwise, executing the step (6), and entering a parent candidate sequence updating cycle;

(12) outputting multi-scale key user extraction results: aiming at the j key user extraction task, sorting N parent candidate sequences on the key user extraction task according to the ascending order of average absolute errors, and utilizing a mapping formula to sort the top p of the first parent candidate sequence_jConverting the dimension into the key user extraction result of the jth key user extraction task, and sequentially obtaining and outputting k key users by the same methodExtracting the key user extraction result of the user extraction task to complete the multi-scale key user extraction process;

(13) testing the performance of the extracted k key user sets with different scales: and aiming at the obtained key user extraction results of the k key user extraction tasks, respectively completing a recommendation process based on a collaborative filtering recommendation algorithm of the key users to obtain the prediction scores of the k key user extraction tasks in the test set users, and calculating the average absolute error obtained by the k key user extraction tasks according to an average absolute error calculation formula.

The method comprises the steps of representing an optimized population in a multi-factor genetic algorithm by using a key user candidate sequence set in the invention, calculating a capability factor and a scalar fitness value of each parent candidate sequence, carrying out matching genetic operation on each parent candidate sequence, and selectively updating a descendant candidate sequence set to obtain key user sets with different scales. The method can be used for simultaneously extracting the key users of the recommendation system in different scales, and the extracted key users of the system in different scales and the information carried by the key users are used for completing the recommendation process, so that accurate and efficient recommendation is provided for the target users of the system.

Compared with the prior art, the invention has the following advantages:

the extraction efficiency of multi-scale key users is high: because the invention is based on the multi-factor genetic algorithm, the optimized population in the multi-factor genetic algorithm is represented by the candidate key user sequence set in the invention, and the capability factor and the scalar fitness value of each parent candidate sequence are calculated, the problems that the recommendation accuracy is taken as an optimization target in the prior art, only key users in a single scale can be extracted in a single optimization process, and the extraction efficiency is low are solved, so that the invention can obtain key user sets in different scales in the same optimization process, improves the extraction efficiency of key users, and not only can provide accurate recommendation, but also can accelerate the online recommendation speed of a recommendation system when the key users in different scales are put into the recommendation process.

The accuracy rate of the extraction result of the multi-scale key user is high: the method carries out matching genetic operation on the parent candidate sequence set, selectively updates the child candidate sequence set to obtain the key user sets with different scales, and overcomes the problem that the accuracy of the obtained recommendation result is low due to the fact that the criteria selected by key users is specified as the authority of the users in the prior art and is established according to actual engineering experience, so that the key users with different scales selected by the method have higher accuracy in the recommendation process.

The computational complexity is low: and the offspring candidate sequence set is selectively updated, so that unnecessary calculation is reduced, and the calculation complexity is reduced overall.

Drawings

FIG. 1 is a block flow diagram of the present invention;

FIG. 2 is a diagram of simulation results of the method of the present invention and the comparison method on the basis data MovieLens-100 k.

The specific implementation mode is as follows:

the present invention is described in detail below with reference to the attached drawings.

Example 1

With the development of modern science and technology and the internet, people are exposed to more and more information in production and life. The information era brings convenience to people and also brings the problems of information overload and information explosion. The recommendation system is an effective means for solving the information overload problem, is a tool for helping a user to quickly find needed articles and useful information and a data mining technology, and is an important branch of modern industry. The recommendation system can help a user to quickly and accurately find the needs in a large amount of data or information, and can also help a merchant to provide more display opportunities for some long-tailed articles, so that the conversion efficiency of commodities is improved. The recommendation system can provide useful recommendations in a variety of decision-making applications. The recommendation system helps a user to make reasonable explanation from numerous and complicated big data under the condition of unclear purpose, and is essentially an information filtering method. Recommendation systems have been widely used in various fields including e-commerce, audio and video, books, fashion catering, and the like. In the recommendation system, the user can score the items, and the scoring data of the items by the user reflects the interest degree of the user in the items, namely the preference of the user.

Each recommendation system comprises a group of users carrying most reliable and objective information in the system and beneficial to recommendation results, the users are called as key users of the recommendation system, the key users and the information carried by the key users are put into a recommendation process, satisfactory recommendation results can be obtained, the recommendation time is greatly reduced, and the real-time recommendation efficiency of the recommendation system is improved. However, the existing key user extraction method has the problems of low accuracy and low extraction efficiency, and the invention develops research aiming at the defects of the existing method in the aspects of accuracy and extraction efficiency, and provides a multi-scale key user extraction method based on a multi-factor genetic algorithm, which is shown in figure 1 and comprises the following steps:

(1) acquiring data and dividing the data into a training set, a verification set and a test set: and intercepting the grading data of all users containing key users of different scales on the article from the network as basic data, and dividing the basic data into a training set, a verification set and a test set according to the proportion of 60%, 20% and 20%.

For ease of understanding, from the perspective of the recommendation system and method of recommendation, it may also be described as inputting user item rating data: obtaining the scoring data of the n articles by the m users, and dividing the scoring data into a training set, a verification set and a test set according to the proportion of 60%, 20% and 20%. The scoring data of the goods by the user can come from actual network information in information engineering and data engineering.

(2) Generating a candidate sequence set; inputting k tasks, wherein each task is an extraction task of a recommendation system key user, and the extraction scale of the key user corresponding to each key user extraction task is p₁,p₂,…,p_kAnd k is the number of the multi-scale key user extraction tasks of the recommendation system and is also the number of the key user extraction scales. Based on a multi-factor genetic algorithm coding method, k key user extraction scales are arranged in a descending order, the first key user extraction scale is selected as the length q of a candidate sequence, N q-dimensional parent candidate sequences and N q-dimensional child candidate sequences are respectively generated randomly, each dimension takes the random value as a random number between 0 and1, wherein N is the number of the parent candidate sequences and the number of the child candidate sequences. And taking a set of N parent candidate sequences and N child candidate sequences as 2N intermediate candidate sequences, and mapping the generated parent, child and intermediate candidate sequence sets to obtain a key user sequence set containing the multi-scale key users.

(3) Calculating the average absolute error of k key user extraction tasks of each parent candidate sequence: aiming at each parent candidate sequence, according to a mapping formula, the top p of the parent candidate sequence is_jBit as the key user sequence obtained from the jth key user extraction task, where p_jAnd extracting the scale of the key users of the j-th multi-scale key user extraction task to obtain the key user sequence of each parent candidate sequence on the k key user extraction tasks. And aiming at the key user sequences of each parent candidate sequence on the k key user extraction tasks, completing a recommendation process by using a key user-based collaborative filtering recommendation algorithm to obtain the prediction scores of the users in the basic data verification set. And calculating the average absolute error of each parent candidate sequence on each key user extraction task according to an average absolute error calculation formula.

(4) Computing a power factor and a scalar fitness value for each parent candidate sequence: and aiming at each key user extraction task, respectively sequencing the average absolute errors of all the parent candidate sequences on each key user extraction task in an ascending manner to obtain the sequencing serial numbers of each parent candidate sequence on k key user extraction tasks, and calculating the capacity factor and the scalar fitness value of each parent candidate sequence by using the obtained sequencing serial numbers.

(5) Input termination threshold: the termination threshold is set empirically according to the actual engineering.

(6) Performing matching genetic operation on the parent candidate sequence set, and updating the child candidate sequence set: based on a multi-factor genetic algorithm, two parent candidate sequences are selected from a parent candidate sequence set in a successive random mode, matching inheritance is conducted according to random mating probability and capacity factors of the two parent candidate sequences, uniform crossing operation is conducted on the two parent candidate sequences with the same capacity factor to obtain two offspring candidate sequences, basic bit variation operation is conducted on the two parent candidate sequences with different capacity factors respectively, and the two offspring candidate sequences are obtained respectively.

(7) Selectively updating the mean absolute error of the k key user extraction tasks for each child candidate sequence: randomly selecting a key user extraction task corresponding to the capacity factor of any one parent candidate sequence for a child candidate sequence obtained by uniformly crossing two parent candidate sequences, calculating the average absolute error of the key user extraction task corresponding to the capacity factor of the unique parent candidate sequence for the child candidate sequence obtained by standard bit variation of the unique parent candidate sequence, and updating the average absolute error of the key user extraction task of which each child candidate sequence is not calculated to be 100000.

(8) And merging the parent candidate sequence set and the child candidate sequence set to obtain an intermediate candidate sequence set.

(9) Calculating and updating the capability factor and scalar fitness value for each intermediate candidate sequence: and aiming at each key user extraction task, respectively sequencing the average absolute errors of all the intermediate candidate sequences on each key user extraction task in an ascending manner to obtain the sequencing serial numbers of all the intermediate candidate sequences on the k key user extraction tasks, and calculating and updating the capacity factor and the scalar fitness value of each intermediate candidate sequence by using the obtained sequencing serial numbers.

(10) Updating the parent candidate sequence set: and (3) arranging scalar fitness values of the 2N intermediate candidate sequences in a descending order, and selecting the first N intermediate candidate sequences as an updated parent candidate sequence set.

(11) Calculating the minimum average absolute error sum, and judging whether the minimum average absolute error sum is smaller than a termination threshold value: and (2) sorting all the parent candidate sequences in a descending order according to scalar fitness values, calculating the average absolute error sum of the k key user extraction tasks of the first parent candidate sequence, taking the average absolute error sum as the minimum average absolute error sum of the multi-scale key user extraction tasks, judging whether the minimum average absolute error sum is smaller than a termination threshold value, if so, judging that the minimum average absolute error sum is smaller than the termination threshold value, executing the step (12), outputting a key user set extraction result and testing the performance, otherwise, executing the step (6) when the minimum average absolute error sum is larger than or equal to the termination threshold value, and entering a parent candidate sequence set updating cycle. The method comprises the steps of circularly updating the capacity factor and the scalar fitness value of a parent candidate sequence set, carrying out matching genetic operation on the parent candidate sequence set, and selectively updating a child candidate sequence set until the minimum mean absolute error sum is smaller than a termination threshold value.

(12) Outputting multi-scale key user extraction results: aiming at the j key user extraction task, sorting N parent candidate sequences on the key user extraction task according to the ascending order of average absolute errors, and utilizing a mapping formula to sort the top p of the first parent candidate sequence_jAnd converting the dimension into a key user extraction result of the j-th key user extraction task, and sequentially obtaining and outputting key user extraction results of the k key user extraction tasks by using the same method to finish the multi-task key user extraction process.

The specific thought of the invention is that the optimized population in the multi-factor genetic algorithm is represented by the key user candidate sequence set in the invention, the capability factor and the scalar fitness value of each parent candidate sequence are calculated, the matching genetic operation is carried out on the parent candidate sequence set, the offspring candidate sequence set is selectively updated, and the key user sets with different scales of the recommendation system are determined. And recommending articles for the target user of the recommendation system by using the extracted key users with different scales and the grading information carried by the key users.

The invention provides an overall technical scheme for extracting the multi-scale key users of the recommendation system, can simultaneously extract the key users with different scales from the complicated and numerous recommendation system data in one extraction process, filters redundant data, extracts information concerned by people, and is beneficial to dealing with the data explosion problem in the information era.

Example 2

The method for extracting multi-scale key users based on the multi-factor genetic algorithm is the same as the method for extracting the multi-scale key users in the embodiment 1, and the calculation of the capacity factor and the scalar fitness value in the step (4) comprises the following steps:

(4a) selecting a task: and randomly selecting an unselected task from the k key user extraction tasks.

(4b) And (3) sequencing the parent candidate sequence set on the selected task: and arranging the average absolute errors of the N parent candidate sequences on the selected task in an ascending manner to obtain a group of sequence numbers of the N parent candidate sequences on the selected task.

(4c) Judging whether all tasks are selected: and (4) judging whether the number of the selected tasks reaches k, if so, executing the step (4d), and entering a parent candidate sequence selection cycle, otherwise, executing the step (4a), and entering a k key user extraction task selection cycle.

(4d) And (3) parent candidate sequence selection: and randomly selecting an unselected parent candidate sequence from the N parent candidate sequences.

(4e) Parent candidate sequence capability factor and scalar fitness value calculation: and sequencing the sequencing sequence numbers of the selected parent candidate sequence in the k tasks in an ascending order, selecting the task corresponding to the first sequencing sequence number as the capability factor of the parent candidate sequence, and taking the reciprocal of the first sequencing sequence number as the scalar fitness value of the parent candidate sequence.

(4f) Judging whether all the parent candidate sequences are selected: and (4) judging whether the number of the selected parent candidate sequences reaches N, if so, executing the step (5) and inputting a termination threshold value, otherwise, executing the step (4d) and finishing the calculation of the set capacity factor and the scalar fitness value of the parent candidate sequences.

The calculation and the update of the capability factor and the scalar fitness value in the step (9) of the invention are carried out, the calculation objects are 2N intermediate candidate sequences, and the calculation steps are as above, and the method comprises the following steps:

(9a) selecting a task: and randomly selecting an unselected task from the k key user extraction tasks.

(9b) The intermediate candidate sequence set is ordered on the selected task: and arranging the average absolute errors obtained by the 2N intermediate candidate sequences on the selected task in an ascending way to obtain a group of sequencing serial numbers of the 2N intermediate candidate sequences on the selected task.

(9c) Judging whether all tasks are selected: and (4) judging whether the number of the selected tasks reaches k, if so, executing the step (9d), and entering an intermediate candidate sequence selection cycle, otherwise, executing the step (9a), and entering a k key user extraction task selection cycle.

(9d) Selecting an intermediate candidate sequence: an unselected intermediate candidate sequence is arbitrarily selected from the 2N intermediate candidate sequences.

(9e) Intermediate candidate sequence competence factor and scalar fitness value calculation: and sequencing the sequencing sequence numbers of the selected intermediate candidate sequence in the k tasks in an ascending order, selecting the task corresponding to the first sequencing sequence number as the capability factor of the intermediate candidate sequence, and taking the reciprocal of the first sequencing sequence number as the scalar fitness value of the intermediate candidate sequence.

(9f) Judging whether all the intermediate candidate sequences are selected: and (3) judging whether the number of the selected intermediate candidate sequences reaches 2N, if so, executing the step (10) to update the parent candidate sequence set, otherwise, executing the step (9d) to finish the calculation and update of the capability factor and the scalar fitness value of the intermediate candidate sequence set.

The method is based on the multi-factor genetic algorithm, the optimized population in the multi-factor genetic algorithm is represented by the key user candidate sequence set in the invention, and the capability factor and the scalar fitness value of each parent candidate sequence are calculated, so that the problems that in the prior art, the recommendation accuracy is taken as an optimization target, only single-scale key users can be extracted in a single optimization process, and the extraction efficiency is low are solved. According to the technical scheme, the key user sets with different scales can be obtained in the same optimization process, and the extraction efficiency of the key users is effectively improved.

Example 3

The multi-scale key user extraction method based on the multi-factor genetic algorithm is the same as that of the parent candidate sequence set in the embodiment 1-2, and the step (6) of performing matching genetic operation on the parent candidate sequence set and updating the child candidate sequence set, and comprises the following steps:

(6a) mating probability input: random mating probability rmp is input.

(6b) And (3) parent candidate sequence selection: randomly selecting two parent candidate sequences from the parent candidate sequence set, and calling the two parent candidate sequences as a first parent candidate sequence p_aAnd a second parent candidate sequence p_b。

(6c) Random number generation: a random number rand between 0 and1 is randomly generated.

(6d) And (3) judging: judging a first parent candidate sequence p_aAnd a second parent candidate sequence p_bIf the capacity factors are the same or the rand is smaller than rmp, if yes, execute step (6e) to execute the uniform crossover operation, otherwise, execute step (6f) to execute the basic bit variation operation.

(6e) Uniform crossing operation: first parent candidate sequence p_aAnd a second parent candidate sequence p_bObtaining the updated first filial generation candidate sequence c through uniform crossing operation_aAnd a second progeny candidate sequence c_bAnd (6g) executing.

(6f) Basic bit mutation operation: first parent candidate sequence p_aObtaining an updated first filial generation candidate sequence c through a basic bit variation operation_aSecond parent candidate sequence p_bObtaining an updated second filial generation candidate sequence c through the basic bit variation operation_b。

(6g) And (3) judging: judging whether the times of randomly selecting two parent candidate sequences from the parent candidate sequence set reaches

And if so, executing step (7) to selectively update the average absolute error of the k key user extraction tasks of each child candidate sequence, otherwise, executing step (6b) to enter a parent candidate sequence selection loop.

According to the invention, the matched genetic operation is carried out on each parent candidate sequence, and the parent candidate sequences are selected for genetic operation in a matched manner according to the capacity factors, so that the parent candidate sequences with the same capacity factors have higher probability to execute uniform cross operation and carry out genetic gene communication.

Example 4

The multi-scale key user extraction method based on the multi-factor genetic algorithm is the same as the embodiment 1-3, and the method selectively updates the average absolute error of the k key user extraction tasks of each filial generation candidate sequence in the step (7) of the invention, and comprises the following steps:

(7a) and randomly selecting an unselected filial candidate sequence set from the filial candidate sequence sets.

(7b) And mapping the selected candidate sequences of the filial generations into key user sequences according to a mapping formula.

(7c) And (4) judging whether the selected child candidate sequence is obtained by uniformly crossing two parent candidate sequences, if so, executing the step (7d), otherwise, executing the step (7 j).

(7d) A random number rand1 between 0 and1 is randomly generated.

(7e) And (5) judging whether rand1 is smaller than 0.5, if so, executing the step (7f), otherwise, executing the step (7 h).

(7f) The first parent candidate sequence p on the selected child candidate sequence_aThe extraction scale of the key user extraction task corresponding to the ability factor is used as a boundary value a, the front a bit of the mapped key user sequence is selected as a key user set mapped by the child candidate sequence, the recommendation process is completed by using a key user-based collaborative filtering recommendation algorithm, and the prediction score of the verification set user is obtained.

(7g) According to an average error calculation formula, updating the first parent candidate sequence p on the selected child candidate sequence by using the obtained prediction score of the verification set user_aFor extracting tasks from key users corresponding to the ability factorAverage absolute error; step (7l) is performed.

(7h) The second parent candidate sequence p on the selected child candidate sequence_bThe extraction scale of the key user extraction task corresponding to the ability factor is used as a boundary value b, the front b bits of the mapped key user sequence are selected as a key user set mapped by the filial generation candidate sequence, the recommendation process is completed by using a key user-based collaborative filtering recommendation algorithm, and the prediction score of the verification set user is obtained.

(7i) According to the average error calculation formula, updating the second parent candidate sequence p on the selected child candidate sequence by using the obtained prediction score of the verification set user_bExtracting the average absolute error of the task by the key user corresponding to the capability factor; step (7l) is performed.

(7j) And taking the extraction scale of the key user extraction task corresponding to the capacity factor of the only parent candidate sequence on the selected child candidate sequence as a boundary value c, selecting the front c bit of the key user sequence obtained by mapping as a key user set obtained by mapping the child candidate sequence, and completing the recommendation process by utilizing a key user-based collaborative filtering recommendation algorithm to obtain the prediction score of the verification set user.

(7k) And updating the average absolute error of the key user extraction task corresponding to the capability factor of the unique parent candidate sequence on the selected child candidate sequence by using the obtained prediction score of the verification set user according to an average error calculation formula.

(7l) the average absolute error of all the non-updated tasks on the selected child candidate sequence is updated to be 100000.

(7m) judging whether the number of the selected candidate sequences of the descendants reaches N, if so, executing the step (8), merging the candidate sequence set of the parent generation and the candidate sequence set of the descendants, otherwise, executing the step (7a), and entering a selection cycle of the candidate sequences of the descendants;

the invention selectively updates the average absolute error of the k key user extraction tasks of each filial generation candidate sequence, selectively inherits the task type-entering average absolute error calculation with better excellence of the parent generation candidate sequence according to the capability factor of the parent generation candidate sequence corresponding to the filial generation candidate sequence, namely selects the task with the smallest average absolute error of the parent generation candidate sequence to calculate and update the average absolute error of the filial generation candidate sequence, thereby reducing the unnecessary calculation, leading the whole extraction process to be more concise and efficient, and having higher extraction efficiency when extracting multi-scale key users.

Example 5

The multi-scale key user extraction method based on the multi-factor genetic algorithm is the same as the uniform cross operation in the embodiment 1-4 and the step (6e), and comprises the following steps:

(6e1) generating a cross indication vector: and randomly generating a q-dimensional cross indication vector, wherein each dimension of the cross indication vector is randomly taken as 0 or 1, and q is the length of the candidate sequence. And q is the maximum value of the extraction scales of the plurality of key users of the input recommendation system multi-scale key user extraction tasks.

(6e2) Selecting cross indication vector components: and randomly selecting one-dimensional unselected cross indication vector components from the cross indication vectors.

(6e3) Judging whether the cross indication component is 1: judging whether the selected cross indication vector component is 1, if so, determining the first filial generation candidate sequence c_aIs updated as the first parent candidate sequence p_aCorresponding to the value of the dimension, the second filial generation candidate sequence c_bIs updated as the second parent candidate sequence p_bThe value of the corresponding dimension, otherwise, the first descendant candidate sequence c_aIs updated as the second parent candidate sequence p_bCorresponding to the value of the dimension, the second filial generation candidate sequence c_bIs updated as the first parent candidate sequence p_aThe value of the corresponding dimension.

(6e4) Judging whether the uniform crossing operation is finished: and (4) judging whether all the cross indication vector components are taken, if so, finishing the uniform cross operation of the two selected parent candidate sequences, executing the step (6g), entering a parent candidate sequence selection cycle, otherwise, executing the step (6e2), and continuing to enter the cross indication vector selection cycle.

According to the method, two parent candidate sequences are sequentially and randomly selected from the parent candidate sequence set to carry out uniform cross operation, so that genes of the parent candidate sequences can be fully exchanged, and the generated offspring candidate sequences can uniformly inherit the genes of the two parent candidate sequences, so that the method can more accurately and quickly converge to multi-scale key user extraction results, and the recommendation system key users of different scales can be accurately and efficiently obtained.

Example 6

The multi-scale key user extraction method based on the multi-factor genetic algorithm is the same as the basic bit variation operation in the embodiments 1 to 5 and the step (6f), and comprises the following steps:

(6f1) determining the variation position: randomly generating a random integer z between [1, q-1], and taking z as a mutation bit position.

(6f2) Updating the mutation bit: randomly generating a random number w between (0,1), and updating the z-th bit of the selected parent candidate sequence to w.

The method executes standard bit variation operation on the parent candidate sequence, enlarges the search space, overcomes the defect that the existing key user extraction method based on the evolutionary algorithm is easy to fall into the local optimal solution in the extraction process, improves the search efficiency, and can quickly obtain the accurate multi-scale key user set.

A detailed example is given below to further illustrate the present invention.

Example 7

The multi-scale key user extraction method based on the multi-factor genetic algorithm is the same as the embodiment 1-6, and the specific implementation steps of the invention are further described with reference to fig. 1.

Step 1, data are obtained and divided into a training set, a verification set and a test set.

And intercepting the grading data of all users containing key users of different scales on the article from the network as basic data, and dividing the basic data into a training set, a verification set and a test set according to the proportion of 60%, 20% and 20%.

From the perspective of the recommendation system and recommendation algorithm, user item scoring data is input: inputting the scoring data of the n articles by the m users, and dividing the scoring data into a training set, a verification set and a test set according to the proportion of 60%, 20% and 20%.

And 2, generating a candidate sequence set.

Step 2.1, inputting k recommendation system key user extraction tasks, wherein the key user extraction scales are p respectively₁,p₂,…,p_k。

And 2.2, performing descending order arrangement on the extraction scales of the k key users, and selecting the extraction scale of the first key user as the length q of the candidate sequence and the result sequence.

And 2.3, randomly generating parent candidate sequences of N q dimensions, wherein each dimension randomly takes a random number between 0 and1, and N is the number of the parent candidate sequences.

And 2.4, randomly generating N q-dimensional offspring candidate sequences, wherein the random value of each dimension is a random number between 0 and1, and N is the number of the offspring candidate sequences.

And 2.5, taking a set of N parent candidate sequences and N child candidate sequences as 2N intermediate candidate sequences.

And 3, calculating the average absolute error of the k key user extraction tasks of each parent candidate sequence.

And 3.1, randomly selecting an unselected parent candidate sequence from the parent candidate sequence set.

And 3.2, generating an empty key user sequence.

And 3.3, mapping the selected parent candidate sequence into a key user sequence according to a mapping formula.

The mapping formula is as follows:

s_i＝1+(n-1)×y_i

wherein s is_iRepresenting the serial number of the key user after the ith dimension mapping in the selected parent candidate sequence, n representing the total user number of the recommendation system, y_iAnd representing the value of the ith dimension element in the selected parent candidate sequence.

And 3.4, randomly selecting one unselected task from the k key user extraction tasks.

And 3.5, taking the key user extraction scale corresponding to the selected key user extraction task as a selection threshold value a, and taking the front a bit of the key user sequence as a key user sequence number set obtained by the selected key user extraction task.

And 3.6, completing a recommendation process by using a collaborative filtering recommendation algorithm based on key users to obtain the prediction scores of the verification set users.

The collaborative filtering recommendation algorithm based on the key users comprises the following specific steps:

and 3.6.1, calculating the cosine similarity between all the users in the recommendation system and the key user by using the following formula.

Wherein S_uvRepresenting cosine similarity of the u-th user and the v-th key user, n representing the number of all articles in the recommendation system, i representing the serial number of the articles, sigma representing summation operation, r_uiIndicates the rating, r, of the ith item by the u-th user_viRepresents the rating of the ith item by the nth key user,

indicating an open root operation.

And 3.6.2, selecting L key users with the highest similarity to each user as a neighbor user set.

At step 3.6.3, a prediction score for each item is calculated for each user using the following equation.

Wherein the content of the first and second substances,

represents the predicted score of the ith user by the u-th user, sigma represents the summation operation, N_uSet of neighbor users, S, representing the u-th user_uvDenotes the u-thCosine similarity, r, between a user and a v-th neighbor user_viRepresenting the rating of the ith item by the nth neighbor user.

And 3.7, calculating the average absolute error of the selected parent candidate sequence on the selected key user extraction task according to an average absolute error calculation formula.

The average error calculation formula is as follows:

wherein MAE represents the mean absolute error of the prediction score and the true score in the validation set, T_uSet representing the totality of users of the recommendation system, I_uRepresenting the set of items scored by the u-th user in the authentication set, r_u,iAnd

respectively representing the truth score and the forecast score of the ith user on the ith item in the verification set.

And 3.8, judging whether the number of the tasks selected in the selected parent candidate sequence reaches k, if so, executing the step 3.9, otherwise, executing the step 3.4.

And 3.9, judging whether the number of the selected parent candidate sequences reaches N, if so, executing the step 4, otherwise, executing the step 3.1.

And 4, calculating the capability factor and the scalar fitness value of each parent candidate sequence.

And 4.1, randomly selecting an unselected task from the k key user extraction tasks.

And 4.2, arranging the average absolute errors of the N parent candidate sequences on the selected task in an ascending manner to obtain a group of sequence numbers of the N parent candidate sequences on the selected task.

And 4.3, judging whether the number of the selected tasks reaches k, if so, executing the step 4.4, otherwise, executing the step 4.1.

And 4.4, randomly selecting an unselected parent candidate sequence from the N parent candidate sequences.

And 4.5, sequencing the sequencing sequence numbers of the selected parent candidate sequence in k tasks in an ascending way, selecting the task corresponding to the first sequencing sequence number as the capability factor of the parent candidate sequence, and taking the reciprocal of the first sequencing sequence number as the scalar fitness value of the parent candidate sequence.

And 4.6, judging whether the number of the selected parent candidate sequences reaches N, if so, executing the step 5, otherwise, executing the step 4.4.

Step 5, inputting a termination threshold value: the termination threshold is set empirically according to the actual engineering.

And 6, performing matching genetic operation on the parent candidate sequence set, and updating the child candidate sequence set.

Step 6.1, random mating probability rmp is input.

Step 6.2, arbitrarily selecting two parent candidate sequences from the parent candidate sequence set, and calling the two parent candidate sequences as a first parent candidate sequence p_aAnd a second parent candidate sequence p_b。

And 6.3, randomly generating a random number rand between 0 and 1.

Step 6.4, judge the first parent candidate sequence p_aAnd a second parent candidate sequence p_bIf the capacity factors are the same or if rand is smaller than rmp, if yes, step 6.5 is performed, otherwise, step 6.6 is performed.

Step 6.5, first parent candidate sequence p_aAnd a second parent candidate sequence p_bObtaining the updated first filial generation candidate sequence c through the basically uniform cross operation_aAnd a second progeny candidate sequence c_b。

The uniform crossing operation comprises the following specific steps:

and 6.5.1, randomly generating a q-dimensional cross indication vector, wherein each dimension of the cross indication vector is randomly selected to be 0 or 1, and q is the length of the candidate sequence.

At step 6.5.2, one-dimensional unselected cross-pointing vector components are arbitrarily selected from the cross-pointing vectors.

Step 6.5.3, judging whether the selected cross indication vector component is 1, if so, determining the first filial generation candidate sequence c_aIs updated as the first parent candidate sequence p_aCorresponding to the value of the dimension, the second filial generation candidate sequence c_bIs updated as the second parent candidate sequence p_bThe value of the corresponding dimension, otherwise, the first descendant candidate sequence c_aIs updated as the second parent candidate sequence p_bCorresponding to the value of the dimension, the second filial generation candidate sequence c_bIs updated as the first parent candidate sequence p_aThe value of the corresponding dimension.

And 6.5.4, judging whether all the cross indication vector components are taken, if so, finishing the uniform cross operation of the two selected parent candidate sequences, and otherwise, executing a step 6.5.2.

Step 6.6, first parent candidate sequence p_aObtaining an updated first filial generation candidate sequence c through a basic bit variation operation_aSecond parent candidate sequence p_bObtaining an updated second filial generation candidate sequence c through the basic bit variation operation_b。

The basic bit operation comprises the following specific steps:

and 6.6.1, randomly generating a random integer z between [1, q-1], and taking z as a mutation bit position.

And 6.6.2, randomly generating a random number w between (0,1), and updating the z-th bit of the selected parent candidate sequence to be w.

Step 6.7, judging whether the times of randomly selecting two parent candidate sequences from the parent candidate sequence set reach the target

If so, go to step 7, otherwise go to step 6.2.

And 7, updating the average absolute error of the k key user extraction tasks of each child candidate sequence.

And 7.1, randomly selecting an unselected filial generation candidate sequence set from the filial generation candidate sequence sets.

And 7.2, mapping the selected candidate sequences of the filial generations into key user sequences according to a mapping formula.

The mapping formula is as follows:

s_i＝1+(n-1)×y_i

wherein s is_iRepresenting the serial number of the key user after mapping the ith dimension element in the selected candidate sequence of the filial generation, n representing the total number of users of the recommendation system, y_iAnd representing the value of the ith dimension element in the selected candidate sequence of the child.

And 7.3, judging whether the selected child candidate sequence is obtained by uniformly crossing two parent candidate sequences, if so, executing the step 7.4, otherwise, executing the step 7.10.

Step 7.4, a random number rand1 between 0 and1 is randomly generated.

And 7.5, judging whether the rand1 is less than 0.5, if so, executing the step 7.6, otherwise, executing the step 7.8.

Step 7.6, the first parent candidate sequence p on the selected child candidate sequence is selected_aThe extraction scale of the key user extraction task corresponding to the capability factor is used as a boundary value a, the front a bit of the key user sequence obtained by mapping is selected as a key user set, the recommendation process is completed by using a key user-based collaborative filtering recommendation algorithm, and the prediction score of the verification set user is obtained.

The collaborative filtering recommendation algorithm based on the key users specifically comprises the following steps:

step 7.6.1, calculating the cosine similarity between all users in the recommendation system and the key user using the following formula.

indicating an open root operation.

And step 7.6.2, selecting the L key users with the highest similarity to each user as a neighbor user set.

Step 7.6.3, calculate the prediction score for each user for each item using the following equation.

Wherein the content of the first and second substances,

represents the predicted score of the ith user by the u-th user, sigma represents the summation operation, N_uSet of neighbor users, S, representing the u-th user_uvRepresents the cosine similarity of the u-th user and the v-th neighbor user, r_viRepresenting the rating of the ith item by the nth neighbor user.

Step 7.7, updating the first parent candidate sequence p on the selected child candidate sequence according to the average error calculation formula_aExtracting the average absolute error of the task by the key user corresponding to the capability factor; step 7.12 is performed.

The average error calculation formula is as follows:

respectively representing the u-th use in the verification setThe user scores the truth and the forecast of the ith item.

Step 7.8, the second parent candidate sequence p on the selected child candidate sequence is selected_bThe extraction scale of the key user extraction task corresponding to the capability factor is used as a boundary value b, the front b bits of the key user sequence obtained by mapping are selected as a key user set, the recommendation process is completed by using a key user-based collaborative filtering recommendation algorithm, and the prediction score of the verification set user is obtained.

And 7.9, updating the second parent candidate sequence p on the selected child candidate sequence according to an average error calculation formula_bExtracting the average absolute error of the task by the key user corresponding to the capability factor; step 7.12 is performed.

And 7.10, taking the extraction scale of the key user extraction task corresponding to the capacity factor of the only parent candidate sequence on the selected child candidate sequences as a boundary value c, selecting the front c bits of the key user sequence obtained by mapping as a key user set, and completing a recommendation process by using a key user-based collaborative filtering recommendation algorithm to obtain the prediction score of the verification set user.

And 7.11, updating the average absolute error of the key user extraction task corresponding to the capability factor of the only parent candidate sequence in the selected child candidate sequences according to an average error calculation formula.

And 7.12, updating the average absolute error of all the tasks which are not updated on the selected child candidate sequence to be 100000.

And 7.13, judging whether the number of the selected filial generation candidate sequences reaches N, if so, executing the step 8, otherwise, executing the step 7.1.

And 8, merging the parent candidate sequence set and the child candidate sequence set to obtain an intermediate candidate sequence set.

And 9, updating the scalar fitness value and the capability factor of each intermediate candidate sequence.

And 9.1, randomly selecting an unselected task from the k key user extraction tasks.

And 9.2, arranging the average absolute errors of the 2N intermediate candidate sequences on the selected task in an ascending order to obtain the sequence numbers of the 2N intermediate candidate sequences on the selected task.

And 9.3, judging whether the number of the selected tasks reaches k, if so, executing the step 9.4, otherwise, executing the step 9.1.

And 9.4, randomly selecting an unselected intermediate candidate sequence from the 2N intermediate candidate sequences.

And 9.5, extracting the sequencing serial numbers of the tasks from the k key users of the selected intermediate candidate sequence, arranging the sequencing serial numbers in an ascending order, selecting the task corresponding to the first sequencing serial number as the capability factor of the intermediate candidate sequence, and taking the reciprocal of the first sequencing serial number as the scalar fitness value of the intermediate candidate sequence.

And 9.6, judging whether the number of the selected parent candidate sequences reaches N, if so, executing the step 10, otherwise, executing the step 9.4.

And step 10, updating the parent candidate sequence set.

And (3) arranging scalar fitness values of the 2N intermediate candidate sequences in a descending order, and selecting the first N intermediate candidate sequences as an updated parent candidate sequence set.

And step 11, calculating the average absolute error sum of the k key user extraction tasks of the first parent candidate sequence.

And step 12, judging whether the average absolute error sum is smaller than a termination threshold value, if so, executing step 13, otherwise, executing step 6.

And step 13, outputting the multi-scale key user extraction result.

Aiming at the jth key user extraction task, sorting N parent candidate sequences on the key user extraction task according to the ascending order of average absolute errors, and utilizing a mapping formula to sort the top p of the first parent candidate sequence_jConverting the dimension into the key user extraction result of the jth key user extraction task, and sequentially obtaining and outputting the key user extraction results of the k key user extraction tasks by the same method to complete the multi-scale key user extraction process.

And step 14, testing the performance of the extracted k key user sets with different scales.

And aiming at the obtained key user extraction results of the k key user extraction tasks, respectively completing a recommendation process based on a collaborative filtering recommendation algorithm of the key users to obtain the prediction scores of the k key user extraction tasks in the test set users, and calculating the average absolute error obtained by the k key user extraction tasks according to an average absolute error calculation formula.

The method comprises the steps of representing an optimized population in a multi-factor genetic algorithm by using a key user candidate sequence set in the invention, calculating the capability factor and scalar fitness value of each parent candidate sequence, carrying out matching genetic operation on the parent candidate sequence set, and selectively updating a child candidate sequence set to obtain key user sets with different scales.

The effect of the present invention will be further described with reference to simulation experiments.

Example 8

The multi-scale key user extraction method based on the multi-factor genetic algorithm is the same as the embodiment 1-7, and the effect of the invention can be further illustrated by simulation:

simulation conditions

The operation environment of the simulation experiment of the invention is as follows: the processor is Intel Core (TM) i5-3470 CPU @3.2GHz, the memory is 4.00GB, the hard disk is 1T, the operating system is Windows 10, and the programming environment is Visual Studio Enterprise 2015.

The simulation experiment is carried out on a data set MovieLens-100k commonly used in the field of recommendation systems, the recommendation effect of extracting multi-scale key users of the recommendation system and recommending the key users with different scales by using the extracted key users is verified, the basic data obtained by the method is the MovieLens-100k data set, the MovieLens-100k is a film scoring data set comprising 10000 scoring data of 1682 films by 943 users, and the value of each score is an integer between [1 and 5 ]. In the experiment, basic data are randomly divided into a training set, a verification set and a test set according to the proportion of 60%, 20% and 20%, and detailed information of the basic data is shown in table 1. In table 1, the number of users refers to the number of all users included in the basic data acquired by the present invention, the number of items refers to the number of items scored by all users in the basic data acquired by the present invention, and in an actual recommendation system, the items may be music, books, movies, etc., which are 1682 movies in this embodiment. The key user sets of different scales to be extracted by the method are subsets of different sizes of the whole user sets in the basic data.

Table 1 data set information table

In the simulation experiment of the invention, the set size N of the parent candidate sequence and the offspring candidate sequence is set to be 100, the size 2N of the intermediate candidate sequence set is 200, the random mating probability is set to be 0.2, the number of neighbors in the collaborative filtering recommendation algorithm based on the key users is set to be 90, the number k of the extracted multi-scale key user tasks is set to be 2, and the scales of the extracted multi-scale key user tasks are respectively 95 and 159.

The method comprises the steps of utilizing the scoring data of training sets of all users on articles in basic data, expressing optimized populations in a multi-factor genetic algorithm by using a key user candidate sequence set in the method, calculating the capacity factor and the scalar fitness value of each parent candidate sequence, carrying out matching genetic operation on the parent candidate sequence set, selectively updating a child candidate sequence set, circularly updating the parent candidate sequence set, obtaining key user sets of different scales by using a mapping formula, obtaining the scores of verification set users on the articles in a key user-based collaborative filtering recommendation algorithm by using the verification set data of the basic data, and obtaining average absolute errors corresponding to the key user sets of different scales by using an average absolute error calculation formula. And when the minimum average absolute error reaches a termination threshold, converting the final parent candidate sequence set into a set containing key users of different scales by using a mapping formula, and finishing the extraction process of the key users of multiple scales.

And obtaining the prediction scores of the users on the objects on the basic data test set in a collaborative filtering recommendation algorithm based on the key users by using the obtained key user sets with different scales, calculating the average absolute error between the prediction scores obtained by using the extracted multi-scale key users and the real scores of the users in the basic data test set by using an average absolute error calculation formula, and testing the performance of the extracted multi-scale key users.

Content of simulation experiment

The simulation experiment of the invention is that on the basis of the acquired basic data MovieLens-100k dataset, the optimized population in the multi-factor genetic algorithm is represented by the key user candidate sequence set in the invention, the ability factor and scalar fitness value of each parent candidate sequence are calculated, the matching genetic operation is carried out on each parent candidate sequence, and the offspring candidate sequence set is selectively updated to obtain the key user sets with different scales. And calculating the accuracy of the recommendation effect of all target users in the basic data test set.

The simulation experiment of the invention comprises the following steps:

step 1, acquiring a MovieLens-100k data set containing 100000 scores of 1682 items of 943 users as basic data, and dividing the basic data into a training set, a verification set and a test set according to the proportion of 60%, 20% and 20%.

And 2, inputting 2 recommendation system key user extraction tasks, wherein the key user extraction scales are respectively 95 and 159, selecting the maximum value 159 of 95 and 159 as the lengths of the parent candidate sequence and the child candidate sequence, randomly generating 100 159-dimensional parent candidate sequences and 100 159-dimensional child candidate sequences, and randomly taking the random number between 0 and1 for each dimension of the parent candidate sequence and the child candidate sequence.

And 3, mapping the 100 parent candidate sequences into a key user sequence by using a mapping formula:

s_i＝1+(943-1)×y_i

wherein s is_iRepresenting the serial number y of the key user after mapping the ith dimension element in the selected parent candidate sequence_iAnd representing the value of the ith dimension element in the selected parent candidate sequence.

Selecting the first 95 bits of the key user sequence as a key user set of a 1 st key user extraction task, selecting the first 159 bits of the key user sequence as a key user set of a second key user extraction task, respectively putting the extracted key user sets into a collaborative filtering recommendation algorithm based on key users, and calculating average error values of 100 parent candidate sequences on 2 key user extraction tasks by using an average absolute error calculation formula:

wherein MAE represents the mean absolute error of the prediction score and the truth score in the basic data validation set, T_uSet representing the totality of users in the underlying data authentication set, I_uRepresenting the set of items, r, scored by the u-th user in the verification set of underlying data_u,iAnd

respectively representing the truth score and the forecast score of the ith user on the ith item in the basic data authentication set.

And 4, respectively arranging the average absolute errors of the 100 parent candidate sequences on the 2 key user extraction tasks in an ascending manner on the basis of each key user extraction task to obtain the ranking sequence numbers of the 100 parent candidate sequences on the 2 key user extraction tasks, taking the minimum ranking sequence number of each parent candidate sequence in the 2 key user extraction tasks as the capability factor of the parent candidate sequence, and taking the reciprocal of the minimum ranking sequence number of each parent candidate sequence in the 2 key user extraction tasks as the scalar fitness value of the parent candidate sequence.

And 5, inputting a termination threshold value: the termination threshold is set empirically according to the actual engineering.

And 6, carrying out uniform crossing and basic bit variation operation on the parent candidate sequence set, and updating the child candidate sequences.

And 7, selectively updating the average fitness value of one key user extraction task in the child candidate sequences according to the number of parent candidate sequences of the child candidate sequences and the capacity factor, and updating the average fitness value of the other key user extraction task to be 100000.

And 8, combining 100 parent candidate sequences and 100 child candidate sequences to obtain 200 intermediate candidate sequences.

And 9, respectively arranging the 200 intermediate candidate sequences on the 2 key user extraction tasks according to the average absolute error in an ascending order to obtain the ordering serial numbers of the 200 intermediate candidate sequences on the 2 key user extraction tasks, and updating the capability factors and the scalar fitness values of the 200 intermediate candidate sequences for each intermediate candidate sequence.

And step 10, arranging the scalar fitness values of the 200 intermediate candidate sequences in a descending order, and selecting the first 100 intermediate candidate sequences as an updated parent candidate sequence set.

And 11, calculating the average absolute error sum of the 2 key user extraction tasks of the first parent candidate sequence.

And 12, judging whether the average absolute error sum is smaller than a termination threshold value, if so, executing the step 13, and otherwise, executing the step 6.

And step 13, converting the parent candidate sequence with the highest scalar fitness value in the 2 key user extraction tasks into a key user set by using a mapping formula, putting the obtained key users of two scales into a key user-based collaborative filtering recommendation algorithm to obtain a recommendation result on the test set, calculating the average absolute error of the 2 key user extraction tasks, and comparing the average absolute error with the result obtained by independently extracting the key users of a single scale.

And step 14, outputting a key user set obtained by extracting tasks by 2 key users, and completing the multi-task key user extraction process.

Analysis of simulation experiment results

The simulation experiment of the invention is to utilize the invention to extract 2 kinds of systematic key users of different scales on the basic data MovieLens-100k data set at the same time, the contrast method is to extract 2 kinds of key users of different scales to gather as the method that the single task is extracted alone; when the key user sets of different scales extracted by the simulation experiment are applied to the recommendation process, the obtained average absolute error of the method is shown in table 2, when the key user sets of different scales extracted by the comparison method are applied to the recommendation process, the obtained average absolute error of the comparison method is shown in table 3, and the MAE in the table refers to the average absolute error.

TABLE 2 simulation results of the present invention on the MovieLens-100k dataset

	Data set	Information core extraction scale	MAE on verification set	MAE on test set
					Task 1	MovieLens-100k	159	0.7647	0.7861
Task 2	MovieLens-100k	95	0.7753	0.7920

As can be seen from Table 2, when the 2 kinds of key users with different scales obtained by the method are applied to the recommendation process, a lower average absolute error can be obtained, and the accuracy of the key users with different scales extracted by the method applied to the recommendation algorithm is reflected.

TABLE 3 simulation results of the comparative method on the MovieLens-100k dataset

	Data set	Information core extraction scale	MAE on verification set	MAE on test set
					Task 1	MovieLens-100k	159	0.7634	0.8039
Task 2	MovieLens-100k	95	0.7939	0.8055

Comparing the average absolute errors of the basic data test sets in tables 2 and 3, it can be seen that the average absolute errors of 2 key users of different scales extracted by the method can be lower than that of the comparison method when the key users are put into the recommendation process, which shows that the method obtains higher accuracy than that of the comparison method.

In the simulation experiment, when the 2 kinds of key user sets with different scales extracted by the method and the 2 kinds of key user sets with different scales extracted by the comparison method are applied to the recommendation process, the obtained average absolute error ratio is shown in fig. 2. Wherein (a) in FIG. 2 shows the comparison of the average absolute error of the user scores of the basic data test set when the key users obtained by the present invention and the key users obtained by the comparison method are put into the collaborative filtering process based on the key users when the extraction scale of the key users is 159, fig. 2 (b) shows a comparison of average absolute errors of user scores of a basic data test set obtained when a key user obtained by the present invention and a key user obtained by a comparison method are put into a collaborative filtering process based on key users when the key user extraction scale is 95, the abscissa shows an experiment type, the ordinate shows the average absolute error, the light-colored bar graph shows a multi-scale key user extraction method as multi-task optimization by the present invention, and the dark-colored bar graph shows a method for separately extracting single-scale key users as single-task optimization.

As can be seen from fig. 2, when the 2 key user result sets with different scales obtained by the present invention are put into the recommendation process, the average absolute error is lower than that when the key users obtained by the single-task independent optimization extraction are put into the recommendation process, which reflects the accuracy of the multi-task extraction of the key users of the present invention, and in this embodiment, the present invention can simultaneously obtain 2 key users with different scales in one operation process, and has a faster key user extraction rate compared with the single-task independent optimization. When the key user extraction scale k takes more values, the invention can also obtain k key users with different scales simultaneously in one operation process.

In short, the invention discloses a multi-scale key user extraction method based on a multi-factor genetic algorithm, which solves the technical problem of simultaneously extracting key users with different scales in a recommendation system, and mainly comprises the following steps: generating a candidate sequence set, and calculating the average absolute error of a plurality of key user extraction tasks of the candidate sequence set; calculating a capability factor and a scalar fitness value of the parent candidate sequence; performing matching genetic operation on the parent candidate sequence set; performing selective average absolute error updating on the filial generation candidate sequence set; merging the parent candidate sequence and the offspring candidate sequence set to obtain intermediate candidate sequence combination; updating the power factor and the scalar fitness value of the intermediate candidate sequence set; updating a parent candidate sequence set; and outputting a plurality of key user sets with different scales. The method represents the optimized population in the multi-factor genetic algorithm by the key user candidate sequence set, calculates the capability factor and the scalar fitness value of each parent candidate sequence, performs matching genetic operation on the parent candidate sequence set, and selectively updates the offspring candidate sequence set. The recommendation accuracy is high, and the extraction efficiency is high. The method can be used for extracting the network key information.

Claims

1. A multi-scale key user extraction method based on a multi-factor genetic algorithm is characterized by comprising the following steps:

(2) generating a candidate sequence set; inputting k tasks, wherein each task is an extraction task of a recommendation system key user, and the extraction scale of the key user corresponding to each key user extraction task is p₁,p₂,...,p_kK is the number of the multi-scale key user extraction tasks of the recommendation system and is also the number of the key user extraction scales; based on a multi-factor genetic algorithm coding method, k key user extraction scales are arranged in a descending order, the first key user extraction scale is selected as the length q of a candidate sequence, N q-dimensional parent candidate sequences and N q-dimensional child candidate sequences are respectively generated randomly, and each dimension is randomA random number with a value between 0 and1, wherein N is the number of parent candidate sequences and the number of child candidate sequences; taking a set of N parent candidate sequences and N child candidate sequences as 2N intermediate candidate sequences, and mapping the generated parent, child and intermediate candidate sequence sets to obtain a key user sequence set containing multi-scale key users;

(3) calculating the average absolute error of k key user extraction tasks of each parent candidate sequence: aiming at each parent candidate sequence, according to a mapping formula, the top p of the parent candidate sequence is_jBit as the key user sequence obtained from the jth key user extraction task, where p_jExtracting the scale of key users of a j-th multi-scale key user extraction task to obtain a key user sequence of each parent candidate sequence on k key user extraction tasks; for the key user sequences of each parent candidate sequence on k key user extraction tasks, completing a recommendation process by using a key user-based collaborative filtering recommendation algorithm to obtain a prediction score of a user in a basic data verification set, and calculating an average absolute error of each parent candidate sequence on each key user extraction task according to an average absolute error calculation formula;

(7) selectively updating the average absolute error of the k key user extraction tasks for each child candidate sequence: randomly selecting a key user extraction task corresponding to the capacity factor of any one parent candidate sequence for a child candidate sequence obtained by uniformly crossing two parent candidate sequences, calculating the average absolute error of the key user extraction task corresponding to the capacity factor of the unique parent candidate sequence for the child candidate sequence obtained by standard bit variation of the unique parent candidate sequence, and updating the average absolute error of the key user extraction task of which each child candidate sequence is not calculated to be 100000;

(11) calculating the minimum average absolute error sum, and judging whether the minimum average absolute error sum is smaller than a termination threshold value: arranging all parent candidate sequences in a descending order according to scalar fitness values, calculating the average absolute error sum of k key user extraction tasks of the first parent candidate sequence to serve as the minimum average absolute error sum of multi-scale key user extraction tasks, judging whether the minimum average absolute error sum is smaller than a termination threshold value, if so, executing the step (12), otherwise, executing the step (6);

(12) outputting multi-scale key user extraction results: aiming at the j key user extraction task, sorting N parent candidate sequences on the key user extraction task according to the ascending order of average absolute errors, and utilizing a mapping formula to sort the top p of the first parent candidate sequence_jConverting the dimension into a key user extraction result of a jth key user extraction task, sequentially obtaining and outputting key user extraction results of k key user extraction tasks by using the same method, and completing a multi-scale key user extraction process;

(13) testing the performance of the extracted k key user sets with different scales: and aiming at the key user extraction results of the k key user extraction tasks, respectively completing a recommendation process based on a collaborative filtering recommendation algorithm of the key users to obtain the prediction scores of the k key user extraction results in the test set users, and calculating the average absolute error obtained by the k key user extraction tasks according to an average absolute error calculation formula.

2. The method for extracting multi-scale key users based on multi-factor genetic algorithm according to claim 1, wherein the calculation of the competence factor and the scalar fitness value in the steps (4) and (9) comprises the following steps:

(4a) randomly selecting an unselected task from the k key user extraction tasks;

(4b) the average absolute errors of the N parent candidate sequences on the selected task are arranged in an ascending order to obtain a group of sequence numbers of the N parent candidate sequences on the selected task;

(4c) judging whether the number of the selected tasks reaches k, if so, executing the step (4d), otherwise, executing the step (4 a);

(4d) randomly selecting an unselected parent candidate sequence from the N parent candidate sequences;

(4e) sequencing the sequencing sequence numbers of the selected parent candidate sequence in k tasks in an ascending order, selecting the task corresponding to the first sequencing sequence number as the capability factor of the parent candidate sequence, and taking the reciprocal of the first sequencing sequence number as the scalar fitness value of the parent candidate sequence;

(4f) judging whether the number of the selected parent candidate sequences reaches N, if so, executing the step (5), otherwise, executing the step (4 d);

and (3) calculating the 2N intermediate candidate sequences as the calculation object in the step (9), wherein the calculation step is the same as the above step, only when judging whether the number of the selected intermediate candidate sequences reaches 2N, if so, the step (10) is required to be skipped, otherwise, any unselected intermediate candidate sequence is selected from the 2N intermediate candidate sequences, and the capability factor and the scalar fitness value of the intermediate candidate sequence are updated by using the calculated result.

3. The method for extracting multi-scale key users based on multi-factor genetic algorithm according to claim 1, wherein the step (6) of performing matching genetic operation on the parent candidate sequence set and updating the child candidate sequence set comprises the following steps:

(6a) inputting random mating probability rmp;

(6b) and (3) parent candidate sequence selection: randomly selecting two parent candidate sequences from the parent candidate sequence set, and calling the two parent candidate sequences as a first parent candidate sequence p_aAnd a second parent candidate sequence p_b；

(6c) Randomly generating a random number rand between 0 and 1;

(6d) judging a first parent candidate sequence p_aAnd a second parent candidate sequence p_bIf the capacity factors are the same or the rand is smaller than the rmp, if so, executing the step (6e), otherwise, executing the step (6 f);

(6e) first parent candidate sequence p_aAnd a second parent candidate sequence p_bObtaining the updated first filial generation candidate sequence c through the basically uniform cross operation_aAnd a second progeny candidate sequence c_bExecuting the step (6 g);

(6f) first parent candidate sequence p_aObtaining an updated first filial generation candidate sequence c through a basic bit variation operation_aSecond parent candidate sequence p_bObtaining an updated second filial generation candidate sequence c through the basic bit variation operation_b；

(6g) Judging whether the times of randomly selecting two parent candidate sequences from the parent candidate sequence set reach

If yes, executing step (7), otherwise, executing step (6 b).

4. The method for multi-scale key user extraction based on multifactor genetic algorithm according to claim 1, characterized in that selectively updating the mean absolute error of k key user extraction tasks of each offspring candidate sequence in step (7) comprises the following steps:

(7a) randomly selecting an unselected offspring candidate sequence set from the offspring candidate sequence set:

(7b) mapping the selected candidate sequences of the filial generations into key user sequences according to a mapping formula;

(7c) judging whether the selected candidate sequences of the descendants are obtained by uniformly crossing two candidate sequences of the parents or not, if so, executing the step (7d), otherwise, executing the step (7 j);

(7d) randomly generating a random number rand1 between 0 and 1;

(7e) judging whether rand1 is less than 0.5, if yes, executing step (7f), otherwise, executing step (7 h);

(7f) the first parent candidate sequence p on the selected child candidate sequence_aThe extraction scale of the key user extraction task corresponding to the ability factor is used as a boundary value a, the front a bit of the mapped key user sequence is selected as a key user set mapped by the child candidate sequence, the recommendation process is completed by using a collaborative filtering recommendation algorithm based on key users, and the prediction score of the verification set user is obtained;

(7g) according to an average error calculation formula, updating the first parent candidate sequence p on the selected child candidate sequence by using the obtained predicted score of the verification set user_aExtracting the average absolute error of the task by the key user corresponding to the capability factor; performing step (7 l);

(7h) the second parent candidate sequence p on the selected child candidate sequence_bThe extraction scale of the key user extraction task corresponding to the ability factor is used as a boundary value b, the front b bits of the mapped key user sequence are selected as a key user set mapped by the filial generation candidate sequence, a recommendation process is completed by using a collaborative filtering recommendation algorithm based on key users, and a prediction score of a verification set user is obtained;

(7i) according to the average error calculation formula, updating the second parent candidate sequence p on the selected child candidate sequence by using the obtained prediction score of the verification set user_bExtracting the average absolute error of the task by the key user corresponding to the capability factor; performing step (7 l);

(7j) taking the extraction scale of a key user extraction task corresponding to the capacity factor of the only parent candidate sequence on the selected child candidate sequence as a boundary value c, selecting the front c bit of the key user sequence obtained by mapping as a key user set obtained by mapping the child candidate sequence, and completing a recommendation process by utilizing a key user-based collaborative filtering recommendation algorithm to obtain a prediction score of a verification set user;

(7k) updating the average absolute error of the key user extraction task corresponding to the capability factor of the only parent candidate sequence on the selected child candidate sequence by using the obtained prediction score of the verification set user according to an average error calculation formula;

(7l) updating the average absolute error of all the tasks which are not updated on the selected child candidate sequence to be 100000;

(7m) judging whether the number of the selected filial generation candidate sequences reaches N, if so, executing the step (8), otherwise, executing the step (7 a).

5. The method for extracting multi-scale key users based on multi-factor genetic algorithm according to claim 3, wherein the uniform crossover operation in step (6e) comprises the following steps:

(6e1) randomly generating a q-dimensional cross indication vector, wherein each dimension of the cross indication vector randomly takes a value of 0 or 1, and q is the length of a candidate sequence;

(6e2) randomly selecting one-dimensional unselected cross indication vector components from the cross indication vectors;

(6e3) judging whether the selected cross indication vector component is 1, if so, determining the first filial generation candidate sequence c_aIs updated as the first parent candidate sequence p_aCorresponding to the value of the dimension, the second filial generation candidate sequence c_bIs updated as the second parent candidate sequence p_bThe value of the corresponding dimension, otherwise, the first descendant candidate sequence c_aIs updated as the second parent candidate sequence p_bCorresponding to the value of the dimension, the second filial generation candidate sequence c_bIs updated as the first parent candidate sequence p_aThe value of the corresponding dimension;

(6e4) and (4) judging whether all the cross indication vector components are taken, if so, finishing the uniform cross operation of the two selected parent candidate sequences, and otherwise, executing the step (6e 2).

6. The method for extracting multi-scale key users based on multi-factor genetic algorithm as claimed in claim 3, wherein the basic mutation operation in step (6f) comprises the following steps:

(6f1) randomly generating a random integer z between [1, q-1], and taking z as a mutation position;

(6f2) randomly generating a random number w between (0,1), and updating the z-th bit of the selected parent candidate sequence to w.