CN112487302B

CN112487302B - File resource accurate pushing method based on user behaviors

Info

Publication number: CN112487302B
Application number: CN202011219336.9A
Authority: CN
Inventors: 王啸峰; 颜庆国; 朱进; 陈健; 王永梅; 陈莉; 吴建周; 张颖; 孙平; 乔勇; 胡文燕; 史海文; 刘成
Original assignee: State Grid Jiangsu Electric Power Co Ltd
Current assignee: State Grid Jiangsu Electric Power Co Ltd
Priority date: 2020-11-04
Filing date: 2020-11-04
Publication date: 2022-11-11
Anticipated expiration: 2040-11-04
Also published as: CN112487302A

Abstract

The invention provides a user behavior-based archive resource accurate pushing method, which comprises a server, wherein the server comprises a user behavior library, the user behavior library comprises user behavior information, and the user behavior information comprises a user label and a user type; the archive repository comprises archive data, and the archive data comprises archive labels and archive types; the specific operation steps are as follows: acquiring a user label and a file label; obtaining the weight of a user label and the weight of an archive label; weighting and scoring the archives according to the weights of the user labels and the weights of the archive labels to obtain a score of an archive j relative to the user i; calculating the user similarity according to the weight of the user label and the weight of the archive label; the invention improves the retrieval utilization efficiency of the files.

Description

File resource accurate pushing method based on user behaviors

Technical Field

The invention relates to the field of pushing of archive resources of user behaviors, in particular to an archive resource accurate pushing method based on user behaviors.

Background

The semantic web provides a tool for intelligent utilization of information resources: the discoverability of the information is improved, the complex search is realized, and a novel network browsing mode is realized. When a user queries by using network search, some key information vocabularies are usually thought of firstly, but actually the required requirements are often complex, and the mastered knowledge is also multi-aspect and multi-angle. When the world trade organization is input in a search engine, related information of China joining the world trade organization may be understood, but the result of general search may be filtered numerous times and returned without work; this is because the computer cannot know the organization condition, main function, agreement and purpose of the world trade organization, but the semantic information can make the program distinguish the elements in different web pages more easily, understand the fact that "China joins the detailed process of the world trade organization", and can combine them together. The semantic information can not only complete the retrieval more accurately, but also automatically process complex processes. In the archive management system, because of huge data and low utilization rate, a method for accurately and optimally recommending clients through semantic analysis and collection is needed.

In the prior art, information interaction between an NLP natural language and archive professional data is not performed, current natural language identification is based on an NLP natural language processing technology and is generally based on a matching process executed after training of an existing database, and related technologies such as archive-based text entity extraction, text classification, key phrase extraction, short text matching, relationship extraction, intelligent voice interaction, character recognition, text similarity algorithm and the like cannot completely realize accurate identification and pushing between archives and users, and cannot realize accurate pushing of associated archives of target archives to users in combination with archive professional data and information.

Disclosure of Invention

In order to solve the above problems, the present invention provides a method for accurately pushing archive resources based on user behavior, wherein the method comprises a server, the server comprises a user behavior library, the user behavior library comprises user behavior information, and the user behavior information comprises a user tag and a user type; the archive comprises archive data, and the archive data comprises archive labels and archive types; the specific operation steps are as follows:

step S100, obtaining a user label U = [ UX = [) ₁ ,UX ₂ ,…,UX _m ]And archive label D = [ DX ₁ ,DX ₂ ,…,DX _n ](ii) a Wherein, UX _i An ith user tag for the user; DX _j A jth profile tag for said profile;

step S200, obtaining user label UX _i Weight Uw (UX) of _i ) And archive label DX _j Weight Dw (DX) _j )；

Step S300, UX according to user label _i Weight Uw (UX) of _i ) And archive label DX _j Weight Dw (DX) _j ) Carrying out weighted scoring on the archives to obtain a score of an archive j relative to the user i, and recommending the high-grade archive to the user;

step S400, according to the user label UX _i Weight Uw (UX) of _i ) And archive label DX _j Weight Dw (DX) _j ) And calculating the similarity of the users, and judging the users similar to the users.

Wherein the user tag UX _i Weight Uw (UX) of _i ) The acquisition method comprises the following steps:

obtaining the user label UX in the user behavior library _i The term frequency TF and the inverse document frequency IDF; wherein, the word frequency TF is:

n _i is the number of times the word appears throughout, Σ _p n _p,i Is the sum of the number of occurrences of all words;

the inverse document frequency IDF is:

where N is the total number of users in the current user type, N' is the total number of profiles in other user types, d _i Is the total number of users having the label in the current user type, d _i ' is at otherThe total number of users in the user type that contain the tag.

The user label UX _i Weight Uw (UX) of _i )＝tf _i ×Idf _i 。

Wherein, in step S200, the archive label DX _j Weight Dw (DX) _j ) The calculation method comprises the following steps:

obtaining the file label DX in the file library _j The word frequency TF and the inverse document frequency IDF; wherein, the word frequency TF is:

wherein, t _y Is the number of times the word appears in the file title, p _y Is the number of times the word appears in the first segment of the file, n _y Is the number of times the word appears in the file text, Σ _k n _k,y Is the sum of the times of all the vocabulary in the file;

the inverse document frequency IDF is:

wherein N is the total number of files of the current file type, N' is the total number of files of other file types, d _j Is the total number of files containing the word in the current file type, d _j ' is the total number of files that contain the term in other file types.

The file label DX _j Weight Dw (DX) _j )＝tf _j ×Idf _j 。

Wherein, in step S300, the score F of the file j relative to the user i is

The method for pushing archive resources accurately based on user behaviors as described above, wherein in step S400, the similarity is determined

The invention has the following beneficial effects:

the corresponding database is established through actions such as retrieval and acquisition of the user in the archive library and autonomous learning and training are carried out, and meanwhile, the archive types preferred by the user are grouped and matched based on natural semantic recognition, so that accurate archive pushing is provided for the user, the archive data querying speed and experience of the user are improved, and the user can be better assisted to accurately acquire the relevant archives preferred by the user.

Detailed Description

The technical solutions of the present application are clearly and completely described below in conjunction with the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, of the embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The application of the invention comprises a user behavior-based file pushing server, wherein the server comprises a user behavior library, the user behavior library comprises user behavior information, and the user behavior information comprises a user label and a user type; the archive repository comprises archive data, and the archive data comprises archive labels and archive types;

the server further comprises a processor and a non-transitory computer readable storage medium storing a computer program which, when executed by the processor, implements the following user behavior based archive resource precision pushing method;

step S100, obtaining a user label U = [ UX = [) ₁ ,UX ₂ ,…,UX _m ]And archive label D = [ DX ] ₁ ,DX ₂ ,…,DX _n ](ii) a Wherein, UX _i An ith user tag for the user; DX _j A jth profile tag for said profile;

step S200, obtaining a user label UX _i Weight Uw (UX) of _i ) (ii) a File label DX _j Weight Dw (DX) _j )；

Step S300, UX according to user label _i Weight Uw (UX) of _i ) And archive label DX _j Weight Dw (DX) _j ) Carrying out weighted scoring on the files, and recommending the high-grade files to the user;

and S400, calculating the similarity of the users through the user tags, and judging the users similar to the users.

In some embodiments, the user tag UX _i Weight Uw (UX) of _i ) The calculating method comprises the following steps:

n _i is the number of times the word appears throughout, Σ _p n _p,i Is the sum of the times of occurrence of all the words;

the inverse document frequency IDF is:

wherein N is the total number of users in the current user type, N' is the total number of profiles in other user types, d _i Is the total number of users having the tag in the current user type, d _i ' is the total number of users that contain the tag in other user types.

The user label UX _i Weight Uw (UX) of _i )＝tf _i ×Idf _i 。

Example 1:

the user system comprises two user types of an archiver and a common user, wherein the type of the user A is the archiver, and the occurrence frequency of the user label of the user A is as follows: [ Jiangsu, file, 1, document, 1, drawing, 1].

The number of users and the number of users having "Jiangsu" user tags are shown in Table 1:

TABLE 1

Calculating the weights of two user labels of 'Jiangsu' and 'Bureau of archives', and calculating according to a formula to obtain the following results:

Jiangsu

Jiangsu

weight uw of "Jiangsu" user tag _Jiangsu ＝TF×IDF＝9.5228787×0.6197887＝5.9021726；

Calculating the weight uw of the label of the 'filing bureau' according to the same method _{File office} ＝5.7155976；

It can be seen that the "Jiangsu" label is more important to user A than the "Bureau of archives" label.

Example 2:

in other embodiments, the weights are automatically set according to an algorithm, and the weights may change for multiple readings or changes to preferences. The user has a behavior library, the file type, the keywords and the file label of the user are stored, the weight is calculated according to the behavior library, after multiple searches, the behavior library is changed, and the weight is changed along with the change; there is another filtering library to remove files that the user dislikes or has read when recommending.

As shown in table 2, a new user performs the following operations, the behavior library changes accordingly, and the label weight also changes accordingly:

TABLE 2

In some embodiments, the profile tag DX _j Weight Dw (DX) _j ) The calculation method comprises the following steps:

wherein, t _y Is the number of times the word appears in the file title, p _y Is the number of times the word appears in the first segment of the file, n _y Is the number of times the word appears in the file text, Σ _k n _k,y Is the sum of the times of all the words in the file;

the inverse document frequency IDF is:

The file label DX _j Weight Dw (DX) _j )＝tf _j ×Idf _j ；

Example 3:

set up the archives type and be the archives and can count the archives, wherein, archives A's archives type is the archives, and archives A's word distribution is shown as table 3:

TABLE 3

The total number of files of the document file type is 50, wherein the number of files containing 'research' is 10, and the number of files containing 'work' is 30; accounting file type file total number is 60, wherein, the number containing 'research' is 5, and the number containing 'work' is 20;

calculating the weight of the document labels 'research' and 'work':

research and study

Research and study on

Investigation of the weights dw _{Research and study} ＝TF×IDF＝16.8628661；

Calculating the weight dw of the "work" according to the same method _{Work by} ＝6.6285913；

From the above results, it can be intuitively obtained that the "research" label is more important than the "work" label for archive a.

In some actual ways, by obtaining the weights of the profile tags and the user behavior tags, the profiles can be weighted and scored, and high-level profiles are recommended to the user, and the method comprises the following specific steps:

the score F of profile j relative to user i is

Example 4:

setting the first file label weight as: [ Jiangsu 0.5, nanjing 0.1, history 0.9];

the second profile tag weight is: [ Jiangsu 0.9, nanjing 0.1, history 0.1];

the user tag weights are: [ Jiangsu 0.11, nanjing 0.12, history 0.2];

the scores of the first profile and the second profile are:

F ₁ ＝0.5x0.11+0.1x0.12+0.9x0.2＝0.242；

F ₂ ＝0.9x0.11+0.1x0.12+0.1x0.2＝0.131；

it can be seen that the first profile scores higher because the user is more inclined to the history-related profile.

And after the calculation is finished, sorting to obtain the files with high scores, removing the files which are disliked and read by the user, adding the files obtained according to the collaborative filtering to form a final file list, and recommending the final file list to the user.

In other factual manners, a collaborative filtering recommendation algorithm is further included, the collaborative filtering recommendation uses a similarity algorithm to calculate the similarity of the users, and the preference profiles of the similar users are recommended. The specific implementation process is as follows: the current user opens a file, the system acquires other users who have opened the file, calculates the similarity between the current user and other users, acquires one or more users with high similarity, and recommends the favorite file of the user with high similarity to the current user.

The similarity calculation algorithm is a cosine similarity calculation method, and the calculation formula is as follows: degree of similarity

Example 5:

setting that there are now 3 users x, y, z, the similarity between user x and two other users needs to be calculated:

wherein, the weight of each user label of x is [ Jiangsu 0.147, filing bureau 0.095, document file 0.1];

y is [ Jiangsu 0.177, bureau of archives 0.105, paperwork archives 0.155];

z is [ Jiangsu 0.09, filing bureau 0.175, document file 0.032];

similarity of x and y

Similarity of x and z

Higher similarity values for x and y can be obtained.

By the implementation mode, the real-time monitoring, comprehensive and real perception of the user state, the user demand and the management can be realized. By utilizing a big data analysis technology, the method obtains deep insight of user requirements from mass data, fuses, analyzes and processes the perception data, integrates with a service system and makes active response, and can accurately recommend different types of clients and clients with similar characteristics.

While the preferred embodiments of the present application have been described, additional variations and modifications of those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention. It will be apparent to those skilled in the art that various changes and modifications can be made in the present application without departing from the spirit and scope of the application. Thus, if such modifications and variations of the present application fall within the scope of the claims of the present application and their equivalents, the present application is also intended to include such modifications and variations.

Claims

1. A method for accurately pushing archive resources based on user behaviors comprises a server, wherein the server comprises a user behavior library and an archive library, and the user behavior library comprises user labels and user types; the archive comprises archive labels and archive types; the specific operation steps are as follows:

step S100, obtaining the user label U = [ UX ] of the user ₁ ,UX ₂ ,…,UX _m ]And archive label D = [ DX ₁ ,DX ₂ ,…,DX _n ](ii) a Wherein, UX _i An ith user tag for the user; DX _j A jth profile tag for said profile;

Step S300, according to the user label UX _i Weight Uw (UX) of _i ) And archive label DX _j Weight Dw (DX) _j ) Carrying out weighted scoring on the archives to obtain a score of an archive j relative to the user i, and recommending the high-grade archive to the user;

step S400, UX according to user label _i Weight Uw (UX) of _i ) And archive label DX _j Weight Dw (DX) _j ) Calculating user similarity, and judging users similar to the users;

wherein, in step S200, the user tag UX _i Weight Uw (UX) of _i ) The acquisition method comprises the following steps:

obtaining the user label UX in the user behavior library _i The word frequency TF and the inverse document frequency IDF; wherein, the word frequency TF is:

n _i is the number of times the word appears throughout, Σ _m n _m,i Is the sum of the times of occurrence of all the words;

the inverse document frequency IDF is:

wherein N is the total number of users in the current user type, N' is the total number of profiles in other user types, d _i Is the total number of users having the label in the current user type, d _i ' is the total number of users that contain the tag in other user types;

the user label UX _i Weight Uw (UX) of _i )＝tf _i ×Idf _i ；

obtaining the file label DX in the file library _j The term frequency TF and the inverse document frequency IDF; wherein, the word frequency TF is:

wherein, t _j Is the number of times the word appears in the file title, p _j Is the number of times the word appears in the first segment of the file, n _j Is the number of times the word appears in the file text, Σ _n n _n,j Is the sum of the times of all the vocabulary in the file;

the inverse document frequency IDF is:

wherein N is the total number of files of the current file type, N' is the total number of files of other file types, d _j Is the total number of files containing the word in the current file type, d _j ' is the total number of files that contain the term in other file types;

the file label DX _j Weight Dw (DX) _j )＝tf _j ×Idf _j 。

2. The method for pushing archive resource precisely based on user behavior as claimed in claim 1, wherein in step S300, the score F of archive j relative to user i is

3. The method for pushing archive resource precisely based on user behavior as claimed in claim 1, wherein in step S400, similarity degree