CN114722183A

CN114722183A - Knowledge pushing method and system for scientific research tasks

Info

Publication number: CN114722183A
Application number: CN202210270393.2A
Authority: CN
Inventors: 王金安; 陈昱旻; 邓建; 何天豪
Original assignee: Beijing Deyan Ruitong Technology Co ltd; Chengdu Aircraft Industrial Group Co Ltd
Current assignee: Beijing Deyan Ruitong Technology Co ltd; Chengdu Aircraft Industrial Group Co Ltd
Priority date: 2022-03-18
Filing date: 2022-03-18
Publication date: 2022-07-08

Abstract

The invention discloses a knowledge pushing method and a knowledge pushing system for scientific research tasks, wherein the tasks are classified firstly, then user portraits are depicted, corresponding portrait labels are set for each user, and specific tasks under a certain classification are subjected to labeling treatment; aiming at specific tasks, a mixed similarity algorithm is adopted to realize knowledge pushing on users; the mixed similarity algorithm comprises a label matching algorithm and a text similarity algorithm, the recommendation results of the label matching algorithm and the text similarity algorithm are combined, the product of the recommendation scores of the label matching algorithm and the text similarity algorithm and the product of the preset weight are added, and the recommendation list to be selected and the recommendation score of the mixed similarity algorithm are obtained correspondingly respectively. According to the method, scientific research tasks are taken as dimensions, knowledge is achieved in a mode of mixing and weighting a label matching algorithm and a text similarity algorithm, knowledge in a knowledge database is accurately pushed to users, and working efficiency is effectively improved.

Description

Knowledge pushing method and system for scientific research tasks

Technical Field

The invention belongs to the technical field of data pushing methods, and particularly relates to a knowledge pushing method and system for scientific research tasks.

Background

In the management work of scientific research projects, a large amount of data, information and knowledge are generated, but due to the relative dispersion of knowledge and the lack of knowledge consistency, a lot of valuable knowledge is isolated for a long time and even buried, the knowledge cannot be effectively utilized, and the value of the knowledge as an important asset cannot be played.

In order to effectively improve the working efficiency, a reasonable and effective knowledge push mechanism needs to be established. Knowledge pushing in scientific research project management work needs to be specific to tasks, however, most pushing methods are based on user figures, and a knowledge pushing method specific to the tasks is not found yet.

The user portrait is that user-related information such as professional background, cultural degree, knowledge acquisition mode, interest preference and the like of a user is acquired, modeling identification is carried out on the basis of the user portrait, a specific label is made for the user, the user with certain attributes and characteristics is classified by analyzing the user label, the user data is compared, classified and classified, the attribute label of the user is constructed in a multi-dimensional mode, the user is ranked according to importance, and important, core, key and large-scale users are highlighted to form different characteristic user groups.

The task refers to an inherent flow formed by standardizing and abstracting a certain type of concrete service scene in the management process.

The prior art with chinese patent publication No. CN111191122A and publication No. 5/22/2020 discloses a learning resource recommendation system based on user portrait. The method is characterized in that an individual label library is established, and personalized recommendation is realized through individual and group portraits to meet individual requirements. The method has the defects that the pushing is realized by completely utilizing a label matching algorithm, the algorithm is single, and the accurate pushing of the scientific research task knowledge cannot be realized.

The Chinese patent publication No. CN106776503B, the prior art of which publication date is 2017, 5 and 31, discloses a text semantic similarity determination method and a text semantic similarity determination device. The method is characterized in that a preset label theme LDA model is obtained through training according to a training sample and a preset theme label of the training sample, a text theme label vector is obtained through calculation of the LDA model, and then a similarity value is obtained through a text similarity calculation method on the basis of the processing result.

Disclosure of Invention

The invention aims to provide a knowledge pushing method and a knowledge pushing system for scientific research tasks, and aims to solve the problems.

The invention is mainly realized by the following technical scheme:

a knowledge pushing method facing scientific research tasks comprises the steps of classifying tasks, then classifying according to different tasks, describing user portrait, setting corresponding portrait labels for each user, and performing labeling processing on specific tasks under a certain classification; aiming at specific tasks, a mixed similarity algorithm is adopted to realize knowledge pushing for users; the mixed similarity algorithm comprises a label matching algorithm and a text similarity algorithm, the recommendation results of the label matching algorithm and the text similarity algorithm are combined, the product of the recommendation scores of the label matching algorithm and the text similarity algorithm and the product of the preset weight are added, and the recommendation list to be selected and the recommendation score of the mixed similarity algorithm are obtained correspondingly respectively.

To better implement the present invention, further, the formula of the recommendation score calculated by the tag matching algorithm is as follows:

wherein: n (u, i) represents a taskuAnd knowledgeiThe common label is a label which is used by the user,

ω _ukrepresenting tasksuAnd a labelkThe degree of association of (a), i.e. the weight of the tag relative to the task;

r _kipresentation labelkAnd knowledgeiI.e. the weight of the tag relative to knowledge.

In order to better implement the method, the number of matched labels and the consideration of label weight are further added in a label matching algorithm so as to improve the accuracy of a recommendation result, and the label weight is determined according to the sequence of the labels.

In order to better implement the method, further, in a text similarity algorithm, a task is represented by a text, corresponding task text feature vectors and knowledge document feature vectors are generated, and then a knowledge recommendation list and recommendation scores corresponding to each knowledge are obtained through a cosine similarity algorithm.

Gathering all attribute field values of the tasks, task names and text contents described by the tasks, extracting keywords through a TF-IDF algorithm, calculating the occurrence frequency of the keywords, and expressing the keywords by vectors; for any given two space vectors a and B, the remaining chord similarity θ is calculated from the dot product and the vector length, and the formula is as follows:

wherein: a. the_i、B_iRespectively representing the components of the dimensions of the vectors A and B;

and then obtaining a recommendation score corresponding to each knowledge under the text similarity algorithm.

In order to better implement the present invention, further, if the weights of the tag matching algorithm and the text similarity algorithm are 0.6 and 0.4, respectively, the recommendation score of the hybrid similarity algorithm = 0.6 of the recommendation score calculated by the tag matching algorithm + 0.4 of the recommendation score calculated by the text similarity algorithm.

The invention is mainly realized by the following technical scheme:

a knowledge pushing system facing scientific research tasks comprises a task classification module, a user portrait module, a task tagging module and a mixed similarity module, wherein the task classification module is used for preliminarily screening task attributes by using a task classification algorithm so as to describe user portrait and task tagging operation facing the tasks later; the user portrait module is used for depicting the user portrait and setting a corresponding portrait label for each user; and the mixed similarity module is used for combining the recommendation results of the tag matching algorithm and the text similarity algorithm, adding the products of the recommendation scores of the tag matching algorithm and the text similarity algorithm and the preset weight, and respectively and correspondingly obtaining a recommendation list to be selected and a recommendation score of the mixed similarity algorithm.

The invention classifies the tasks, portrays the user portrait after classifying, and pushes the knowledge of the user by using a mixed similarity algorithm for the specific tasks. The mixed similarity algorithm integrates the characteristics of a tag matching algorithm and a text similarity algorithm, combines recommendation results obtained by the two algorithms, and performs weighted calculation on recommendation scores according to preset weights of the two algorithms to obtain a recommendation list and scores to be selected.

And (4) task classification: in the traditional personalized recommendation scheme, the behavior feedback of the user, either explicitly or implicitly, after being collected and processed is converted into the adjustment of the user portrait and the preference tag, so that the future personalized recommendation result of the user is indirectly influenced through a tag matching algorithm. However, in a specific scientific research task, the classification of the task is different, and the role orientation of the user in the task flow is different, so that the user portrait has obvious difference. Therefore, the behavior feedback of the user on the task is converted into the preference label of the user, which is inaccurate, in the process, the key dimension of the task is undoubtedly omitted, and therefore, the task classification algorithm is adopted to distinguish the task types, such as development, production, planning and the like, on the attribute of the task.

User portrait depicting: according to different task classifications, user portrait is depicted, and a series of labels are generated by adopting a corresponding portrait algorithm through analyzing user characteristics (posts, roles and the like), knowledge use preference and historical browsing records. This is because the variable factor of the human task performer needs to be heavily considered during the performance of a particular task. That is, the resulting knowledge recommendations should be different for the same task, performed by different people. For example, a new person should have a more basic and comprehensive recommendation of task knowledge than a more qualified employee. Therefore, it is important to depict the user's portrait.

Hybrid similarity algorithm: and combining recommendation results by using a tag matching algorithm and a text similarity algorithm, and performing weighted calculation on recommendation scores according to weights preset by the two algorithms to obtain a recommendation list to be selected and scores. Wherein, the weight value of the label matching algorithm is 0.6, the weight value of the text similarity algorithm is 0.4, namely: recommendation knowledge score = label matching algorithm score value 0.6+ text similarity algorithm score value 0.4. This is because the tags are set manually and are subjective, and especially for knowledge, uploading by people with different knowledge may select different tags, and therefore, the result obtained by matching the tags is sometimes unreliable and depends on the accuracy and generality of tag selection to a great extent. In order to make up for subjectivity and unreliability brought by label matching, a label matching algorithm and a text similarity algorithm are introduced into the mixed similarity algorithm at the same time. The text similarity algorithm completely depends on text content, so compared with label matching, the recommendation result has the characteristics of objectivity and comprehensiveness. But in terms of recommendation accuracy, training of algorithms and understanding of text semantics by machines are more dependent, and under certain conditions, are not necessarily more accurate than tag matching. Therefore, the hybrid similarity algorithm integrates the characteristics of the label matching algorithm and the text similarity algorithm, and innovatively solves the problem of accuracy of knowledge pushing by adopting a weight mode.

And (3) a label matching algorithm: the labels are characteristic attributes given to the task types and knowledge documents, and the scientific research tasks have very definite labeling characteristics, such as reporting, evaluation, summarization and the like. The scientific research tasks can be conveniently subjected to labeling assignment. And then calculating the matching degree of the tags of the scientific research tasks and the tags of the knowledge documents to realize a knowledge recommendation list and a recommendation score corresponding to each knowledge.

Text similarity algorithm: the text similarity algorithm is based on task description texts and knowledge Document contents, a Chinese word segmentation technology and a TF-IDF (Term Frequency-Inverse Document Frequency) algorithm are adopted to extract task field information, keywords in task description, knowledge Document attribute information and keywords in Document contents, corresponding task text feature vectors and knowledge Document feature vectors are generated, and then a knowledge recommendation list and a recommendation score corresponding to each knowledge are obtained through a cosine similarity algorithm.

The invention has the beneficial effects that:

according to the method, scientific research tasks are taken as dimensions, knowledge recommendation is achieved in a mode of mixing and weighting a label matching algorithm and a text similarity algorithm, knowledge in a knowledge database is accurately pushed to users, and working efficiency is effectively improved.

Drawings

FIG. 1 is a flow chart of the present invention;

FIG. 2 is a schematic diagram of a recommendation score calculation for a tag matching algorithm;

fig. 3 is a schematic diagram of a final recommendation candidate list.

Detailed Description

Example 1:

Further, the weights of the tag matching algorithm and the text similarity algorithm are 0.6 and 0.4, respectively, and then the recommendation score of the hybrid similarity algorithm = 0.6+ 0.4.

Example 2:

in this embodiment, optimization is performed on the basis of embodiment 1, and a formula of a recommendation score calculated by a tag matching algorithm is as follows:

Furthermore, in the tag matching algorithm, the number of matched tags and the consideration of tag weights are added to improve the accuracy of the recommendation result, and the tag weights are determined according to the sequence of the tags.

Other parts of this embodiment are the same as embodiment 1, and thus are not described again.

Example 3:

in the embodiment, optimization is performed on the basis of embodiment 1 or 2, in a text similarity algorithm, a task is represented by a text, corresponding task text feature vectors and knowledge document feature vectors are generated, and then a knowledge recommendation list and a recommendation score corresponding to each knowledge are obtained by a cosine similarity algorithm.

Further, gathering all attribute field values of the tasks, task names and text contents described by the tasks, extracting keywords through a TF-IDF algorithm, calculating the occurrence frequency of the keywords, and expressing the keywords by vectors; for any given two space vectors a and B, the remaining chord similarity θ is calculated from the dot product and the vector length, as follows:

The rest of this embodiment is the same as embodiment 1 or 2, and therefore, the description thereof is omitted.

Example 4:

a knowledge pushing system facing scientific research tasks is shown in figure 1 and comprises a task classification module, a user portrait module, a task tagging module and a mixed similarity module, wherein the task classification module is used for primarily screening task attributes by using a task classification algorithm so as to be convenient for the following task-oriented user portrait depicting and task tagging operation; the user portrait module is used for depicting the user portrait and setting a corresponding portrait label for each user; and the mixed similarity module is used for merging the recommendation results of the label matching algorithm and the text similarity algorithm, adding the products of the recommendation scores of the label matching algorithm and the text similarity algorithm and the preset weight, and respectively and correspondingly obtaining a recommendation list to be selected and a recommendation score of the mixed similarity algorithm.

According to the method, scientific research tasks are used as dimensions, knowledge recommendation is achieved in a mode of mixed weighting of a tag matching algorithm and a text similarity algorithm, knowledge in a knowledge database is accurately pushed to users, and working efficiency is effectively improved.

Example 5:

a knowledge pushing method for scientific research tasks comprises the steps of classifying tasks, depicting a user portrait after classifying the tasks, and pushing the knowledge of the user by using a mixed similarity algorithm for specific tasks. The mixed similarity algorithm integrates the characteristics of a tag matching algorithm and a text similarity algorithm, combines recommendation results obtained by the two algorithms, and performs weighted calculation on recommendation scores according to preset weights of the two algorithms to obtain a recommendation list and scores to be selected.

Further, as shown in fig. 1, after a task is initiated, the task attributes are first classified, for example: "development", "production", "planning", etc.

According to the method, user portraits are depicted and corresponding portrait labels are set for each user by extracting user posts and role attributes and feature labels of participated projects and tasks.

And performing labeling processing on specific tasks under a certain classification, such as 'research and development', '2019 years' and the like.

Further, the hybrid similarity algorithm is matched, and a label matching algorithm and a text similarity algorithm are simultaneously matched.

And in the process of performing the tag matching algorithm, the number of matched tags and the consideration of tag weight are added to improve the accuracy of the recommendation result. The weight of the tags may be determined according to the order of the tags, such as the first tag having a weight of 0.6, the second tag having a weight of 0.55, the third tag having a weight of 0.45, etc.

As shown in fig. 2, assume that there is a task named "2019 annual planning", the matching of the label of the task template with 3 pieces of knowledge and the label weight, and the value on the arrow indicates the weight of the label to the task or knowledge. And obtaining a recommendation score corresponding to each knowledge under the label matching algorithm.

The calculation formula of the label matching algorithm score is as follows:

r _kiindicating labelkAnd knowledgeiI.e. the weight of the tag relative to knowledge.

Further, in the process of text similarity calculation, the theoretical basis is a cosine similarity formula.

For any given two space vectors, A and B, the remaining chord similarity θ is calculated from the dot product and the vector length, as follows:

Cosine similarity measures the similarity between two vectors by measuring their cosine values of their angle. The formula is applicable to any dimension of vector space, and is commonly used for comparison among vectors in a high-dimension space. In the field of text comparison, similarity of two texts in terms of their topics can be calculated using a cosine similarity formula by representing the texts as multi-dimensional keyword vectors, with the value in each dimension corresponding to the frequency of occurrence of a keyword term in the text in the document.

When the algorithm is applied to the recommendation facing to the task, the task is firstly represented by text, namely, each attribute field value of the task, text contents such as task name, task description and the like are gathered together, and then keywords are extracted through the TF-IDF algorithm, the occurrence frequency of the keywords is calculated, and the keywords are represented by vectors. For example, the text vector for task A may be represented as:

task A: (1, 1, 2, 1, 3, 1, 0, … … 1, 0)

Wherein the value of each dimension represents the frequency of occurrence of a keyword, the position of each keyword in the vector is fixed, and the set of all keywords is derived from the corpus provided by the algorithm tool. A dimension value of 0 indicates that the keyword does not appear in the text.

Similarly, a knowledge document can also be represented as a vector B with the same degree of dimension:

knowledge B: (0, 1, 0, 2, 1, 0, 0, … … 1, 1)

Therefore, the correlation between the two can be calculated by the cosine similarity formula (0 < theta < 1). And then obtaining a recommendation score corresponding to each knowledge under the text similarity algorithm.

Further, the label matching algorithm score value is multiplied by the weight value of 0.6, and the text similarity algorithm score value is multiplied by the weight value of 0.4, and the sum is added to obtain the final value of the mixed similarity algorithm.

And sequencing the correlation to obtain a recommended to-be-selected list. As shown in FIG. 3, where ID is the number of knowledge ID and score is the relevance value of knowledge to task.

The above description is only a preferred embodiment of the present invention, and is not intended to limit the present invention in any way, and all simple modifications and equivalent variations of the above embodiments according to the technical spirit of the present invention are included in the scope of the present invention.

Claims

1. A knowledge pushing method facing scientific research tasks is characterized in that the tasks are classified firstly, then user portrait portrayal is conducted according to different task classifications, corresponding portrait labels are set for each user, and specific tasks under a certain classification are subjected to labeling processing; aiming at specific tasks, a mixed similarity algorithm is adopted to realize knowledge pushing for users; the mixed similarity algorithm comprises a label matching algorithm and a text similarity algorithm, the recommendation results of the label matching algorithm and the text similarity algorithm are combined, the product of the recommendation scores of the label matching algorithm and the text similarity algorithm and the product of the preset weight are added, and the recommendation list to be selected and the recommendation score of the mixed similarity algorithm are obtained correspondingly respectively.

2. The knowledge pushing method for scientific research tasks as claimed in claim 1, wherein the formula of the recommendation score calculated by the tag matching algorithm is as follows:

3. The knowledge pushing method oriented to scientific research tasks as claimed in claim 2, wherein the number of matched tags and the consideration of tag weights are added to the tag matching algorithm to improve the accuracy of the recommendation result, and the tag weights are determined according to the order of the tags.

4. The knowledge pushing method oriented to scientific research tasks as claimed in claim 1, wherein in a text similarity algorithm, the tasks are represented by texts, corresponding task text feature vectors and knowledge document feature vectors are generated, and then a knowledge recommendation list and recommendation scores corresponding to each knowledge are obtained through a cosine similarity algorithm.

5. The knowledge pushing method oriented to scientific research tasks as claimed in claim 4, wherein the attribute field values of the tasks, the task names and the text contents of the task descriptions are collected together, then the keywords are extracted through the TF-IDF algorithm, the occurrence frequency of the keywords is calculated, and the keywords are expressed by vectors; for any given two space vectors a and B, the remaining chord similarity θ is calculated from the dot product and the vector length, and the formula is as follows:

6. The knowledge pushing method for scientific research tasks according to any one of claims 1 to 5, wherein the weights of the label matching algorithm and the text similarity algorithm are 0.6 and 0.4, respectively, and then the recommendation score of the hybrid similarity algorithm = 0.6+ 0.4.

7. A knowledge pushing system facing scientific research tasks is characterized by comprising a task classification module, a user portrait module, a task tagging module and a mixed similarity module, wherein the task classification module is used for primarily screening task attributes by using a task classification algorithm so as to describe user portrait and task tagging operation facing the tasks in the following; the user portrait module is used for depicting the user portrait and setting a corresponding portrait label for each user; and the mixed similarity module is used for merging the recommendation results of the label matching algorithm and the text similarity algorithm, adding the products of the recommendation scores of the label matching algorithm and the text similarity algorithm and the preset weight, and respectively and correspondingly obtaining a recommendation list to be selected and a recommendation score of the mixed similarity algorithm.