CN113641791A

CN113641791A - Expert recommendation method, electronic device and storage medium

Info

Publication number: CN113641791A
Application number: CN202110925509.7A
Authority: CN
Inventors: 李涵
Original assignee: Zhuo Erzhi Lian Wuhan Research Institute Co Ltd
Current assignee: Zhuo Erzhi Lian Wuhan Research Institute Co Ltd
Priority date: 2021-08-12
Filing date: 2021-08-12
Publication date: 2021-11-12

Abstract

The application relates to the field of data processing, and provides an expert recommendation method, electronic equipment and a storage medium, wherein the expert recommendation method comprises the following steps: acquiring problem data and determining a problem vector corresponding to the problem data; constructing a user tag heterogeneous network according to the historical answer records of the target user; obtaining a user vector according to the user label heterogeneous network; performing model training according to the problem vector and the user vector to obtain a trained predictive neural model; inputting a target problem into the prediction neural model to obtain the similarity probability of the target user and the target problem; and generating recommendation information according to the similarity probability. The method provided by the application can automatically determine the domain expert corresponding to the target problem in the target user based on the prediction neural model, and improves the accuracy of recommending the domain expert.

Description

Expert recommendation method, electronic device and storage medium

Technical Field

The present application relates to the field of data analysis, and in particular, to an expert recommendation method, an electronic device, and a storage medium.

Background

With the development of internet technology, information sharing, transmission and acquisition through social networking software have become one of the main social ways of vast netizens. The netizens can ask questions on the network, such as a question and answer platform, and wait for the experts to answer the questions. In the prior art, generally, whether a user is an authoritative expert in a certain field is judged through authentication contents filled by the user. In practical application, the number of users who fill in the authentication content is not large, and the authentication content filled in by the users only relates to partial fields, so that the accuracy is low, the coverage rate of the fields of the field expert discovery method is low, and the accuracy of the recommended field experts is not high.

Disclosure of Invention

In view of the above, the present disclosure is directed to providing an expert recommendation method, an electronic device, and a computer storage medium, and aims to improve accuracy of experts in a recommendation field.

A first aspect of the present application provides an expert recommendation method, including:

acquiring problem data and determining a problem vector corresponding to the problem data;

constructing a user tag heterogeneous network according to the historical answer records of the target user;

obtaining a user vector according to the user label heterogeneous network;

performing model training according to the problem vector and the user vector to obtain a trained predictive neural model;

inputting a target problem into the prediction neural model to obtain the similarity probability of the target user and the target problem;

and generating recommendation information according to the similarity probability.

According to an optional embodiment of the present application, the determining the problem vector corresponding to the problem data includes:

performing data cleaning on the problem data;

and performing data conversion on the problem data after data cleaning to obtain a problem vector corresponding to the problem data.

According to an alternative embodiment of the application, the data cleansing comprises removing tags, filtering stop words, clearing code segments.

According to an optional embodiment of the present application, the user tag heterogeneous network includes a node set and an edge set, where the node set includes a user set and a tag set, and the edge set includes a relationship between a user and a tag and a relationship between a tag and a tag.

According to an optional embodiment of the present application, the obtaining a user vector according to the user tag heterogeneous network includes:

and learning the user label heterogeneous network based on the LINE model to obtain a user vector.

According to an optional embodiment of the present application, the performing model training according to the problem vector and the user vector to obtain a trained predictive neural model includes:

inputting the problem vector and the user vector into a predictive neural model to be trained to obtain problem features and user features;

calculating the similarity probability of the problem feature and the user feature;

determining a loss function of the predictive neural model according to the similarity probability;

and adjusting the model parameters of the prediction neural model according to the loss function to obtain the trained prediction neural model.

According to an alternative embodiment of the present application, said calculating the likelihood of the problem feature and the user feature comprises:

calculating cosine similarity of the problem and the user characteristics by using a cosine function;

and processing the cosine similarity by using a softmax function to obtain a similarity probability.

According to an optional embodiment of the present application, the generating recommendation information according to the similarity probability includes:

and if a plurality of target users exist, sequencing the target users according to the similarity probability of each target user and the target problem, and generating an expert recommendation table.

A second aspect of the present application provides an electronic device, comprising:

a memory to store at least one instruction;

a processor configured to implement the expert recommendation method as described above when executing the at least one instruction.

A third aspect of the present application provides a computer-readable storage medium having stored therein at least one instruction which, when executed by a processor, implements the expert recommendation method as described above.

According to the technical scheme, the problem vector corresponding to the problem data is determined by acquiring the problem data; constructing a user tag heterogeneous network according to the historical answer records of target users, wherein the target users are users corresponding to the question data; then, according to the user label heterogeneous network, a user vector is obtained; then, performing model training according to the problem vector and the user vector to obtain a trained predictive neural model; finally, inputting the target problem into the prediction neural model to obtain the similarity probability of the target user and the target problem; and generating recommendation information according to the similarity probability, automatically determining a domain expert corresponding to the target problem in the target user based on a predictive neural model, helping a questioner to find the expert corresponding to the solved problem, avoiding the problem of decreased liveness of a question-answering platform caused by unmanned answering of a large number of problems, and improving the accuracy of recommending the domain expert.

Drawings

Fig. 1 is a schematic flow chart of an expert recommendation method provided in an embodiment of the present application;

fig. 2 is a schematic view of a scenario of a user tag heterogeneous network according to an embodiment of the present disclosure;

FIG. 3 is a schematic structural diagram of a predictive neural model provided in an embodiment of the present application;

FIG. 4 is a schematic flow chart illustrating inputting a target problem into the predictive neural model according to an embodiment of the present application;

fig. 5 is a schematic block diagram of a structure of an electronic device according to an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The schematic flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.

The embodiment of the application provides an expert recommendation method, electronic equipment and a computer-readable storage medium.

Some embodiments of the present application will be described in detail below with reference to the accompanying drawings. The embodiments described below and the features of the embodiments can be combined with each other without conflict.

Referring to fig. 1, fig. 1 is a schematic flowchart of an expert recommendation method according to an embodiment of the present application. As shown in fig. 1, the expert recommendation method specifically includes steps S11 to S16, and the order of the steps in the schematic flowchart may be changed or some steps may be omitted according to different requirements.

And step S11, acquiring the problem data and determining the problem vector corresponding to the problem data.

The question data is data related to a question posed by a user. The question data may include three elements, a question header, a question label, and a question body. For example, the problem data on the target platform may be obtained by obtaining data in an Extensible Markup Language (XML) format on the target platform.

If expert recommendation is performed on a problem proposed by a user in a certain platform, historical problem data on the platform can be obtained, wherein the historical problem data is related to the problem proposed by the user on the platform before. A time interval may be preset and historical problem data may be obtained for the time interval on the platform. For example, the time interval may be set to two years, and historical problem data on the platform from the current time point in the last two years is acquired. Through the preset time interval, the situation that the user who answers the question is not answering the question on the platform at present because the time for submitting the obtained question is far away from the current time can be avoided, invalid experts are prevented from being recommended to the user, and the accuracy of expert recommendation is improved.

In some embodiments of the present application, determining the problem vector to which the problem data corresponds includes:

performing data cleaning on the problem data;

And performing data cleaning on the problem data, providing key problem data in the problem data, and performing data conversion on the key problem data to obtain a problem vector.

The problem data after data cleaning can be converted into a vector format from a word sequence format by using a word vector tool, so that a problem vector corresponding to the problem data is obtained. For example, if the question data is english data, the format conversion can be performed by using the open source Word vector tool Word2vec3 of google; if the problem data is Chinese data, the open source Word vector tools Word2vec3 and glove tools can be used.

In some embodiments of the present application, the data cleansing includes removing tags, filtering stop words, clearing code segments.

For example, hypertext Markup Language (HTML) tags in the issue data may be removed.

Stop words, which may include articles, prepositions, adverbs, conjunctions, etc. in English words, such as a, the, or, etc.; can include the words of "in", "inside", "also", "of", "it", "is", etc. in Chinese characters. For example, the stop word filtering may be performed on the question data using a preset stop word table. The stop vocabulary may include a vocabulary of 891 English stop words.

Because the acquired problem data in the XML format contains more meaningless code segments, the code segments in the problem data are removed. The code segment format to be removed can be preset, and the problem data is cleared according to the code segment format. For example, "< code >" and "</code >" are determined as code segment formats that need to be removed, and code segments enclosed by "< code >" and "</code >" in the problem data are removed.

In some embodiments of the present application, the data cleansing may further include stemming if the question data is in english format. Stemming is the process of removing affixes to obtain roots. For example, the word "shaped" is "shaped" after extracting the stem. Through the word stem extraction of the problem data, a word sequence which only consists of word stems and contains important words is obtained, so that the efficiency of training a prediction neural model is improved.

By cleaning the problem data, useless problem data can be removed, only the problem data containing useful information is reserved, and the efficiency of training the prediction neural model is improved.

And step S12, constructing the user label heterogeneous network according to the historical answer records of the target user.

The target user is a user who has answered the question on the target platform. The target platform is a platform which needs expert recommendation at present.

The historical answer records and the question data have corresponding relations, the question data comprise a plurality of questions, and each question at least comprises one corresponding historical answer record.

The user label heterogeneous network can comprise a node set and an edge set, wherein the node set comprises a user set and a label set, and the edge set comprises a relationship between a user and a label and a relationship between a label and a label.

For example, the question data includes a plurality of questions, the pluralityEach question constitutes a question set Q, Q ═ Q₁，q₂，...，q_lAnd the target users form a user set U, and the U is { U ═ U }₁，u₂，...，u_mMultiple questions correspond to multiple question labels, which form a label set T, where T is { T ═ T }₁，t₂，...，t_n}. Based on the user set U and the label set T, a user label heterogeneous network can be constructed, wherein the network is an undirected graph and can reflect the relation between users and labels in a network layer.

If the heterogeneous network is set as G ═ V, E, where the node set V includes two types of nodes, i.e., a user set U and a label set T, and the edge set E is composed of relationships between users and labels and relationships between labels and labels.

The user's relationship to the tags may be expressed in terms of the number of times each tag appears in all questions answered by a user. For example, tag a appears 6 times in all questions answered by the user, and the relationship of the user to tag a is represented by 6.

The tag-to-tag relationship may be expressed in terms of the number of times two tags appear in a question at the same time. For example, the number of times that the label a and the label B appear in the problem set is determined, and if the number of times that the label a and the label B appear in one problem at the same time is 8, the relationship between the label a and the label B is represented by 8.

As shown in fig. 2, fig. 2 is a schematic view of a scenario of a user tag heterogeneous network according to an embodiment of the present application, where there are three users (Alice, Bob, May), where the questions answered by the three users include four question tags (Windows, Linux, Bash, Mac).

The relationship between the user Alice and the question label Windows is 15, that is, the user Alice answers 15 questions with the question labels Windows; the relation between the user Alice and the question label Linux is 27, namely the user Alice answers the 27 questions with the question labels Linux; the relationship between the user Bob and the question label Windows is 3, that is, the user Bob answers 3 questions with the question labels Windows; the relationship between the user Bob and the question label Mac is 7, that is, the user Bob answers 7 questions with the question label Mac; the relationship between the user May and the question label Linux is 30, namely the user May answers 30 questions with the question labels Linux; the relationship between the user May and the question label Mac is 11, that is, the user May answers 11 questions with the question label Mac.

The relation between the problem label Windows and the problem label Bash is 4, namely the frequency of the problem label Windows and the problem label Bash appearing in a problem at the same time is 4; the relation between the problem label Bash and the problem label Mac is 8, namely the number of times that the problem label Bash and the problem label Mac appear in a problem at the same time is 8; the relation between the problem label Bash and the problem label Linux is 9, namely the frequency of the problem label Bash and the problem label Linux appearing in one problem simultaneously is 9; the relationship between the problem tag Mac and the problem tag Linux is 18, that is, the number of times the problem tag Mac and the problem tag Linux appear in one problem at the same time is 18.

The user answers a question to which a question tag is attached, the question tag being associated with the user. By constructing the user label heterogeneous network and utilizing the sparse labels, the user characteristics can be better read, so that the efficiency of training the predictive neural model is improved.

And step S13, obtaining a user vector according to the user label heterogeneous network.

The network embedding method can be used for learning the user label heterogeneous network to obtain the vector representation of the user, namely the user vector.

Illustratively, the user tag heterogeneous Network can be learned based on a Large-scale Network coding (LINE) model, and a user vector is obtained based on first-order similarity and second-order similarity of nodes in the user tag heterogeneous Network.

And step S14, performing model training according to the problem vector and the user vector to obtain a trained predictive neural model.

For example, a predictive Neural model is constructed in advance, and the predictive Neural model may include two full-connected Neural Networks (DNNs) with the same structure but different parameters, where each DNN may include two hidden layers, for example, each hidden layer includes 300 neurons, an output layer includes 128 neurons, and the feature dimension of the output is 128. One DNN has model input of problem vectors and output of problem features, and the other DNN has model input of user vectors and output of user features.

Fig. 3 is a schematic structural diagram of a predictive neural model according to an embodiment of the present disclosure. The prediction neural model comprises two CNN models with the same structure but different parameters, wherein each DNN can comprise two hidden layers, each hidden layer comprises 300 neurons, and an output layer comprises 128 neurons. The CNN output with the user vector as input is the user feature of feature dimension 128; the CNN output with the problem vector as input is the feature location of the feature dimension 128. In fig. 3, a scenario is shown in which three question vectors (question vector Q1, question vector Q2, question vector Q3) are input into the CNN.

The prediction neural model also comprises a cosine similarity calculation module and a similarity calculation module, wherein the cosine similarity calculation module takes the outputs of the two CNNs as inputs, namely, takes the user characteristics and the problem characteristics as inputs and outputs a cosine similarity value; the similarity calculation module takes the output of the cosine similarity calculation module as input, namely takes the cosine similarity value as input and outputs the similarity value.

In some embodiments of the present application, performing model training according to the problem vector and the user vector, and obtaining a trained predictive neural model includes:

In some embodiments of the present application, calculating the likelihood of the problem feature and the user feature comprises:

The cosine function formula may be:

wherein, y_URepresenting the output of the user vector after input of CNN; y is_QRepresenting the output of the problem vector after input CNN.

And converting the cosine similarity into a value of 0-1 by using a softmax function to obtain the similarity probability.

The softmax function may be:

in the question-and-answer community, a questioner presents a new question and waits for other users to answer. The questioner may take one of the answers and set it as the "best answer". K represents the total number of questions that the user U has made the best answer, and r is the number of questions that are randomly drawn from the question set that are not answered by the user.

In some embodiments, the accuracy of the model is improved by defining a loss function as follows:

where K represents the total number of questions the user U answered best.

In the training process, when the cosine similarity between the question features with the best answers and the user features is the maximum and the cosine similarity between the non-best answer question features and the user features is the minimum, the Loss is the minimum. If a user answers many questions, a group of data can be split into multiple groups of data, where K is 10, so as to facilitate the training of the neural network.

And step S15, inputting the target problem into the prediction neural model to obtain the similarity probability of the target user and the target problem.

Wherein inputting the target problem into the predictive neural model comprises: determining a question vector corresponding to a target question input by a questioner; determining user vectors corresponding to a plurality of target users who have answered the questions in the platform; and inputting the problem vector and the user vector into a prediction neural model to obtain the similarity probability of each target user and the target problem. For example, as shown in fig. 4, a user tag heterogeneous network corresponding to each target user is determined, a user vector corresponding to each target user is determined based on the user tag heterogeneous network and a LINE model, a question vector corresponding to a question text is determined according to the question text input by a questioner, and the user vector and the question vector are input into a predictive neural model, so that the similarity probability between each target user and the target question is obtained.

And step S16, generating recommendation information according to the similarity probability.

In some embodiments of the present application, the generating recommendation information according to the similarity probability includes: and if a plurality of target users exist, sequencing the target users according to the similarity probability of each target user and the target problem, and generating an expert recommendation table. The ranking of the expert recommendation table is used to reflect the degree to which the expert is appropriate to answer the question.

According to the expert recommendation method provided by the embodiment, the problem data is obtained, and the problem vector corresponding to the problem data is determined; constructing a user tag heterogeneous network according to the historical answer records of target users, wherein the target users are users corresponding to the question data; then, according to the user label heterogeneous network, a user vector is obtained; then, performing model training according to the problem vector and the user vector to obtain a trained predictive neural model; finally, inputting the target problem into the prediction neural model to obtain the similarity probability of the target user and the target problem; and generating recommendation information according to the similarity probability, automatically determining a domain expert corresponding to the target problem in the target user based on a predictive neural model, helping a questioner to find the expert corresponding to the solved problem, avoiding the problem of decreased liveness of a question-answering platform caused by unmanned answering of a large number of problems, and improving the accuracy of recommending the domain expert.

Referring to fig. 5, fig. 5 is a schematic block diagram of an electronic device according to an embodiment of the present disclosure. The electronic device 50 may be a server or a terminal device.

The Network in which the electronic device 50 is located includes, but is not limited to, the internet, a wide area Network, a metropolitan area Network, a local area Network, a Virtual Private Network (VPN), and the like.

As shown in fig. 5, the electronic device 50 includes a communication interface 501, a memory 502, a processor 503, an Input/Output (I/O) interface 504, and a bus 505. The processor 503 is coupled to the communication interface 501, the memory 502, and the I/O interface 504, respectively, via the bus 505.

The communication interface 501 is used for communication. The communication interface 501 may be an existing interface of the electronic device 50 or may be a newly established interface of the electronic device 50. Communication interface 501 may be a Network interface, such as a Wireless Local Area Network (WLAN) interface, a cellular Network communication interface, or a combination thereof.

The memory 502 may be used to store an operating system and computer programs. For example, the memory 502 stores a program corresponding to the expert recommendation method described above.

It should be understood that the memory 502 may include a program storage area and a data storage area. Wherein, the storage program area can be used for storing an operating system, application programs (such as expert recommendation methods) required by at least one method, and the like; the storage data area may store data created according to the use of the electronic device 50, and the like. In addition, the memory 502 may include volatile memory and may also include non-volatile memory, such as a hard disk, a memory, a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), at least one magnetic disk storage device, a Flash memory device, or other storage device.

The processor 503 provides the computational and control capabilities that support the operation of the overall computer device. For example, the processor 503 is configured to execute a computer program stored in the memory 502 to implement the steps of the expert recommendation method described above.

It should be understood that the Processor 503 is a Central Processing Unit (CPU), and may be other general purpose processors, Digital Signal Processors (DSPs), Application Specific Integrated Circuits (ASICs), Field Programmable Gate Arrays (FPGAs) or other Programmable logic devices, discrete Gate or transistor logic devices, discrete hardware components, etc. Wherein a general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

The I/O interface 504 is used to provide a channel for user input or output, for example, the I/O interface 504 may be used to connect various input and output devices (mouse, keyboard, 3D touch device, etc.), displays, so that a user may enter information, or visualize information.

The bus 505 is used at least for providing a channel for mutual communication between the communication interface 501, the memory 502, the processor 503 and the I/O interface 504 in the electronic device 50.

Those skilled in the art will appreciate that the architecture shown in fig. 5 is merely a block diagram of some of the structures associated with the disclosed aspects and is not intended to limit the computing devices to which the disclosed aspects apply, as particular computing devices may include more or less components than those shown, or may combine certain components, or have a different arrangement of components.

Wherein, in one embodiment, the processor 503 executes the computer program stored in the memory 502 to implement the expert recommendation method, and implements the following steps:

obtaining a user vector according to the user label heterogeneous network;

Specifically, the specific implementation method of the instructions by the processor 503 may refer to the description of the relevant steps in the aforementioned expert recommendation method embodiment, which is not described herein again.

Embodiments of the present application further provide a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, where the computer program includes program instructions, and a method implemented when the program instructions are executed may refer to various embodiments of the expert recommendation method in the present application.

The computer-readable storage medium may be an internal storage unit of the electronic device according to the foregoing embodiment, for example, a hard disk or a memory of the electronic device. The computer readable storage medium may also be an external storage device of the electronic device, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, provided on the electronic device.

Further, the computer-readable storage medium may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function, and the like; the storage data area may store data created according to use of the electronic device, and the like.

The electronic device and the computer-readable storage medium provided by the foregoing embodiment determine the problem vector corresponding to the problem data by acquiring the problem data; constructing a user tag heterogeneous network according to the historical answer records of target users, wherein the target users are users corresponding to the question data; then, according to the user label heterogeneous network, a user vector is obtained; then, performing model training according to the problem vector and the user vector to obtain a trained predictive neural model; finally, inputting the target problem into the prediction neural model to obtain the similarity probability of the target user and the target problem; and generating recommendation information according to the similarity probability, automatically determining a domain expert corresponding to the target problem in the target user based on a predictive neural model, helping a questioner to find the expert corresponding to the solved problem, avoiding the problem of decreased liveness of a question-answering platform caused by unmanned answering of a large number of problems, and improving the accuracy of recommending the domain expert.

It is also to be understood that the terminology used in the description of the present application herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in the specification of the present application and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise.

It should also be understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items. It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments. While the invention has been described with reference to specific embodiments, the scope of the invention is not limited thereto, and those skilled in the art can easily conceive various equivalent modifications or substitutions within the technical scope of the invention. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. An expert recommendation method, comprising:

obtaining a user vector according to the user label heterogeneous network;

2. The expert recommendation method of claim 1 wherein the determining the question vector to which the question data corresponds comprises:

performing data cleaning on the problem data;

3. The expert recommendation method of claim 2 wherein the data cleansing includes removing tags, filtering stop words, clearing code segments.

4. The expert recommendation method of claim 1 wherein the user-tag heterogeneous network comprises a node set and an edge set, wherein the node set comprises a user set and a tag set, and the edge set comprises a user-to-tag relationship and a tag-to-tag relationship.

5. The expert recommendation method of claim 4 wherein the obtaining a user vector based on the user tag heterogeneous network comprises:

6. The expert recommendation method of claim 1 wherein the model training based on the problem vector and the user vector to obtain a trained predictive neural model comprises:

7. The expert recommendation method of claim 6 wherein the calculating the likelihood probability of the problem feature and the user feature comprises:

8. The expert recommendation method of any one of claims 1 to 7 wherein the generating recommendation information based on the similarity probability comprises:

9. An electronic device, comprising a memory and a processor;

the memory is to store at least one instruction;

the processor is configured to implement the expert recommendation method of any of claims 1-8 when executing the at least one instruction.

10. A computer-readable storage medium having stored therein at least one instruction which, when executed by a processor, implements the expert recommendation method of any one of claims 1 to 8.