CN112765305A

CN112765305A - Method and device for analyzing interest topic of author, electronic equipment and storage medium

Info

Publication number: CN112765305A
Application number: CN202011625275.6A
Authority: CN
Inventors: 徐硕; 李玲; 翟东升
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2020-12-31
Filing date: 2020-12-31
Publication date: 2021-05-07
Anticipated expiration: 2040-12-31

Abstract

The embodiment of the application provides a method and a device for analyzing an interest topic of an author, electronic equipment and a storage medium, and relates to the technical field of information analysis. The method comprises the following steps: obtaining at least one document of a target field, determining contribution weight of each author in the document, a topic expressed in the document by each word and a word responsible for each author in the document; and obtaining the topic expressed by each author in the document according to the topic expressed by each word in the document, the word responsible for each author in the document and the contribution weight of each author in the document, and determining the interest topic of the author according to the topic expressed by the content responsible for the relevant document by the author. According to the embodiment of the application, the interest topics of the authors can be found on the premise that each co-author contributes inequally to one multi-author article, the interest topics of scientific research personnel are reasonably reflected, research hotspots and trends in the subject field can be beneficially explored, and personalized academic research can be promoted.

Description

Method and device for analyzing interest topic of author, electronic equipment and storage medium

Technical Field

The present application relates to the field of information analysis technologies, and in particular, to a method and an apparatus for analyzing an interest topic of an author, an electronic device, and a storage medium.

Background

Nowadays, scientific and technical literature is used as a main carrier of academic achievements, gathers a great deal of human intelligence, and is a window for spreading knowledge and performing academic communication, wherein scientific and technical literature resources contain a great deal of characteristic information, such as potential semantic relations between words, relations between scientific and technical literature topics and authors (research interests of authors), the rise of research hotspots, the process from maturity to decline, and the like.

Now, in the research interest mining aspect of technologists, Rosen-Zvi et al introduce Author hidden variables into an LDA (Latent Dirichlet Allocation) model, replace document-Topic distribution in the LDA model with Author-Topic distribution, and propose an AT (Author-Topic) model. The model can mine the relation between the author and the subject, namely the research interest of the scientific research personnel.

However, when the AT model and other similar models model author interests, the contribution of each author in the literature is assumed to be the same, which is inconsistent with the actual situation, and the interest topics of the authors cannot be accurately analyzed.

Disclosure of Invention

Embodiments of the present invention provide an analysis method, apparatus, electronic device and storage medium for a topic of interest of an author, which overcome the above problems or at least partially solve the above problems.

In a first aspect, a method for analyzing an interest topic of an author is provided, the method comprising:

acquiring at least one document of a target field, and determining the contribution weight of each author in the document; the contribution weight is a normalized result of the contribution value of the author;

for each document, determining the topic of each word in the document expressed in the document and the word responsible for each author in the document; obtaining a topic expressed by each author in the document according to the topic expressed by each word in the document, the word responsible for each author in the document and the contribution weight of each author in the document;

for each author, related documents responsible for the author are determined from at least one document, and topics expressed by the author in the content responsible for the related documents are obtained to determine the topics of interest of the author.

Further, determining a contribution weight for each author in the document, comprising:

acquiring authors and contribution values of each author in the literature;

determining an initial weight of each author according to the number of authors in the document and the contribution value of each author;

and normalizing the initial weight of each author in the literature to obtain the final weight of each author in the literature.

Further, determining an initial weight for each author based on the number of authors and the contribution value of each author in the document, comprising:

if the number of the authors in the document does not exceed the preset number value, performing descending order arrangement on the authors in the document according to the contribution value of each author in the document to obtain an ordering result of each author in the document;

and calculating the initial weight of each author according to a preset weight algorithm according to the sequencing result of each author.

Further, determining an initial weight of each author according to the number of authors and the contribution value of each author in the document, further comprising:

if the number of the authors in the document exceeds a preset number value, performing descending order arrangement on the authors in the document according to the contribution value of each author in the document to obtain an ordering result of each author in the document;

when the sorting result of the authors is smaller than or equal to the preset quantity value, calculating to obtain the initial weight of the authors of which the sorting result is smaller than or equal to the preset quantity according to a preset weight algorithm according to the sorting result of the authors of which the sorting result is smaller than or equal to the preset quantity value in the sorting results of the authors;

when the sorting result of the author is larger than the preset quantity value, taking the preset multiple of the initial weight of the first author as the initial weight of all authors of which the sorting result is larger than the preset quantity value;

the first author is the author in the document whose ranking result is the first.

Further, determining a topic in the document that each word in the document expresses in the document includes:

allocating a preset number of themes to all words in the document, and after allocating themes to all words in the document every time, calculating the probability that the words are allocated to the target theme and other themes except the target theme when the theme is allocated next time according to the number of the words appearing in the document, the number of the words allocated to the target theme in the document after the theme is allocated at this time, and the number of the words allocated as the target theme in the document for any word in the document;

according to the probability that the words are allocated to the target theme and other themes except the target theme when the themes are allocated next time, allocating the themes for the next time on the words until the allocation times reach a preset threshold value;

obtaining a theme distributed when the distribution frequency of the words reaches a preset threshold value;

wherein, the target theme is the theme distributed when the word appears for the first time in the distribution.

Further, determining the words in the document for which each author is responsible includes:

the method comprises the steps that authors of preset times are allocated to all words in a document, and after the authors are allocated to all the words in the document every time, for any word in the document, the probability that the word is allocated to a target author and other authors except the target author when the author is allocated next time is calculated according to the number of the words appearing in a document, the number of the words allocated to the target author in the document after the authors are allocated at this time, and the number of the words allocated to the target author in the document;

according to the probability that the word is allocated to the target author and other authors except the target author when the author is allocated next time, allocating the author to the word next time until the allocation frequency reaches a preset threshold value;

acquiring an author to be distributed when the distribution frequency of the words reaches a preset threshold value;

wherein the target author is the author to which the word is assigned when it first appears in the assignment.

Further, obtaining the topic expressed by the content responsible for each author in the document according to the topic expressed by each word in the document, the word responsible for each author in the document and the contribution weight of each author in the document, including:

for any author in the literature, selecting a topic expressed by a word in the literature and a word responsible for the author in the literature according to the final weight of the author;

and taking the word responsible by the author as a target word, and determining the topic expressed in the literature by the author according to the topic expressed in the literature by the target word.

Further, determining related documents responsible for the author from at least one document, and obtaining topics expressed by the content responsible for the related documents by the author to determine topics of interest of the author, including:

acquiring relevant documents responsible for authors, and determining topics expressed by the authors in the content responsible for the relevant documents;

determining an interest topic of an author in the topics expressed by the author according to the topics expressed by the author;

calculating the occurrence probability of the interest topic of the author according to the occurrence times of the interest topic of the author in the related documents in charge of the author, and taking the topic with the probability exceeding a preset probability value as the interest topic of the author.

In a second aspect, an apparatus for analyzing interest topics of authors is provided, including:

the first acquisition module is used for acquiring at least one document of the target field and determining the contribution weight of each author in the document; the contribution weight is a normalized result of the contribution value of the author;

the determining module is used for determining the topic of each word in the literature expressed in the literature and the word responsible for each author in the literature for each literature; obtaining a topic expressed by each author in the document according to the topic expressed by each word in the document, the word responsible for each author in the document and the contribution weight of each author in the document;

and the second acquisition module is used for determining related documents responsible for the authors from at least one document for each author, and acquiring topics expressed by the content responsible for the related documents by the authors so as to determine interest topics of the authors.

In a third aspect, an embodiment of the present invention provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the steps of the method provided in the first aspect.

In a fourth aspect, an embodiment of the present invention provides a computer-readable storage medium, on which a computer program is stored, which, when executed by a processor, implements the steps of the method as provided in the first aspect.

According to the method, the device, the electronic equipment and the storage medium for analyzing the interest topics of the authors, the contribution weight of each author in the literature is determined by acquiring at least one literature in the target field, and the topic expressed by each word in the literature and the word responsible for each author in the literature are determined; and obtaining the topic expressed by each author in the document according to the topic expressed by each word in the document, the word responsible for each author in the document and the contribution weight of each author in the document, and determining the interest topic of the author according to the topic expressed by the content responsible for the relevant document by the author. According to the embodiment of the application, the interest topics of the authors can be found on the premise that each co-author contributes inequally to one multi-author article, the interest topics of scientific research personnel are reasonably reflected, research hotspots and trends in the subject field can be beneficially explored, and personalized academic research can be promoted.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings used in the description of the embodiments of the present application will be briefly described below.

FIG. 1 is a schematic illustration of a document distribution with different author numbers provided in an embodiment of the present application;

FIG. 2 is a flowchart illustrating a method for analyzing an author's interest topic according to an embodiment of the present application;

FIG. 3 is a diagram illustrating word distribution in a document provided by an embodiment of the present application;

FIG. 4 is a schematic diagram of a topic assignment for primary words provided by an embodiment of the present application;

FIG. 5 is a schematic diagram of topic assignment of a word after an iterative process is completed according to an embodiment of the present application;

FIG. 6 is a diagram illustrating author assignment of primary words provided by an embodiment of the present application;

FIG. 7 is a diagram illustrating author assignment of words after iterative processing is completed according to an embodiment of the present application;

FIG. 8 is a diagram of an author interest disclosure model provided in accordance with an embodiment of the present application;

FIG. 9 is a schematic structural diagram of an apparatus for analyzing a topic of interest of an author provided in an embodiment of the present application;

fig. 10 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to the embodiments of the present application, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are exemplary only for the purpose of explaining the present application and are not to be construed as limiting the present invention.

As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof. It will be understood that when an element is referred to as being "connected" or "coupled" to another element, it can be directly connected or coupled to the other element or intervening elements may also be present. Further, "connected" or "coupled" as used herein may include wirelessly connected or wirelessly coupled. As used herein, the term "and/or" includes all or any element and all combinations of one or more of the associated listed items.

The application provides a method and a device for analyzing an interest topic of an author, an electronic device and a storage medium, and aims to solve the above technical problems in the prior art.

To make the objects, technical solutions and advantages of the present application more clear, embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

The following describes the technical solutions of the present application and how to solve the above technical problems with specific embodiments. The following several specific embodiments may be combined with each other, and details of the same or similar concepts or processes may not be repeated in some embodiments. Embodiments of the present application will be described below with reference to the accompanying drawings.

First, the present application can be applied in various scientific research literature exploration scenarios, such as: data mining, machine learning, information analysis, relevant policy making, personalized academic recommendation, author scientific evaluation and the like, scientific documents are used as main carriers of academic achievements, a large amount of human intelligence is gathered, and the method is a window for spreading knowledge and performing academic communication. The Prisy scientific literature exponential growth law and the logic curve growth model show that the scientific literature quantity is growing exponentially, which brings great challenges to the detection and tracking of scientific knowledge/topics. Scientific and technical literature resources contain a large amount of implicit information, such as potential semantic relationships between words and relationships between literature topics and authors (research interests of authors) and the like, and can reflect research hotspots and trends in the current subject field to a certain extent. Research shows that the research of automatically disclosing the subject of the literature and mining the research interest of the author can play a good role in supporting scientific researchers, academic exchange platforms and scientific research management institutions.

In the research interest mining aspect of science and technology workers, Rosen-Zvi et al introduce Author hidden variables into an LDA (Latent Dirichlet Allocation) model, replace document-Topic distribution in the LDA model with Author-Topic distribution, and propose an AT (Author-Topic) model. The model can mine the relation between the author and the subject, namely the research interest of the scientific research personnel. The information age has rapidly developed scientific technology, and the form of scientific research is gradually changed from personal research into multiparty collaborative group-type research, which is particularly shown in the continuous rising number of authors of scientific papers describing scientific research achievements. It is well known that the contribution of each author is different for most scientific achievements. However, AT and other similar models implicitly embed assumptions of equivalent contribution when modeling author interest.

Fig. 1 is a schematic diagram of distribution of documents with different numbers of authors provided in this embodiment of the present application, as shown in fig. 1, it can be known that 2-5 authors in a document are generally responsible for completion, and it can be seen that the generality of a multi-author signed academic document, and multiple authors are responsible for a document together, and it is necessary to determine the contribution weight of each author in the document to analyze the topic of interest of each author more clearly.

According to the application, in the process of author interest disclosure, a contribution weight re-distribution mechanism is introduced, on the premise that each author contributes inequality to one multi-author article, interest topics of each author are found, and more scientific decision support is provided for a personalized academic recommendation system, recruitment promotion of scholars, scientific research rewards and fund distribution. Specifically, the invention provides an author interest disclosure model introducing a contribution weight allocation mechanism on the basis of an AT model, which is named as AT^creditThe model, the concept is also applicable to other similar interest-revealing topic models.

It should be understood that the method for analyzing the author's interest topic provided in the present application can be applied to any computer or system having a function of analyzing the author's interest topic, such as: analyzing the interest Topic of authors in the field of bioscience, in the research interest mining aspect of science and technology workers, Rosen-Zvi et al introduce Author hidden variables into an LDA (Latent Dirichlet Allocation) model, replace document-Topic distribution in the LDA model with Author-Topic distribution, and propose an AT (Author-Topic) model. The model can mine the relation between the author and the subject, namely the research interest of the scientific research personnel. The information age has rapidly developed scientific technology, and the form of scientific research is gradually changed from personal research into multiparty collaborative group-type research, which is particularly shown in the continuous rising number of authors of scientific papers describing scientific research achievements. It is well known that the contribution of each author is different for most scientific achievements. However, AT and other similar models implicitly embed assumptions of equivalent contribution when modeling author interest.

According to the application, in the process of author interest disclosure, a contribution weight re-distribution mechanism is introduced, on the premise that each author contributes inequality to one multi-author article, interest topics of each author are found, and more scientific decision support is provided for a personalized academic recommendation system, recruitment promotion of scholars, scientific research rewards and fund distribution. Specifically, the invention provides an author interest disclosure model introducing a contribution weight allocation mechanism on the basis of an AT model, which is named as AT^creditModels, the concept is also applicable to other similar interest topic models.

In order to solve the above problem, an embodiment of the present application provides a method for analyzing an interest topic of an author. Referring to the drawings, a method for analyzing an author's interest topic provided in the embodiment of the present application is described in detail through specific embodiments and other application scenarios, and fig. 2 is a schematic flow chart of the method for analyzing an author's interest topic provided in the embodiment, as shown in fig. 2, the method includes:

s201, obtaining at least one document of the target field, and determining the contribution weight of each author in the document.

In this embodiment, the target document and the related information of the target document are acquired by a computer, or official statistical documents are searched through the internet, and the document in the target field, the author signature related to the target document and the contribution value of each author in the target document are selected, for example: the SynBio data set provided by a competition organization party can be predicted by adopting the emerging technology of 2018 and 2019, through statistics, 2580 academic papers in the data set are scientific research documents related to the biological field, and then the author signatures of all documents and the contribution weight of all authors in each document are collected.

S202, for each document, determining the topic of each word in the document expressed in the document and the word responsible for each author in the document; and obtaining the topic expressed by each author in the document according to the topic expressed by each word in the document, the word responsible for each author in the document and the contribution weight of each author in the document.

After the documents in the target field are obtained, the documents need to be preprocessed, stop words are filtered, remaining words are reserved, authors of the documents are determined, contribution weights of each author in the documents are obtained through analysis and calculation according to information of the authors, topics expressed in the documents by each word in the documents and words responsible for each author in the documents are analyzed through a preset author topic model, sampling analysis is conducted on the words distributed by the authors and the topics of the words according to the weights of the authors, and topics expressed in the documents by each author are determined. For example: a document is commonly responsible for two authors, wherein Zhang III is mainly responsible for, Zhang III is 80% in weight, Liqu is 20% in weight, when analyzing the subject of a word and the author of the word, sampling is needed through the proportion of the weight, the probability that the word associated with Zhang III and the subject associated with the word are sampled is higher, and the subject expressed in the document by Zhang III is determined through high-probability sampling analysis. The probability that a word associated with liquad and a topic associated with that word are sampled is relatively low, but the topic expressed in that document can also be determined.

Specifically, word segmentation processing is carried out on the target document, and preset words are filtered to obtain processed text information; processing related information of the target document to obtain an author name list and each author weight in the target document;

in this embodiment, after collecting the target document, the target document needs to be preprocessed, that is, redundant information, stop words, sentence segmentation, and the like in the target document are removed, and after the preprocessing is completed, the cleaned text information is obtained, for example: after the target document is collected, firstly, the sentences in the target document need to be segmented into words, and then, the characters are filtered out, wherein the characters comprise stop characters, numbers and characters with a frequency lower than a preset frequency, the filtering method comprises the following steps of comparing according to a pre-constructed stop word vocabulary, and judging which words are stop words, for example: in english, these words "first, and, but" are stop words and then removed, but sometimes "and" is not stop words and needs to be determined through some complicated analysis, for example, determination is performed according to upper and lower contexts, and after documents are preprocessed, text information is formed, which contains unprocessed words and can also form a dictionary.

The method comprises the steps of processing according to related information of a target document, namely disambiguating author signatures in each document in a target field, distinguishing whether authors with the same name in different documents are the same person, re-determining the number of authors signed in the target document, calculating according to the number of authors in the target document and contribution weights of the authors, and determining the weight of each author in the target document, wherein for example, two documents are used, one male author is named Zhang III and signed in the first document, the other female author is also named Zhang III and signed in the second document, whether two authors named Zhang III are the same person is determined, and disambiguation is performed if the two authors are the same person.

S203, for each author, determining related documents responsible for the author from at least one document, and acquiring topics expressed by the content responsible for the related documents by the author to determine interest topics of the author.

According to the method and the device, the interested subject of the author needs to be determined, other documents related to the author need to be collected, analysis is carried out according to the subjects expressed by the author in the other documents, and the subject expressed by the author with the largest occurrence probability is selected as the subject interested by the author; for example: the author A expresses a topic 1 in a first document, expresses a topic 2 in a second document, and the topic 1 and the topic 2 are two different topics. According to the method, the device, the electronic equipment and the storage medium for analyzing the interest topics of the authors, the contribution weight of each author in the literature is determined by acquiring at least one literature in the target field, and the topic expressed by each word in the literature and the word responsible for each author in the literature are determined; and obtaining the topic expressed by each author in the document according to the topic expressed by each word in the document, the word responsible for each author in the document and the contribution weight of each author in the document, and determining the interest topic of the author according to the topic expressed by the content responsible for the relevant document by the author. According to the embodiment of the application, the interest topics of the authors can be found on the premise that each author contributes inequality to one multi-author article, the interest topics of scientific research personnel are reasonably reflected, research hotspots and trends in the subject field can be beneficially explored, and personalized academic research can be promoted.

On the basis of the above embodiment, as an alternative embodiment, determining the contribution weight of each author in the document includes:

acquiring authors and contribution values of each author in the literature;

After the related information of the target document is acquired, the related information needs to be processed, wherein the author in charge of the target document needs to be disambiguated, because the author with the same name exists in the target document, whether the author with the same name is the same person needs to be distinguished, and the method adopted in the embodiment of the application includes but is not limited to a rule-based scoring clustering method, and a manual disambiguation method, an automatic disambiguation method and the like can be used.

For example, rule-based scoring clustering is used to determine whether authors with the same name in different target documents are the same person, such as: the names of the A author and the B author are the same, and the A author and the B author sign their names as authors in different documents, so that the same names or different names are distinguished. The rule-based scoring and clustering method mainly judges and identifies according to several judgment rules, wherein the judgment rules comprise a rule 1: according to the mailbox of the author described in the literature, if the mailboxes of the author are the same, the same person can be determined, if the mailboxes are different, the same person is determined, if the names of the author are the same but the mailboxes are different, the judgment is performed according to the working address of the author described in the literature, if the described working addresses are the same, the possibility that the author is the same person is high, and the rule 2: the judgment can be carried out according to the collaborators with which the two persons often collaborate, several authors may often collaborate together, if the collaborators of the two persons overlap, the two persons may be the same person, there is a possible situation that some authors like to quote their references, if the references quoted by two persons with the same name are consistent and both quote their references, the two persons may be the same person, based on the rules, the similarity of each rule is scored, for example, according to mailbox judgment, if the mailboxes are the same, the similarity is higher, 100 points are scored, if the mailboxes of the two persons are different, the working units are the same, the possibility of similarity is determined to be not high, 80 points are scored, the similarity is scored according to the judgment rules, the scores are accumulated between the authors based on the rules, whether the number of the accumulated scores is the same person is determined to be not the same, if the score is high, the two persons are the same person, such as: all people who call XX (people with the same name) form a table, information of each person is specifically analyzed, clustering analysis is carried out according to the rule, the same names which are gathered together are determined to be the same person, other people with the same names which are gathered together are also the other person, an author set consisting of authors is obtained after disambiguation, the number of the authors responsible for the signature in the target document is determined again, and the weights of the authors are determined conveniently in the follow-up process.

After the relevant information of the target document is obtained, the relevant information needs to be processed, namely, ambiguous authors are eliminated, the number of authors in charge of signature in the target document is determined, contribution weight of each author is determined, the content in charge of each author is analyzed conveniently, and unnecessary troubles caused by the same author are avoided.

After the number of authors in the target document is determined, all authors in the target document need to be sorted according to contribution sizes of the authors in the target document, and then, a preset contribution weight algorithm is used to estimate contribution weights of the authors in the document, where the contribution weight algorithm includes, but is not limited to, an arithmetic counting method, a geometric counting method, a harmonic counting method, a network-based counting method, a axiom counting method, and a golden number counting method, and one of the contribution weight algorithms is randomly selected to be calculated.

Arithmetic counting method: the method is that the collaborators in the author signature list linearly distribute contribution scores in descending order. The difference in contribution between two adjacent collaborators is

Wherein, in particular, for a paper m, the number of coworkers is A_mI.e. number of authors, contribution of authors is c_mThe lambda is a free parameter, different parameter values can be set according to actual conditions, and i represents the sequencing result of the author.

Geometric counting method: the method is that the contribution scores of the collaborators in the author signature list form a geometric series. The contribution ratio between two adjacent co-workers is lambda (lambda is more than or equal to 1).

Harmonic counting method: the contribution weight of each co-worker in the author signature list is

The contribution ratio between two adjacent collaborators is

Network-based counting method: the method consists of two steps, the first step being a fractional counting method, i.e.

The second step is a portion of the contribution weight (λ e [0,1]) that each collaborator after the first author in the author signature list will obtain himself]) Assigned to the previous author.

Axiom counting method: the method divides the author into G_m(G_m≤A_m) Group g_m,kAre the elements in the k-th group that are arranged in order.

Further, the method has a parameter λ (λ ∈ [0,1]]) The calculation method is as follows:

gold number counting method: the method is based on gold number

To consider the contributions of each collaborator in the list of author signatures.

Further, the method has the calculation mode of the parameter lambda (lambda epsilon [0,1 ]):

in addition, for a case where there are multiple first authors or correspondents contributing the same, before applying the contribution calculation method (except axiom counting method), we regard all the collaborators as the first authors, re-order the collaborators and calculate contribution weights, and average the contribution weights.

On the basis of the above embodiment, as an alternative embodiment, determining the initial weight of each author according to the number of authors and the contribution value of each author in the document includes:

Before the weights are calculated, sorting is performed according to contribution sizes of authors in a target document, authors in the target document with the most contributions are ranked first, wherein authors in the target document with the same contribution sizes need to be ranked together, a sequence is randomly set, after the authors in the target document are ranked, initial weights of the authors are calculated according to ranking positions of the authors in the target document by using a contribution weight algorithm, wherein the initial weights of the authors with the same contribution sizes need to be added and summed and then averagely distributed to the authors participating in the summation, after the initial weights of the authors are calculated, normalization processing needs to be performed again, because the sum of the initial weights of the authors calculated by the contribution weight algorithm is greater than 1, and therefore final weights of the authors need to be calculated, the normalization process is to add and sum the initial weights of all authors in the target document, the sum value is used as a denominator, the initial weight of any author in the target document is used as a numerator, and the finally obtained value is the final contribution weight of the author in the target document.

In the embodiment of the application, a harmonic counting method is used for analysis, wherein i is the author serial number after sorting according to contribution size, namely a sorting result, λ is a free parameter, and infinity is generally taken for convenient calculation

Then, the initial weight of the author is calculated as a numerator, that is, the final weight of the first author is

And the final weight of all authors in the target document is obtained by analogy.

After the number of authors in the target document is determined, the weight of each author in the target document is calculated by using a contribution weight algorithm according to the contribution size of the authors in the target document, so that the contribution weight of each author in the target document is obtained, the distribution accuracy of the contribution weight of each author in the target document is improved, and the analysis of interest topics of subsequent authors is facilitated.

On the basis of the above embodiment, as an alternative embodiment, determining the initial weight of each author according to the number of authors and the contribution value of each author in the document, further includes:

In the embodiment of the present application, after determining the number of authors in the target document, it is required to determine whether the number of authors in the target document exceeds a preset number value, and if the number of authors in the target document exceeds the preset number value, the superordinate collaborator is called a superordinate collaborator, and for a paper that owns the superordinate collaborator (i.e., the collaborator is greater than the preset number value), the contribution weight of the collaborator needs to be redistributed, and the corresponding method is as follows:

c_m,irepresenting the author weight, i representing the author ranking result, and using harmonic counting as an example for analysis, assuming a preset value of 10, the number of authors in the target document is more than 10, which is called super collaborator, assuming that 11 authors in the document signature have more than a preset value of 10, the contribution of each author in the target document is needed first, all authors in the target document are ranked, the initial weight of the first author after ranking is 1, the weight of the second author is one half, the weight of the third author is one third, therefore, only the author weight of the first 10 bits is calculated, the contribution weight of the 11 th author in the target document is a preset multiple of the initial contribution weight of the first author, and assuming that the preset multiple is 0.05, the initial contribution weight of the 11 th author is 0.05, and the initial contribution of the authors ranked after the 11 th order is all 0.05, thereby determining the initial weight of all authors in the target document.

According to the embodiment of the application, the contribution weight of the author needs to be analyzed according to the number of the authors in the document, when the number of the authors exceeds a preset number value, if the authors are ranked according to the contribution values and the contribution weight is calculated, the weight of the author ranked at the back is possibly smaller, in order to enable each author to be highlighted, the preset multiple of the contribution weight of the first author is used as the contribution weight of the exceeding part of the authors, so that each author can participate in topic analysis and highlight the topic which the author wants to express.

In the embodiment of the application, a Gibbs sampling algorithm formula is adopted to calculate the theme z of the nth word in the target document m_m,nAnd the author x of the nth word in the target document m_m,n；

The gibbs sampling algorithm is as follows:

wherein Pr represents the probability of calculating the condition,

representing word vectors, omega, in textual information_m,nRepresenting the nth word in the target document m,

representing all topic vectors, z, outside the topic assigned to the nth word in the target document m_m，nRepresenting the topic of the nth word in the target document m,

representing all author vectors, x, except the author assigned to the nth word in the target document m_m，nRepresenting the author of the nth word in the target document m,

the author variable representing the target document,

representing rights of various authors in the target documentA weight vector, λ is a parameter for calculating the author weight of the target document, K represents the number of topics of the content of the target document, ω_m，nRepresenting the nth word in the target document m, V representing the number of words in the processed text information, a_m,nIndicating the number of authors responsible for the target document,

refers to the nth word in the target document m being assigned z_m,nThe number of times of the subject matter,

the dirichlet prior parameter vector of (a),

the topic parameter representing the nth word in the target document m is

The element of (1) indicates that regardless of the number of times that the current dispensing is made,

indicating that sum all words v are assigned a topic z_m,nThe number of times and the subject parameters of all words,

indicating that the nth word in the target document m is simultaneously assigned z_m,nSubject and author x_m,nThe number of times of the operation of the motor,

is a dirichlet prior parameter vector,

topic z representing the nth word in the target document m_m,nThe author parameter of is

The elements (A) and (B) in (B),

representing summing all simultaneously assigned topics k and authors x_m,nThe number of times and the author parameter alpha of all topics k,

represents author x in target document m_m，nThe contribution weight of (1).

Fig. 3 is a schematic diagram of word distribution in the literature provided by an embodiment of the present application, as shown in fig. 3, there are 4 bank words, 6 money words, 6 loan words, and other words in literature 1, and there are 5 bank words, 7 money words, 4 loan words, and other words in literature 2, which are only sample schematic diagrams, and there are many words in the literature, so it can be known that the number of words needs to be determined before topic assignment is performed, and the same words can also be grouped together for statistics.

On the basis of the above embodiment, as an alternative embodiment, determining the topic of each word in the document expressed in the document includes:

allocating themes of preset times to all words in the document, and after allocating themes to all words in the document every time, calculating the probability that the words are allocated to the target theme and other themes except the target theme when the theme is allocated next time according to the number of the words appearing in the document, the number of the words allocated to the target theme in the document after the theme is allocated at this time, and the number of the words allocated as the target theme in the document for any word in the document;

Fig. 4 is a schematic diagram of topic assignment of a primary word provided in an embodiment of the present application, and as shown in fig. 4, it is assumed that topics have

topics

1 and 2, a bank of words is assigned with topic 1, and a money is assigned with

topics

1 and 2. Specifically, in the embodiment of the present application, it is necessary to determine the topic to be expressed by the word, and then assign a topic to the word, and perform an iterative process to make the calculated probability converge, so as to determine what the topic of the word is. In a general case, after the document is preprocessed to perform word segmentation, and the topic to be expressed by each word is unknown, further analysis is needed, first, the number of words and the type and number of topics need to be determined, then, a topic is randomly assigned to the words, the first assignment is initialization, so that each word is assigned with a topic, the assignment of topics is performed for the second time, the topic for assigning words for the second time is assigned according to the probability of calculating the topic, taking fig. 4 as an example, the first word bank is assigned with topic 1 for the first time, the number of all bank words assigned with topic 1 is counted to be 4, the number of all words assigned with topic 1 is counted to be 11, the probability of the first bank word assigned with topic 1 is 4/11, the topic is assigned according to the probability of topic 1 being 4/11, and the topic 2 is assigned with topic 7/11, if there are many themes, theme 1 is assigned with a probability of 4/11, and the remaining themes are assigned according to the probability calculated by the number of assignments. And then calculating the probability of a second bank word, redistributing the topics according to the probability, sequentially calculating all the words, completing second topic allocation, namely recording as one iteration, completing the preset iteration times, and estimating the topic of each word in the text information.

The probability calculation formula corresponds to the part of the Gibbs sampling algorithm

This part calls for the nth word assignment z in the target document m_m,nProbability of topic, according to assignment z_m,nThe probability of a topic determines what the topic the nth word in the target document m may express, a document having only one topic, but each word is capable of expressingThere are many different themes that determine the theme that each word may express.

Indicating that the nth word in the target document m is assigned z_m,nThe number of times of this topic is understood as the number of times z is assigned to the word in the target document m which is the same as the nth word_m,nThe number of words for this topic is,

the theme parameter representing the nth word in the target document m is a preset parameter generally having a value of 0.01, and is used for avoiding that some words are not distributed with themes when the themes are distributed, if only one word is distributed with themes for the first time, statistics can be calculated according to a value of 0, a numerator is 0, the result probability is 0, the theme parameter is set, and the situation that the occurrence probability is 0 is avoided, wherein-1 represents that the situation that the currently distributed themes z are not considered, and_m,nthe number of times.

Indicating that different ones of all words are assigned Z_m,nThe number of times this topic has been assigned, it is understood that different words of all words are assigned z_m,nThe number of words for this topic, such as: bank is assigned z_m,nThe number of words of the subject, and also money, is assigned z_m,nThe number of words of this topic, summing up these different words is assigned z_m,nThe number of words of the topic and the topic parameters of the different words are obtained_m,nThe total number of words for this topic.

The nth word assignment z in the target document m is calculated_m,nProbability of topic, which is the assignment of z once to a word_m,nProbability of subject matter, i.e.

Denotes z in the target document m_m,nAnd (3) distributing the probability distribution of the words of the theme, distributing the theme according to the calculated probability for the expected estimation result, completing probability calculation and theme distribution of all the words, and performing iteration processing for a preset number of times, wherein the iteration processing is the calculation of the theme probability from the first word, and distributing the theme according to the probability until the last word is redistributed to form the one-time iteration processing.

Fig. 5 is a schematic diagram of topic assignment of a word after iteration processing is completed according to an embodiment of the present application, and as shown in fig. 5, in document 1, after a preset number of iterations processing, a topic assigned to a word bank is topic 1, and a topic assigned to a word money is topic 1, so that it can be determined that the topic of the word bank is topic 1, and the topic of the word money is topic 1.

According to the method and the device, the word theme is distributed, the probability of the distributed theme is calculated, the theme is redistributed according to the probability, iteration processing is sequentially carried out, the result is converged and tends to be constant, the theme of the word is determined, and a foundation is laid for subsequently determining the interest theme of an author.

On the basis of the above embodiment, as an alternative embodiment, determining the word in the document for which each author is responsible includes:

Fig. 6 is a schematic diagram of author assignment of primary words provided in this embodiment of the present application, and as shown in fig. 6, it is assumed that authors have author 1 and author 2, authors assigned to the word bank have author 1 and author 2, and authors assigned to the word money have author 1 and author 2. After the theme of the word is determined, the author corresponding to the word needs to be determined, and the author needs to be allocated to the word for iterative processing, so that the calculation probability is converged, and the author of the word can be determined. Specifically, in general, after obtaining relevant documents, only the content of the documents and the contribution value of each author are known, and the content specifically responsible for each author is not known, so that only one-step analysis can be performed, first, the number of words and the number of authors need to be determined, then, the words are randomly assigned to one author, the first assignment is initialized, each word is assigned with an author, the second assignment of the author is performed, and the authors assigned the words for the second time are assigned according to the probability of the author, which is similar to the topic assignment of the words.

The probability calculation formula corresponds to the same principle in the Gibbs sampling algorithm

This part calls for the nth word in the target document m to be simultaneously assigned z_m,nSubject and author x_m，nAccording to the distribution z_m，nSubject and author x_m,nIs assigned z_m,nWho the author of the topic is.

Indicating that the nth word in the target document m is simultaneously assigned z_m，nThis topic and Author x_m,nIs understood as the number of times in the target document m, the same word as the nth word is simultaneously assigned z_m,nThis topic and Author x_m,nIs also understood that the word with the same nth word is assigned the author x_m,nAnd the subject of the wordIs z_m,nThe number of words of (a) is,

topic z representing the nth word in the target document m_m,nThe author parameter of (1) is generally 0.01, and is used for avoiding that when an author is allocated, some words are not allocated with authors, if the word has only one word and is not allocated with an author for the first time, statistics can be calculated according to a 0 value, a molecule has 0, the result probability is 0, the author parameter is set, and the situation that the probability is 0 is avoided, wherein-1 represents that the currently allocated author x is not considered_m,nThe times, meaning is calculated first, and then the topics are assigned according to the probability.

Meaning that different ones of all words are assigned assignment z simultaneously_m,nThis topic and Author x_m,nIs understood to mean that different words of all words are assigned z simultaneously_m,nThis topic and Author x_m,nCan also be understood as the number of words in all words to which different words are assigned author x_m,nAnd the subject of these words is z_m,nThe number of words of the subject, summing up the different words being assigned z simultaneously_m,nThis topic and Author x_m,nThe number of words and the subject parameters of these different words, all of which are assigned z simultaneously_m,nThis topic and Author x_m,nThe total number of words.

Calculating the nth word simultaneous assignment z in the available target document m_m,nSubject and author x_m,nIs a probability of

Represents author x in target document m_m,nThe author is assigned according to the calculated probability, the probability calculation and the author assignment of all the words are completed, and the iteration processing of the preset times is carried out, wherein the iteration processing is the first singleThe author probability is calculated at the beginning of the word, the author is distributed according to the probability until the author is redistributed to the last word, and the iterative processing is carried out.

Fig. 7 is a schematic diagram illustrating author assignment of a word after completion of an iterative process according to an embodiment of the present application, and as shown in fig. 7, after a preset number of iterative processes in document 1, an author assigned to a word bank is author 2, a majority of authors assigned to a word money are author 2, and a minority of authors are author 1, so that it can be determined that an author of the word bank is author 2, and an author of the word money is author 2.

According to the method and the device, word authors are allocated, the probability of the allocated authors is calculated, the authors are allocated according to the probability, iteration processing is carried out in sequence, the result is converged and tends to be constant, and therefore the authors of the words are determined, the part of the authors responsible for the documents is determined, and a foundation is laid for subsequently determining topics in which the authors are interested.

On the basis of the above embodiment, as an alternative embodiment, obtaining the topic expressed by the content responsible for each author in the document according to the topic expressed by each word in the document, the word responsible for each author in the document and the contribution weight of each author in the document includes:

for any author in the literature, selecting a topic expressed in the literature by each word and a word responsible for each author in the literature according to the final weight of the author;

After determining the topic of the word and the author of the word, the embodiment of the application needs to sample according to the final weight of the author, the author with a large final weight ratio has a large sampling probability, and the word associated with the author is also sampled more, so as to determine the topic of the author in charge of the content in the document according to the topic of the word, because the author contributes more weight, the topic of the author in charge of the content may be diversified, the author weight is the sampling probability, the contribution of the author in the document can be highlighted, and the topic of the author in charge of the content can be better indicated, specifically, if a document is jointly responsible for two authors, wherein zhang san is mainly responsible for, zhang san is 80% weight, and lie is 20% weight, when analyzing the topic of the word and the author of the word, the sampling needs to be performed according to the ratio of the weight, the probability that the word associated with zhang san and the subject associated with the word are sampled is higher, and through high-probability sampling analysis, the subject expressed in zhang san in the document is determined, zhang san is likely to have more responsibility and the related subjects are also more, so that it can be analyzed that the subject interested in zhang san may be more diversified, and meanwhile, the probability that the word associated with lie si and the subject associated with the word are sampled is relatively low, but the subject expressed in the document can also be determined, and the subject interested in lie si is likely to be more, but in the document, there are fewer subjects related to lie si in charge and the subject interested in stand out may be relatively low.

After the topic of the word and the author of the word are determined, sampling analysis needs to be performed according to the final weight of the author, so that the sampling probability of the author with a large weight is higher, the contribution of the author in the literature can be highlighted, and the topic of the content responsible by the author can be indicated.

Fig. 8 is a schematic diagram of an author interest disclosure model provided in an embodiment of the present application, and as shown in fig. 8, the author interest disclosure model calculates a topic z of an nth word in a target document m by using a gibbs sampling algorithm formula_m,nAnd the author x of the nth word in the target document m_m,nTable 1 discloses a description table of each parameter in the model for author 1.

Table 1, Author 1 reveals a description table of each parameter in the model

On the basis of the above embodiment, as an alternative embodiment, determining relevant documents in charge of an author from at least one document, and obtaining topics expressed by the content in charge of the relevant documents by the author to determine topics of interest to the author includes:

calculating the probability of the occurrence of the interest topic of the author according to the occurrence times of the interest topic of the author in the related documents in charge of the author, and taking the topic with the probability exceeding a preset threshold value as the interest topic of the author.

After determining the topics of the authors in the literature, the embodiments of the present application need to collect the topics expressed by the authors in the literature related to the authors, summarize the topics expressed by the authors, and select the topics with higher occurrence probability as the topics of interest of the authors, for example: the author A expresses a topic 1 and a topic 2 in a first document, expresses a topic 3 and a topic 4 in a second document, and needs to collect more documents related to the author in order to determine the interested topic of the author A, the interested topics of the author A in different documents are collected, the times of the interested topics are counted, the probability of the topics is calculated according to the times of the topics, and the topics with the probability exceeding a preset threshold value are used as the interested topics of the author.

Author interest disclosure model (AT) using the mechanism for incorporating contribution weight assignment proposed by the present invention^creditModels) to find topics of interest to each researcher in the data set. Taking two high-yielding students of the university of Toronto, Boone, Charles and Andrews, Breda J. as an example, Table 2 is a probability table of interest topics and topics calculated by various algorithms, as shown in Table 2, Boone, Charles and Andrews, Breda J. utilizing AT^creditThe topic of interest found by the model AT the top 3 and the corresponding probability, for example, the topic with the probability greater than 10.00% is taken as the topic of interest of each scholarer, and table 3 is AT^creditThe model mining yields a list of 3 domain deep topics, as shown in table 3, where each domain topic is represented by the most relevant 10 words.

TABLE 2 probability tables of interest topics and topics calculated by various algorithms

TABLE 3 AT^creditModel mining obtained deep theme table of 3 fields

From table 3, it can be found that the research interest of Charles is mainly focused on "genetic interaction", while the interests of Andrews, breda j.

Fig. 9 is a schematic structural diagram of an apparatus for analyzing an interest topic of an author according to an embodiment of the present application, and as shown in fig. 9, the apparatus may include: the first obtaining module 301, the determining module 302, and the second obtaining module 303 specifically:

a first obtaining module 301, configured to obtain at least one document in a target field, and determine a contribution weight of each author in the document; the contribution weight is a normalized result of the contribution value of the author;

a determining module 302, configured to determine, for each document, a topic in the document that each word in the document expresses in the document and a word in the document that each author is responsible for; obtaining a topic expressed by each author in the document according to the topic expressed by each word in the document, the word responsible for each author in the document and the contribution weight of each author in the document;

and a second obtaining module 303, configured to, for each author, determine, from at least one document, a relevant document for which the author is responsible, and obtain a topic expressed by the content for which the author is responsible in the relevant document, so as to determine an interest topic of the author.

The apparatus for analyzing an author's interest topic provided in the embodiment of the present invention specifically executes the process of the method embodiment, and for details, the contents of the method embodiment for analyzing an author's interest topic are described in detail, and are not described herein again. According to the analysis device for the interest topics of the authors, the contribution weight of each author in the literature, the topic expressed by each word in the literature and the word responsible for each author in the literature are determined by acquiring at least one literature in the target field; and obtaining the topic expressed by each author in the document according to the topic expressed by each word in the document, the word responsible for each author in the document and the contribution weight of each author in the document, and determining the interest topic of the author according to the topic expressed by the content responsible for the relevant document by the author. According to the embodiment of the application, the interest topics of the authors can be found on the premise that each author contributes inequality to one multi-author article, the interest topics of scientific research personnel are reasonably reflected, research hotspots and trends in the subject field can be beneficially explored, and personalized academic research can be promoted.

Further, the first obtaining module 301 includes:

the preprocessing module is used for acquiring authors and contribution values of each author in the document;

Further, a pre-processing module comprising:

the first weight calculation module is used for performing descending arrangement on the authors in the document according to the contribution values of the authors in the document to obtain the ordering result of each author in the document if the number of the authors in the document does not exceed the preset number value;

Further, the preprocessing module further comprises:

the second weight calculation module is used for performing descending arrangement on the authors in the document according to the contribution values of the authors in the document to obtain the ordering result of each author in the document if the number of the authors in the document exceeds a preset number value;

Further, the determining module 302 includes:

the theme determining module is used for allocating themes of preset times to all words in the document, and calculating the probability that the words are allocated to the target theme and other themes except the target theme when the theme is allocated next time according to the number of the words appearing in the document, the number of the words allocated to the target theme after the theme is allocated at this time, the number of the words allocated to the target theme in the document and the number of the words allocated to the target theme in the document for any word in the document after the theme allocation to all the words in the document is completed each time;

Further, the determining module 302 further includes:

the author confirming module is used for allocating authors of preset times to all words in the document, and after the author allocation to all the words in the document is completed each time, for any word in the document, calculating the probability that the word is allocated to the target author and other authors except the target author when the author is allocated next time according to the number of the words appearing in the document, the number of the words allocated to the target author in the document after the author is allocated at this time, and the number of the words allocated to the target author in the document;

Further, the preprocessing module further comprises:

the interest topic module is used for selecting a topic expressed by a word in the document and a word responsible for the author in the document for any author in the document according to the final weight of the author;

Further, the second obtaining module 303 includes:

the document acquisition module is used for acquiring relevant documents responsible for the author and determining a theme expressed by the content responsible for the relevant documents by the author;

An embodiment of the present application provides an electronic device, including: a memory and a processor; at least one program stored in the memory for execution by the processor, which when executed by the processor, implements: determining contribution weight of each author in the literature, a topic expressed by each word in the literature and a word responsible for each author in the literature by acquiring at least one literature in a target field; and obtaining the topic expressed by each author in the document according to the topic expressed by each word in the document, the word responsible for each author in the document and the contribution weight of each author in the document, and determining the interest topic of the author according to the topic expressed by the content responsible for the relevant document by the author. According to the embodiment of the application, the interest topics of the authors can be found on the premise that each author contributes inequality to one multi-author article, the interest topics of scientific research personnel are reasonably reflected, research hotspots and trends in the subject field can be beneficially explored, and personalized academic research can be promoted.

In an alternative embodiment, an electronic device is provided, as shown in fig. 10, the electronic device 4000 shown in fig. 10 comprising: a processor 4001 and a memory 4003. Processor 4001 is coupled to memory 4003, such as via bus 4002. Optionally, the electronic device 4000 may further comprise a transceiver 4004. In addition, the transceiver 4004 is not limited to one in practical applications, and the structure of the electronic device 4000 is not limited to the embodiment of the present application.

The Processor 4001 may be a CPU (Central Processing Unit), a general-purpose Processor, a DSP (Digital Signal Processor), an ASIC (Application Specific Integrated Circuit), an FPGA (Field Programmable Gate Array) or other Programmable logic device, a transistor logic device, a hardware component, or any combination thereof. Which may implement or perform the various illustrative logical blocks, modules, and circuits described in connection with the disclosure. The processor 4001 may also be a combination that performs a computational function, including, for example, a combination of one or more microprocessors, a combination of a DSP and a microprocessor, or the like.

Bus 4002 may include a path that carries information between the aforementioned components. The bus 4002 may be a PCI (Peripheral Component Interconnect) bus, an EISA (Extended Industry Standard Architecture) bus, or the like. The bus 4002 may be divided into an address bus, a data bus, a control bus, and the like. For ease of illustration, only one thick line is shown in FIG. 10, but this is not intended to represent only one bus or type of bus.

The Memory 4003 may be a ROM (Read Only Memory) or other types of static storage devices that can store static information and instructions, a RAM (Random Access Memory) or other types of dynamic storage devices that can store information and instructions, an EEPROM (Electrically Erasable Programmable Read Only Memory), a CD-ROM (Compact Disc Read Only Memory) or other optical Disc storage, optical Disc storage (including Compact Disc, laser Disc, optical Disc, digital versatile Disc, blu-ray Disc, etc.), a magnetic Disc storage medium or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer, but is not limited to these.

The memory 4003 is used for storing application codes for executing the scheme of the present application, and the execution is controlled by the processor 4001. Processor 4001 is configured to execute application code stored in memory 4003 to implement what is shown in the foregoing method embodiments.

The present application provides a computer-readable storage medium, on which a computer program is stored, which, when running on a computer, enables the computer to execute the corresponding content in the foregoing method embodiments. Compared with the prior art, the contribution weight of each author in the literature, the topic expressed by each word in the literature and the word responsible for each author in the literature are determined by acquiring at least one literature in the target field; and obtaining the topic expressed by each author in the document according to the topic expressed by each word in the document, the word responsible for each author in the document and the contribution weight of each author in the document, and determining the interest topic of the author according to the topic expressed by the content responsible for the relevant document by the author. According to the embodiment of the application, the interest topics of the authors can be found on the premise that each author contributes inequality to one multi-author article, the interest topics of scientific research personnel are reasonably reflected, research hotspots and trends in the subject field can be beneficially explored, and personalized academic research can be promoted.

It should be understood that, although the steps in the flowcharts of the figures are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least a portion of the steps in the flow chart of the figure may include multiple sub-steps or multiple stages, which are not necessarily performed at the same time, but may be performed at different times, which are not necessarily performed in sequence, but may be performed alternately or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

The foregoing is only a partial embodiment of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims

1. A method for analyzing interest topics of authors is characterized by comprising the following steps:

for each document, determining a topic in the document that each word in the document expresses in the document, and a word in the document that each author is responsible for; obtaining the topic expressed by each author in the document according to the topic expressed by each word in the document, the word responsible for each author in the document and the contribution weight of each author in the document;

for each author, determining related documents responsible for the author from the at least one document, and obtaining topics expressed by the content responsible for the related documents by the author to determine interest topics of the author.

2. The method of claim 1, wherein the determining the contribution weight of each author in the document comprises:

acquiring the contribution value of the author and each author in the document;

normalizing the initial weight of each author in the document to obtain the final weight of each author in the document.

3. The method for analyzing author interest topic according to claim 2, wherein the determining an initial weight of each author according to the number of authors and contribution value of each author in the document comprises:

if the number of the authors in the document does not exceed a preset number value, performing descending order arrangement on the authors in the document according to the contribution value of each author in the document to obtain an ordering result of each author in the document;

4. The method of analyzing author interest topic of claim 2, wherein the determining an initial weight of each author based on the number of authors and the contribution value of each author in the document further comprises:

if the number of the authors in the literature exceeds a preset number value, performing descending order arrangement on the authors in the literature according to the contribution value of each author in the literature to obtain an ordering result of each author in the literature;

when the sorting result of the authors is greater than a preset quantity value, taking a preset multiple of the initial weight of the first author as the initial weight of all the authors of which the sorting result is greater than the preset quantity value;

the first author is the author in the document whose ranking result is first.

5. The method for analyzing the author's topic of interest as recited in claim 1, wherein the determining the topic of each word in the document expressed in the document comprises:

allocating a preset number of themes to all words in the document, and after allocating themes to all words in the document every time, calculating the probability that the words are allocated to the target theme and other themes except the target theme when allocating themes next time according to the number of the words appearing in the document, the number of the words allocated to the target theme in the document after allocating themes this time, the number of the words allocated to the target theme in the document and the number of the words allocated to the target theme in the document for any word in the document;

according to the probability that the word is allocated to the target theme and other themes except the target theme when the theme is allocated next time, allocating the theme for the next time on the word until the allocation frequency reaches a preset threshold value;

and the target theme is the theme distributed when the word appears for the first time in the distribution.

6. The method for analyzing interest topics of authors as claimed in claim 1, wherein the determining words in the document for which each author is responsible comprises:

assigning authors of a preset number of times to all words in the document, and after assigning authors to all words in the document is completed each time, calculating the probability that a word is assigned to a target author and other authors except the target author when an author is assigned next time according to the number of the words appearing in the document, the number of the words assigned to the target author in the document after the authors are assigned this time, and the number of the words assigned to the target author in the document for any word in the document;

according to the probability that the word is allocated to the target author and other authors except the target author when the author is allocated next time, allocating authors for the next time until the allocation times reach a preset threshold value;

wherein the target author is an author assigned to the word when the word first appears in the assignment.

7. The method for analyzing interest topics of authors according to claim 2, wherein the obtaining of the topics expressed by the content responsible for each author in the document according to the topics expressed by each word in the document, the words responsible for each author in the document and the contribution weight of each author in the document comprises:

and taking the word responsible by the author as a target word, and determining the expressed topic of the author in the literature according to the expressed topic of the target word in the literature.

8. The method for analyzing interest topics of authors according to claim 1, wherein the determining related documents responsible for authors from the at least one document, and obtaining topics expressed by contents responsible for related documents of authors to determine interest topics of authors comprises:

acquiring relevant documents responsible for the author, and determining a theme expressed by the content of the author in the responsibility of the relevant documents;

calculating the occurrence probability of the interest topic of the author according to the occurrence times of the interest topic of the author in related documents in charge of the author, and taking the topic of which the probability exceeds a preset probability value as the interest topic of the author.

9. An apparatus for analyzing interest topics of authors, comprising:

the system comprises a first acquisition module, a second acquisition module and a third acquisition module, wherein the first acquisition module is used for acquiring at least one document of a target field and determining the contribution weight of each author in the document; the contribution weight is a normalized result of the contribution value of the author;

a determining module for determining, for each document, a topic of each word in the document expressed in the document and a word in charge of each author in the document; obtaining the topic expressed by each author in the document according to the topic expressed by each word in the document, the word responsible for each author in the document and the contribution weight of each author in the document;

and the second acquisition module is used for determining related documents written by the authors from the at least one document for each author, and acquiring the topics expressed by the content of the authors in the related documents to determine the interest topics of the authors.

10. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, characterized in that the processor implements the steps of the method for analyzing a topic of interest of an author as claimed in any one of claims 1 to 8 when executing said program.

11. A computer-readable storage medium, characterized in that it stores computer instructions that make the computer execute the steps of the method for analyzing the author's interest topic according to any one of claims 1 to 8.