CN115238163A

CN115238163A - Information pushing method and device based on document data, storage medium and terminal

Info

Publication number: CN115238163A
Application number: CN202110444526.9A
Authority: CN
Inventors: 江明; 李永智; 谷俊
Original assignee: Shanghai Biguan Data Technology Co ltd; Shanghai Education Talent Exchange Service Center
Current assignee: Shanghai Biguan Data Technology Co ltd; Shanghai Education Talent Exchange Service Center
Priority date: 2021-04-23
Filing date: 2021-04-23
Publication date: 2022-10-25

Abstract

An information pushing method and device based on literature data, a storage medium and a terminal are provided, wherein the information pushing method based on the literature data comprises the following steps: crawling literature data published by a plurality of users, wherein the literature data comprises literature subjects and cited literature data; if the authors of different literature data are homonymous, identifying the homonymous authors at least according to the topic similarity of the literature data; according to the crawled document data, document data published by at least some users in the plurality of users and cited document data are extracted, the published document data comprise published document quantity, and the cited document data comprise cited document quantity; calculating the evaluation result of at least part of users according to at least part of the published literature data of the users and the cited literature data; and pushing information according to the evaluation result of each user, wherein the pushed information comprises the user and/or the literature data of the user. The technical scheme of the invention can truly and accurately realize information push based on the document data.

Description

Information pushing method and device based on document data, storage medium and terminal

Technical Field

The present invention relates to the field of data processing technologies, and in particular, to an information pushing method and apparatus, a storage medium, and a terminal based on document data.

Background

The H index (H index) is a mixed quantitative index, and can be used for evaluating the academic output quantity and the academic output level of researchers. The H index of a user means that each of the Np papers published by the user is quoted at least H times, and each of the remaining Np-H papers is quoted less than or equal to H times.

However, in the prior art, the H index is not sensitive to the issue data and the high quoted data of talents, so that the evaluation of talents and the recommendation of talents by adopting the H index cannot reflect the real situation.

Disclosure of Invention

The technical problem solved by the invention is how to really and accurately realize information push based on the literature data.

In order to solve the above technical problem, an embodiment of the present invention provides an information push method based on a multilayer reference network, where the information push method based on the multilayer reference network includes: crawling literature data published by a plurality of users, wherein the literature data comprises literature subjects and cited literature data; if the authors of different document data are homonymous, identifying the homonymous authors according to the topic similarity of the document data at least; extracting published document data and cited document data of at least some users in the plurality of users according to the crawled document data, wherein the published document data comprises the published document number, and the cited document data comprises the cited document number; calculating the evaluation result of the at least part of users according to the literature data published by the at least part of users and the cited literature data; and pushing information according to the evaluation result of each user, wherein the pushed information comprises the user and/or the literature data of the user.

Optionally, the calculating the evaluation result of the at least part of users according to at least the literature data published by the at least part of users and the cited literature data includes: calculating a total number of cited documents using the cited document data; determining a zero cited document quantity in the published document data and calculating a weighted sum of the zero cited document quantity and a remaining document quantity; and calculating the evaluation result of at least part of users according to the total amount of the cited documents and the weighted sum.

Optionally, the calculating the evaluation result of the at least part of users according to at least the literature data published by the at least part of users and the cited literature data includes: calculating the total amount of cited documents by using the cited document data; determining a zero cited document quantity in the published document data and calculating a weighted sum of the zero cited document quantity and the remaining document quantity; calculating an H index of the at least some users; and combining the H index with the total amount of the cited documents and the weighted sum to obtain the evaluation result of at least part of users.

Optionally, the determining the document data published by at least part of the user and the reference document data according to the crawled document data comprises: establishing a text sending database according to a plurality of crawled documents, wherein the text sending database comprises document data published by each user; establishing a citation database according to a plurality of crawled documents, wherein the citation database comprises document data quoted by document data published by each user; and determining the quantity of the document data published by at least part of users and the quantity of cited documents according to the document publishing database and the citation database.

Optionally, the crawling of the document data published by multiple users includes: standardizing authors and a text sending mechanism in the literature data according to a preset format; and/or, if the authors of different literature data are homonymous authors, identifying the homonymous authors at least according to the topic similarity of the literature data.

Optionally, the identifying the author at least according to the topic similarity of the literature data includes: calculating topic similarity among the literature data; if the topic similarity is smaller than a first preset threshold value, determining the proportion of the appearance of the same-name organization in the document published by the same-name author in the same time period; if the ratio is larger than a second preset threshold value, determining the ratio of the collaborators in the same time period and the same mechanism of the document published by the same author; and if the ratio is larger than a third preset threshold value, determining that the same author is the same author, otherwise, determining that the same author is different authors.

Optionally, the literature data includes title, year, source, institution, keyword, and abstract; the types of documents include papers, patents, books, and meeting reports.

In order to solve the above technical problem, an embodiment of the present invention further discloses an information pushing device based on literature data, where the information pushing device based on literature data includes: the document data crawling module is used for crawling document data published by a plurality of users, and the document data comprises document subjects and cited document data; the homonymy identification module is used for identifying homonymy authors according to the topic similarity of the literature data at least if the authors of different literature data have the same name; the literature data determining module is used for determining published literature data and cited literature data of at least part of users according to a plurality of crawled literature data, the published literature data comprises published literature quantity, and the cited literature data comprises cited literature quantity; the evaluation result calculation module is used for calculating the evaluation result of the at least part of users according to the literature data published by the at least part of users and the cited literature data; and the pushing module is used for pushing information according to the evaluation result of each user, and the pushed information comprises the user and/or the literature data of the user.

The embodiment of the invention also discloses a storage medium, wherein a computer program is stored on the storage medium, and the computer program executes the steps of the information pushing method based on the literature data when being executed by a processor.

The embodiment of the invention also discloses a terminal, which comprises a memory and a processor, wherein the memory is stored with a computer program capable of running on the processor, and the processor executes the step of the information push method based on the literature data when running the computer program.

Compared with the prior art, the technical scheme of the embodiment of the invention has the following beneficial effects:

in the technical scheme of the invention, the evaluation results of at least some users are calculated by using the document data published by at least some users and the cited document data, wherein the published document data comprises the published document number, and the cited document data comprises the cited document number and is used for information push; compared with the H index in the prior art, the calculated evaluation result of the user can reflect the text sending amount and the reference frequency of the user, so that the evaluation result can reflect the text sending data and the high-quoted data of the user. In addition, by distinguishing the same-name authors of the documents, the accuracy of document data can be improved, the real situation of the user can be reflected more truly, and the accuracy of information pushing (such as talent recommendation) is improved.

Drawings

Fig. 1 is a flowchart of an information push method based on document data according to an embodiment of the present invention;

FIG. 2 is a flowchart of one embodiment of step S103 shown in FIG. 1;

fig. 3 is a schematic structural diagram of an information pushing apparatus based on document data according to an embodiment of the present invention.

Detailed Description

As described in the background art, the H-index in the prior art is not sensitive to the issue data and the highly cited data of talents, resulting in that the evaluation of talents and talent recommendation by using the H-index cannot reflect the real situation.

In the technical scheme of the invention, the evaluation results of at least some users are calculated by utilizing the document data published by at least some users and the cited document data, wherein the published document data comprises the number of published documents, and the cited document data comprises the number of cited documents and is used for information push; compared with the H index in the prior art, the calculated evaluation result of the user can reflect the text sending amount and the reference frequency of the user, so that the evaluation result can reflect the text sending data and the high cited data of the user. In addition, by distinguishing the same-name authors of the documents, the accuracy of document data can be improved, the real situation of the user can be reflected more truly, and the accuracy of information pushing (such as talent recommendation) is improved.

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below.

Fig. 1 is a flowchart of an information push method based on literature data according to an embodiment of the present invention.

The information pushing method based on the literature data according to the embodiment of the present invention may be executed by a computing device, and the computing device may be any appropriate terminal, such as a mobile phone, a computer, an internet of things device, a server, and the like, but is not limited thereto.

The document referred to in the embodiment of the present invention may be a thesis, or may be any document data with reference documents, such as a patent, a book, a conference report, and the like, and the embodiment of the present invention is not limited to this.

Specifically, the information pushing method based on the literature data may include the following steps:

step S101: crawling literature data published by a plurality of users, wherein the literature data comprises literature subjects and cited literature data;

step S102: if the authors of different document data are homonymous, identifying the homonymous authors according to the topic similarity of the document data at least;

step S103: extracting published document data and cited document data of at least some users in the plurality of users according to the crawled document data, wherein the published document data comprises the published document number, and the cited document data comprises the cited document number;

step S104: calculating the evaluation result of the at least part of users according to the literature data published by the at least part of users and the cited literature data;

step S105: and pushing information according to the evaluation result of each user, wherein the pushed information comprises the user and/or the literature data of the user.

It should be noted that the sequence numbers of the steps in this embodiment do not represent a limitation on the execution sequence of the steps.

The evaluation result in this embodiment may be used to evaluate the academic output quantity and the academic output level of the user, and may be referred to as an H + + index or any other implementable name, which is not limited in this embodiment of the present invention.

In a specific implementation of step S101, document data of multiple users may be crawled, and specifically, document data may be collected from multiple document source libraries by using a crawler technology. In a specific application scenario, the multiple users may be users belonging to the same discipline. Then by calculating the rating results for multiple users in subsequent steps, a user with a higher academic level within the discipline can be determined for talent recommendation.

Specifically, the crawled document data may include titles, years, sources, organizations, keywords, and abstracts, or other public information, which is not limited by the embodiments of the present invention.

After the document data is crawled, the crawled document data can be subjected to operations such as duplicate removal and standardization. For specific implementation of crawler technology and data cleansing, reference may be made to the prior art, and embodiments of the present invention are not limited thereto.

Because names of authors of different documents may be the same, but refer to different persons, the authors with the same name need to be distinguished, so as to avoid data statistics errors under the condition of no distinction, and ensure accuracy of subsequent calculation. In a specific implementation of step S102, it may be determined whether the authors of the same name are the same person based at least on the topic similarity of the documents. The topic similarity may be calculated based on titles, keywords, and abstracts among documents, and a natural language processing algorithm, such as an Institute of Computing Technology (ICTCLAS), jieba participle, and term-inverse document frequency (TF-IDF) algorithm, may be used in the specific calculation, which is not limited in the embodiment of the present invention.

If the similarity of the subjects of the two documents reaches a preset threshold, for example, 60%, it can be determined that the two articles are articles in the same field, and then the authors with the same name can be verified to be the same person. And if the topic similarity is found to be lower than a preset threshold, the authors with the same name are not the same person.

After the data cleaning in the step S102, the crawled document data has higher accuracy, and a foundation is laid for the accuracy of the subsequent evaluation result calculation.

In the implementation of steps S103-S104, for at least some users, the published document data, such as the document quantity, title, year, source, institution, keyword, abstract, etc. of the published document of at least some users, can be determined; and citation data of at least some of the documents published by the user, such as number of citations, title, year, source, institution, keyword, abstract, and the like. When calculating the evaluation result of at least part of users, the calculation can be performed by using at least the number of published documents and the number of cited documents. Therefore, the user's text amount and the frequency of reference can be reflected in the evaluation result of the user.

Specifically, the evaluation result may be in the form of a numerical value, such as a score, or may be in the form of a grade. The evaluation result is positively correlated with the number of published documents and the number of cited documents, and the more the number of published documents of the user is, the higher the evaluation result of the user is; the larger the number of cited documents of the user, the higher the evaluation result of the user.

The evaluation result of the embodiment of the invention can reflect the user text sending amount and can also reflect the quotation amount. For users who have few messages but high citation amount of documents, the embodiment of the invention can obtain a higher evaluation result, thereby solving the problems of unscientific talent evaluation for cold research and citation hysteresis in the prior art.

Further, in the specific implementation of step S105, the accuracy of talent recommendation can be improved by pushing information to the user according to the evaluation result of the user.

In a specific embodiment of the present invention, the calculated evaluation results of each user may be stored to form a talent database. The talent database comprises the identification of each user and the corresponding evaluation result. Further, the data in the talent database can be updated regularly.

In one non-limiting embodiment of the present invention, step S104 shown in fig. 1 may include the following steps: calculating the total amount of cited documents by using the cited document data; determining a zero cited document quantity in the published document data and calculating a weighted sum of the zero cited document quantity and the remaining document quantity; and calculating the evaluation result of at least part of users according to the total amount of the cited documents and the sum.

In this embodiment, a citation of zero exists in the document published by the user, that is, there is no citation in the document; the influence of the zero cited documents on the evaluation result of the user is small, so that the influence of the zero cited documents on the evaluation result of the user can be reduced by calculating the evaluation result by weighting the number of the zero cited documents with the number of the remaining documents. The weighted sum may reflect the amount of documents published by the user.

In addition, the total amount of cited documents can be calculated, namely, the total amount of the cited documents can be obtained by summing the number of the cited documents of all the documents published by the users to be evaluated. The evaluation result of the user can be calculated based on the total amount of the cited documents and the weighted sum.

The total number of cited documents and the weighted sum can be calculated by any feasible mathematical operation method to obtain the evaluation result of the user. For example, the sum and product of the total amount of cited documents and the above-mentioned sum can be directly multiplied; or, the total amount of the cited documents is large, and in order to avoid an excessive influence on the evaluation result, operations such as square root, cubic root, logarithm, and the like may be performed on the cited documents, and then, the cited documents may be summed up with the sum, multiplied by the sum, and the like, which is not limited in the embodiment of the present invention.

In another non-limiting embodiment of the present invention, step S104 shown in fig. 1 may include the following steps: calculating the total amount of cited documents by using the cited document data; determining a zero cited document quantity in the published document data and calculating a weighted sum of the zero cited document quantity and a remaining document quantity; calculating an H index of the at least some users; and combining the H index with the total amount of cited documents and the sum to obtain the evaluation result of at least part of users.

Unlike the previous embodiment, the embodiment of the present invention calculates the H-index of at least some users in addition to the total amount of cited documents and the above-mentioned weighted sum. And calculating the evaluation result of at least part of users based on the sum of the H index, the total amount of cited documents and the weighted sum.

The H-index, the total number of cited documents, and the weighted sum may be calculated by any feasible mathematical operation to obtain the user's evaluation result. For a specific implementation manner of calculating the H index, reference may be made to the existing one, and details of the embodiment of the present invention are not described herein again.

In an embodiment of the present invention, referring to fig. 2, step S103 shown in fig. 1 may include the following steps:

step S201: establishing a text sending database according to a plurality of crawled documents, wherein the text sending database comprises document data published by each user;

step S202: establishing a citation database according to a plurality of crawled documents, wherein the citation database comprises document data quoted by document data published by each user;

step S203: and determining the quantity of the document data published by at least part of users and the quantity of cited documents according to the document publishing database and the citation database.

In this embodiment, the crawled documents may form a talent document library, such as a talent paper database. On the basis, the data in the talent document library can be split to form a text sending database and a citation database. The file database is mainly used for storing all document data (including information such as titles, release years, source periodicals, affiliated institutions, keywords, abstracts and the like) published by talents, the citation database is mainly used for storing data of citations of other documents in the documents published by talents, so that data of the talents citations other people are formed, and citations and citation times can be screened for the designated talents in return.

Specifically, the subsequent step in determining at least part of the published document data of the user can be determined based on the data in the text database; the determination of the citation data of at least part of the user may be based on data in a citation database.

In one non-limiting embodiment of the present invention, step S101 shown in fig. 1 may be followed by the following steps: and standardizing the authors and the issuing institutions in the literature data according to a preset format.

The embodiment of the invention can be used for cleaning the crawled data. The author and the issuing organization in the literature data can be standardized according to a preset format. Such as formatting process of different abbreviation methods in the literature by the author and standardization process described by the same author, differentiation and processing of different papers by the same author, standardization process of description method by different organizations in the papers, etc.

Furthermore, when distinguishing different papers of the same author, the following steps can be implemented: calculating topic similarity between the literature data; if the topic similarity is smaller than a first preset threshold value, determining the proportion of the appearance of the same-name organization in the document published by the same-name author in the same time period; if the proportion is larger than a second preset threshold value, determining the proportion of collaborators in the same time period and the same mechanism in the document published by the same author; and if the ratio is larger than a third preset threshold value, determining that the same author is the same author, otherwise, determining that the same author is different authors.

In specific implementation, because the author names are written in different ways, for example, first name is after first name, first name is after last name, name is abbreviated, and the like, in this embodiment, preliminary identification of the author name may be performed in advance, and then deduplication processing of the author with the same name may be performed. Firstly, extracting author information in all documents, sequencing all authors, and carrying out first combination on authors with completely the same name in scanning sequencing to form a primary author set. Then arranging the rest authors in sequence according to the alphabetic sequence of the author names, comparing every two authors, and if two character strings are compared to completely match the sequence of the shorter character string, considering that the two names are possibly the same person; for further confirmation, the "last name" and "first name" in the two original names need to be extracted and compared according to different arrangement modes of the last name and the first name (the comparison method is the same as above, but the alphabetical order adjustment is not performed). If a shortest matching arrangement (e.g., thomas Huang and t. Huang) appears after traversing all the arrangements, the two names are further confirmed to be the same person, and are merged for the second time and put into the primary author set. A second merge is performed and supplemented into the preliminary author set.

In the preliminary author set, the same author is subjected to distinguishing processing. Specifically, all articles of the same author are traversed, and the similarity of the articles is compared pairwise to confirm whether the articles belong to the same field. Similarity comparisons may be made using titles, keywords, and abstracts, for example, computed using ICTCCLAS, jieba participles, and TF-IDF algorithms. If the topic similarity exceeds a first preset threshold (for example, 60%), then it can be determined that the two articles are articles in the same domain, and the same author is preliminarily verified to be the same person. If the topic similarity is found to be below a first preset threshold, it is considered that the same author may not be the same person. After the above steps are finished, further processing is needed.

Identifying the organization in which the author is located. And (4) carrying out homonymy processing according to a corresponding author mechanism in the article, and if the same person is preliminarily verified, identifying the time of the mechanism where the author is located according to the publication year of the article. If the preliminary verification is not the same person, the institution needs to be further identified, the proportion of the same-name institution appearing in the same time period (for example, 5 years) is determined, and if the proportion exceeds a second preset threshold (for example, 0.5) of the threshold, two situations are indicated, wherein one situation is that two same-name persons with different specialties appear in the same institution, and the other situation is that the person is the same person, but the research direction is changed. Therefore, further determination of their collaborators is needed.

If the proportion of the same collaborator exceeds a third preset threshold (e.g. 30%) within the same time period and the same institution, the person is considered to be the same person, only the study direction has changed; if the ratio is below a third preset threshold, then it is interpreted that the two people are different authors of the same organization. After the processing, the same-name authors of different document data can be distinguished according to the rule, so that the authors can be classified and the mechanisms can be identified.

The embodiment of the present invention further discloses an information pushing device 30 based on literature data, where the information pushing device 30 based on literature data may include:

a literature data crawling module 301, configured to crawl literature data published by multiple users, where the literature data includes literature topics and cited literature data;

the homonym identification module 302 is used for identifying homonyms of authors of different document data at least according to the topic similarity of the document data if the authors of the different document data are homonymous;

a document data determining module 303, configured to determine published document data and cited document data of the at least some users according to the crawled document data, where the published document data includes a published document number, and the cited document data includes a cited document number;

an evaluation result calculation module 304, configured to calculate an evaluation result of the at least part of users according to at least the literature data published by the at least part of users and the cited literature data;

the pushing module 305 is configured to push information according to the evaluation result of each user, where the information includes the user and/or literature data of the user.

In the embodiment of the invention, the evaluation result of at least part of users is calculated by utilizing the document data published by at least part of users and the cited document data, wherein the published document data comprises the number of published documents, and the cited document data comprises the number of cited documents and is used for information push; compared with the H index in the prior art, the calculated evaluation result of the user can reflect the text sending amount and the reference frequency of the user, so that the evaluation result can reflect the text sending data and the high cited data of the user. In addition, by distinguishing the same-name authors of the documents, the accuracy of document data can be improved, the real situation of the user can be reflected more truly, and the accuracy of information pushing (such as talent recommendation) is improved.

For more details of the operation principle and the operation mode of the information pushing apparatus 30 based on literature data, reference may be made to the relevant descriptions in fig. 1 to fig. 2, and details are not repeated here.

The embodiment of the invention also discloses a storage medium, which is a computer-readable storage medium and stores a computer program thereon, and the computer program can execute the steps of the method shown in fig. 1 or fig. 2 when running. The storage medium may include ROM, RAM, magnetic or optical disks, etc. The storage medium may further include a non-volatile memory (non-volatile) or a non-transitory memory (non-transient), and the like.

The embodiment of the invention also discloses a terminal which can comprise a memory and a processor, wherein the memory is stored with a computer program which can run on the processor. The processor, when executing the computer program, may perform the steps of the method shown in fig. 1 or fig. 2. The terminal includes, but is not limited to, a mobile phone, a computer, a tablet computer and other terminal devices.

A terminal in the embodiments of the present application may refer to various forms of terminals, access terminals, subscriber units, subscriber stations, mobile stations (mobile stations, MS), remote stations, remote terminals, mobile devices, user terminals, terminal devices (terminal equipment), wireless communication devices, user agents, or user equipment. The terminal device may also be a cellular phone, a cordless phone, a Session Initiation Protocol (SIP) phone, a Wireless Local Loop (WLL) station, a Personal Digital Assistant (PDA), a handheld device with a Wireless communication function, a computing device or other processing device connected to a Wireless modem, a vehicle-mounted device, a wearable device, a terminal device in a future 5G Network or a terminal device in a future evolved Public Land Mobile Network (PLMN), and the like, which is not limited in this embodiment.

It should be understood that the term "and/or" herein is only one kind of association relationship describing the association object, and means that there may be three kinds of relationships, for example, a and/or B, and may mean: a exists alone, A and B exist simultaneously, and B exists alone. In addition, the character "/" in this document indicates that the former and latter related objects are in an "or" relationship.

The "plurality" appearing in the embodiments of the present application means two or more.

The descriptions of the first, second, etc. appearing in the embodiments of the present application are only for the purpose of illustrating and differentiating the description objects, and do not represent any particular limitation to the number of devices in the embodiments of the present application, and cannot constitute any limitation to the embodiments of the present application.

The term "connect" in the embodiments of the present application refers to various connection manners, such as direct connection or indirect connection, to implement communication between devices, which is not limited in this embodiment of the present application.

It should be understood that, in the embodiment of the present application, the processor may be a Central Processing Unit (CPU), and the processor may also be other general-purpose processors, digital Signal Processors (DSPs), application Specific Integrated Circuits (ASICs), field Programmable Gate Arrays (FPGAs) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, and the like. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.

It will also be appreciated that the memory in the embodiments of the subject application may be either volatile memory or nonvolatile memory, or may include both volatile and nonvolatile memory. The nonvolatile memory may be a read-only memory (ROM), a Programmable ROM (PROM), an Erasable PROM (EPROM), an Electrically Erasable PROM (EEPROM), or a flash memory. Volatile memory may be Random Access Memory (RAM) which acts as external cache memory. By way of example and not limitation, many forms of Random Access Memory (RAM) are available, such as Static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDR SDRAM), enhanced SDRAM (enhanced SDRAM), synchronous DRAM (SLDRAM), synchronous Link DRAM (SLDRAM), and direct bus RAM (DR RAM).

The above embodiments may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, the above-described embodiments may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions or computer programs. The procedures or functions described in accordance with the embodiments of the present application are produced in whole or in part when the computer instructions or the computer program are loaded or executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another computer readable storage medium, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire or wirelessly. The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains one or more collections of available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium. The semiconductor medium may be a solid state disk.

In the several embodiments provided in the present application, it should be understood that the disclosed method, apparatus and system may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative; for example, the division of the unit is only a logic function division, and there may be another division manner in actual implementation; for example, various elements or components may be combined or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may be physically included alone, or two or more units may be integrated into one unit. The integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

The integrated unit implemented in the form of a software functional unit may be stored in a computer-readable storage medium. The software functional unit is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute some steps of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk.

Although the present invention is disclosed above, the present invention is not limited thereto. Various changes and modifications may be effected therein by one skilled in the art without departing from the spirit and scope of the invention as defined in the appended claims.

Claims

1. An information pushing method based on literature data is characterized by comprising the following steps:

crawling literature data published by a plurality of users, wherein the literature data comprises literature subjects and cited literature data;

if the authors of different literature data are homonymous, identifying the homonymous authors at least according to the topic similarity of the literature data;

extracting published document data and cited document data of at least some users in the plurality of users according to the crawled document data, wherein the published document data comprises the published document number, and the cited document data comprises the cited document number;

calculating the evaluation result of the at least part of users according to the literature data published by the at least part of users and the cited literature data;

and pushing information according to the evaluation result of each user, wherein the pushed information comprises the user and/or the literature data of the user.

2. The information pushing method based on literature data according to claim 1, wherein the calculating the evaluation result of the at least some users according to at least the literature data published by the at least some users and the cited literature data comprises:

calculating the total amount of cited documents by using the cited document data;

determining a zero cited document quantity in the published document data and calculating a weighted sum of the zero cited document quantity and a remaining document quantity;

and calculating the evaluation result of at least part of users according to the total amount of the cited documents and the weighted sum.

3. The information pushing method based on literature data according to claim 1, wherein the calculating the evaluation result of the at least some users according to at least the literature data published by the at least some users and the cited literature data comprises:

calculating a total number of cited documents using the cited document data;

calculating an H index of the at least some users;

and combining the H index with the total amount of the cited documents and the weighted sum to obtain the evaluation result of at least part of users.

4. The method according to claim 1, wherein the determining the published literature data and the cited literature data of at least some users according to the crawled literature data comprises:

establishing a document sending database according to a plurality of crawled documents, wherein the document sending database comprises document data published by each user;

establishing a citation database according to a plurality of crawled documents, wherein the citation database comprises document data quoted by document data published by each user;

and determining the quantity of the document data published by at least part of users and the quantity of cited documents according to the document publishing database and the citation database.

5. The method according to claim 1, wherein crawling published literature data of multiple users comprises:

and standardizing the authors and the issuing institutions in the literature data according to a preset format.

6. The information pushing method based on literature data according to claim 1, wherein said identifying authors of the same name at least according to topic similarity of literature data comprises:

calculating topic similarity between the literature data;

if the topic similarity is smaller than a first preset threshold value, determining the proportion of the appearance of the same-name organization in the document published by the same-name author in the same time period;

if the ratio is larger than a second preset threshold value, determining the ratio of the collaborators in the same time period and the same mechanism of the document published by the same author;

and if the ratio is larger than a third preset threshold value, determining that the same author is the same author, otherwise, determining that the same author is different authors.

7. The information pushing method based on literature data according to any one of claims 1 to 6, wherein the literature data comprises title, year, source, organization, keyword and abstract; the types of documents include papers, patents, books, and meeting reports.

8. An information pushing apparatus based on document data, comprising:

the document data crawling module is used for crawling document data published by a plurality of users, and the document data comprises document subjects and cited document data;

the homonymy identification module is used for identifying homonymy authors according to the topic similarity of the literature data at least if the authors of different literature data have the same name;

the literature data determining module is used for determining published literature data and cited literature data of at least part of users according to a plurality of crawled literature data, wherein the published literature data comprises published literature quantity, and the cited literature data comprises cited literature quantity;

the evaluation result calculation module is used for calculating the evaluation results of the at least part of users at least according to the literature data published by the at least part of users and the cited literature data;

and the pushing module is used for pushing information according to the evaluation result of each user, and the pushed information comprises the user and/or the literature data of the user.

9. A storage medium having a computer program stored thereon, wherein the computer program is executed by a processor to perform the steps of the document data based information pushing method according to any one of claims 1 to 7.

10. A terminal comprising a memory and a processor, wherein the memory stores a computer program operable on the processor, and the processor executes the computer program to perform the steps of the information pushing method based on literature data according to any one of claims 1 to 7.