CN101887460A - Document quality assessment method and application - Google Patents

Document quality assessment method and application Download PDF

Info

Publication number
CN101887460A
CN101887460A CN2010102263535A CN201010226353A CN101887460A CN 101887460 A CN101887460 A CN 101887460A CN 2010102263535 A CN2010102263535 A CN 2010102263535A CN 201010226353 A CN201010226353 A CN 201010226353A CN 101887460 A CN101887460 A CN 101887460A
Authority
CN
China
Prior art keywords
document
author
summit
transition probability
expression
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2010102263535A
Other languages
Chinese (zh)
Inventor
张铭
封盛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Peking University
Original Assignee
Peking University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Peking University filed Critical Peking University
Priority to CN2010102263535A priority Critical patent/CN101887460A/en
Publication of CN101887460A publication Critical patent/CN101887460A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a document quality assessment algorithm applied to a document sharing platform. The algorithm comprises the following steps of: constructing an academic network graph by using a relationship between a document-document and document-periodical session and a writer; quantifying the relationship as a transition relationship between vertexes on the graph, and acquiring a transition probability matrix by modeling; establishing a model by using the collection behavior of a user on documents, and calculating a user analysis-based document quality value; and performing a random walk iterative algorithm with restart on the graph to obtain information on document quality, periodical session quality and writer academic reputation. The document quality assessment algorithm combines the user behavior information and the document quality assessment for the first time, and can give the document quality analysis result and give the analysis results of the writer academic reputation and the periodical session academic quality at the same time, and obviously improves the ordering effect compared with other methods.

Description

A kind of document quality assessment method and application
Technical field
The present invention relates to a kind of method for evaluating quality of document, be specifically related to a kind of document quality assessment method on the document shared platform, belong to the knowledge excavation technical field.
Background technology
In recent years since, along with the develop rapidly of scientific research, the publication speed of scientific and technical literature increases year by year, and its quantity is very huge, for example only just has more than 150 ten thousand pieces of scientific and technical literatures on the CiteSeerX of digital library at computing machine and information science field.The scientific research personnel needs to read in the process of carrying out research work and with reference to a large amount of scientific and technical literature data, high-quality document and low-quality document are far different for the value of researcher, and be very different from these and obtain the scientific and technical literature with higher-value the documents and materials that quantity is very huge and become the very work of difficulty.Therefore, how the quality of scientific and technical literature is effectively assessed this research topic automatically and also attracted increasing researchist.
Share to exchange on the website at the socialization document in academic research field, the user can collect and oneself think more valuable scientific and technical literature, and the mark label is commented on, and these documents are shared with other user.User's collection behavior should become an important reference when the quality of scientific and technical literature is analyzed, and has utilized user's behavior to come research that the scientific and technical literature quality is analyzed also considerably less at present.Therefore, under Web 2.0 environment, how user behavior effectively is applied in the scientific and technical literature QA system, is worth further research.
Scientific paper is carried out quality evaluation, and the existing evaluation method of academia mainly comprises the peer review, citation analysis and based on the method for link analysis.The peer review is generally used for estimating the early stage of paper, as meeting or periodical evaluation submission paper; The quoted passage evaluation is used for the later stage to be estimated, for example the academic level that published thesis of evaluation study personnel.
The peer review, promptly various aspects such as the quality of being finished from the meaning of selected problem and novelty, research method, research by self experts and scholars of identical research field, thesis writing level are carried out comprehensive evaluation.The advantage of the peer review is that the expert is careful and accurately to the evaluation of research quality, and the expert relies on the deep scientific attainments of association area can see the horizontal relative superiority or inferiority of academic research clearly; Shortcoming is then that current evaluation system is not perfect, tight cause easily " abuses " of " colleague " self-discipline, and a large amount of scientific papers gone together with estimating wastes time and energy, and is unrealistic.
Citation analysis, promptly utilize between scientific paper quote and the relation of being cited adopts certain concrete grammar and evaluation criterion that paper is carried out quality assessment.The researchist of citation analysis method has proposed the quality evaluation index of a series of quantifications, is for example drawn the frequency, factor of influence etc.With respect to the peer review, the evaluation method of citation analysis is simpler, is easy to utilize computing machine to finish automatically; Meanwhile, the result of citation analysis is more coarse, and must utilize quoting and the relation of being cited between paper, and to the document of newly delivering, because it is less to be cited, the evaluation that often provides is on the low side, and limitation is stronger.
Brin and Page have proposed the PageRank algorithm in 1998 based on the linking relationship between the webpage and have come webpage according to its importance sorting, and have founded the Google search engine based on this.Kleinberg has proposed another link analysis algorithm HITS algorithm.Afterwards, consider the link structure that concerns natural formation between the scientific and technical literature by reference, a lot of researchists solve the problem of document quality assessment aspect based on the thought of these methods.
Summary of the invention
The objective of the invention is to pass through to the relationship modeling between document, author and the periodical meeting and analyze, utilize under Web 2.0 environment relation between the user behavior and document quality to assist to analyze the document quality.The present invention is unified in the peer review and these two kinds of analytical approachs of citation analysis under the random walk algorithm frame that band restarts, and provides final analysis result.
It is (flow process is as shown in Figure 1) that the present invention solves the scheme that its technical matters adopts:
The present invention proposes a kind of assessment document method for quality, this method is applied to the scientific and technical literature shared platform, and on this platform, the user can collect, add label, comment on, is shared with other users to document, it is characterized in that, said method comprising the steps of:
A. utilize adduction relationship, document and periodical meeting and author's relation and the delivering the time of document of document, make up the digraph of cum rights, be called academic network chart;
B. adduction relationship, the document with document quantitatively becomes the transfer relationship between the summit on the figure with periodical meeting and author's relation, and modeling obtains the transition probability matrix on the academic network chart;
C. utilize the user that model is set up in the collection behavior of document, consider the collection time, utilize the HITS algorithm computation to obtain a document mass value based on customer analysis;
D. the model of setting up according to step B and step C, with the random walk iteration that restarts, up to result's convergence, obtain the probable value on each summit on the academic network chart, this probable value is the information of document quality, periodical meeting quality and author's acientific reputation.
Method provided by the invention not only can be used for the scientific and technical literature shared platform, equally also is applicable to paper shared platform or website (document wherein refers to paper), and picture shared platform or website (document wherein refers to picture) etc.
Beneficial effect of the present invention:
The method for evaluating quality that is applied to scientific and technical literature that the present invention proposes based on figure, for the first time user behavior information and document quality assessment are combined, can provide the analysis result that the document quality analysis can also provide author's acientific reputation and periodical meeting science quality as a result the time.As apply the present invention to the Indexing of Scien. and Tech. Literature website, to the user according to key search to the result carry out mass value ordering, can help the user to find high-quality scientific and technical literature sooner, recognize high-quality periodical and meeting sooner, and the high author of acientific reputation.The ordering effect that experiment showed, this method is compared additive method and is significantly improved.
Description of drawings
Fig. 1 is the general flow chart according to the scientific and technical literature method for evaluating quality based on figure of the present invention;
Fig. 2 is the academic network chart that makes up according to the present invention;
Fig. 3 is transfer relationship figure between the summit on the academic network chart that makes up according to the present invention;
Fig. 4 is the user-document collection graph of a relation that makes up according to the present invention.
Embodiment
Below in conjunction with the drawings and specific embodiments the present invention is described in further detail:
Step 1 is utilized adduction relationship, document and periodical meeting and author's relation and the delivering the time of document of document, makes up the digraph of cum rights, is called academic network chart.
The academic network chart of design construction of the present invention is made up of three parts, and the relation between document, author, three kinds of entities of periodical meeting is carried out modeling.Three parts are respectively:
● the interconnected subgraph G of bibliographic citation Dd=(V d, E Dd),
G DdBe digraph, the adduction relationship between the expression document, wherein V dBe the document vertex set, E DdBe the limit collection, directed edge<d i, d j∈ E DdExpression document d iQuoted document d j
● author-document subgraph G Ad=(V a∪ V d, E Ad),
G AdBe a bigraph (bipartite graph), the works relation between expression author and the document, wherein V aBe author's vertex set, E AdBe the limit collection, nonoriented edge (a i, d j) ∈ E AdExpression author a iWrite document d j
● periodical meeting-document subgraph G Cd=(V c∪ V d, E Cd),
G CdBe a bigraph (bipartite graph), the relation of delivering between expression periodical meeting and the document, wherein V cBe periodical, meeting vertex set, E CdBe the limit collection, nonoriented edge (c i, d j) ∈ E CdExpression document d jBe published in periodical or meeting c iOn;
The combination of these three subgraphs is academic network chart, as shown in Figure 2.
Define academic network chart and be digraph G=(V, E).Wherein V is a vertex set, V=V a∪ V d∪ V c, E is the limit collection, E=E Dd∪ E Ad∪ E CdConsider that random walk need carry out on digraph, therefore author-document subgraph and each the bar nonoriented edge in periodical meeting-document subgraph all are expressed as being connected two directed edges on these two summits here, for example: (c i, d jThe c of) →< i, d j∪<d j, c i.
Step 2 quantitatively becomes the transfer relationship between the summit on the figure with adduction relationship, the document of document with periodical meeting and author's relation, and modeling obtains the transition probability matrix on the academic network chart.
An author, one piece of document or a periodical/meeting are represented in each summit among the academic network chart G, and therefore scheming G is an isomery figure who comprises three kinds of dissimilar entities.The present invention is to the different transition probability α of the definition of the transfer between dissimilar summits (entity), as shown in Figure 3.For these transition probability parameters, definition:
α ad=α cd=1
α dadcdd=1
Wherein: α AdBe the transition probability from the author summit to the document summit, α CdFor from delivering the transition probability of summit, place to the document summit, α DaBe the transition probability from the document summit to the author summit, α DcFor from the document summit to the transition probability of delivering the summit, place, α DdBe transition probability from the document summit to the document summit.
Definition W (G) is the cum rights adjacency matrix of figure G, and corresponding to the weight that concerns between the different summits in the academic network chart, according to the definition of front to academic network chart, W (G) can be broken down into a series of submatrixs as shown in the table.At first, the present invention obtains initial cum rights adjacency matrix to each submatrix initialize; Then, the initial value of matrix is used weights score function, obtain final cum rights adjacency matrix; At last, based on the cum rights adjacency matrix, calculate transition probability matrix again.
Figure BDA0000023283060000041
Below provide the original definition of these submatrixs respectively:
● the cum rights adjacency matrix from the document summit to the document summit
Figure BDA0000023283060000042
Wherein t (d) expression document d's delivers the time Γ Dd(d i) expression document d iThe set of the document of quoting.
● the cum rights adjacency matrix from summit, author place to the document summit
Figure BDA0000023283060000043
Γ wherein Ad(a i) expression author a iDeliver the set of document,
Figure BDA0000023283060000044
Author a is the k author of document d.
● the cum rights adjacency matrix W from the document summit to the author summit Da(j, i)=| Γ Da(d j) |-k+1
Γ wherein Da(d j) expression document d jAuthor set, k represents author a iBe document d jK author.
● from the document summit to the cum rights adjacency matrix of delivering the summit, place
Figure BDA0000023283060000051
● from delivering the cum rights adjacency matrix of summit, place to the document summit
Figure BDA0000023283060000052
C wherein IkExpression meeting c iA certain, perhaps periodical c iA certain volume, Γ Cd(c Im) expression is published in c ImOn literature collection, t (c Im) expression c ImThe corresponding time (time).
Γ cd(c ik)={d|t(d)=t(c ik)∧d∈Γ cd(c i)}
Obviously, Y k Γ cd ( c ik ) = Γ cd ( c i ) , And
Figure BDA0000023283060000054
∀ k , l , t ( c ik ) ≠ t ( c il ) .
Next the initial weight in the matrix is used a weights score function Ф:
W(i,j)=Ф(W(i,j))
The standard of suitable weights score function is: this function should be a monotonically increasing function, but along with the increase of independent variable value, the increasing degree of functional value reduces gradually, that is: Ф ' (x)>0 and Ф " (x)<0, get in this method
Figure BDA0000023283060000056
Next, at first define the transition probability matrix of three subgraph correspondences, calculate the transition probability matrix of whole academic network chart at last.
● reference citation subgraph G Dd
Document to the transition probability matrix of document is:
M dd = ( M dd ( i , j ) ) i , j ∈ V d
Wherein,
M dd ( i , j ) = P ( d j | d i ) = W dd ( i , j ) Σ k W dd ( i , k )
● author-document subgraph G Ad
The author to the transition probability matrix of document is:
M ad = ( M ad ( i , j ) ) i ∈ V a , j ∈ V d
Wherein,
M ad ( i , j ) = P ( d j | a i ) = W ad ( i , j ) Σ k W ad ( i , k )
Document to author's transition probability matrix is:
M da = ( M da ( i , j ) ) i ∈ V a , j ∈ V d
Wherein,
M da ( j , i ) = P ( a i | d j ) = W da ( j , i ) Σ k W da ( j , k )
● periodical meeting-document subgraph G Cd
Document to the transition probability matrix of periodical meeting is:
M dc = ( M dc ( i , j ) ) i ∈ V d , j ∈ V c
Wherein,
M dc ( i , j ) = P ( c j | d i ) = W dc ( i , j ) Σ k W dc ( i , k )
The periodical meeting to the transition probability matrix of document is:
M cd = ( M cd ( j , i ) ) i ∈ V d , j ∈ V c
Wherein,
M cd ( j , i ) = P ( d i | c j ) = W cd ( j , i ) Σ k W cd ( j , k )
By the transition probability matrix of subgraph, obtain the transition probability matrix on the academic network chart:
M ( G ) = ( P ( j | i ) ) i , j ∈ V = α dd M dd α da M da α dc M dc M ad 0 0 M cd 0 0
Step 3 utilizes the user that model is set up in the collection behavior of document, considers the collection time, utilizes the HITS algorithm computation to obtain a document mass value based on customer analysis.
The present invention will couple together structuring user's-document collection graph of a relation by the collection behavior between document and the user, and user and document are the summits among the figure, and the collection behavior is the limit, as shown in Figure 4.The present invention define user-document collection system be B=(U, D, T, R), wherein U is that the user gathers, D is a literature collection, T is the set of a series of time points,
Figure BDA0000023283060000071
The set of expression collection relation.(u, d, t) ∈ R, expression user u has collected document d at moment t.
The mass value vector of definition literature collection is: q=(q 1, q 2, Λ, q m), m=|D| wherein; Expert's degree vector of definition user set is: e=(e 1, e 2, Λ, e n), n=|U| wherein.The adjacency matrix A of definition user-document collection graph of a relation:
Figure BDA0000023283060000072
Calculating document mass value and user expert's degree is exactly that the iterative process that is repeated below restrains up to the result:
q=e×A
e=q×A T
Step 4, according to the model of step 2 and step 3 foundation, the random walk iteration with restarting restrains up to the result, obtain the probable value on each summit on the academic network chart, this probable value is the information of document quality, periodical meeting quality and author's acientific reputation.
If d is a document mass value vector, a is author's acientific reputation vector, and c is a periodical meeting mass value vector.The vector of corresponding three kinds of entities is connected into vector: a π=[d T, a T, c T] TThe random walk algorithm that restarts of band can be with following equation expression:
π t+1=cM Tπ t+(1-c)Q,0≤c≤1
Adopt following method to make up Q:
Q (i) is standardized, make Σ i ∈ V Q ( i ) = | V | .
When judging whether to restrain, the π vector that twice iteration in adjacent front and back obtained subtracts each other, if difference is less than 10 -6, judge that then it is convergence.Suppose that the vector that obtains at last is π n, then value wherein is document mass value, author's acientific reputation value and periodical meeting mass value.
Performance evaluating
Scientific and technical literature quality evaluating method of the present invention is that document, periodical meeting and author have provided a quality score value, the ranking results that utilizes this score value the to obtain evaluation and test that experimentizes.
At first the result to the document quality assessment evaluates and tests, and chooses three fields: the document of " Opinion Mining ", " Topic Model " and " Social Network " is evaluated and tested.The main utilization manually of the artificial evaluation and test of the experiment of literature review evaluated and tested in conjunction with DCG (Discounted Cumulative Gain) evaluation and test algorithm the mode of quality ranking results marking.Evaluation and test person gives different score values according to the quality difference of different documents to it, and the document that score value is high more should come the front of ranking results more.Afterwards, use DCG evaluation and test algorithm to come the result is evaluated and tested, the DCG value is high more, illustrates that the ranking results of algorithm output corresponds to actual needs more.The computing formula of DCG evaluation and test value is:
DCG p = score 1 + Σ i = 2 p score i log 2 i
Score wherein iGive the score value of i item in the ranking results for evaluation and test person.
To the evaluation of document quality, the control methods of being adopted is as follows:
● the document part in the PageRank arithmetic result
● the document part in the PopRank arithmetic result
● the document part among Random Walk algorithm (RW) result on the academic network chart
● document is drawn number of times (Citation Count): document is tested the number of times that is cited in the collection of thesis of employing at this paper.
Below be evaluation result (for the ease of expression, method of the present invention is designated as RW+U):
??Opinion??Mining ??Topic??Model ??Social??Network
??PageRank ??13.66547 ??15.78191 ??12.03448
??PopRank ??17.16117 ??17.33343 ??16.13546
??CitationCount ??17.40092 ??17.21429 ??14.63348
??RW ??17.68033 ??17.90367 ??16.81558
??RW+U ??18.16559 ??18.41081 ??17.28261
Next is that evaluation experimental result to author's acientific reputation evaluates and tests, and method is identical with document quality assessment experiment, and control methods is as follows:
Author's part in the PageRank arithmetic result
Author's part in the PopRank arithmetic result
Author's part among Random Walk algorithm (RW) result on the academic network chart
Deliver document number (Publication Count): the document sum that the author delivers in the field archives of experiment
The field document is drawn number of times (Citation Count): the quilt of the document that the author delivers in the field archives of experiment draws the number of times summation
Evaluation result is as follows:
??Opinion??Mining ??Topic??Model ??Social??Network
??PubNum ??12.79365 ??13.29129 ??10.64821
??CitationCount ??16.86091 ??14.12744 ??11.1679
??PageRank ??15.53911 ??14.77489 ??13.77117
??PopRank ??17.33779 ??15.87551 ??16.48075
??RW ??17.81661 ??16.5786 ??16.87568
??RW+U ??17.99627 ??16.61291 ??16.8852
Be that the academic quality evaluation result of periodical is evaluated and tested at last.Consider that factor of influence is the periodical quality evaluating method that generally adopts in the academia, so the normative reference of evaluation and test is the result of revision influencing factors analysis method.Revision factor of influence computing method are as follows:
mIF X = C D
Wherein, D is the sum of the document delivered on the periodical X, and C is these documents number of times sums that are cited.
Method for periodical evaluation evaluation and test is top n result's a accuracy rate, and its computing method are as follows:
Figure BDA0000023283060000092
Below be evaluation result:
??P@50 ??P@80 ??P@100
??PageRank ??0.24 ??0.3375 ??0.4
??PopRank ??0.42 ??0.425 ??0.47
??RW ??0.42 ??0.425 ??0.47
??RW+U ??0.44 ??0.4375 ??0.48
Last table is depicted as the distribution situation per year of the document mass value mean value in several arithmetic result.What list here is mean value from 1971 to 2009, and annual average is with delivering the mass value sum of document divided by the document number of delivering then.As can be seen from the figure, method RW of the present invention and RW+U will generally be higher than other two kinds of methods to the mass value of new document, illustrate that method of the present invention has solved the general problem on the low side of new literature review result in the classic method.
It should be noted that the purpose of publicizing and implementing example is to help further to understand the present invention, it will be appreciated by those skilled in the art that: in the spirit and scope that do not break away from the present invention and claims, various substitutions and modifications all are possible.For example, the present invention can be applied to paper shared platform or website (only needing to replace document with paper) equally, and picture shared platform or website (only needing to replace document with picture) etc.Therefore, the present invention should not be limited to the disclosed content of embodiment, and the scope of protection of present invention is as the criterion with the scope that claims define.

Claims (8)

1. assess the document method for quality for one kind, this method is applied to the scientific and technical literature shared platform, and on this platform, the user can collect, add label, comment on, is shared with other users to document, it is characterized in that, said method comprising the steps of:
A. utilize adduction relationship, document and periodical meeting and author's relation and the delivering the time of document of document, make up the digraph of cum rights, be called academic network chart;
B. adduction relationship, the document with document quantitatively becomes the transfer relationship between the summit on the figure with periodical meeting and author's relation, and modeling obtains the transition probability matrix on the academic network chart;
C. utilize the user that model is set up in the collection behavior of document, consider the collection time, utilize the HITS algorithm computation to obtain a document mass value based on customer analysis;
D. the model of setting up according to step B and step C, with the random walk iteration that restarts, up to result's convergence, obtain the probable value on each summit on the academic network chart, this probable value is the information of document quality, periodical meeting quality and author's acientific reputation.
2. the method for claim 1 is characterized in that, the academic network chart described in the steps A is made up of three subgraphs, is respectively:
● the interconnected subgraph G of bibliographic citation Dd=(V d, E Dd),
G DdBe digraph, the adduction relationship between the expression document, wherein V dBe the document vertex set, E DdBe the limit collection, directed edge<d i, d j∈ E DdExpression document d iQuoted document d j
● author-document subgraph G Ad=(V a∪ V d, E Ad),
G AdBe a bigraph (bipartite graph), the works relation between expression author and the document, wherein V aBe author's vertex set, E AdBe the limit collection, nonoriented edge (a i, d j) ∈ E AdExpression author a iWrite document d j
● periodical meeting-document subgraph G Cd=(V c∪ V d, E Cd),
G CdBe a bigraph (bipartite graph), the relation of delivering between expression periodical meeting and the document, wherein V cBe periodical, meeting vertex set, E CdBe the limit collection, nonoriented edge (c i, d j) ∈ E CdExpression document d jBe published in periodical or meeting c iOn; Academic network chart be digraph G=(V, E), vertex set V=V wherein a∪ V d∪ V c, limit collection E=E Dd∪ E Ad∪ E CdAuthor-document subgraph G AdWith periodical meeting-document subgraph G CdIn each bar nonoriented edge all replace with two directed edges that connect these two summits, limit.
3. method as claimed in claim 2 is characterized in that, the implementation method of described step B is:
B1. the transfer between the dissimilar summits is defined different transition probability α,
α ad=α cd=1
α dadcdd=1
α AdBe the transition probability from the author summit to the document summit, α CdFor from delivering the transition probability of summit, place to the document summit, α DaBe the transition probability from the document summit to the author summit, α DcFor from the document summit to the transition probability of delivering the summit, place, α DdBe transition probability from the document summit to the document summit;
B2. the cum rights adjacency matrix W (G) of definition figure G corresponding to the weight that concerns between the different summits in the academic network chart, according to the definition of academic network chart, is decomposed into a series of submatrixs: W to W (G) Dd, W Ad, W Da, W Dc, W Cd, W wherein DdBe the cum rights adjacency matrix from the document summit to the document summit, W AdBe the cum rights adjacency matrix from summit, author place to the document summit, W DaBe the cum rights adjacency matrix from the document summit to the author summit, W DcFor from the document summit to the cum rights adjacency matrix of delivering the summit, place, W CdFor from delivering the cum rights adjacency matrix of summit, place to the document summit;
B3. each submatrix initialize is obtained initial cum rights adjacency matrix:
Figure FDA0000023283050000021
Wherein t (d) expression document d's delivers the time Γ Dd(di) expression document d iThe set of the document of quoting;
Figure FDA0000023283050000022
Γ wherein Ad(a i) expression author a iDeliver the set of document,
Figure FDA0000023283050000023
Author a is the k author of document d;
c)W da(j,i)=|Γ da(d j)|-k+1
Γ wherein Da(d j) expression document d jAuthor set, k represents author a iBe document d jK author;
Figure FDA0000023283050000024
Figure FDA0000023283050000025
C wherein IkExpression meeting c iA certain, perhaps periodical c iA certain volume, Γ Cd(c Im) expression is published in c ImOn literature collection, t (c Im) expression c ImThe corresponding time;
B4. the initial value of matrix is used weights score function, obtain final cum rights adjacency matrix;
B5. based on the cum rights adjacency matrix, calculate transition probability matrix.
4. method as claimed in claim 3 is characterized in that, the weights score function that adopts among the described step B4, be a monotonically increasing function, but along with the increase of independent variable value, the increasing degree of functional value reduce gradually, that is: Ф ' (x)>0 and Ф " (x)<0, get in this method
Figure FDA0000023283050000031
5. method as claimed in claim 4 is characterized in that, the implementation method of described step B5 is:
I. define the transition probability matrix of three subgraphs
-reference citation subgraph G Dd
Document is to the transition probability matrix of document
Figure FDA0000023283050000032
Wherein
M dd ( i , j ) = P ( d j | d i ) = W dd ( i , j ) Σ k W dd ( i , k ) ;
-author-document subgraph G Ad
The author is to the transition probability matrix of document
Figure FDA0000023283050000034
Wherein
M ad ( i , j ) = P ( d j | a i ) = W ad ( i , j ) Σ k W ad ( i , k ) ;
Document is to author's transition probability matrix
Figure FDA0000023283050000036
Wherein
M da ( j , i ) = P ( a i | d j ) = W da ( j , i ) Σ k W da ( j , k ) ;
-periodical meeting-document subgraph G Cd
Document is to the transition probability matrix of periodical meeting
Figure FDA0000023283050000038
Wherein
M dc ( i , j ) = P ( c j | d i ) = W dc ( i , j ) Σ k W dc ( i , k ) ;
The periodical meeting is to the transition probability matrix of document
Figure FDA00000232830500000310
Wherein
M cd ( j , i ) = P ( d i | c j ) = W cd ( j , i ) Σ k W cd ( j , k ) ;
Ii obtains the transition probability matrix on the academic network chart by the transition probability matrix of subgraph:
M ( G ) = ( P ( j | i ) ) i , j ∈ V = α dd M dd α da M da α dc M dc M ad 0 0 M cd 0 0 .
6. method as claimed in claim 5 is characterized in that, the implementation method of described step C is:
C1. structuring user's-document is collected graph of a relation,
The summit is user and document, and the limit is the collection behavior; The definition user-document collection system be B=(U, D, T, R), wherein U is that the user gathers, D is a literature collection, T is the set of a series of time points,
Figure FDA0000023283050000041
The set of expression collection relation, (u, d, t) ∈ R, expression user u has collected document d at moment t;
C2. define the adjacency matrix A of user-document collection graph of a relation,
At first define the mass value vector q=(q of literature collection 1, q 2, Λ, q m), m=|D| wherein; Expert's degree vector e=(e of definition user set 1, e 2, Λ, e n), n=|U| wherein; Then user-document is collected the adjacency matrix of graph of a relation
Figure FDA0000023283050000042
C3. calculate document mass value and user expert's degree, method is that the iterative process that is repeated below restrains up to the result
q=e×A
e=q×A T
7. method as claimed in claim 6 is characterized in that, the implementation method of described step D is:
D1. establishing d is document mass value vector, and a is author's acientific reputation vector, and c is a periodical meeting mass value vector, and the vector of corresponding three kinds of entities is connected into a vectorial π=[d T, a T, c T] T
D2. the random walk algorithm that restarts with band is used formula π T+1=cM Tπ t+ (1-c) Q, 0≤c≤1, wherein
Figure FDA0000023283050000043
Q (i) is standardized, make Σ i ∈ V Q ( i ) = | V | ;
D3. the π vector that twice iteration in adjacent front and back obtained subtracts each other, if difference is less than 10 -6, judge that then it is convergence; Suppose that the vector that obtains at last is π n, then value wherein is document mass value, author's acientific reputation value and periodical meeting mass value.
8. the described method of claim 1 is applied to: paper shared platform or website, picture shared platform or website.
CN2010102263535A 2010-07-14 2010-07-14 Document quality assessment method and application Pending CN101887460A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2010102263535A CN101887460A (en) 2010-07-14 2010-07-14 Document quality assessment method and application

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2010102263535A CN101887460A (en) 2010-07-14 2010-07-14 Document quality assessment method and application

Publications (1)

Publication Number Publication Date
CN101887460A true CN101887460A (en) 2010-11-17

Family

ID=43073382

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2010102263535A Pending CN101887460A (en) 2010-07-14 2010-07-14 Document quality assessment method and application

Country Status (1)

Country Link
CN (1) CN101887460A (en)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102984191A (en) * 2011-09-07 2013-03-20 百度在线网络技术(北京)有限公司 Method and device and equipment used for determining behavior related quality information
CN103559407A (en) * 2013-11-14 2014-02-05 北京航空航天大学深圳研究院 Recommendation system and method for measuring node intimacy in weighted graph with direction
CN104462215A (en) * 2014-11-05 2015-03-25 大连理工大学 Scientific and technical literature quoting number predicting method based on time sequence
CN104537495A (en) * 2014-12-31 2015-04-22 浙江大学 Scholar ability calculation method and system
CN104657488A (en) * 2015-03-05 2015-05-27 中南大学 Method for calculating author influence based on citation propagation network
CN105404641A (en) * 2015-10-23 2016-03-16 华建宇通科技(北京)有限责任公司 Baseline based journal evaluation method and evaluation apparatus
CN105589948A (en) * 2015-12-18 2016-05-18 重庆邮电大学 Document citation network visualization and document recommendation method and system
CN105740386A (en) * 2016-01-27 2016-07-06 北京航空航天大学 Thesis search method and device based on sorting integration
CN105843876A (en) * 2016-03-18 2016-08-10 合网络技术(北京)有限公司 Multimedia resource quality assessment method and apparatus
CN107391659A (en) * 2017-07-18 2017-11-24 北京工业大学 A kind of citation network academic evaluation sort method based on credit worthiness
CN107833142A (en) * 2017-11-08 2018-03-23 广西师范大学 Academic social networks scientific research cooperative person recommends method
WO2018077181A1 (en) * 2016-10-27 2018-05-03 腾讯科技(深圳)有限公司 Method and device for graph centrality calculation, and storage medium
CN109272228A (en) * 2018-09-12 2019-01-25 石家庄铁道大学 Scientific research influence power analysis method based on Research Team's cooperative network
CN109801692A (en) * 2018-12-14 2019-05-24 平安医疗健康管理股份有限公司 A kind of Medical record database method for evaluating quality and device
CN110457439A (en) * 2019-08-06 2019-11-15 北京如优教育科技有限公司 One-stop intelligent writes householder method, device and system
CN110825942A (en) * 2019-10-22 2020-02-21 清华大学 Method and system for calculating quality of thesis
CN110955749A (en) * 2019-10-24 2020-04-03 浙江工业大学 Paper attention prediction method
CN112286988A (en) * 2020-10-23 2021-01-29 平安科技(深圳)有限公司 Medical document sorting method and device, electronic equipment and storage medium
CN112508461A (en) * 2021-01-27 2021-03-16 中国科学院自动化研究所 Academic influence evaluation service platform system and device for multiple elements
US11328328B2 (en) 2019-03-28 2022-05-10 Coupang Corp. Computer-implemented method for arranging hyperlinks on a grapical user-interface

Cited By (31)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102984191A (en) * 2011-09-07 2013-03-20 百度在线网络技术(北京)有限公司 Method and device and equipment used for determining behavior related quality information
CN102984191B (en) * 2011-09-07 2017-06-09 百度在线网络技术(北京)有限公司 Method, device and equipment for determining behavior correlated quality information
CN103559407B (en) * 2013-11-14 2016-08-31 北京航空航天大学深圳研究院 A kind of commending system for measuring direct graph with weight interior joint cohesion and method
CN103559407A (en) * 2013-11-14 2014-02-05 北京航空航天大学深圳研究院 Recommendation system and method for measuring node intimacy in weighted graph with direction
CN104462215A (en) * 2014-11-05 2015-03-25 大连理工大学 Scientific and technical literature quoting number predicting method based on time sequence
CN104462215B (en) * 2014-11-05 2017-07-11 大连理工大学 A kind of scientific and technical literature based on time series is cited number Forecasting Methodology
CN104537495A (en) * 2014-12-31 2015-04-22 浙江大学 Scholar ability calculation method and system
CN104657488A (en) * 2015-03-05 2015-05-27 中南大学 Method for calculating author influence based on citation propagation network
CN105404641A (en) * 2015-10-23 2016-03-16 华建宇通科技(北京)有限责任公司 Baseline based journal evaluation method and evaluation apparatus
CN105404641B (en) * 2015-10-23 2018-10-26 华建宇通科技(北京)有限责任公司 A kind of Journal Evaluation method and evaluating apparatus based on baseline
CN105589948A (en) * 2015-12-18 2016-05-18 重庆邮电大学 Document citation network visualization and document recommendation method and system
CN105589948B (en) * 2015-12-18 2018-10-12 重庆邮电大学 A kind of reference citation network visualization and literature recommendation method and system
CN105740386A (en) * 2016-01-27 2016-07-06 北京航空航天大学 Thesis search method and device based on sorting integration
CN105843876A (en) * 2016-03-18 2016-08-10 合网络技术(北京)有限公司 Multimedia resource quality assessment method and apparatus
CN105843876B (en) * 2016-03-18 2020-07-14 阿里巴巴(中国)有限公司 Quality evaluation method and device for multimedia resources
WO2018077181A1 (en) * 2016-10-27 2018-05-03 腾讯科技(深圳)有限公司 Method and device for graph centrality calculation, and storage medium
US10936765B2 (en) 2016-10-27 2021-03-02 Tencent Technology (Shenzhen) Company Limited Graph centrality calculation method and apparatus, and storage medium
CN107391659B (en) * 2017-07-18 2020-05-22 北京工业大学 Citation network academic influence evaluation ranking method based on credibility
CN107391659A (en) * 2017-07-18 2017-11-24 北京工业大学 A kind of citation network academic evaluation sort method based on credit worthiness
CN107833142A (en) * 2017-11-08 2018-03-23 广西师范大学 Academic social networks scientific research cooperative person recommends method
CN109272228B (en) * 2018-09-12 2022-03-15 石家庄铁道大学 Scientific research influence analysis method based on scientific research team cooperation network
CN109272228A (en) * 2018-09-12 2019-01-25 石家庄铁道大学 Scientific research influence power analysis method based on Research Team's cooperative network
CN109801692A (en) * 2018-12-14 2019-05-24 平安医疗健康管理股份有限公司 A kind of Medical record database method for evaluating quality and device
US11328328B2 (en) 2019-03-28 2022-05-10 Coupang Corp. Computer-implemented method for arranging hyperlinks on a grapical user-interface
CN110457439A (en) * 2019-08-06 2019-11-15 北京如优教育科技有限公司 One-stop intelligent writes householder method, device and system
CN110825942A (en) * 2019-10-22 2020-02-21 清华大学 Method and system for calculating quality of thesis
CN110955749A (en) * 2019-10-24 2020-04-03 浙江工业大学 Paper attention prediction method
CN112286988A (en) * 2020-10-23 2021-01-29 平安科技(深圳)有限公司 Medical document sorting method and device, electronic equipment and storage medium
WO2021179687A1 (en) * 2020-10-23 2021-09-16 平安科技(深圳)有限公司 Medical literature sorting method and apparatus, electronic device and storage medium
CN112286988B (en) * 2020-10-23 2023-07-25 平安科技(深圳)有限公司 Medical document ordering method, device, electronic equipment and storage medium
CN112508461A (en) * 2021-01-27 2021-03-16 中国科学院自动化研究所 Academic influence evaluation service platform system and device for multiple elements

Similar Documents

Publication Publication Date Title
CN101887460A (en) Document quality assessment method and application
Petersen et al. Methods to account for citation inflation in research evaluation
Moosavi et al. Community detection in social networks using user frequent pattern mining
Xia et al. MVCWalker: Random walk-based most valuable collaborators recommendation exploiting academic factors
CN103399858B (en) Based on the socialization's collaborative filtering recommending method trusted
CN103559262B (en) Community-based author and scientific paper commending system thereof and recommend method
CN102262681B (en) A kind of blog information identifies the method for crucial blog collection in propagating
Zhang et al. User community discovery from multi-relational networks
CN101694652A (en) Network resource personalized recommended method based on ultrafast neural network
CN102456062B (en) Community similarity calculation method and social network cooperation mode discovery method
CN106156286A (en) Type extraction system and method towards technical literature knowledge entity
CN109670039A (en) Sentiment analysis method is commented on based on the semi-supervised electric business of tripartite graph and clustering
CN110674318A (en) Data recommendation method based on citation network community discovery
CN106250438A (en) Based on random walk model zero quotes article recommends method and system
CN104462374B (en) A kind of broad sense maximal degree random walk figure methods of sampling
CN102456064B (en) Method for realizing community discovery in social networking
Ma et al. Balancing user profile and social network structure for anchor link inferring across multiple online social networks
Xu et al. Finding overlapping community from social networks based on community forest model
CN103617259A (en) Matrix decomposition recommendation method based on Bayesian probability with social relations and project content
CN102262663A (en) Method for repairing software defect reports
CN109947987A (en) A kind of intersection collaborative filtering recommending method
Guo et al. A general method of community detection by identifying community centers with affinity propagation
CN104346408A (en) Method and equipment for labeling network user
Chang et al. Link prediction in a bipartite network using Wikipedia revision information
Nikologianni et al. Building Information Modelling (BIM) and the impact on landscape: A systematic review of evolvements, shortfalls and future opportunities

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C02 Deemed withdrawal of patent application after publication (patent law 2001)
WD01 Invention patent application deemed withdrawn after publication

Open date: 20101117