CN104281690B - A kind of label-cloud generation method and device - Google Patents

A kind of label-cloud generation method and device Download PDF

Info

Publication number
CN104281690B
CN104281690B CN201410534723.XA CN201410534723A CN104281690B CN 104281690 B CN104281690 B CN 104281690B CN 201410534723 A CN201410534723 A CN 201410534723A CN 104281690 B CN104281690 B CN 104281690B
Authority
CN
China
Prior art keywords
label
article
matrix
cloud
text set
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201410534723.XA
Other languages
Chinese (zh)
Other versions
CN104281690A (en
Inventor
强思维
李庭赟
王望
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
On Behalf Of Information Technology (shanghai) Co Ltd
Original Assignee
On Behalf Of Information Technology (shanghai) Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by On Behalf Of Information Technology (shanghai) Co Ltd filed Critical On Behalf Of Information Technology (shanghai) Co Ltd
Priority to CN201410534723.XA priority Critical patent/CN104281690B/en
Publication of CN104281690A publication Critical patent/CN104281690A/en
Application granted granted Critical
Publication of CN104281690B publication Critical patent/CN104281690B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides a kind of label-cloud generation method and device,The label-cloud that text set information is carried by receiving generates request,For each label in every article in text set corresponding with text set information,Calculate weighted value generation story label matrix of the label in its affiliated article,And by carrying out singular value decomposition to story label matrix,Generate the second matrix of the weight of the first matrix and indicating label of weight of the indicative character vector in text set in characteristic vector,And then utilize the first matrix,Second matrix and the create-rule pre-set realize the generation of label-cloud,The application to story label matrix by carrying out singular value decomposition,And then utilize the matrix generation label-cloud after decomposing,When avoiding index of the label-cloud of prior art generation as article set key content,Semantic coverage indicated by each tag element is excessively wide in range,The problem of key content of embodiment article set is not accurate enough.

Description

A kind of label-cloud generation method and device
Technical field
The application is related to label-cloud technical field, more particularly to a kind of label-cloud generation method and device.
Background technology
Label-cloud is generally by indexing some article set, and then utilize the higher label of the frequency of occurrences in this article set Generation.By the way that the label-cloud is shown with visual means, user can be allowed intuitively to understand letter important in this article set Breath, and the label-cloud is alternatively arranged as the index of article set key content, when user clicks on any one label in the label-cloud During element, the information related to the tag element can be found out from article set immediately, facilitates user to consult information.
But because traditional label-cloud is that the basis of frequency statistics is directly being carried out to each label in article set On, meet what each label of preset requirement generated as a tag element in label-cloud by the use of the frequency, therefore, it will usually Because the tag element form for forming label-cloud is single (i.e.:Each tag element is only made up of a label) and cause label-cloud During index as article set key content, semantic coverage indicated by each tag element is excessively wide in range, embodies article collection The problem of key content of conjunction is not accurate enough.
The content of the invention
In view of this, the application provides a kind of label-cloud generation method and device, to avoid the label that prior art generates During index of the cloud as article set key content, semantic coverage indicated by each tag element is excessively wide in range, embodies article The problem of key content of set is not accurate enough.
To achieve these goals, technical scheme provided in an embodiment of the present invention is as follows:
A kind of label-cloud generation method, including:
Label-cloud generation request is received, wherein carrying text set information;
For each label in every article in text set corresponding with the text set information, calculate the label and exist Weighted value in its affiliated article;
Utilize the label and the weight of the label in each piece article corresponding with the text set information, every article Value, generate article-label matrix;
Singular value decomposition, power of the generation indicative character vector in the text set are carried out to the article-label matrix Second matrix of weight of the first matrix and indicating label of weight in the characteristic vector;
Using first matrix, the second matrix and the create-rule pre-set, label-cloud is generated.
Preferably, each label in for every article corresponding with the text set information, calculates the label and exists During weighted value in its affiliated article, calculated using equation below:
S(i)(Wk)=【Ssource(Wk)-Pos(Wk)*λ(Ssource(Wk))】*idf(Wk)*Sattributes(Wk), wherein, institute State S(i)(Wk) it is k-th of label W in i-th history articlekThe first weighted value in the history article, the Ssource(Wk) For label WkSource parameter, the Pos (Wk) it is label WkLocation parameter, the λ (Ssource(Wk)) it is because of label WkPosition Put introduced punishment parameter, the idf (Wk) it is the label WkSignificance level in internet, the Sattributes (Wk) it is the label WkPart of speech parameter.
Preferably, it is described to utilize each piece article corresponding with the text set information, the label in every article and described The weighted value of label, article-label matrix is generated, including:
For every article, weighted value meets the label of the first threshold scope pre-set in acquisition this article;
Obtain the union of each label;
Article-label matrix is generated using each label that is described and concentrating, wherein, it is every in the article-label matrix Row represents an article in each label that is described and concentrating, and each column represents all texts corresponding to a label that is described and concentrating Chapter, and the element in this article-label matrix is the weighted value of label.
Preferably, it is described using first matrix, the second matrix and the create-rule pre-set, label-cloud is generated, Including:
Obtain and meet each first element for pre-setting Second Threshold scope in first matrix;
For row corresponding with each first element respectively in second matrix, obtain and meet to set in advance in the row Label is as a tag element in label-cloud corresponding to each second element for the 3rd threshold range put.
Preferably, in addition to:Show the label-cloud being made up of each tag element.
Preferably, in addition to:When the quantity of label in the tag element is more than preset value, deleted according to what is pre-set Except the part labels in tag element described in redundant rule elimination.
A kind of label-cloud generating means, including:
Receiving unit, for receiving label-cloud generation request, wherein carrying text set information;
Computing unit, for for each mark in every article in text set corresponding with the text set information Label, calculate weighted value of the label in its affiliated article;
First generation unit, for utilizing the label in each piece article corresponding with the text set information, every article And the weighted value of the label, generate article-label matrix;
Second generation unit, for carrying out singular value decomposition to the article-label matrix, generation indicative character vector exists Second matrix of weight of the first matrix and indicating label of the weight in the text set in the characteristic vector;
3rd generation unit, for utilizing first matrix, the second matrix and the create-rule pre-set, generation mark Sign cloud.
Preferably, the 3rd generation unit includes:
Acquiring unit, meet each first yuan that pre-sets Second Threshold scope in first matrix for obtaining Element;
3rd generation subelement, for for row corresponding with each first element respectively in second matrix, Label corresponding to each second element for the 3rd threshold range for meeting to pre-set in the row is obtained as one in label-cloud Individual tag element.
Preferably, in addition to:
Display unit, for showing the label-cloud being made up of each tag element.
Preferably, in addition to:
Unit is deleted, for when the quantity of label in the tag element is more than preset value, being deleted according to what is pre-set Except the part labels in tag element described in redundant rule elimination.
The application provides a kind of label-cloud generation method and device, and the label-cloud that text set information is carried by receiving generates Request, for each label in every article in text set corresponding with text set information, calculates the label belonging to it Weighted value generation article-label matrix in article, and by carrying out singular value decomposition, generation instruction to article-label matrix Second matrix of weight of the first matrix and indicating label of weight of the characteristic vector in text set in characteristic vector, and then The generation of label-cloud is realized using the first matrix, the second matrix and the create-rule that pre-sets, the application passes through to article-mark Sign matrix and carry out singular value decomposition, and then label-cloud is generated using the matrix after decomposing, avoid the label of prior art generation During index of the cloud as article set key content, semantic coverage indicated by each tag element is excessively wide in range, embodies article The problem of key content of set is not accurate enough.
Brief description of the drawings
In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing There is the required accompanying drawing used in technology description to be briefly described, it should be apparent that, drawings in the following description are only this The embodiment of invention, for those of ordinary skill in the art, on the premise of not paying creative work, can also basis The accompanying drawing of offer obtains other accompanying drawings.
Fig. 1 is a kind of label-cloud generation method flow chart that the embodiment of the present application one provides;
Fig. 2 is a kind of structural representation for label-cloud generating means that the embodiment of the present application two provides.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Site preparation describes, it is clear that described embodiment is only part of the embodiment of the present invention, rather than whole embodiments.It is based on Embodiment in the present invention, those of ordinary skill in the art are obtained every other under the premise of creative work is not made Embodiment, belong to the scope of protection of the invention.
Embodiment one:
Fig. 1 is a kind of label-cloud generation method flow chart that the embodiment of the present application one provides.
As shown in figure 1, this method includes:
S101, label-cloud generation request is received, wherein carrying text set information.
S102, for each label in every article in text set corresponding with text set information, calculate the label Weighted value in its affiliated article.
In the embodiment of the present application, it is preferred that the corresponding text set for including at least one article of text set information, and every Carry at least one label in article, the source of the label when can be article generation it is user-defined (such as:Work as preservation During certain article, be its set a label be " sweet apple ") or process word segmentation processing obtain (such as:By user Customized label " sweet apple " passes through word segmentation processing, and two labels " banana " of generation and " apple "/when preserving article, Using the more word of occurrence number in article as participle label).
Preferably, after the label-cloud for carrying text set information is received, need first for corresponding with text set information Each label in every article in text set, weighted value of the label in its affiliated article is calculated, wherein, specific meter It is as follows to calculate formula:
S(i)(Wk)=【Ssource(Wk)-Pos(Wk)*λ(Ssource(Wk))】*idf(Wk)*Sattributes(Wk), wherein, S(i) (Wk) it is k-th of label W in i-th articlekWeighted value in this article, Ssource(Wk) it is label WkSource parameter, Pos (Wk) it is label WkLocation parameter, λ (Ssource(Wk)) it is because of label WkThe introduced punishment parameter in position, idf (Wk) it is mark Sign WkSignificance level in internet, Sattributes(Wk) it is label WkPart of speech parameter.
In the embodiment of the present application, it is preferred that Ssource(Wk) it is label WkSource parameter, wherein, the source of label refers to It is customized label/participle label to show the label, and preferably, pre-sets the S when label is customized labelsource(Wk) Value for participle label when 8~20 times.
In the embodiment of the present application, it is preferred that Pos (Wk) it is label WkLocation parameter, wherein, the position instruction of label The position in each label of source identical in article of the label belonging to it, and preferably, text of the label belonging to it Which position, the Pos (W of the label are arranged in each label of source identical in chapterk) value is several, such as:When carrying 5 in some article Individual label, wherein 3 are participle label, this 3 participle labels are followed successively by " banana ", " apple ", " pear ", then, label " duck Pos (the W of pears "k) value be 3.
In the embodiment of the present application, it is preferred that λ (Ssource(Wk)) it is because of label WkThe introduced punishment parameter in position, Wherein, punishment parameter is different because the source of label is different, it is preferred that the λ (S pre-setsource(Wk)) value 0.08~ Between 0.11, and Ssource(Wk)-Pos(Wk)*λ(Ssource(Wk)) value be more than or equal to 0.5.
In the embodiment of the present application, it is preferred that idf (Wk) it is label WkSignificance level in internet, wherein, calculate The process of the significance level of some label is prior art, refers to prior art, is not described in detail herein.
In the embodiment of the present application, it is preferred that Sattributes(Wk) it is label WkPart of speech parameter, wherein, it is preferred that mark The part of speech of label be proper noun, noun, verb, adjective, adverbial word, and when part of speech be proper noun, noun, verb, adjective, During adverbial word, S is followed successively byattributes(Wk) it is entered as 10,9,5,4,4.
S103, utilize the label and the weighted value of label in each piece article corresponding with text set information, every article, life Into article-label matrix.
In the embodiment of the present application, it is preferred that when in every article in for text set corresponding with text set information Each label, after weighted value of the label in its affiliated article is calculated, need to utilize corresponding with text set information each The weighted value of piece article, the label in every article and label, generates article-label matrix, specifically generates article-label square The process of battle array is as follows:
1st, for every article, the label of first threshold scope that weighted value in this article meets to pre-set is obtained.
In the embodiment of the present application, it is preferred that be previously provided with first threshold scope, for every article, obtain this article Weighted value meets the label of the first threshold scope pre-set in chapter.
2nd, the union of each label is obtained.
In the embodiment of the present application, it is preferred that the label of repetition is there may be between article, as article A carry label 1, Also label 1 is carried in article B.
Specifically, after the label for the first threshold scope that weighted value satisfaction is pre-set is got for every article, The each label that need to be directed in the every article got, obtain the union of each label.
3rd, utilization and each label generation article-label matrix concentrated, wherein, often row represents in article-label matrix One article and each label for concentrating, all articles corresponding to the label that each column is represented and concentrated, and this article-mark Sign the weighted value that the element in matrix is label.
In the embodiment of the present application, it is preferred that after the union of each label is got, this need to be utilized and concentrated each Label generate article-label matrix, wherein, in article-label matrix often row represent an article and concentrate each label, All articles corresponding to the label that each column is represented and concentrated, and the element in this article-label matrix is the weight of label Value.
Specifically, the article list that text set corresponding with text set information includes is { D1, D2..., Dn, every text Chapter DtEach label TmAnd label TmWeighted value W in this articlemThe list of labels of composition is { (T1, w1) ..., (Tm, wm), " it is more than or equal to θ ", for each label T by pre-setting first threshold scopemIf its weighted value Wm>=θ, Then determine label TmTo obtain result, thus from every article DtIn each label TmIn filter out and meet to pre-set All labels of first threshold scope are { T1, w1) ..., (Tp, wp), what each tag computation filtered out using this was obtained Article-label matrix is as follows:
Wherein, n is article quantity, p is that each label takes the quantity after union, often row represent an article and concentrate Each label, all articles corresponding to the label that each column is represented and concentrated, and the element in this article-label matrix is The weighted value of label.
Such as:When text set corresponding to text set information includes 2 articles, respectively carry label A, label B, label C, Label D article 1, and label A, label B, label C, label D, the article 2 of label E are carried, by for each tag computation Its weighted value in affiliated article, and according to the first threshold scope pre-set, it is determined that meeting the first threshold scope Label is respectively:Label A (weighted value 1A), label B (weighted value 1B), label C (weighted value 1C) in article 1, and Label A (weighted value 2A), label C (weighted value 2C), label D (weighted value 2D) in article 2, now, the text of generation Chapter-label matrix is as follows:
It can be seen that in article-label matrix often row represent an article and each label for concentrating, each column represents and concentrates A label corresponding to all articles, and the element in this article-label matrix be label weighted value, when and concentrate deposit In label B, and article 2 and the label concentrated and when label B is not present, will in article-label matrix expression article 2 label B element is arranged to 0.
In the embodiment of the present application, it is preferred that the element not limited in article-label matrix of the generation in row is corresponding Order in union label, inventor can arbitrarily set according to the demand of oneself, and article-label matrix of such as generation is:
S104, singular value decomposition is carried out to article-label matrix, weight of the generation indicative character vector in text set Second matrix of the weight of the first matrix and indicating label in characteristic vector.
In the embodiment of the present application, it is preferred that carry out singular value decomposition, generation for the article-label matrix got Second matrix of weight of the first matrix and indicating label of weight of the indicative character vector in text set in characteristic vector.
In the embodiment of the present application, it is preferred that singular value decomposition is existing mathematical algorithm, refers to existing skill Art, detailed restriction is not done herein.
S105, utilize the first matrix, the second matrix and the create-rule pre-set, generation label-cloud.
In the embodiment of the present application, it is preferred that when article-label matrix is carried out singular value decomposition obtain the first matrix and After second matrix, the first matrix, the second matrix and the create-rule pre-set need to be utilized, label-cloud is generated, specifically, the mistake Cheng Wei:
1st, each first element for meeting to pre-set Second Threshold scope in the first matrix is obtained.
2nd, for row corresponding with each first element respectively in the second matrix, meet to pre-set in the row is obtained Label corresponding to each second element of three threshold ranges is as a tag element in label-cloud.
In the embodiment of the present application, it is preferred that weight of the first matrix indicative character vector in text set, the second matrix Weight of the indicating label in characteristic vector.For some element in the first matrix, what it was indicated is corresponding with the element Weight of the characteristic vector in text set, weight of the indicating label in characteristic vector corresponding with the element in the second matrix Row is row corresponding with the element in the second matrix.
Further, in a kind of label-cloud generation method that the embodiment of the present application provides, in addition to:Display is by each mark Sign the label-cloud of element composition.
Further, in a kind of label-cloud generation method that the embodiment of the present application provides, in addition to:When in tag element When the quantity of label is more than preset value, the part labels in tag element are deleted according to the deletion rule pre-set.
In the embodiment of the present application, it is preferred that when the quantity of the label in some tag element is more than preset value, can press Label in the tag element is deleted according to the weighted value order from small to large of each label, until label in the tag element Quantity meets preset value.
The application provides a kind of label-cloud generation method, and the label-cloud that text set information is carried by receiving generates request, For each label in every article in text set corresponding with text set information, the label is calculated in its affiliated article Weighted value generation article-label matrix, and by carrying out singular value decomposition to article-label matrix, generation indicative character to Measure the second matrix of the weight of the first matrix and indicating label of weight in text set in characteristic vector, and then utilize the One matrix, the second matrix and the create-rule that pre-sets realize the generation of label-cloud, and the application passes through to article-label matrix Singular value decomposition is carried out, and then using the matrix generation label-cloud after decomposing, avoids the label-cloud conduct of prior art generation During the index of article set key content, the semantic coverage indicated by each tag element is excessively wide in range, embodiment article set The problem of key content is not accurate enough.
Embodiment two:
Fig. 2 is a kind of structural representation for label-cloud generating means that the embodiment of the present application two provides.
As shown in Fig. 2 the device includes:The receiving unit 1 that is sequentially connected, computing unit 2, the first generation unit 3, second The generation unit 5 of generation unit 4 and the 3rd, wherein:
Receiving unit 1, for receiving label-cloud generation request, wherein carrying text set information.
Computing unit 2, for for each label in every article in text set corresponding with text set information, meter Calculate weighted value of the label in its affiliated article.
First generation unit 3, for utilizing the label and mark in each piece article corresponding with text set information, every article The weighted value of label, generate article-label matrix.
Second generation unit 4, for carrying out singular value decomposition to article-label matrix, generation indicative character vector is in text Second matrix of weight of the first matrix and indicating label of the weight of this concentration in characteristic vector.
3rd generation unit 5, for using the first matrix, the second matrix and the create-rule pre-set, generating label Cloud.
In the embodiment of the present application, it is preferred that the 3rd generation unit includes:Acquiring unit, for obtaining the first matrix Middle each first element for meeting to pre-set Second Threshold scope;3rd generation subelement, for in the second matrix points Row not corresponding with each first element, obtain each second element pair for the 3rd threshold range for meeting to pre-set in the row The label answered is as a tag element in label-cloud.
Further, in a kind of label-cloud generating means that the embodiment of the present application provides, in addition to:Display unit, use In the label-cloud that display is made up of each tag element.
Further, in a kind of label-cloud generating means that the embodiment of the present application provides, in addition to:Unit is deleted, is used In when the quantity of label in tag element is more than preset value, the portion in tag element is deleted according to the deletion rule pre-set Minute mark label.
The application provides a kind of label-cloud generating means, and the label-cloud that the device carries text set information by receiving generates Request, for each label in every article in text set corresponding with text set information, calculates the label belonging to it Weighted value generation article-label matrix in article, and by carrying out singular value decomposition, generation instruction to article-label matrix Second matrix of weight of the first matrix and indicating label of weight of the characteristic vector in text set in characteristic vector, and then The generation of label-cloud is realized using the first matrix, the second matrix and the create-rule that pre-sets, the application passes through to article-mark Sign matrix and carry out singular value decomposition, and then label-cloud is generated using the matrix after decomposing, avoid the label of prior art generation During index of the cloud as article set key content, semantic coverage indicated by each tag element is excessively wide in range, embodies article The problem of key content of set is not accurate enough.
Each embodiment is described by the way of progressive in this specification, what each embodiment stressed be and other The difference of embodiment, between each embodiment identical similar portion mutually referring to.For device disclosed in embodiment For, because it is corresponded to the method disclosed in Example, so description is fairly simple, related part is said referring to method part It is bright.
It the above is only the preferred embodiment of the application, make skilled artisans appreciate that or realizing the application.It is right A variety of modifications of these embodiments will be apparent to one skilled in the art, as defined herein general former Reason can be realized in other embodiments in the case where not departing from spirit herein or scope.Therefore, the application will not Be intended to be limited to the embodiments shown herein, and be to fit to it is consistent with principles disclosed herein and features of novelty most Wide scope.

Claims (9)

  1. A kind of 1. label-cloud generation method, it is characterised in that including:
    Label-cloud generation request is received, wherein carrying text set information;
    For each label in every article in text set corresponding with the text set information, the label is calculated in its institute Belong to the weighted value in article;
    It is raw using the label and the weighted value of the label in each piece article corresponding with the text set information, every article Into article-label matrix;
    Singular value decomposition is carried out to the article-label matrix, weight of the generation indicative character vector in the text set Second matrix of the weight of the first matrix and indicating label in the characteristic vector;
    Using first matrix, the second matrix and the create-rule pre-set, label-cloud is generated;
    Wherein, each label in for every article corresponding with the text set information, calculates the label belonging to it During weighted value in article, calculated using equation below:
    S(i)(Wk)=【Ssource(Wk)-Pos(Wk)*λ(Ssource(Wk))】*idf(Wk)*Sattributes(Wk), wherein, the S(i) (Wk) it is k-th of label W in i-th history articlekThe first weighted value in the history article, the Ssource(Wk) it is label WkSource parameter, the Pos (Wk) it is label WkLocation parameter, the λ (Ssource(Wk)) it is because of label WkPosition drawn The punishment parameter entered, the idf (Wk) it is the label WkSignificance level in internet, the Sattributes(Wk) for institute State label WkPart of speech parameter.
  2. 2. according to the method for claim 1, it is characterised in that described to utilize each piece text corresponding with the text set information The weighted value of chapter, the label in every article and the label, article-label matrix is generated, including:
    For every article, weighted value meets the label of the first threshold scope pre-set in acquisition this article;
    Obtain the union of each label;
    Article-label matrix is generated using each label that is described and concentrating, wherein, every row table in the article-label matrix Show an article in each label that is described and concentrating, each column represents all articles corresponding to a label that is described and concentrating, And the element in this article-label matrix is the weighted value of label.
  3. 3. according to the method for claim 1, it is characterised in that it is described using first matrix, the second matrix and in advance The create-rule of setting, label-cloud is generated, including:
    Obtain and meet each first element for pre-setting Second Threshold scope in first matrix;
    For row corresponding with each first element respectively in second matrix, obtain and meet what is pre-set in the row Label corresponding to each second element of 3rd threshold range is as a tag element in label-cloud.
  4. 4. according to the method for claim 3, it is characterised in that also include:What display was made up of each tag element Label-cloud.
  5. 5. according to the method for claim 3, it is characterised in that also include:When the quantity of label in the tag element is big When preset value, the part labels in the tag element are deleted according to the deletion rule pre-set.
  6. A kind of 6. label-cloud generating means, it is characterised in that including:
    Receiving unit, for receiving label-cloud generation request, wherein carrying text set information;
    Computing unit, for for each label in every article in text set corresponding with the text set information, meter Calculate weighted value of the label in its affiliated article;
    First generation unit, for utilizing the label in each piece article corresponding with the text set information, every article and institute The weighted value of label is stated, generates article-label matrix;
    Second generation unit, for carrying out singular value decomposition to the article-label matrix, generation indicative character vector is described Second matrix of weight of the first matrix and indicating label of the weight in text set in the characteristic vector;
    3rd generation unit, for using first matrix, the second matrix and the create-rule pre-set, generating label Cloud;
    Wherein, each label in for every article corresponding with the text set information, calculates the label belonging to it During weighted value in article, calculated using equation below:
    S(i)(Wk)=【Ssource(Wk)-Pos(Wk)*λ(Ssource(Wk))】*idf(Wk)*Sattributes(Wk), wherein, the S(i) (Wk) it is k-th of label W in i-th history articlekThe first weighted value in the history article, the Ssource(Wk) it is label WkSource parameter, the Pos (Wk) it is label WkLocation parameter, the λ (Ssource(Wk)) it is because of label WkPosition drawn The punishment parameter entered, the idf (Wk) it is the label WkSignificance level in internet, the Sattributes(Wk) for institute State label WkPart of speech parameter.
  7. 7. device according to claim 6, it is characterised in that the 3rd generation unit includes:
    Acquiring unit, meet each first element for pre-setting Second Threshold scope in first matrix for obtaining;
    3rd generation subelement, for for row corresponding with each first element respectively in second matrix, obtaining Label corresponding to each second element for the 3rd threshold range for meeting to pre-set in the row is as a mark in label-cloud Sign element.
  8. 8. device according to claim 7, it is characterised in that also include:
    Display unit, for showing the label-cloud being made up of each tag element.
  9. 9. device according to claim 7, it is characterised in that also include:
    Unit is deleted, for when the quantity of label in the tag element is more than preset value, being advised according to the deletion pre-set Then delete the part labels in the tag element.
CN201410534723.XA 2014-10-11 2014-10-11 A kind of label-cloud generation method and device Active CN104281690B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410534723.XA CN104281690B (en) 2014-10-11 2014-10-11 A kind of label-cloud generation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410534723.XA CN104281690B (en) 2014-10-11 2014-10-11 A kind of label-cloud generation method and device

Publications (2)

Publication Number Publication Date
CN104281690A CN104281690A (en) 2015-01-14
CN104281690B true CN104281690B (en) 2018-01-05

Family

ID=52256563

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410534723.XA Active CN104281690B (en) 2014-10-11 2014-10-11 A kind of label-cloud generation method and device

Country Status (1)

Country Link
CN (1) CN104281690B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111026832A (en) * 2019-11-15 2020-04-17 贝壳技术有限公司 Method and system for generating articles

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103176961A (en) * 2013-03-05 2013-06-26 哈尔滨工程大学 Transfer learning method based on latent semantic analysis
CN103440256A (en) * 2013-07-26 2013-12-11 中国科学院深圳先进技术研究院 Method and device for automatically generating Chinese text label cloud

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8037066B2 (en) * 2008-01-16 2011-10-11 International Business Machines Corporation System and method for generating tag cloud in user collaboration websites
US20110296345A1 (en) * 2010-05-27 2011-12-01 Alcatel-Lucent Usa Inc. Technique For Determining And Indicating Strength Of An Item In A Weighted List Based On Tagging

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103176961A (en) * 2013-03-05 2013-06-26 哈尔滨工程大学 Transfer learning method based on latent semantic analysis
CN103440256A (en) * 2013-07-26 2013-12-11 中国科学院深圳先进技术研究院 Method and device for automatically generating Chinese text label cloud

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于云计算技术的文本可视化分析;张林泉等;《成都工业学院学报》;20140331;第90-92页 *
相关关键词与相关图书标签云;黎邦群;《信息资源建设》;20130830;第11-15页 *

Also Published As

Publication number Publication date
CN104281690A (en) 2015-01-14

Similar Documents

Publication Publication Date Title
CN105159962B (en) Position recommends method and apparatus, resume to recommend method and apparatus, recruitment platform
CN103970765B (en) Correct mistakes model training method, device and text of one is corrected mistakes method, device
CN104915410B (en) A kind of mind map preserves and loading method, preservation and loading system
MY188760A (en) Search intention identifying method and device
CN104462323B (en) Semantic similarity calculation method, method for processing search results and device
CN103729456B (en) Microblog multi-modal sentiment analysis method based on microblog group environment
CN107301200A (en) A kind of article appraisal procedure and system analyzed based on Sentiment orientation
CN103218412B (en) Public feelings information processing method and device
CN104965821B (en) A kind of data mask method and device
CN106909572A (en) A kind of construction method and device of question and answer knowledge base
CN106779053A (en) The knowledge point of a kind of allowed for influencing factors and neutral net is known the real situation method
CN106506327A (en) A kind of spam filtering method and device
CN109902157A (en) A kind of training sample validation checking method and device
CN106909573A (en) A kind of method and apparatus for evaluating question and answer to quality
CN108009248A (en) A kind of data classification method and system
CN105488098A (en) Field difference based new word extraction method
CN106126497A (en) A kind of automatic mining correspondence executes leader section and the method for cited literature textual content fragment
CN106557566B (en) A kind of text training method and device
CN104281690B (en) A kind of label-cloud generation method and device
CN103049629B (en) A kind of method and device detecting noise data
CN106909536A (en) Method is recommended in a kind of scoring based on Heterogeneous Information
CN104239314B (en) A kind of method and system of query expansion word
CN104090918B (en) Sentence similarity calculation method based on information amount
CN105893363A (en) A method and a system for acquiring relevant knowledge points of a knowledge point
El-Oujaji et al. Difference of stability between two elite boxing groups: a preliminary study

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant