CN103440256A - Method and device for automatically generating Chinese text label cloud - Google Patents

Method and device for automatically generating Chinese text label cloud Download PDF

Info

Publication number
CN103440256A
CN103440256A CN2013103199489A CN201310319948A CN103440256A CN 103440256 A CN103440256 A CN 103440256A CN 2013103199489 A CN2013103199489 A CN 2013103199489A CN 201310319948 A CN201310319948 A CN 201310319948A CN 103440256 A CN103440256 A CN 103440256A
Authority
CN
China
Prior art keywords
cloud
label
chinese
word
speech tagging
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN2013103199489A
Other languages
Chinese (zh)
Other versions
CN103440256B (en
Inventor
汪云海
华博
丹尼尔·科恩
陈宝权
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Institute of Advanced Technology of CAS
Original Assignee
Shenzhen Institute of Advanced Technology of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Institute of Advanced Technology of CAS filed Critical Shenzhen Institute of Advanced Technology of CAS
Priority to CN201310319948.9A priority Critical patent/CN103440256B/en
Priority claimed from CN201310319948.9A external-priority patent/CN103440256B/en
Publication of CN103440256A publication Critical patent/CN103440256A/en
Application granted granted Critical
Publication of CN103440256B publication Critical patent/CN103440256B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Abstract

The invention belongs to the technical field of label extraction and particularly relates to a method and a device for automatically generating a Chinese text label cloud. The method for automatically generating the Chinese text label cloud comprises the following steps of a, carrying out word segmentation and part-of-speech tagging on text data to be analyzed by using Chinese lexical analysis; b, extracting a keyword and the word frequency of the text data to be analyzed according to the word segmentation and part-of-speech tagging result; c, taking the extracted keyword and word frequency as input data and generating a label cloud by using a label cloud generation algorithm. According to the method and the device for automatically generating the Chinese text label cloud, which are disclosed by the invention, the Chinese word segmentation and the label cloud algorithm are combined and optimized, the blank of a Chinese word label cloud generation algorithm is filled up and a favorable tool is provided for work such as extraction of key points of news and public opinion analysis.

Description

A kind of Chinese text label-cloud automatic generation method and device
Technical field
The invention belongs to the tag extraction technical field, relate in particular to a kind of Chinese text label-cloud automatic generation method and device.
Background technology
Along with the fast development of scientific and technical development, particularly computer technology, the mankind produce and with the ability of obtaining data, become order of magnitude ground to increase.Wherein news, network and newspaper have a large amount of fresh informations to produce, collection, analysis and excavation for these Chinese text data are the emphasis that the researchist works all the time, usually adopt label to carry out mark to text data, calibrate crucial words, easy-to-look-up or location.Label-cloud is that the visualization of keyword is described, for gathering label that the user generates or the word content of a website.The label-cloud generation method of existing Chinese text extracts keyword by participle technique, and generate without the word tag cloud blocked mutually according to the Wordle algorithm, the shortcoming of the label-cloud generation method of existing Chinese text is: participle technique is subject to the neologisms that upgrade every day and the problem such as text grammer is lack of standardization, can not carry out exactly lexical analysis according to text data first; In addition, existing label-cloud generation method is mainly for English text, the structure that the label-cloud generated can not fine adaptation Chinese text.
Summary of the invention
The invention provides a kind of Chinese text label-cloud automatic generation method and device, be intended to solve existing label-cloud generation method and can not carry out exactly lexical analysis according to text data first, with and mainly for English text, the technical matters that the label-cloud generated can not fine adaptation Chinese text structure.
Technical scheme provided by the invention is: a kind of Chinese text label-cloud automatic generation method comprises:
Step a: to text data to be analyzed, utilize Chinese lexical analysis to carry out participle and part-of-speech tagging;
Step b: the keyword and the word frequency that extract text data to be analyzed according to participle and part-of-speech tagging result;
Step c: the keyword extracted is usingd and word frequency as the input data, use label-cloud generating algorithm generating labels cloud.
Technical scheme of the present invention also comprises: in described step a, described Chinese lexical analysis adopts the Chinese lexical analysis based on stacked hidden horse model, described Chinese lexical analysis comprises: at pretreatment stage, adopt N-shortest path rough segmentation method, obtain an optimum N rough lumber minute result of energy overlay ambiguity; On the rough segmentation result set, adopt the hidden horse models coupling of low layer dictionary corpus to identify name, the place name of common non-nesting, and according to the result identified complicated place name and the mechanism's name of name, place name that adopted high-rise hidden horse models coupling dictionary corpus to identify nested; The unregistered word identified is joined in the hidden horse model of class-based cutting with the probability calculated, and unregistered word and ambiguity, all not as special case, participate in the competition of candidate result together with generic word; The hidden horse mark that carries out part of speech on the word segmentation result of global optimization obtains the lexical analysis result.
Technical scheme of the present invention also comprises: described dictionary corpus carries out corresponding renewal, and update mode comprises: utilize the web crawlers technology to be captured the neologisms of search engine or news website renewal, and collect the relevant news of neologisms; By collecting the news that neologisms are relevant, add corpus to be trained, to neologisms part-of-speech tagging in addition, the neologisms after part-of-speech tagging are added to dictionary, and upgrade dictionary and corpus.
Technical scheme of the present invention also comprises: in described step c, the label-cloud generating algorithm is carried out layout based on geometry to keyword.
Technical scheme of the present invention also comprises: described layout type comprises: emanant layout and linear placement, and described radiation layout is all labels to be to radiation from inside to outside place, described linear placement is along with sweep trace is placed by all labels.
Technical scheme of the present invention also comprises: described label-cloud generating algorithm comprises: select the layout type generated, after label is carried out to initial placement according to the layout type of selecting, travel through all labels, when occurring that label blocks mutually, use the greedy algorithm tag hub little round size to find new placement location.
Another technical scheme provided by the invention is: a kind of Chinese text label-cloud automatically generating device, comprise: participle and part-of-speech tagging module, keyword and word frequency extraction module and label-cloud generation module, described participle and part-of-speech tagging module, keyword is connected with the label-cloud generation module successively with the word frequency extraction module, described participle and part-of-speech tagging module are for utilizing Chinese lexical analysis to carry out participle and part-of-speech tagging to text data to be analyzed, described keyword and word frequency extraction module are for extracting keyword and the word frequency of text data to be analyzed according to participle and part-of-speech tagging result, described label-cloud generation module for the keyword extracted is usingd and word frequency as the input data, use label-cloud generating algorithm generating labels cloud.
Technical scheme of the present invention also comprises: described participle and part-of-speech tagging module adopt the Chinese lexical analysis based on stacked hidden horse model, specifically comprise: at pretreatment stage, adopt N-shortest path rough segmentation method, obtain an optimum N rough lumber minute result of energy overlay ambiguity; On the rough segmentation result set, adopt the hidden horse models coupling of low layer dictionary corpus to identify name, the place name of common non-nesting, and according to the result identified complicated place name and the mechanism's name of name, place name that adopted high-rise hidden horse models coupling dictionary corpus to identify nested; The unregistered word identified is joined in the hidden horse model of class-based cutting with the probability calculated, and unregistered word and ambiguity, all not as special case, participate in the competition of candidate result together with generic word; The hidden horse mark that carries out part of speech on the word segmentation result of global optimization obtains the lexical analysis result.
Technical scheme of the present invention also comprises: described dictionary corpus carries out corresponding renewal, and update mode comprises: utilize the web crawlers technology to be captured the neologisms of search engine or news website renewal, and collect the relevant news of neologisms; By collecting the news that neologisms are relevant, add corpus to be trained, to neologisms part-of-speech tagging in addition, the neologisms after part-of-speech tagging are added to dictionary, and upgrade dictionary and corpus.
Technical scheme of the present invention also comprises: described label-cloud generation module carries out layout based on geometry to keyword, described layout type comprises: emanant layout and linear placement, described radiation layout is all labels to be to radiation from inside to outside place, described linear placement is along with sweep trace is placed by all labels, described label-cloud generation module generating labels cloud comprises: select the layout type generated, after label is carried out to initial placement according to the layout type of selecting, travel through all labels, when occurring that label blocks mutually, use the greedy algorithm tag hub little round size to find new placement location.
Technical scheme of the present invention has following advantage or beneficial effect: the dictionary that the Chinese text label-cloud automatic generation method of the embodiment of the present invention and device are used Words partition system is improved, can carry out self according to producing new language material every day, and add the mood analytic function; Have more space structure, the more rational label-cloud of color according to the word frequency of the keyword extracted and the mood color generation of keyword again; In addition, by Chinese word segmentation and the combination of label-cloud algorithm optimization, having filled up the blank of Chinese label cloud generating algorithm, is that the news main points are extracted, and the work such as public opinion analysis provide favourable instrument.
The accompanying drawing explanation
Accompanying drawing 1 is the process flow diagram of the Chinese text label-cloud automatic generation method of the embodiment of the present invention;
The Chinese lexical analysis algorithm flow chart of the Chinese text label-cloud automatic generation method of accompanying drawing 2 embodiment of the present invention;
Accompanying drawing 3 is dictionary, corpus self flow processs of the Chinese text label-cloud automatic generation method of the embodiment of the present invention;
Accompanying drawing 4 is keyword layout type schematic diagram of the Chinese text label-cloud automatic generation method of the embodiment of the present invention;
The Chinese text label-cloud automatic generation method application greedy algorithm of accompanying drawing 5 embodiment of the present invention solves the application schematic diagram that label blocks;
The schematic diagram of the Chinese text label-cloud automatic generation method application Netease news that accompanying drawing 6 is the embodiment of the present invention;
The Chinese text label-cloud automatic generation method that accompanying drawing 7 is the embodiment of the present invention generates schematic diagram to the label-cloud of Netease's news money order receipt to be signed and returned to the sender;
The structural representation of the Chinese text label-cloud automatically generating device that accompanying drawing 8 is the embodiment of the present invention.
Embodiment
In order to make purpose of the present invention, technical scheme and advantage clearer, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein, only in order to explain the present invention, is not intended to limit the present invention.
Referring to Fig. 1, is the process flow diagram of the Chinese text label-cloud automatic generation method of the embodiment of the present invention.The Chinese text label-cloud automatic generation method of the embodiment of the present invention comprises:
Step 100: to text data to be analyzed, utilize Chinese lexical analysis to carry out participle and part-of-speech tagging;
In step 100, text data to be analyzed comprises the data such as news, network and newspaper; Seeing also 2, Fig. 2 is the Chinese lexical analysis algorithm flow chart of the Chinese text label-cloud automatic generation method of the embodiment of the present invention.Chinese lexical analysis is that a string continuous character cutting is become to single word; And the part of speech of each word that judges rightly.In embodiments of the present invention, Chinese lexical analysis adopts the Chinese lexical analysis based on stacked hidden horse model, specifically comprises: at pretreatment stage, adopt N-shortest path rough segmentation method, obtain rapidly an optimum N rough lumber minute result of energy overlay ambiguity; On the rough segmentation result set, adopt the hidden horse models coupling of low layer dictionary corpus to identify name, the place name of common non-nesting, and according to the result identified complicated place name and the mechanism's name of name, place name that adopted high-rise hidden horse models coupling dictionary corpus to identify nested; The probability that the unregistered word identified is calculated with science joins in the hidden horse model of class-based cutting, and unregistered word and ambiguity, all not as special case, participate in the competition of candidate result together with generic word; The hidden horse mark that carries out part of speech on the word segmentation result of global optimization obtains the lexical analysis result.
In step 100, in order to improve the accuracy of keyword extraction, the dictionary that the Chinese text label-cloud automatic generation method of the embodiment of the present invention is used original Chinese lexical analysis algorithm is improved, and has expanded original corpus.Concrete grammar comprises: utilize the web crawlers technology to be captured the neologisms that every day, the search engine such as Baidu, search dog or news website upgraded, and the relevant news of collection neologisms, by collecting the news that neologisms are relevant, add corpus to be trained, to neologisms part-of-speech tagging in addition, neologisms after part-of-speech tagging are added to dictionary, and upgrading dictionary and corpus, idiographic flow is as Fig. 3.
Step 200: the keyword and the word frequency that extract text data to be analyzed according to participle and part-of-speech tagging result;
Step 300: the keyword extracted is usingd and word frequency as the input data, use label-cloud generating algorithm generating labels cloud.
In step 300, the label-cloud generating algorithm is carried out layout based on geometry to keyword, keeps the Orthogonal Ordering(quadrature sequence between key word) characteristic.Generating labels cloud idiographic flow comprises: select the layout type generated, wherein, layout type comprises: emanant layout and linear placement, and the radiation layout is all labels to be to radiation from inside to outside place, linear placement be by all labels along with sweep trace is placed, layout is as Fig. 4; After label is carried out to initial placement according to one of two kinds of layouts placements, travel through all labels, when the situation that two labels block occurring, use the greedy algorithm tag hub little round size to find new placement location and solve occlusion issue, the Chinese text label-cloud automatic generation method application greedy algorithm that Fig. 5 is the embodiment of the present invention solves the application schematic diagram that label blocks.Greedy algorithm specifically comprises: when blocking generation, for the merging foreground blocks that comprises a plurality of targets, according to merging detection module and blocking front tracking results, can obtain the destination number of blocking in foreground blocks, label, color, the priori features such as shape, during location, travel through successively all targets that are not positioned, calculate the observation probability of each target, and export as the positioning result of this target the highest target position using observation probability, pixel by this target coverage is added in set simultaneously, pixel set during the prospect that is combined is fast is upgraded, repeat said process, until obtain the positioning result of all shelter targets in foreground blocks.
In step 300, layout type selective emission formula layout, need to redefine the geometric center of integral layout; Travel through complete standard laid down by the ministries or commissions of the Central Government label, complete the generation of label-cloud.
See also Fig. 6 and Fig. 7, the schematic diagram of the Chinese text label-cloud automatic generation method application Netease news that Fig. 6 is the embodiment of the present invention, the Chinese text label-cloud automatic generation method that Fig. 7 is the embodiment of the present invention generates schematic diagram to the label-cloud of Netease's news money order receipt to be signed and returned to the sender.The Chinese text label-cloud automatic generation method of the embodiment of the present invention can carry out keyword extraction and Chinese label cloud generating run to extensive Chinese text data, utilize this system to carry out the label-cloud generation to user's money order receipt to be signed and returned to the sender data of Netease's news, according to the word frequency of the keyword extracted and the mood color generation of keyword, have more space structure, the more rational label-cloud of color
Referring to Fig. 8, is the structural representation of the Chinese text label-cloud automatically generating device of the embodiment of the present invention.The Chinese text label-cloud automatically generating device of the embodiment of the present invention comprises: participle and part-of-speech tagging module, keyword and word frequency extraction module and label-cloud generation module, wherein, participle is connected with the label-cloud generation module successively with part-of-speech tagging module, keyword and word frequency extraction module.
Participle and part-of-speech tagging module: for to text data to be analyzed, utilizing Chinese lexical analysis to carry out participle and part-of-speech tagging.Text data to be analyzed comprises the data such as news, network and newspaper, and Chinese lexical analysis is that a string continuous character cutting is become to single word; And the part of speech of each word that judges rightly.In embodiments of the present invention, Chinese lexical analysis adopts the Chinese lexical analysis based on stacked hidden horse model, specifically comprises: at pretreatment stage, adopt N-shortest path rough segmentation method, obtain rapidly an optimum N rough lumber minute result of energy overlay ambiguity; On the rough segmentation result set, adopt the hidden horse models coupling of low layer dictionary corpus to identify name, the place name of common non-nesting, and according to the result identified complicated place name and the mechanism's name of name, place name that adopted high-rise hidden horse models coupling dictionary corpus to identify nested; The probability that the unregistered word identified is calculated with science joins in the hidden horse model of class-based cutting, and unregistered word and ambiguity, all not as special case, participate in the competition of candidate result together with generic word; The hidden horse mark that carries out part of speech on the word segmentation result of global optimization obtains the lexical analysis result.
In order to improve the accuracy of keyword extraction, the dictionary that the Chinese text label-cloud automatically generating device of the embodiment of the present invention is used original Chinese lexical analysis algorithm is improved, and has expanded original corpus.Concrete grammar comprises: utilize the web crawlers technology to be captured the neologisms that every day, the search engine such as Baidu, search dog or news website upgraded, and the relevant news of collection neologisms, by collecting the news that neologisms are relevant, add corpus to be trained, to neologisms part-of-speech tagging in addition, neologisms after part-of-speech tagging are added to dictionary, and upgrading dictionary and corpus, idiographic flow is as Fig. 3.
Keyword and word frequency extraction module are for extracting keyword and the word frequency of text data to be analyzed according to participle and part-of-speech tagging result.
The label-cloud generation module for the keyword extracted is usingd and word frequency as the input data, use label-cloud generating algorithm generating labels cloud.The label-cloud generation module carries out layout based on geometry to keyword, keeps the Orthogonal Ordering(quadrature sequence between key word) characteristic.The concrete mode of label-cloud generation module generating labels cloud comprises: select the layout type generated, wherein, layout type comprises: emanant layout and linear placement, the radiation layout is all labels to be to radiation from inside to outside place, linear placement be by all labels along with sweep trace is placed, layout is as Fig. 4; After label is carried out to initial placement according to one of two kinds of layouts placements, travel through all labels, when the situation that two labels block occurring, use the greedy algorithm tag hub little round size to find new placement location and solve occlusion issue, the Chinese text label-cloud automatic generation method application greedy algorithm that Fig. 5 is the embodiment of the present invention solves the application schematic diagram that label blocks.Greedy algorithm specifically comprises: when blocking generation, for the merging foreground blocks that comprises a plurality of targets, according to merging detection module and blocking front tracking results, can obtain the destination number of blocking in foreground blocks, label, color, the priori features such as shape, during location, travel through successively all targets that are not positioned, calculate the observation probability of each target, and export as the positioning result of this target the highest target position using observation probability, pixel by this target coverage is added in set simultaneously, pixel set during the prospect that is combined is fast is upgraded, repeat said process, until obtain the positioning result of all shelter targets in foreground blocks.
In the concrete mode of label-cloud generation module generating labels cloud, layout type selective emission formula layout, need to redefine the geometric center of integral layout; Travel through complete standard laid down by the ministries or commissions of the Central Government label, complete the generation of label-cloud.
Technical scheme of the present invention has following advantage or beneficial effect: the dictionary that the Chinese text label-cloud automatic generation method of the embodiment of the present invention and device are used Words partition system is improved, can carry out self according to producing new language material every day, and add the mood analytic function; Have more space structure, the more rational label-cloud of color according to the word frequency of the keyword extracted and the mood color generation of keyword again; In addition, by Chinese word segmentation and the combination of label-cloud algorithm optimization, having filled up the blank of Chinese label cloud generating algorithm, is that the news main points are extracted, and the work such as public opinion analysis provide favourable instrument.
The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, all any modifications of doing within the spirit and principles in the present invention, be equal to and replace and improvement etc., within all should being included in protection scope of the present invention.

Claims (10)

1. a Chinese text label-cloud automatic generation method comprises:
Step a: to text data to be analyzed, utilize Chinese lexical analysis to carry out participle and part-of-speech tagging;
Step b: the keyword and the word frequency that extract text data to be analyzed according to participle and part-of-speech tagging result;
Step c: the keyword extracted is usingd and word frequency as the input data, use label-cloud generating algorithm generating labels cloud.
2. Chinese text label-cloud automatic generation method according to claim 1, it is characterized in that, in described step a, described Chinese lexical analysis adopts the Chinese lexical analysis based on stacked hidden horse model, described Chinese lexical analysis comprises: at pretreatment stage, adopt N-shortest path rough segmentation method, obtain an optimum N rough lumber minute result of energy overlay ambiguity; On the rough segmentation result set, adopt the hidden horse models coupling of low layer dictionary corpus to identify name, the place name of common non-nesting, and according to the result identified complicated place name and the mechanism's name of name, place name that adopted high-rise hidden horse models coupling dictionary corpus to identify nested; The unregistered word identified is joined in the hidden horse model of class-based cutting with the probability calculated, and unregistered word and ambiguity, all not as special case, participate in the competition of candidate result together with generic word; The hidden horse mark that carries out part of speech on the word segmentation result of global optimization obtains the lexical analysis result.
3. Chinese text label-cloud automatic generation method according to claim 2, it is characterized in that, described dictionary corpus carries out corresponding renewal, and update mode comprises: utilize the web crawlers technology to be captured the neologisms of search engine or news website renewal, and collect the relevant news of neologisms; By collecting the news that neologisms are relevant, add corpus to be trained, to neologisms part-of-speech tagging in addition, the neologisms after part-of-speech tagging are added to dictionary, and upgrade dictionary and corpus.
4. Chinese text label-cloud automatic generation method according to claim 1, is characterized in that, in described step c, the label-cloud generating algorithm is carried out layout based on geometry to keyword.
5. Chinese text label-cloud automatic generation method according to claim 4, it is characterized in that, described layout type comprises: emanant layout and linear placement, described radiation layout is all labels to be to radiation from inside to outside place, and described linear placement is along with sweep trace is placed by all labels.
6. Chinese text label-cloud automatic generation method according to claim 4, it is characterized in that, described label-cloud generating algorithm comprises: select the layout type generated, after label is carried out to initial placement according to the layout type of selecting, travel through all labels, when occurring that label blocks mutually, use the greedy algorithm tag hub little round size to find new placement location.
7. a Chinese text label-cloud automatically generating device, it is characterized in that, comprise: participle and part-of-speech tagging module, keyword and word frequency extraction module and label-cloud generation module, described participle and part-of-speech tagging module, keyword is connected with the label-cloud generation module successively with the word frequency extraction module, described participle and part-of-speech tagging module are for utilizing Chinese lexical analysis to carry out participle and part-of-speech tagging to text data to be analyzed, described keyword and word frequency extraction module are for extracting keyword and the word frequency of text data to be analyzed according to participle and part-of-speech tagging result, described label-cloud generation module for the keyword extracted is usingd and word frequency as the input data, use label-cloud generating algorithm generating labels cloud.
8. Chinese text label-cloud automatically generating device according to claim 7, it is characterized in that, described participle and part-of-speech tagging module adopt the Chinese lexical analysis based on stacked hidden horse model, specifically comprise: at pretreatment stage, adopt N-shortest path rough segmentation method, obtain an optimum N rough lumber minute result of energy overlay ambiguity; On the rough segmentation result set, adopt the hidden horse models coupling of low layer dictionary corpus to identify name, the place name of common non-nesting, and according to the result identified complicated place name and the mechanism's name of name, place name that adopted high-rise hidden horse models coupling dictionary corpus to identify nested; The unregistered word identified is joined in the hidden horse model of class-based cutting with the probability calculated, and unregistered word and ambiguity, all not as special case, participate in the competition of candidate result together with generic word; The hidden horse mark that carries out part of speech on the word segmentation result of global optimization obtains the lexical analysis result.
9. Chinese text label-cloud automatically generating device according to claim 8, it is characterized in that, described dictionary corpus carries out corresponding renewal, and update mode comprises: utilize the web crawlers technology to be captured the neologisms of search engine or news website renewal, and collect the relevant news of neologisms; By collecting the news that neologisms are relevant, add corpus to be trained, to neologisms part-of-speech tagging in addition, the neologisms after part-of-speech tagging are added to dictionary, and upgrade dictionary and corpus.
10. Chinese text label-cloud automatically generating device according to claim 7, it is characterized in that, described label-cloud generation module carries out layout based on geometry to keyword, described layout type comprises: emanant layout and linear placement, described radiation layout is all labels to be to radiation from inside to outside place, described linear placement is along with sweep trace is placed by all labels, described label-cloud generation module generating labels cloud comprises: select the layout type generated, after label is carried out to initial placement according to the layout type of selecting, travel through all labels, when occurring that label blocks mutually, use the greedy algorithm tag hub little round size to find new placement location.
CN201310319948.9A 2013-07-26 A kind of Chinese text label-cloud automatic generation method and device Active CN103440256B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310319948.9A CN103440256B (en) 2013-07-26 A kind of Chinese text label-cloud automatic generation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310319948.9A CN103440256B (en) 2013-07-26 A kind of Chinese text label-cloud automatic generation method and device

Publications (2)

Publication Number Publication Date
CN103440256A true CN103440256A (en) 2013-12-11
CN103440256B CN103440256B (en) 2016-11-30

Family

ID=

Cited By (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104281690A (en) * 2014-10-11 2015-01-14 时之我代信息科技(上海)有限公司 Tag cloud generating method and device
WO2015196909A1 (en) * 2014-06-27 2015-12-30 北京奇虎科技有限公司 Word segmentation method and device
CN105243130A (en) * 2015-09-29 2016-01-13 中国电子科技集团公司第三十二研究所 Text processing system and method for data mining
CN105528421A (en) * 2015-12-07 2016-04-27 中国人民大学 Search dimension excavation method of query terms in mass data
CN105740231A (en) * 2016-01-28 2016-07-06 浪潮软件股份有限公司 Data content associating method and apparatus
CN106610933A (en) * 2015-10-27 2017-05-03 北京国双科技有限公司 Configuration method and device for keyword tag
CN108197117A (en) * 2018-01-31 2018-06-22 厦门大学 A kind of Chinese text keyword extracting method based on document subject matter structure with semanteme
CN110189393A (en) * 2019-06-05 2019-08-30 山东大学 A kind of generation method and device of shape word cloud
CN110532539A (en) * 2018-05-24 2019-12-03 本识科技(深圳)有限公司 A kind of human-machine interactive information treating method and apparatus
CN110866400A (en) * 2019-11-01 2020-03-06 中电科大数据研究院有限公司 Automatic-updating lexical analysis system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101464898A (en) * 2009-01-12 2009-06-24 腾讯科技(深圳)有限公司 Method for extracting feature word of text
US20110295903A1 (en) * 2010-05-28 2011-12-01 Drexel University System and method for automatically generating systematic reviews of a scientific field
CN102289523A (en) * 2011-09-20 2011-12-21 北京金和软件股份有限公司 Method for intelligently extracting text labels
CN102654866A (en) * 2011-03-02 2012-09-05 北京百度网讯科技有限公司 Method and device for establishing example sentence index and method and device for indexing example sentences
CN103186675A (en) * 2013-04-03 2013-07-03 南京安讯科技有限责任公司 Automatic webpage classification method based on network hot word identification

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101464898A (en) * 2009-01-12 2009-06-24 腾讯科技(深圳)有限公司 Method for extracting feature word of text
US20110295903A1 (en) * 2010-05-28 2011-12-01 Drexel University System and method for automatically generating systematic reviews of a scientific field
CN102654866A (en) * 2011-03-02 2012-09-05 北京百度网讯科技有限公司 Method and device for establishing example sentence index and method and device for indexing example sentences
CN102289523A (en) * 2011-09-20 2011-12-21 北京金和软件股份有限公司 Method for intelligently extracting text labels
CN103186675A (en) * 2013-04-03 2013-07-03 南京安讯科技有限责任公司 Automatic webpage classification method based on network hot word identification

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2015196909A1 (en) * 2014-06-27 2015-12-30 北京奇虎科技有限公司 Word segmentation method and device
CN104281690A (en) * 2014-10-11 2015-01-14 时之我代信息科技(上海)有限公司 Tag cloud generating method and device
CN104281690B (en) * 2014-10-11 2018-01-05 时之我代信息科技(上海)有限公司 A kind of label-cloud generation method and device
CN105243130A (en) * 2015-09-29 2016-01-13 中国电子科技集团公司第三十二研究所 Text processing system and method for data mining
CN106610933A (en) * 2015-10-27 2017-05-03 北京国双科技有限公司 Configuration method and device for keyword tag
CN105528421A (en) * 2015-12-07 2016-04-27 中国人民大学 Search dimension excavation method of query terms in mass data
CN105528421B (en) * 2015-12-07 2018-09-04 中国人民大学 A kind of search dimension method for digging for query word in mass data
CN105740231A (en) * 2016-01-28 2016-07-06 浪潮软件股份有限公司 Data content associating method and apparatus
CN108197117A (en) * 2018-01-31 2018-06-22 厦门大学 A kind of Chinese text keyword extracting method based on document subject matter structure with semanteme
CN108197117B (en) * 2018-01-31 2020-05-26 厦门大学 Chinese text keyword extraction method based on document theme structure and semantics
CN110532539A (en) * 2018-05-24 2019-12-03 本识科技(深圳)有限公司 A kind of human-machine interactive information treating method and apparatus
CN110189393A (en) * 2019-06-05 2019-08-30 山东大学 A kind of generation method and device of shape word cloud
CN110866400A (en) * 2019-11-01 2020-03-06 中电科大数据研究院有限公司 Automatic-updating lexical analysis system
CN110866400B (en) * 2019-11-01 2023-08-04 中电科大数据研究院有限公司 Automatic change lexical analysis system of update

Similar Documents

Publication Publication Date Title
CN107330011B (en) The recognition methods of the name entity of more strategy fusions and device
CN102831121B (en) Method and system for extracting webpage information
US11256856B2 (en) Method, device, and system, for identifying data elements in data structures
CN102419778B (en) Information searching method for discovering and clustering sub-topics of query statement
CN101241514B (en) Method for creating error-correcting database, automatic error correcting method and system
KR102464248B1 (en) Method, apparatus, electronic device, and storage medium for extracting spo triples
CN102214166B (en) Machine translation system and machine translation method based on syntactic analysis and hierarchical model
CN102253930B (en) A kind of method of text translation and device
CN107203526B (en) Query string semantic demand analysis method and device
CN104679867B (en) Address method of knowledge processing and device based on figure
CN103076892A (en) Method and equipment for providing input candidate items corresponding to input character string
JP2022512269A (en) Methods for extracting POI names, devices, devices, programs and computer storage media
CN105677857B (en) method and device for accurately matching keywords with marketing landing pages
CN105589936A (en) Data query method and system
CN102750282B (en) Synonym template mining method and device as well as synonym mining method and device
CN103186522A (en) Electronic device and natural language analyzing method thereof
CN102779135A (en) Method and device for obtaining cross-linguistic search resources and corresponding search method and device
CN102339294A (en) Searching method and system for preprocessing keywords
CN102541282B (en) Utilize icon moving to the method, the Apparatus and system that complete vocabulary and edit again
CN106815215B (en) The method and apparatus for generating annotation repository
CN105550169A (en) Method and device for identifying point of interest names based on character length
CN105159885A (en) Point-of-interest name identification method and device
CN106155998B (en) A kind of data processing method and device
CN104462272A (en) Search requirement analysis method and device
CN105138708A (en) Method and device for identifying names of points of interest (POI)

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant