CN103440256A - Method and device for automatically generating Chinese text label cloud - Google Patents
Method and device for automatically generating Chinese text label cloud Download PDFInfo
- Publication number
- CN103440256A CN103440256A CN2013103199489A CN201310319948A CN103440256A CN 103440256 A CN103440256 A CN 103440256A CN 2013103199489 A CN2013103199489 A CN 2013103199489A CN 201310319948 A CN201310319948 A CN 201310319948A CN 103440256 A CN103440256 A CN 103440256A
- Authority
- CN
- China
- Prior art keywords
- cloud
- label
- chinese
- word
- speech tagging
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Abstract
The invention belongs to the technical field of label extraction and particularly relates to a method and a device for automatically generating a Chinese text label cloud. The method for automatically generating the Chinese text label cloud comprises the following steps of a, carrying out word segmentation and part-of-speech tagging on text data to be analyzed by using Chinese lexical analysis; b, extracting a keyword and the word frequency of the text data to be analyzed according to the word segmentation and part-of-speech tagging result; c, taking the extracted keyword and word frequency as input data and generating a label cloud by using a label cloud generation algorithm. According to the method and the device for automatically generating the Chinese text label cloud, which are disclosed by the invention, the Chinese word segmentation and the label cloud algorithm are combined and optimized, the blank of a Chinese word label cloud generation algorithm is filled up and a favorable tool is provided for work such as extraction of key points of news and public opinion analysis.
Description
Technical field
The invention belongs to the tag extraction technical field, relate in particular to a kind of Chinese text label-cloud automatic generation method and device.
Background technology
Along with the fast development of scientific and technical development, particularly computer technology, the mankind produce and with the ability of obtaining data, become order of magnitude ground to increase.Wherein news, network and newspaper have a large amount of fresh informations to produce, collection, analysis and excavation for these Chinese text data are the emphasis that the researchist works all the time, usually adopt label to carry out mark to text data, calibrate crucial words, easy-to-look-up or location.Label-cloud is that the visualization of keyword is described, for gathering label that the user generates or the word content of a website.The label-cloud generation method of existing Chinese text extracts keyword by participle technique, and generate without the word tag cloud blocked mutually according to the Wordle algorithm, the shortcoming of the label-cloud generation method of existing Chinese text is: participle technique is subject to the neologisms that upgrade every day and the problem such as text grammer is lack of standardization, can not carry out exactly lexical analysis according to text data first; In addition, existing label-cloud generation method is mainly for English text, the structure that the label-cloud generated can not fine adaptation Chinese text.
Summary of the invention
The invention provides a kind of Chinese text label-cloud automatic generation method and device, be intended to solve existing label-cloud generation method and can not carry out exactly lexical analysis according to text data first, with and mainly for English text, the technical matters that the label-cloud generated can not fine adaptation Chinese text structure.
Technical scheme provided by the invention is: a kind of Chinese text label-cloud automatic generation method comprises:
Step a: to text data to be analyzed, utilize Chinese lexical analysis to carry out participle and part-of-speech tagging;
Step b: the keyword and the word frequency that extract text data to be analyzed according to participle and part-of-speech tagging result;
Step c: the keyword extracted is usingd and word frequency as the input data, use label-cloud generating algorithm generating labels cloud.
Technical scheme of the present invention also comprises: in described step a, described Chinese lexical analysis adopts the Chinese lexical analysis based on stacked hidden horse model, described Chinese lexical analysis comprises: at pretreatment stage, adopt N-shortest path rough segmentation method, obtain an optimum N rough lumber minute result of energy overlay ambiguity; On the rough segmentation result set, adopt the hidden horse models coupling of low layer dictionary corpus to identify name, the place name of common non-nesting, and according to the result identified complicated place name and the mechanism's name of name, place name that adopted high-rise hidden horse models coupling dictionary corpus to identify nested; The unregistered word identified is joined in the hidden horse model of class-based cutting with the probability calculated, and unregistered word and ambiguity, all not as special case, participate in the competition of candidate result together with generic word; The hidden horse mark that carries out part of speech on the word segmentation result of global optimization obtains the lexical analysis result.
Technical scheme of the present invention also comprises: described dictionary corpus carries out corresponding renewal, and update mode comprises: utilize the web crawlers technology to be captured the neologisms of search engine or news website renewal, and collect the relevant news of neologisms; By collecting the news that neologisms are relevant, add corpus to be trained, to neologisms part-of-speech tagging in addition, the neologisms after part-of-speech tagging are added to dictionary, and upgrade dictionary and corpus.
Technical scheme of the present invention also comprises: in described step c, the label-cloud generating algorithm is carried out layout based on geometry to keyword.
Technical scheme of the present invention also comprises: described layout type comprises: emanant layout and linear placement, and described radiation layout is all labels to be to radiation from inside to outside place, described linear placement is along with sweep trace is placed by all labels.
Technical scheme of the present invention also comprises: described label-cloud generating algorithm comprises: select the layout type generated, after label is carried out to initial placement according to the layout type of selecting, travel through all labels, when occurring that label blocks mutually, use the greedy algorithm tag hub little round size to find new placement location.
Another technical scheme provided by the invention is: a kind of Chinese text label-cloud automatically generating device, comprise: participle and part-of-speech tagging module, keyword and word frequency extraction module and label-cloud generation module, described participle and part-of-speech tagging module, keyword is connected with the label-cloud generation module successively with the word frequency extraction module, described participle and part-of-speech tagging module are for utilizing Chinese lexical analysis to carry out participle and part-of-speech tagging to text data to be analyzed, described keyword and word frequency extraction module are for extracting keyword and the word frequency of text data to be analyzed according to participle and part-of-speech tagging result, described label-cloud generation module for the keyword extracted is usingd and word frequency as the input data, use label-cloud generating algorithm generating labels cloud.
Technical scheme of the present invention also comprises: described participle and part-of-speech tagging module adopt the Chinese lexical analysis based on stacked hidden horse model, specifically comprise: at pretreatment stage, adopt N-shortest path rough segmentation method, obtain an optimum N rough lumber minute result of energy overlay ambiguity; On the rough segmentation result set, adopt the hidden horse models coupling of low layer dictionary corpus to identify name, the place name of common non-nesting, and according to the result identified complicated place name and the mechanism's name of name, place name that adopted high-rise hidden horse models coupling dictionary corpus to identify nested; The unregistered word identified is joined in the hidden horse model of class-based cutting with the probability calculated, and unregistered word and ambiguity, all not as special case, participate in the competition of candidate result together with generic word; The hidden horse mark that carries out part of speech on the word segmentation result of global optimization obtains the lexical analysis result.
Technical scheme of the present invention also comprises: described dictionary corpus carries out corresponding renewal, and update mode comprises: utilize the web crawlers technology to be captured the neologisms of search engine or news website renewal, and collect the relevant news of neologisms; By collecting the news that neologisms are relevant, add corpus to be trained, to neologisms part-of-speech tagging in addition, the neologisms after part-of-speech tagging are added to dictionary, and upgrade dictionary and corpus.
Technical scheme of the present invention also comprises: described label-cloud generation module carries out layout based on geometry to keyword, described layout type comprises: emanant layout and linear placement, described radiation layout is all labels to be to radiation from inside to outside place, described linear placement is along with sweep trace is placed by all labels, described label-cloud generation module generating labels cloud comprises: select the layout type generated, after label is carried out to initial placement according to the layout type of selecting, travel through all labels, when occurring that label blocks mutually, use the greedy algorithm tag hub little round size to find new placement location.
Technical scheme of the present invention has following advantage or beneficial effect: the dictionary that the Chinese text label-cloud automatic generation method of the embodiment of the present invention and device are used Words partition system is improved, can carry out self according to producing new language material every day, and add the mood analytic function; Have more space structure, the more rational label-cloud of color according to the word frequency of the keyword extracted and the mood color generation of keyword again; In addition, by Chinese word segmentation and the combination of label-cloud algorithm optimization, having filled up the blank of Chinese label cloud generating algorithm, is that the news main points are extracted, and the work such as public opinion analysis provide favourable instrument.
The accompanying drawing explanation
Accompanying drawing 1 is the process flow diagram of the Chinese text label-cloud automatic generation method of the embodiment of the present invention;
The Chinese lexical analysis algorithm flow chart of the Chinese text label-cloud automatic generation method of accompanying drawing 2 embodiment of the present invention;
Accompanying drawing 3 is dictionary, corpus self flow processs of the Chinese text label-cloud automatic generation method of the embodiment of the present invention;
Accompanying drawing 4 is keyword layout type schematic diagram of the Chinese text label-cloud automatic generation method of the embodiment of the present invention;
The Chinese text label-cloud automatic generation method application greedy algorithm of accompanying drawing 5 embodiment of the present invention solves the application schematic diagram that label blocks;
The schematic diagram of the Chinese text label-cloud automatic generation method application Netease news that accompanying drawing 6 is the embodiment of the present invention;
The Chinese text label-cloud automatic generation method that accompanying drawing 7 is the embodiment of the present invention generates schematic diagram to the label-cloud of Netease's news money order receipt to be signed and returned to the sender;
The structural representation of the Chinese text label-cloud automatically generating device that accompanying drawing 8 is the embodiment of the present invention.
Embodiment
In order to make purpose of the present invention, technical scheme and advantage clearer, below in conjunction with drawings and Examples, the present invention is further elaborated.Should be appreciated that specific embodiment described herein, only in order to explain the present invention, is not intended to limit the present invention.
Referring to Fig. 1, is the process flow diagram of the Chinese text label-cloud automatic generation method of the embodiment of the present invention.The Chinese text label-cloud automatic generation method of the embodiment of the present invention comprises:
Step 100: to text data to be analyzed, utilize Chinese lexical analysis to carry out participle and part-of-speech tagging;
In step 100, text data to be analyzed comprises the data such as news, network and newspaper; Seeing also 2, Fig. 2 is the Chinese lexical analysis algorithm flow chart of the Chinese text label-cloud automatic generation method of the embodiment of the present invention.Chinese lexical analysis is that a string continuous character cutting is become to single word; And the part of speech of each word that judges rightly.In embodiments of the present invention, Chinese lexical analysis adopts the Chinese lexical analysis based on stacked hidden horse model, specifically comprises: at pretreatment stage, adopt N-shortest path rough segmentation method, obtain rapidly an optimum N rough lumber minute result of energy overlay ambiguity; On the rough segmentation result set, adopt the hidden horse models coupling of low layer dictionary corpus to identify name, the place name of common non-nesting, and according to the result identified complicated place name and the mechanism's name of name, place name that adopted high-rise hidden horse models coupling dictionary corpus to identify nested; The probability that the unregistered word identified is calculated with science joins in the hidden horse model of class-based cutting, and unregistered word and ambiguity, all not as special case, participate in the competition of candidate result together with generic word; The hidden horse mark that carries out part of speech on the word segmentation result of global optimization obtains the lexical analysis result.
In step 100, in order to improve the accuracy of keyword extraction, the dictionary that the Chinese text label-cloud automatic generation method of the embodiment of the present invention is used original Chinese lexical analysis algorithm is improved, and has expanded original corpus.Concrete grammar comprises: utilize the web crawlers technology to be captured the neologisms that every day, the search engine such as Baidu, search dog or news website upgraded, and the relevant news of collection neologisms, by collecting the news that neologisms are relevant, add corpus to be trained, to neologisms part-of-speech tagging in addition, neologisms after part-of-speech tagging are added to dictionary, and upgrading dictionary and corpus, idiographic flow is as Fig. 3.
Step 200: the keyword and the word frequency that extract text data to be analyzed according to participle and part-of-speech tagging result;
Step 300: the keyword extracted is usingd and word frequency as the input data, use label-cloud generating algorithm generating labels cloud.
In step 300, the label-cloud generating algorithm is carried out layout based on geometry to keyword, keeps the Orthogonal Ordering(quadrature sequence between key word) characteristic.Generating labels cloud idiographic flow comprises: select the layout type generated, wherein, layout type comprises: emanant layout and linear placement, and the radiation layout is all labels to be to radiation from inside to outside place, linear placement be by all labels along with sweep trace is placed, layout is as Fig. 4; After label is carried out to initial placement according to one of two kinds of layouts placements, travel through all labels, when the situation that two labels block occurring, use the greedy algorithm tag hub little round size to find new placement location and solve occlusion issue, the Chinese text label-cloud automatic generation method application greedy algorithm that Fig. 5 is the embodiment of the present invention solves the application schematic diagram that label blocks.Greedy algorithm specifically comprises: when blocking generation, for the merging foreground blocks that comprises a plurality of targets, according to merging detection module and blocking front tracking results, can obtain the destination number of blocking in foreground blocks, label, color, the priori features such as shape, during location, travel through successively all targets that are not positioned, calculate the observation probability of each target, and export as the positioning result of this target the highest target position using observation probability, pixel by this target coverage is added in set simultaneously, pixel set during the prospect that is combined is fast is upgraded, repeat said process, until obtain the positioning result of all shelter targets in foreground blocks.
In step 300, layout type selective emission formula layout, need to redefine the geometric center of integral layout; Travel through complete standard laid down by the ministries or commissions of the Central Government label, complete the generation of label-cloud.
See also Fig. 6 and Fig. 7, the schematic diagram of the Chinese text label-cloud automatic generation method application Netease news that Fig. 6 is the embodiment of the present invention, the Chinese text label-cloud automatic generation method that Fig. 7 is the embodiment of the present invention generates schematic diagram to the label-cloud of Netease's news money order receipt to be signed and returned to the sender.The Chinese text label-cloud automatic generation method of the embodiment of the present invention can carry out keyword extraction and Chinese label cloud generating run to extensive Chinese text data, utilize this system to carry out the label-cloud generation to user's money order receipt to be signed and returned to the sender data of Netease's news, according to the word frequency of the keyword extracted and the mood color generation of keyword, have more space structure, the more rational label-cloud of color
Referring to Fig. 8, is the structural representation of the Chinese text label-cloud automatically generating device of the embodiment of the present invention.The Chinese text label-cloud automatically generating device of the embodiment of the present invention comprises: participle and part-of-speech tagging module, keyword and word frequency extraction module and label-cloud generation module, wherein, participle is connected with the label-cloud generation module successively with part-of-speech tagging module, keyword and word frequency extraction module.
Participle and part-of-speech tagging module: for to text data to be analyzed, utilizing Chinese lexical analysis to carry out participle and part-of-speech tagging.Text data to be analyzed comprises the data such as news, network and newspaper, and Chinese lexical analysis is that a string continuous character cutting is become to single word; And the part of speech of each word that judges rightly.In embodiments of the present invention, Chinese lexical analysis adopts the Chinese lexical analysis based on stacked hidden horse model, specifically comprises: at pretreatment stage, adopt N-shortest path rough segmentation method, obtain rapidly an optimum N rough lumber minute result of energy overlay ambiguity; On the rough segmentation result set, adopt the hidden horse models coupling of low layer dictionary corpus to identify name, the place name of common non-nesting, and according to the result identified complicated place name and the mechanism's name of name, place name that adopted high-rise hidden horse models coupling dictionary corpus to identify nested; The probability that the unregistered word identified is calculated with science joins in the hidden horse model of class-based cutting, and unregistered word and ambiguity, all not as special case, participate in the competition of candidate result together with generic word; The hidden horse mark that carries out part of speech on the word segmentation result of global optimization obtains the lexical analysis result.
In order to improve the accuracy of keyword extraction, the dictionary that the Chinese text label-cloud automatically generating device of the embodiment of the present invention is used original Chinese lexical analysis algorithm is improved, and has expanded original corpus.Concrete grammar comprises: utilize the web crawlers technology to be captured the neologisms that every day, the search engine such as Baidu, search dog or news website upgraded, and the relevant news of collection neologisms, by collecting the news that neologisms are relevant, add corpus to be trained, to neologisms part-of-speech tagging in addition, neologisms after part-of-speech tagging are added to dictionary, and upgrading dictionary and corpus, idiographic flow is as Fig. 3.
Keyword and word frequency extraction module are for extracting keyword and the word frequency of text data to be analyzed according to participle and part-of-speech tagging result.
The label-cloud generation module for the keyword extracted is usingd and word frequency as the input data, use label-cloud generating algorithm generating labels cloud.The label-cloud generation module carries out layout based on geometry to keyword, keeps the Orthogonal Ordering(quadrature sequence between key word) characteristic.The concrete mode of label-cloud generation module generating labels cloud comprises: select the layout type generated, wherein, layout type comprises: emanant layout and linear placement, the radiation layout is all labels to be to radiation from inside to outside place, linear placement be by all labels along with sweep trace is placed, layout is as Fig. 4; After label is carried out to initial placement according to one of two kinds of layouts placements, travel through all labels, when the situation that two labels block occurring, use the greedy algorithm tag hub little round size to find new placement location and solve occlusion issue, the Chinese text label-cloud automatic generation method application greedy algorithm that Fig. 5 is the embodiment of the present invention solves the application schematic diagram that label blocks.Greedy algorithm specifically comprises: when blocking generation, for the merging foreground blocks that comprises a plurality of targets, according to merging detection module and blocking front tracking results, can obtain the destination number of blocking in foreground blocks, label, color, the priori features such as shape, during location, travel through successively all targets that are not positioned, calculate the observation probability of each target, and export as the positioning result of this target the highest target position using observation probability, pixel by this target coverage is added in set simultaneously, pixel set during the prospect that is combined is fast is upgraded, repeat said process, until obtain the positioning result of all shelter targets in foreground blocks.
In the concrete mode of label-cloud generation module generating labels cloud, layout type selective emission formula layout, need to redefine the geometric center of integral layout; Travel through complete standard laid down by the ministries or commissions of the Central Government label, complete the generation of label-cloud.
Technical scheme of the present invention has following advantage or beneficial effect: the dictionary that the Chinese text label-cloud automatic generation method of the embodiment of the present invention and device are used Words partition system is improved, can carry out self according to producing new language material every day, and add the mood analytic function; Have more space structure, the more rational label-cloud of color according to the word frequency of the keyword extracted and the mood color generation of keyword again; In addition, by Chinese word segmentation and the combination of label-cloud algorithm optimization, having filled up the blank of Chinese label cloud generating algorithm, is that the news main points are extracted, and the work such as public opinion analysis provide favourable instrument.
The foregoing is only preferred embodiment of the present invention, not in order to limit the present invention, all any modifications of doing within the spirit and principles in the present invention, be equal to and replace and improvement etc., within all should being included in protection scope of the present invention.
Claims (10)
1. a Chinese text label-cloud automatic generation method comprises:
Step a: to text data to be analyzed, utilize Chinese lexical analysis to carry out participle and part-of-speech tagging;
Step b: the keyword and the word frequency that extract text data to be analyzed according to participle and part-of-speech tagging result;
Step c: the keyword extracted is usingd and word frequency as the input data, use label-cloud generating algorithm generating labels cloud.
2. Chinese text label-cloud automatic generation method according to claim 1, it is characterized in that, in described step a, described Chinese lexical analysis adopts the Chinese lexical analysis based on stacked hidden horse model, described Chinese lexical analysis comprises: at pretreatment stage, adopt N-shortest path rough segmentation method, obtain an optimum N rough lumber minute result of energy overlay ambiguity; On the rough segmentation result set, adopt the hidden horse models coupling of low layer dictionary corpus to identify name, the place name of common non-nesting, and according to the result identified complicated place name and the mechanism's name of name, place name that adopted high-rise hidden horse models coupling dictionary corpus to identify nested; The unregistered word identified is joined in the hidden horse model of class-based cutting with the probability calculated, and unregistered word and ambiguity, all not as special case, participate in the competition of candidate result together with generic word; The hidden horse mark that carries out part of speech on the word segmentation result of global optimization obtains the lexical analysis result.
3. Chinese text label-cloud automatic generation method according to claim 2, it is characterized in that, described dictionary corpus carries out corresponding renewal, and update mode comprises: utilize the web crawlers technology to be captured the neologisms of search engine or news website renewal, and collect the relevant news of neologisms; By collecting the news that neologisms are relevant, add corpus to be trained, to neologisms part-of-speech tagging in addition, the neologisms after part-of-speech tagging are added to dictionary, and upgrade dictionary and corpus.
4. Chinese text label-cloud automatic generation method according to claim 1, is characterized in that, in described step c, the label-cloud generating algorithm is carried out layout based on geometry to keyword.
5. Chinese text label-cloud automatic generation method according to claim 4, it is characterized in that, described layout type comprises: emanant layout and linear placement, described radiation layout is all labels to be to radiation from inside to outside place, and described linear placement is along with sweep trace is placed by all labels.
6. Chinese text label-cloud automatic generation method according to claim 4, it is characterized in that, described label-cloud generating algorithm comprises: select the layout type generated, after label is carried out to initial placement according to the layout type of selecting, travel through all labels, when occurring that label blocks mutually, use the greedy algorithm tag hub little round size to find new placement location.
7. a Chinese text label-cloud automatically generating device, it is characterized in that, comprise: participle and part-of-speech tagging module, keyword and word frequency extraction module and label-cloud generation module, described participle and part-of-speech tagging module, keyword is connected with the label-cloud generation module successively with the word frequency extraction module, described participle and part-of-speech tagging module are for utilizing Chinese lexical analysis to carry out participle and part-of-speech tagging to text data to be analyzed, described keyword and word frequency extraction module are for extracting keyword and the word frequency of text data to be analyzed according to participle and part-of-speech tagging result, described label-cloud generation module for the keyword extracted is usingd and word frequency as the input data, use label-cloud generating algorithm generating labels cloud.
8. Chinese text label-cloud automatically generating device according to claim 7, it is characterized in that, described participle and part-of-speech tagging module adopt the Chinese lexical analysis based on stacked hidden horse model, specifically comprise: at pretreatment stage, adopt N-shortest path rough segmentation method, obtain an optimum N rough lumber minute result of energy overlay ambiguity; On the rough segmentation result set, adopt the hidden horse models coupling of low layer dictionary corpus to identify name, the place name of common non-nesting, and according to the result identified complicated place name and the mechanism's name of name, place name that adopted high-rise hidden horse models coupling dictionary corpus to identify nested; The unregistered word identified is joined in the hidden horse model of class-based cutting with the probability calculated, and unregistered word and ambiguity, all not as special case, participate in the competition of candidate result together with generic word; The hidden horse mark that carries out part of speech on the word segmentation result of global optimization obtains the lexical analysis result.
9. Chinese text label-cloud automatically generating device according to claim 8, it is characterized in that, described dictionary corpus carries out corresponding renewal, and update mode comprises: utilize the web crawlers technology to be captured the neologisms of search engine or news website renewal, and collect the relevant news of neologisms; By collecting the news that neologisms are relevant, add corpus to be trained, to neologisms part-of-speech tagging in addition, the neologisms after part-of-speech tagging are added to dictionary, and upgrade dictionary and corpus.
10. Chinese text label-cloud automatically generating device according to claim 7, it is characterized in that, described label-cloud generation module carries out layout based on geometry to keyword, described layout type comprises: emanant layout and linear placement, described radiation layout is all labels to be to radiation from inside to outside place, described linear placement is along with sweep trace is placed by all labels, described label-cloud generation module generating labels cloud comprises: select the layout type generated, after label is carried out to initial placement according to the layout type of selecting, travel through all labels, when occurring that label blocks mutually, use the greedy algorithm tag hub little round size to find new placement location.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310319948.9A CN103440256B (en) | 2013-07-26 | A kind of Chinese text label-cloud automatic generation method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310319948.9A CN103440256B (en) | 2013-07-26 | A kind of Chinese text label-cloud automatic generation method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN103440256A true CN103440256A (en) | 2013-12-11 |
CN103440256B CN103440256B (en) | 2016-11-30 |
Family
ID=
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104281690A (en) * | 2014-10-11 | 2015-01-14 | 时之我代信息科技(上海)有限公司 | Tag cloud generating method and device |
WO2015196909A1 (en) * | 2014-06-27 | 2015-12-30 | 北京奇虎科技有限公司 | Word segmentation method and device |
CN105243130A (en) * | 2015-09-29 | 2016-01-13 | 中国电子科技集团公司第三十二研究所 | Text processing system and method for data mining |
CN105528421A (en) * | 2015-12-07 | 2016-04-27 | 中国人民大学 | Search dimension excavation method of query terms in mass data |
CN105740231A (en) * | 2016-01-28 | 2016-07-06 | 浪潮软件股份有限公司 | Data content associating method and apparatus |
CN106610933A (en) * | 2015-10-27 | 2017-05-03 | 北京国双科技有限公司 | Configuration method and device for keyword tag |
CN108197117A (en) * | 2018-01-31 | 2018-06-22 | 厦门大学 | A kind of Chinese text keyword extracting method based on document subject matter structure with semanteme |
CN110189393A (en) * | 2019-06-05 | 2019-08-30 | 山东大学 | A kind of generation method and device of shape word cloud |
CN110532539A (en) * | 2018-05-24 | 2019-12-03 | 本识科技(深圳)有限公司 | A kind of human-machine interactive information treating method and apparatus |
CN110866400A (en) * | 2019-11-01 | 2020-03-06 | 中电科大数据研究院有限公司 | Automatic-updating lexical analysis system |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101464898A (en) * | 2009-01-12 | 2009-06-24 | 腾讯科技(深圳)有限公司 | Method for extracting feature word of text |
US20110295903A1 (en) * | 2010-05-28 | 2011-12-01 | Drexel University | System and method for automatically generating systematic reviews of a scientific field |
CN102289523A (en) * | 2011-09-20 | 2011-12-21 | 北京金和软件股份有限公司 | Method for intelligently extracting text labels |
CN102654866A (en) * | 2011-03-02 | 2012-09-05 | 北京百度网讯科技有限公司 | Method and device for establishing example sentence index and method and device for indexing example sentences |
CN103186675A (en) * | 2013-04-03 | 2013-07-03 | 南京安讯科技有限责任公司 | Automatic webpage classification method based on network hot word identification |
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101464898A (en) * | 2009-01-12 | 2009-06-24 | 腾讯科技(深圳)有限公司 | Method for extracting feature word of text |
US20110295903A1 (en) * | 2010-05-28 | 2011-12-01 | Drexel University | System and method for automatically generating systematic reviews of a scientific field |
CN102654866A (en) * | 2011-03-02 | 2012-09-05 | 北京百度网讯科技有限公司 | Method and device for establishing example sentence index and method and device for indexing example sentences |
CN102289523A (en) * | 2011-09-20 | 2011-12-21 | 北京金和软件股份有限公司 | Method for intelligently extracting text labels |
CN103186675A (en) * | 2013-04-03 | 2013-07-03 | 南京安讯科技有限责任公司 | Automatic webpage classification method based on network hot word identification |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2015196909A1 (en) * | 2014-06-27 | 2015-12-30 | 北京奇虎科技有限公司 | Word segmentation method and device |
CN104281690A (en) * | 2014-10-11 | 2015-01-14 | 时之我代信息科技(上海)有限公司 | Tag cloud generating method and device |
CN104281690B (en) * | 2014-10-11 | 2018-01-05 | 时之我代信息科技(上海)有限公司 | A kind of label-cloud generation method and device |
CN105243130A (en) * | 2015-09-29 | 2016-01-13 | 中国电子科技集团公司第三十二研究所 | Text processing system and method for data mining |
CN106610933A (en) * | 2015-10-27 | 2017-05-03 | 北京国双科技有限公司 | Configuration method and device for keyword tag |
CN105528421A (en) * | 2015-12-07 | 2016-04-27 | 中国人民大学 | Search dimension excavation method of query terms in mass data |
CN105528421B (en) * | 2015-12-07 | 2018-09-04 | 中国人民大学 | A kind of search dimension method for digging for query word in mass data |
CN105740231A (en) * | 2016-01-28 | 2016-07-06 | 浪潮软件股份有限公司 | Data content associating method and apparatus |
CN108197117A (en) * | 2018-01-31 | 2018-06-22 | 厦门大学 | A kind of Chinese text keyword extracting method based on document subject matter structure with semanteme |
CN108197117B (en) * | 2018-01-31 | 2020-05-26 | 厦门大学 | Chinese text keyword extraction method based on document theme structure and semantics |
CN110532539A (en) * | 2018-05-24 | 2019-12-03 | 本识科技(深圳)有限公司 | A kind of human-machine interactive information treating method and apparatus |
CN110189393A (en) * | 2019-06-05 | 2019-08-30 | 山东大学 | A kind of generation method and device of shape word cloud |
CN110866400A (en) * | 2019-11-01 | 2020-03-06 | 中电科大数据研究院有限公司 | Automatic-updating lexical analysis system |
CN110866400B (en) * | 2019-11-01 | 2023-08-04 | 中电科大数据研究院有限公司 | Automatic change lexical analysis system of update |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107330011B (en) | The recognition methods of the name entity of more strategy fusions and device | |
CN102831121B (en) | Method and system for extracting webpage information | |
US11256856B2 (en) | Method, device, and system, for identifying data elements in data structures | |
CN102419778B (en) | Information searching method for discovering and clustering sub-topics of query statement | |
CN101241514B (en) | Method for creating error-correcting database, automatic error correcting method and system | |
KR102464248B1 (en) | Method, apparatus, electronic device, and storage medium for extracting spo triples | |
CN102214166B (en) | Machine translation system and machine translation method based on syntactic analysis and hierarchical model | |
CN102253930B (en) | A kind of method of text translation and device | |
CN107203526B (en) | Query string semantic demand analysis method and device | |
CN104679867B (en) | Address method of knowledge processing and device based on figure | |
CN103076892A (en) | Method and equipment for providing input candidate items corresponding to input character string | |
JP2022512269A (en) | Methods for extracting POI names, devices, devices, programs and computer storage media | |
CN105677857B (en) | method and device for accurately matching keywords with marketing landing pages | |
CN105589936A (en) | Data query method and system | |
CN102750282B (en) | Synonym template mining method and device as well as synonym mining method and device | |
CN103186522A (en) | Electronic device and natural language analyzing method thereof | |
CN102779135A (en) | Method and device for obtaining cross-linguistic search resources and corresponding search method and device | |
CN102339294A (en) | Searching method and system for preprocessing keywords | |
CN102541282B (en) | Utilize icon moving to the method, the Apparatus and system that complete vocabulary and edit again | |
CN106815215B (en) | The method and apparatus for generating annotation repository | |
CN105550169A (en) | Method and device for identifying point of interest names based on character length | |
CN105159885A (en) | Point-of-interest name identification method and device | |
CN106155998B (en) | A kind of data processing method and device | |
CN104462272A (en) | Search requirement analysis method and device | |
CN105138708A (en) | Method and device for identifying names of points of interest (POI) |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C14 | Grant of patent or utility model | ||
GR01 | Patent grant |