CN101308498A - Text collection visualized system - Google Patents

Text collection visualized system Download PDF

Info

Publication number
CN101308498A
CN101308498A CNA2008100401459A CN200810040145A CN101308498A CN 101308498 A CN101308498 A CN 101308498A CN A2008100401459 A CNA2008100401459 A CN A2008100401459A CN 200810040145 A CN200810040145 A CN 200810040145A CN 101308498 A CN101308498 A CN 101308498A
Authority
CN
China
Prior art keywords
module
text
xml file
result
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CNA2008100401459A
Other languages
Chinese (zh)
Other versions
CN100595762C (en
Inventor
马颖华
苏贵洋
李建华
冯薇
李文婷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Jiaotong University
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN200810040145A priority Critical patent/CN100595762C/en
Publication of CN101308498A publication Critical patent/CN101308498A/en
Application granted granted Critical
Publication of CN100595762C publication Critical patent/CN100595762C/en
Expired - Fee Related legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A text set visualization system in the computer application technical field comprises a text acquisition module, a Chinese word segmentation module, a term weight calculation module, an XML file organization module and a visual graphical interface module; firstly, the text acquisition module, the Chinese word segmentation module, a characteristic word weight calculation module and the XML file organization module constitute a local database; and then through the interface interaction between the visual graphical interface module and the local database, results are graphically displayed to the search keywords of the user. The presentation of the displayed results applies the visual mode of the correlation degree between the file and a plurality of keywords provided by the invention, and interactive operation of dragging the keywords on a graphical interface is provided to the user to expand the definition of semantic relationships among keywords to achieve better effects.

Description

Text collection visualized system
Technical field
The present invention relates to the system in a kind of Computer Applied Technology field, specifically, what relate to is a kind of text collection visualized system.
Background technology
Along with the web technology extensively and profoundly, people widen day by day for the channel that obtains of information, as long as people use search engine search, the Search Results that same keyword caused may all can be the data of magnanimity.How from these results, to extract the information that can reflect user's needs, become a proposition that more and more is much accounted of, and the visual of text is exactly one of them good solution.
Visual the presenting of information can be accelerated people to information processing speed.The human eye per second can be handled and surpass 5 megabits quantity of information, and the brain per second can only be understood the quantity of information of about 500 bits.In the common keyword search process, people often still need just can judge whether it is needed document by reading literal after the content of roughly having understood article.Even if only see the title of article, decide and whether remove to read this piece article, the quantity of information of checking to article content can be bigger like this, and sometimes can omit of great value text.But even so, also flooded by a large amount of articles through regular meeting.Text collection visualized technology is utilized the big characteristics of visual processes information, with the distribution of avatars information, judges whether approaching own demand of the text by vision, can handle a large amount of result for retrieval faster.
Text collection visualized purpose is the content of coming structured videotex with two-dimensional/three-dimensional image, to assist each alanysis, retrieval or text mining work.Therefore general elder generation sets up mathematical model for text, shows this model with two-dimensional/three-dimensional image then, to reach intuitively, to represent visually the purpose of text characteristics.Set up model and can help to understand text structure.
Article, paragraph are flat flow structures, be unsuitable for to analyze and handle, if do not set up suitable structure and model, and can only understanding word by word and sentence by sentence realizes analyzing to text by people.The method of this artificial inspection efficient under the situation of handling and analyze a large amount of text datas is extremely low; on the other hand; present general cyber stalker such as google, yahoo etc.; only judge that all the keyword that whether comprises input in the article returns Search Results; text set is not further classified or other processing; Shi Changhui returns some different web pages that contain the same text content, has strengthened the workload that the user checks text.
Find through literature search prior art, have in the Chinese patent much is about " visualization system ", such as 200510086559.1 (" remote visualization systems of computing grid "), (03121859.8 " modular assistant visualizing system ") or the like, though it is visual that these patented technologies can realize, but not that mode with figure shows, and can't be applied in text visualization technique search engine aspect.
Summary of the invention
The present invention is directed to above shortcomings in the prior art, a kind of text collection visualized system is provided, come keyword and text are measured by setting up the text data model, according to structure (being the text constituent) classifying text content, and with the patterned way display result, when making the user face the data of magnanimity, not only there is way to filter out the interested part of oneself possibility, more will being presented in face of the user with patterned mode image, and can reflect intuitively that each several part is searched for the tightness degree of purpose with oneself in the Search Results, award the direct guidance quality information of user.
The present invention is achieved by the following technical solutions:
The present invention includes: text collection module, Chinese word segmentation module, term weighing computing module, XML file organization module, visualized graphs interface module, wherein:
The text collection module is collected web page text on the internet, imports the text that collects into the Chinese word segmentation module as the raw data source;
The Chinese word segmentation module is carried out word segmentation processing to the content of text that obtains in the text collection module, and obtaining with the speech is the language material of unit, and the statistics word frequency, is kept at local text, reads word frequency information calculations weight for follow-up term weighing computing module;
The term weighing computing module is used for the result behind the participle is carried out the calculating that feature extraction is a feature speech weight, and together with the characteristic of correspondence speech, and the title of place text is given XML file organization module with result of calculation;
XML file organization module is responsible for data that the term weighing computing module is imported into and is become the XML file to be retained in local computer with the data structure organization of setting, and provides the result who reads after the text data structure processing for the visualized graphs interface module;
It is basic corpus that the visualized graphs interface module is retained in local result data with above-mentioned XML file organization module, by with the user obtain user command alternately, and demonstrate the result.
Described text collection module, comprise: download submodule and sub module stored, wherein: download at first root network address of submodule from setting, according to the web page interlinkage that provides on the root network address, the webpage source file of setting the number of plies grasps, reject non-body matters such as html mark and scripted code simultaneously, obtain initial content of text, this module invokes sub module stored then, text is kept under the local directory of setting, before each text is preserved, at first judges under same catalogue, whether to exist and the identical text in text source (is foundation with URL), if exist, then the text do not preserved; After collecting work finishes, call described Chinese word segmentation module the initial content of text of preserving is above carried out participle work.
Described Chinese word segmentation module is carried out participle to a large amount of Chinese texts that collect, and removes stop-word, makes it to become the independent feature speech, and obtains the frequency of the appearance of each word in every piece of article, gives the term weighing computing module; Wherein, stop-word refer to as " ", " with ", " about " wait some prepositions and tone auxiliary word or words very commonly used.Because these words do not have concrete connotation, and the frequency that occurs in the whole text in text has influence on the content analysis to text, so not within the limit of consideration of weight calculation usually than higher.
Described term weighing computing module, comprise file reading submodule and weight calculation submodule, wherein: the file reading submodule is read participle and word frequency information thereof from the file that Chinese word-dividing mode is preserved, by the weight that draws the feature speech in the weight calculation submodule, and call XML file organization module, feature speech in each piece document and weight data set thereof are made into tree structure, save as the XML feature database.
Described XML file organization module, defined the data organization form of text header, feature speech and weight thereof, and provide and write the XML file function and read the XML file function, confession language weight computation module will be exported the result and be organized into XML file and visualized graphs interface module and read data acquisition the XML file from this locality respectively.
Described visualized graphs interface module, comprise controlling sub, graphical interfaces submodule, wherein: controlling sub is accepted the user instruction operation, keyword that obtains importing and operating parameter, retrieval comprises the document and the corresponding weights of keyword in the XML library that preserve this locality, by graphical interfaces submodule display result; The graphical interfaces submodule is accepted the information parameter of user's input from controlling sub, obtain the data relevant at the XML library with input parameter, data-switching is become graph-based and be presented on the panel, this submodule allows user to carry out the keyword drag operation and adjusts display result simultaneously.
Described controlling sub, its operating parameter that can be provided with comprises: fuzzy/accurate coupling, displaying ratio and check result for retrieval with textual form.
Described fuzzy matching, as long as be meant that comprising a certain keyword just can count result for retrieval with the document, promptly " or " relation.
Described accurate coupling is meant that result for retrieval must comprise the keyword of all key entries, promptly " with " relation.
Present in the exploration of technology in image, the developer more and more wishes and can constantly make result for retrieval meet demands of individuals more in interaction and the feedback by individual participation.The visual basic goal of result for retrieval just is to design an interface so that the user can browse and operate result for retrieval.
In the present invention, better obtain Search Results by the participation that adds access customer.The interdynamic factor that adds is included in the various operating parameters in the described controlling sub, it is the operating parameter that the controlling sub of above-mentioned visualized graphs interface module is accepted, and in the result of graphical interfaces submodule showed, the present invention had realized representing with patterned way the method for document and a plurality of keyword correlation degrees.On the other hand, the user can be in display result as required, selects the text of more emphasizing certain or certain several key words contents.As fashionable,, can select on the position text center of gravity of more being partial to " visual " this keyword if the user wishes to seek when containing the more some more document of visualization technique content in search " text "-" visual " this keyword sets; If when checking, can select on the position focus point of more being partial to " text " about the visual visualization technique article of text.The user also can see that some article highlights the text visualization technique from the situation that whole text set distributes, and some just mentions visual this branch of text when speaking of visualization technique.
In addition, by the semantic relation of towing keyword with the adjustment keyword, as import keyword sets " static link library "-" dynamic link library "-" difference ", be to wish that " static link library " and " dynamic link library " can alternately occur, so their relation will express tight.These two keywords are dragged closely, just can obtain the result for retrieval of these two keyword combinations.
Compared with prior art, the present invention has following beneficial effect:
1, content is identical, but the text from different web pages is identical because of its data structure (weight of keyword in the text of place), the position that shows on graphical interfaces can be overlapping, make the user can skip the identical text of these contents, pay close attention to the text of other different content more, prevent repeat reading.
2, the present invention is distributed in text on the graphical interfaces by its data structure, and this classifying text makes the user to select view as text as required in similar text collection or foreign peoples's text collection.
3, the interface interaction factor of the present invention's adding has been expanded the freedom of search customization, the function of search of not only the same providing " or ", " with " with common search engine, can also be by the towing keyword, allow the user adjust keyword in the distance on the visualization interface with the semantic relation of simulation between the keyword, dynamically adjust the result.
Description of drawings
Fig. 1 is a system architecture diagram of the present invention;
Fig. 2 is a text collection module status process flow diagram.
Embodiment
Below in conjunction with accompanying drawing embodiments of the invention are elaborated: present embodiment is being to implement under the prerequisite with the technical solution of the present invention, provided detailed embodiment and concrete operating process, but protection scope of the present invention is not limited to following embodiment.
As shown in Figure 1, present embodiment comprises: text collection module, Chinese word segmentation module, term weighing computing module, XML file organization module, visualized graphs interface module, wherein:
The text collection module is collected network text on the internet, imports the abundant text that collects into the Chinese word segmentation module;
The Chinese word segmentation module is carried out participle to a large amount of Chinese texts that collect, and removes stop-word, makes it to become independently keyword, and obtains the frequency of the appearance of each word in every piece of article, gives weight computation module;
The term weighing computing module is used for the result behind the participle is carried out the calculating that feature extraction is a feature speech weight, gives XML file organization module with result of calculation together with characteristic of correspondence speech and document information;
XML file organization module is responsible for that the data set that the term weighing computing module imports into is made into the XML file and is retained in this locality, and provides the result who reads after the text data structure processing for the visualized graphs interface module;
It is basic corpus that the visualized graphs interface module is retained in local result data with above-mentioned XML file organization module, by with the user obtain user command alternately, and demonstrate the result.
In local data base, set up initial corpus by the text collection module earlier.
As shown in Figure 2, described text collection module comprises and downloads submodule and sub module stored, downloads submodule and is begun by the root network address of setting earlier, constantly find link information on the webpage and test layer by layer by regular expression, the webpage source file of setting level grasps.By html mark and scripted code in the regular expression rejecting webpage source file, obtain in Chinese text character string and the insertion tabulation again.Detect the content of text that inserts by sub module stored subsequently and whether repeat, if do not repeat then to save as local text.
Described Chinese word segmentation module is carried out participle to a large amount of Chinese texts that collect, and remove stop-word, make it to become the independent feature speech, and obtain the frequency of the appearance of each word in every piece of article, give the term weighing computing module, wherein, stop-word refers to preposition, tone auxiliary word or words very commonly used.
The technology of Chinese word segmentation is slightly different with English participle, and English text is the speech string on small size character set, and Chinese is the word string on large character set.Simultaneously, the sentence of Chinese does not have tangible dividing mark between speech and speech, but continuous word string.Therefore,, must allow computing machine understand the implication of each speech in the sentence,, just can not cause the mistake in the understanding correct the cutting out of speech if want the participle that carries out of computer intelligence.
At present, the Chinese word segmentation algorithm model can be divided three classes substantially: be respectively mechanical Chinese word segmentation method (claiming dictionary formula syncopation again), and semantic morphology and the artificial intelligence method (claim again to understand and divide morphology) of dividing.Mechanical Chinese word segmentation refers to mate in dictionary; Semantic analysis introduced in the semantic morphology that divides, and the language message of natural language self is more handled; Artificial intelligence is that information is carried out intelligent a kind of pattern of handling, mainly contains two kinds of processing modes: a kind of psychologic symbol processing method, function of simulation human brain of being based on.As expert system promptly is the function of wishing the simulation human brain, the constructive inference network, and through symbol transition, thus can the processing of making an explanation property.A kind ofly be based on physiological analogy method.The operating mechanism that neural network is intended to simulate the nervous system mechanism of human brain realizes certain function.
The segmenting method of Cai Yonging is the mechanical Chinese word segmentation method in the present invention, the technology of using is to carry out word segmentation processing by the FreeICTCLAS Words partition system that the Chinese Academy of Sciences researches and develops, and calculate the word frequency that isolated feature speech occurs in text, then the feature speech in every piece of text and word frequency thereof the form with vector is kept in the text document.In addition, this module also be responsible for to be removed stop-word, as " ", " with ", " about " wait some prepositions and tone auxiliary word or words very commonly used.
Described term weighing computing module, comprise file reading submodule and weight calculation submodule, wherein: the file reading submodule is read participle and word frequency information thereof from the file that Chinese word-dividing mode is preserved, by the weight that draws the feature speech in the weight calculation submodule, and call XML file organization module, feature speech in each piece document and weight data set thereof are made into tree structure, save as the XML feature database.
Described weight computation module reads the feature speech of Chinese word segmentation module output and the text of word frequency, uses the TFIDF algorithm that often uses when information extraction and text feature.The TFIDF algorithm is that present information is obtained the comparatively ripe method in field, also is that information is obtained a kind of weighting algorithm that extracts frequent use in the research with text feature.For any given word w, the prime formula of the calculating of its TFIDF value is expressed as follows:
Weight(w)=TF(w)×IDF(w)=f(w)×log(N/n)
Wherein, the number of times that f (w) occurs in certain piece of article (being commonly referred to as the prospect language material) for this word, N is the text sum of the whole language materials of experiment (being commonly referred to as the background language material), and the number of times that n occurs in background language material text for this speech.In general, the occurrence number of word w is calculated according to article, and promptly w is designated as 1 time no matter how many times occurred as long as occur in certain piece of article.What TF was in fact given is the frequency of occurrences of word in the prospect language material, is equivalent to the absolute contribution of this word for the prospect language material.In general, the TF value is high more, and this speech is just representative more for the prospect language material.And IDF equals the ratio of the textual data n that background language material textual data N and this speech occur in the background language material, description be the size of this word usable range.When certain speech often occurs (n is very big), think this speech is interior everyday words, does not all have, so its IDF value becomes very little because n is very big just for what representativeness on a large scale for any field in the background language material.By multiplying each other of TF and IDF, can make the word of those authentic representative prospect language material characteristics obtain higher weights, and those generic word commonly used are suppressed.
In the use of reality, the TFIDF algorithm according to the difference of environment for use, also has many kinds to change, to adapt to different needs on the basis of prime formula.In the present invention, adopt a kind of distortion, promptly come result of calculation in conjunction with the normalization weighting formula that weighting algorithm draws to above-mentioned prime formula.Formula is as follows,
W ( t , d ) = tf ( t , d ) × log ( N / n i + 0.01 ) Σ t ∈ d [ tf ( t , d ) × log ( N / n i + 0.01 ) ] 2
This formula has increased normalized factor at prime formula, and wherein, t represents certain keyword, and d represents certain document, and (t d) just represents the frequency that keyword t occurs to tf in document d.The purpose of this formula is to reduce the inhibiting effect of indivedual high frequency vocabulary to other feature speech, and each component is carried out standardization.
The content that obtains in above-mentioned feature speech weight computation module (weighted value of feature speech in document name, the document and feature speech correspondence) is imported into and is organized into the XML file layout with certain data structure in the XML text molded tissue block.Because the visualized graphs interface module can receive the input data from two approach, the one, directly to submit in real time behind the data source collection analysis and handle by internal memory, another kind is that reading of data is handled from be kept at local XML data library.In order to guarantee the fetch interface unanimity of visualized graphs interface module, to unifying in organizational form through content after the weights computing and the content of from the XML file, reading.Therefore, the output form that reads the XML data promptly writes the input form of XML file.
Described visualized graphs interface module, comprise controlling sub, graphical interfaces submodule, wherein: controlling sub is accepted the user instruction operation, keyword that obtains importing and operating parameter, retrieval comprises the document and the corresponding weights of keyword in the XML library that preserve this locality, by graphical interfaces submodule display result.In the result of graphical interfaces submodule showed, the present invention had realized representing with patterned way the method for document and a plurality of keyword correlation degrees.
Implementation to the graph-based of document and a plurality of keyword correlation degrees in the graphical interfaces submodule is described below:
1. the coordinate system that as a result of shows with the polar coordinate system in the unit circle, the keyword of key entry is distributed in the unit circle edge;
2. each keyword corresponding each piece in unit circle (weights have passed through normalization) document of comprising it has a weight point, and the radian coordinate of this point is identical with the radian coordinate of keyword, and the radius coordinate figure is the weighted value of this keyword in this piece document.The weight point of the keyword set of input in same piece of writing document connects into the polygon about this " text-choose keyword ";
3. calculate polygonal center of gravity, and be presented on the panel of graphical interfaces submodule.This center of gravity and the relative position of each keyword can illustrate the relevant tightness degree of the document and each keyword;
4. allow the user to change each keyword and distribute, thereby adjust this search, so as to the semantic relation between the explanation keyword for the tendentiousness between each keyword along circumference towing keyword.
By the way, when several target keyword composition weights in several pieces of documents were identical, promptly text was about the structural similarity of this theme, and the polygon center of gravity of these texts of representative that show in graphical interfaces so can be assembled or overlap.After one piece that checked wherein, the user can be to negate this bunch text from the angle decision of cluster, or checks this bunch text set in detail.The user can be in display result as required, select the text of more emphasizing certain or certain several key words contents, and provide and to expand the semantic relation that defines between the keyword in the interactive operation of towing keyword on the graphical interfaces to the user, obtain better effect.

Claims (7)

1, a kind of text collection visualized system is characterized in that, comprising: text collection module, Chinese word segmentation module, term weighing computing module, XML file organization module, visualized graphs interface module, wherein:
The text collection module is collected web page text on the internet, imports the text that collects into the Chinese word segmentation module as the raw data source;
The Chinese word segmentation module is carried out word segmentation processing to the content of text that obtains in the text collection module, and obtaining with the speech is the language material of unit, and the statistics word frequency, is kept at local text, reads word frequency information calculations weight for follow-up term weighing computing module;
The term weighing computing module is used for the result behind the participle is carried out the calculating that feature extraction is a feature speech weight, and together with the characteristic of correspondence speech, and the title of place text is given XML file organization module with result of calculation;
XML file organization module is responsible for data that the term weighing computing module is imported into and is become the XML file to be retained in local computer with the data structure organization of setting, and provides the result who reads after the text data structure processing for the visualized graphs interface module;
It is basic corpus that the visualized graphs interface module is retained in local result data with above-mentioned XML file organization module, by with the user obtain user command alternately, and demonstrate the result.
2, text collection visualized system according to claim 1, it is characterized in that, described text collection module, comprise: download submodule and sub module stored, wherein: download at first root network address of submodule from setting, according to the web page interlinkage that provides on the root network address, the webpage source file of setting the number of plies grasps, reject non-body matters such as html mark and scripted code simultaneously, obtain initial content of text, this module invokes sub module stored then, text is kept under the local directory of setting, before each text is preserved, at first judges under same catalogue, whether to exist and the identical text in text source, if exist, then the text do not preserved; After collecting work finishes, call described Chinese word segmentation module the initial content of text of preserving is above carried out participle work.
3, text collection visualized system according to claim 1, it is characterized in that, described Chinese word segmentation module is carried out participle to a large amount of Chinese texts that collect, and remove stop-word, make it to become the independent feature speech, and obtain the frequency of the appearance of each word in every piece of article, give the term weighing computing module, wherein, stop-word refers to preposition, tone auxiliary word or words very commonly used.
4, text collection visualized system according to claim 1, it is characterized in that, described term weighing computing module, comprise file reading submodule and weight calculation submodule, wherein: the file reading submodule is read participle and word frequency information thereof from the file that Chinese word-dividing mode is preserved, by the weight that draws the feature speech in the weight calculation submodule, and call XML file organization module, feature speech in each piece document and weight data set thereof are made into tree structure, save as the XML feature database.
5, text collection visualized system according to claim 1, it is characterized in that, described XML file organization module definition the data organization form of text header, feature speech and weight thereof, and provide and write the XML file function and read the XML file function, confession language weight computation module will be exported the result and be organized into XML file and visualized graphs interface module and read data acquisition the XML file from this locality respectively.
6, text collection visualized system according to claim 1, it is characterized in that, described visualized graphs interface module, comprise controlling sub, graphical interfaces submodule, wherein: controlling sub is accepted the user instruction operation, keyword that obtains importing and operating parameter, retrieval comprises the document and the corresponding weights of keyword in the XML library that preserve this locality, by graphical interfaces submodule display result.
7, text collection visualized system according to claim 6, it is characterized in that, described controlling sub, its operating parameter that can be provided with comprises: fuzzy/as accurately to mate, displaying ratio and check result for retrieval with textual form, wherein: described fuzzy matching, as long as be meant that comprising a certain keyword just counts result for retrieval with the document, promptly " or " relation; Described accurate coupling is meant that result for retrieval must comprise the keyword of all key entries, promptly " with " relation.
CN200810040145A 2008-07-03 2008-07-03 Text collection visualized system Expired - Fee Related CN100595762C (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN200810040145A CN100595762C (en) 2008-07-03 2008-07-03 Text collection visualized system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN200810040145A CN100595762C (en) 2008-07-03 2008-07-03 Text collection visualized system

Publications (2)

Publication Number Publication Date
CN101308498A true CN101308498A (en) 2008-11-19
CN100595762C CN100595762C (en) 2010-03-24

Family

ID=40124956

Family Applications (1)

Application Number Title Priority Date Filing Date
CN200810040145A Expired - Fee Related CN100595762C (en) 2008-07-03 2008-07-03 Text collection visualized system

Country Status (1)

Country Link
CN (1) CN100595762C (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101887416A (en) * 2010-06-29 2010-11-17 魔极科技(北京)有限公司 Method and system for converting characters into graphs
CN103593337A (en) * 2013-11-04 2014-02-19 清华大学 Image-text set visualization method and device
CN103761245A (en) * 2013-12-18 2014-04-30 天脉聚源(北京)传媒科技有限公司 Hot keyword display method, device and browser thereof
CN104123291A (en) * 2013-04-25 2014-10-29 华为技术有限公司 Method and device for classifying data
CN104376038A (en) * 2014-09-12 2015-02-25 中国人民解放军信息工程大学 Position associated text information visualization method based on label cloud
CN104408083A (en) * 2014-10-27 2015-03-11 六盘水职业技术学院 Socialized media analyzing system
CN104978407A (en) * 2015-06-18 2015-10-14 上海交通大学 Visualized presentation system and method for high-dimensional data characteristic attribute change trend
CN105912661A (en) * 2016-04-11 2016-08-31 乐视控股(北京)有限公司 Method and apparatus for removing html tag from search engine
CN106844554A (en) * 2016-12-30 2017-06-13 全民互联科技(天津)有限公司 A kind of contract classification automatic identifying method and system
CN107368506A (en) * 2015-05-11 2017-11-21 斯图飞腾公司 Unstructured data analysis system and method
CN107657067A (en) * 2017-11-14 2018-02-02 国网山东省电力公司电力科学研究院 A kind of quick method for pushing of frontier science and technology information and system based on COS distance
CN108170761A (en) * 2017-12-23 2018-06-15 合肥弹刚信息科技有限公司 A kind of Visualized Analysis System and its method based on magnanimity documentation & info
CN108984647A (en) * 2018-06-26 2018-12-11 北京工业大学 A kind of water utilities domain knowledge map construction method based on Chinese text
CN109460457A (en) * 2018-10-25 2019-03-12 北京奥法科技有限公司 Text sentence similarity calculating method, intelligent government affairs auxiliary answer system and its working method
CN109933782A (en) * 2018-12-03 2019-06-25 阿里巴巴集团控股有限公司 User emotion prediction technique and device
CN111680516A (en) * 2020-06-04 2020-09-18 宁波浙大联科科技有限公司 PDM system product design requirement information semantic analysis and extraction method and system
CN111858830A (en) * 2020-03-27 2020-10-30 北京梦天门科技股份有限公司 Health supervision law enforcement data retrieval system and method based on natural language processing
CN112364626A (en) * 2020-11-25 2021-02-12 广东电网有限责任公司佛山供电局 Intelligent management method and system for safety measures

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101887416A (en) * 2010-06-29 2010-11-17 魔极科技(北京)有限公司 Method and system for converting characters into graphs
CN104123291A (en) * 2013-04-25 2014-10-29 华为技术有限公司 Method and device for classifying data
CN104123291B (en) * 2013-04-25 2017-09-12 华为技术有限公司 A kind of method and device of data classification
CN103593337A (en) * 2013-11-04 2014-02-19 清华大学 Image-text set visualization method and device
CN103593337B (en) * 2013-11-04 2016-08-17 清华大学 A kind of method for visualizing of image-text set
CN103761245B (en) * 2013-12-18 2017-10-24 天脉聚源(北京)传媒科技有限公司 A kind of method, device and browser for showing hot keyword
CN103761245A (en) * 2013-12-18 2014-04-30 天脉聚源(北京)传媒科技有限公司 Hot keyword display method, device and browser thereof
CN104376038A (en) * 2014-09-12 2015-02-25 中国人民解放军信息工程大学 Position associated text information visualization method based on label cloud
CN104408083A (en) * 2014-10-27 2015-03-11 六盘水职业技术学院 Socialized media analyzing system
CN107368506B (en) * 2015-05-11 2020-11-06 斯图飞腾公司 Unstructured data analysis system and method
CN107368506A (en) * 2015-05-11 2017-11-21 斯图飞腾公司 Unstructured data analysis system and method
CN104978407A (en) * 2015-06-18 2015-10-14 上海交通大学 Visualized presentation system and method for high-dimensional data characteristic attribute change trend
CN104978407B (en) * 2015-06-18 2018-03-06 上海交通大学 System and method is presented in visualization for high dimensional data characteristic attribute variation tendency
CN105912661A (en) * 2016-04-11 2016-08-31 乐视控股(北京)有限公司 Method and apparatus for removing html tag from search engine
CN106844554A (en) * 2016-12-30 2017-06-13 全民互联科技(天津)有限公司 A kind of contract classification automatic identifying method and system
CN107657067A (en) * 2017-11-14 2018-02-02 国网山东省电力公司电力科学研究院 A kind of quick method for pushing of frontier science and technology information and system based on COS distance
CN107657067B (en) * 2017-11-14 2021-03-19 国网山东省电力公司电力科学研究院 Cosine distance-based leading-edge scientific and technological information rapid pushing method and system
CN108170761A (en) * 2017-12-23 2018-06-15 合肥弹刚信息科技有限公司 A kind of Visualized Analysis System and its method based on magnanimity documentation & info
CN108984647A (en) * 2018-06-26 2018-12-11 北京工业大学 A kind of water utilities domain knowledge map construction method based on Chinese text
CN109460457A (en) * 2018-10-25 2019-03-12 北京奥法科技有限公司 Text sentence similarity calculating method, intelligent government affairs auxiliary answer system and its working method
CN109933782A (en) * 2018-12-03 2019-06-25 阿里巴巴集团控股有限公司 User emotion prediction technique and device
CN109933782B (en) * 2018-12-03 2023-11-28 创新先进技术有限公司 User emotion prediction method and device
CN111858830A (en) * 2020-03-27 2020-10-30 北京梦天门科技股份有限公司 Health supervision law enforcement data retrieval system and method based on natural language processing
CN111858830B (en) * 2020-03-27 2023-11-14 北京梦天门科技股份有限公司 Health supervision law enforcement data retrieval system and method based on natural language processing
CN111680516A (en) * 2020-06-04 2020-09-18 宁波浙大联科科技有限公司 PDM system product design requirement information semantic analysis and extraction method and system
CN112364626A (en) * 2020-11-25 2021-02-12 广东电网有限责任公司佛山供电局 Intelligent management method and system for safety measures
CN112364626B (en) * 2020-11-25 2023-09-01 广东电网有限责任公司佛山供电局 Intelligent safety measure management method and system

Also Published As

Publication number Publication date
CN100595762C (en) 2010-03-24

Similar Documents

Publication Publication Date Title
CN100595762C (en) Text collection visualized system
Görg et al. Combining computational analyses and interactive visualization for document exploration and sensemaking in jigsaw
Over et al. DUC in context
CN101582080B (en) Web image clustering method based on image and text relevant mining
US20120047123A1 (en) System and method for document analysis, processing and information extraction
CN106339502A (en) Modeling recommendation method based on user behavior data fragmentation cluster
Kostoff Co-word analysis
CN107798622A (en) A kind of method and apparatus for identifying user view
KR20230052609A (en) Review analysis system using machine reading comprehension and method thereof
Endert et al. Typograph: Multiscale spatial exploration of text documents
Carrion et al. A taxonomy generation tool for semantic visual analysis of large corpus of documents
Atzberger et al. Software Forest: A Visualization of Semantic Similarities in Source Code using a Tree Metaphor.
Tabassum et al. Semantic analysis of Urdu english tweets empowered by machine learning
Bu et al. An FAR-SW based approach for webpage information extraction
Repke et al. Extraction and representation of financial entities from text
Musliadi et al. Twitter Social Media Conversion Topic Trending Analysis Using Latent Dirichlet Allocation Algorithm
Rayson et al. Towards interactive multidimensional visualisations for corpus linguistics
Bonnel et al. Effective organization and visualization of web search results
Hlava The Taxobook: Applications, implementation, and integration in search: Part 3 of a 3-part series
Poibeau et al. Generating navigable semantic maps from social sciences corpora
Hirokawa et al. Search and analysis of bankruptcy cause by classification network
Belerao et al. Summarization using mapreduce framework based big data and hybrid algorithm (HMM and DBSCAN)
Saleheen et al. User centric dynamic web information visualization
Sánchez-Zamora et al. Visualizing tags as a network of relatedness
Koutsoupias et al. Sage Research Methods Cases Part 2

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20100324

Termination date: 20170703

CF01 Termination of patent right due to non-payment of annual fee