CN108415900A - A kind of visualText INFORMATION DISCOVERY method and system based on multistage cooccurrence relation word figure - Google Patents
A kind of visualText INFORMATION DISCOVERY method and system based on multistage cooccurrence relation word figure Download PDFInfo
- Publication number
- CN108415900A CN108415900A CN201810112596.2A CN201810112596A CN108415900A CN 108415900 A CN108415900 A CN 108415900A CN 201810112596 A CN201810112596 A CN 201810112596A CN 108415900 A CN108415900 A CN 108415900A
- Authority
- CN
- China
- Prior art keywords
- word
- document
- text
- keyword
- cooccurrence relation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The present invention provides a kind of visualText INFORMATION DISCOVERY method based on multistage cooccurrence relation word figure, and step includes:The content of text of abstracting document carries out cutting to content of text, obtains text fragments;Cutting is carried out to text fragments, extracts keyword, and tagged words class label;According to cooccurrence relation structure multistage cooccurrence relation word figure of the keyword in text fragments, the node in figure corresponds to keyword, and the side in figure corresponds to key words co-occurrence;Word document inverted index is built to each keyword in figure, to retrieve the document for including keyword;VisualText information is obtained by cooccurrence relation word figure.The present invention also provides a kind of visualText INFORMATION DISCOVERY systems based on multistage cooccurrence relation word figure, including document preprocessing module, keyword extracting module, multistage word figure structure module, word document index structure module and visual information discovery module.
Description
Technical field
The invention belongs to text mining, natural language processing field, be related to it is a kind of based on multistage cooccurrence relation word figure can
Method and system are found depending on changing text message.
Background technology
With internet, the development of office electronization, text message is in explosive growth trend, and the amount of text of generation is super
Previous any epoch are got over.One side text includes a large amount of valuable information, and another aspect mass text has significantly increased
Imitate the discovery cost of information.Overwhelming majority application (is such as published, row is ground, is supervised), user can not possibly be to the text of collection
The each document that shelves are concentrated is read to find effective information, and how using computer, assisted mining has from mass text
The information (text mining) of value becomes major issue urgently to be resolved hurrily.
Text mining is according to can be divided into 2 classes the characteristics of target information:The first kind, which is effective information, to be clearly defined
Text mining, for example classify or have the search of hard objectives, active computer can be met daily substantially by matching primitives
It needs;It is for second the text mining that effective information is difficult to be clearly defined, such as the scene that search need is fuzzy, existing method one
As INFORMATION DISCOVERY is carried out by way of " heuristic "." heuristic " INFORMATION DISCOVERY bottom utilizes function of search:User's input is looked into
Word is ask, manually checks search result, the query word formed next time continues search for, and the process constantly repetition is until finding result
Only.For " heuristic " INFORMATION DISCOVERY, the understanding with user to result, the inquiry finally used is likely to and initial inquiry
It is entirely different.
" heuristic " INFORMATION DISCOVERY method has 3 at present:When it is low to the artificial investigation efficiency of search result progress,
Artificial browsing document (search result) is a very time consuming process, is unable to quickly positioning target information;Second is that entire
Process lacks to the global control of destination document set, cause user be often absorbed in discovery procedure and not knowing " wherefrom come,
Where " the problem of, the state of information inspection can not be restored and be efficiently used in check next time;Third, can not
The document of inspected is filtered, it is difficult to avoid rechecking.
Invention content
In order to overcome the shortcomings of that above- mentioned information is found, the present invention proposes a kind of visualization based on multistage cooccurrence relation word figure
Text message finds method and system.
In order to solve the above technical problems, the present invention adopts the following technical scheme that:
A kind of visualText INFORMATION DISCOVERY method based on multistage cooccurrence relation word figure, as shown in Figure 1, its step packet
It includes:
The content of text of abstracting document carries out cutting to content of text, obtains text fragments;
Cutting is carried out to text fragments, extracts keyword, and tagged words class label;
Text fragments are analyzed, according to cooccurrence relation structure multistage cooccurrence relation word of the keyword in text fragments
Scheme, the node in figure corresponds to keyword, and the side in figure corresponds to key words co-occurrence;
Word-document inverted index is built to each keyword in figure, to retrieve the document for including keyword;
VisualText information is obtained by cooccurrence relation word figure.
Further, before the content of text of abstracting document, first document is parsed into row format.
Further, cutting is carried out using symbol, which includes punctuation mark;Or it is cut using stationary window
Point, the size and moving step length of window are set, starts from text to ending and moves, each text fragments of window delineation are as defeated
Go out.
Further, part of speech distinguishing label includes part of speech label, entity word label, document core word label, semantic role mark
Label, customization type label.
Further, entity word label includes solid polymer composite word.
Further, for document core word label, the method that finds document core word include using TF-IDF or
TextRank calculates word weight, and word-based weight is ranked up keyword, takes Topk keyword of top ranked as text
Shelves core word.
Further, the cooccurrence relation of keyword includes co-occurrence in one text segment, is total in adjacent N number of text fragments
The existing, co-occurrence in entire document.
Further, it for a pair of of keyword, can be only present in the nearest single cooccurrence relation word figure of cooccurrence relation,
Cooccurrence relation according to sequence from the near to the distant be one text segment in co-occurrence, in adjacent N number of text fragments co-occurrence, entire
Co-occurrence in document.
Further, the method for visualText information is obtained as shown in Fig. 2, including by cooccurrence relation word figure:It is global
The selection of the online browse of figure and Local map, Local map browses and the switching of extension browsing, cooccurrence relation shows and shows side by side,
Word figure browsing history, word vertex ticks and document markup.
The overall situation is schemed and the online browse of Local map refers to:Overall situation figure provides the displaying function to all words, Yong Huli
The general picture browsing to document sets can be formed with the function;Local map provides the displaying to the adjacent word node of selected word node
The browsing to document sets key area may be implemented using the function by function, user.For different co-occurrence windows, the displaying of figure
Content is different.The function of overall situation figure and Local map is realized by showing the word figure information that front end on-demand loading is drawn offline.
The selection of the Local map browses and extension browsing refers to:Selection browsing includes carrying out full text to the word in global figure
Search selects interested word, shows the Local map centered on the word, including carried out to figure interior joint according to word type label
Selection browsing;Extension browsing, which refers to user, can click neighbor node in Local map, and Local map is automatically updated into be saved with the neighbours
Local map centered on point.
The switching of the cooccurrence relation is shown and displaying refers to side by side:Switching displaying supports user centered on a word,
Different Local maps is loaded by selecting different co-occurrence ranks (window size);During displaying supports that user is with a word side by side
The heart is shown the Local map under different co-occurrence ranks side by side.Switching displaying and side by side displaying check word convenient for user flexibility
Context, find related thread.
Institute's predicate figure browsing history refers to:For user during being extended browsing, system can record user's point
The point and introductory path hit, path are preserved using graph structure, and subsequent user can be loaded and be searched for historical path,
Convenient for recalling and restoring inspection state.
Institute's predicate vertex ticks and document markup refer to:In navigation process, user can be to word node and correlation
Document be marked.It is marked including two classes:First, collection marks, the node and relevant documentation user after label can be rear
It is continuous to carry out emphasis inspection;Second is that deleting label, the node and relevant documentation after label can be deleted from document sets, corresponding
Multistage cooccurrence relation word figure can be also updated.
A kind of visualText INFORMATION DISCOVERY system based on multistage cooccurrence relation word figure, as shown in figure 3, pre- including document
Processing module, keyword extracting module, multistage word figure structure module, word-document index structure module and visual information are found
Module.
Document preprocessing module:Module input is document files set, exports and is<Document code, text fragments list>
Set.Processing to each document files includes being parsed into row format to file, content of text therein is extracted, according to predefined
Rule carries out cutting to full text, obtains the ordered list of text fragments.
Keyword extracting module:The module uses the output of document preprocessing module as input, is each text fragments
It is numbered, and text fragments is further cut, obtain<Word, word class>Set.The mark of word class can make
With the related tool of natural language processing, can also be completed by the self-defined processing of user.
Multistage word figure builds module:The module is built multistage co-occurrence and is closed using the output of keyword extracting module as input
Copula figure.Multistage refers to that the co-occurrence situation of word is investigated using different window sizes, to generate multiple cooccurrence relation word figures.
Such as co-occurrence, co-occurrence, in the same document co-occurrence etc. in adjacent N number of text fragments in one text segment.
Word-document index builds module:The module builds word-document inverted index, for examining to each word in word figure
Rope includes the document of word.
Visual information discovery module:The module is provided to be sent out based on the document browsing of word class and Term co-occurrence relative figure
Existing function, provides the mark function to document, provides the status saving function of traversal word figure, is realized from multi-angle to letter interested
The browsing of breath is found.
The method of the present invention is directed to given document sets and carries out visual information discovery, first with natural language processing technique pair
Document carries out cutting filtering, forms keyword set, and different size window is then used to investigate the co-occurrence situation of word, and structure is multistage
Cooccurrence relation word figure, the cooccurrence relation word figure are also known as word figure;User carries out visual information discovery by browsing the word figure;Visually
Changing INFORMATION DISCOVERY supports user to scan for the word in word figure;Support that selecting a word makees center, is checked by cooccurrence relation
Related term;It supports to do emphasis inspection to the document comprising selected word, word node is deleted in support, to delete relevant documentation and more
New cooccurrence relation word figure, supports the path for traversing word figure to user to preserve.
Document being improved into row information investigation using word figure and investigating efficiency, word figure, which is equivalent to, to be provided to document content
Abstract;It can be easy to be extended inspection using word figure cooccurrence relation, record user's word figure traverse path can help user
Control inspection progress;Subsequent document inspection quantity can be reduced by doing deletion label to word node, and avoid rechecking.
The method of the present invention is flexibly convenient, is embodied in the text fragments size adjusted by self-defined window size,
Text fragments are of different sizes, and obtained word association situation is also different;The class of which word and word can be extracted with self-defined keyword
It can not be determined according to discovery demand.
Description of the drawings
Fig. 1 is a kind of visualText INFORMATION DISCOVERY method flow diagram based on multistage cooccurrence relation word figure.
Fig. 2 is text visualization INFORMATION DISCOVERY functional schematic.
Fig. 3 is a kind of visualText INFORMATION DISCOVERY system diagram based on multistage cooccurrence relation word figure.
Fig. 4 is document pretreatment, keyword extraction schematic diagram.
Fig. 5 is the co-occurrence information schematic diagram that multistage word figure structure module uses.
Fig. 6 is a window co-occurrence figure-overall situation figure.
Fig. 7 is two window co-occurrence figures-overall situation figure.
Fig. 8 is a window co-occurrence figure-Local map (centered on " Tang Dechuan ").
Fig. 9 is two window co-occurrence figures-Local map (centered on " Tang Dechuan ").
Figure 10 is extension browsing schematic diagram (centre word from " Tang Dechuan " to " income-producing enterprise ").
Specific implementation mode
Features described above and advantage to enable the present invention are clearer and more comprehensible, special embodiment below, and institute's attached drawing is coordinated to make
Detailed description are as follows.
The present embodiment provides a kind of visualText INFORMATION DISCOVERY methods based on multistage cooccurrence relation word figure, to a text
Shelves set carries out INFORMATION DISCOVERY, and the document set includes 2 documents, as shown in Figure 1, method and step includes:
1. document pre-processes:
For each document in document sets, output<Document code, text fragments list>.Concrete processing procedure packet
It includes:(1) document is parsed into row format, extracts effective content of text;(2) cutting, the text after cutting are carried out to content of text
Segment generally corresponds to significant semantic primitive;Cutting can use following two classes method:(a) symbol is used to carry out cutting, symbol
It number is specified by user, these symbols include common punctuation mark, such as fullstop, comma, newline, paragraph indentation symbol;(b) make
With stationary window cutting, two parameters of window size and moving step length are set, are moved from the beginning of document to ending, window delineation
Each text fragments as output.
Content of text cutting for this example uses (a) method, selects comma as separator to be cut to document
Point, sentence set is obtained, document is pretreated, and the results are shown in Figure 4.
2. keyword extraction:
To each text fragments of every document, which is numbered text fragments, and is carried out to text fragments
Cutting obtains<Word, word class>List.Part of speech distinguishing label is determined according to demand by user, can use relevant natural language
Handling implement packet is sayed to extract.Commonly part of speech distinguishing label may include:(a) part of speech label, such as noun, verb;(b) entity
Word label, such as time, place, name, mechanism name, entity also includes solid polymer composite, i.e., new by being referred to after multiple word combinations
Entity, such as " commendatory meeting of group ", wherein " group " and " commendatory meeting " is respectively entity word, the two combination refers to novel entities; (c)
Document core word label, implementation method include calculating word weight using TF-IDF or TextRank, word-based weight to word into
Row sequence, takes Topk word of top ranked as core word;(d) semantic role label (Semantic Role
), such as beneficiary, condition, purpose, reason labeling;(e) customization type, the result that can be parsed based on syntax are carried out
It post-processes, such as subject, predicate, the object that OpenIE is obtained.
For this example, retain the part of speech distinguishing label of " noun, solid polymer composite, name, place name, mechanism name ", is based on these classes
Other word carries out INFORMATION DISCOVERY to document.The results are shown in Figure 4 for keyword extraction.For example, for sentence, " Tang Dechuan is in group's table
When South being praised in evident meeting ", obtain " Tang Dechuan/name ", " commendatory meeting of group/solid polymer composite ", " South/ground by extraction
Three words of name " and the sequence of word class.
3. multistage word figure (i.e. cooccurrence relation word figure) builds:
The word that word node of graph is exported using step 2, word figure side are determined by the cooccurrence relation of word.Multistage refers to using not
The co-occurrence situation that word is investigated with window size, to generate multiple cooccurrence relation word figures.Such as in one text segment altogether
Existing, co-occurrence, co-occurrence etc. in entire document in adjacent N number of text fragments.
For a pair of of specific word, it is desirable that can only occur in single word figure, which is min window of the keyword to appearance
Cooccurrence relation word figure corresponding to mouthful.The company side of the word obtained by co-occurrence can also be filtered deletion, filtering rule by with
Family is determined as needed.
For this example, the cooccurrence relation of two ranks is used:In the same window co-occurrence, in two neighboring window co-occurrence, window
Mouth unit is sentence, and the corresponding word figure generated is referred to as " a window co-occurrence figure " and " two window co-occurrence figures ".Obtained word with
Term co-occurrence combines as shown in figure 5, it is presented as even side in word figure.Specifically, in the same window co-occurrence, [" Tang De
River/name ", " commendatory meeting of group/solid polymer composite ", " South/place name "] these three words occur in same sentence, then passing through
The sentence, the company side of obtained word figure are the combination of two of these three words, i.e.,<Tang Dechuan, commendatory meeting of group>、<Tang Dechuan, South
>、<Commendatory meeting of group, South>.
By taking two neighboring window co-occurrence as an example, word list 1 [" Tang Dechuan/name ", " commendatory meeting of group/solid polymer composite ",
" South/place name "] in word and word list 2 [" South/place name ", " South representative in group/solid polymer composite "] in word two
Co-occurrence in a window ranges, then the word in word list 1 can obtain two window co-occurrence figures with the word combination of two in word list 2
Company side.It is noted here that be<Tang Dechuan, South>、<Commendatory meeting of group, South>Because occurring in " a window co-occurrence figure ",
According to " for a pair of of specific word, it is desirable that can only occur in single word figure ", so this two company sides are at " two window co-occurrence figures "
In deleted.
4. word-document index structure:
To each word in word figure, word-document inverted index is built, for retrieving the document for including word.
The data structure of multistage cooccurrence relation word figure and inverted index, subsequent visualization letter are generated by step 1-4
Breath finds to complete by carrying out searching load on demand to data structure.
5. visual information is found, Core Feature includes:
1) online browse of global figure and Local map.
Overall situation figure provides the association to all words and shows that function, user can form the general picture to document sets using the function
Property browsing, Fig. 6 gives the global figure of a window co-occurrence figure, and Fig. 7 gives the global figure of two window co-occurrence figures.Local map provides
To the displaying function of the adjacent word node of selected word node, user may be implemented using the function to the key area of document sets
Browsing, Fig. 8 give the Local map of a window co-occurrence figure.
For different size of co-occurrence window, the displaying content of figure is different.The function of overall situation figure and Local map is to pass through exhibition
Show the word figure information that front end on-demand loading is drawn offline to realize.
2) the selection browsing and extension browsing of Local map.
Selection browsing includes carrying out full-text search to the word in global figure, selects interested word, during displaying is with the word
The Local map of the heart, including selection browsing is carried out to figure interior joint according to word type label.Extension browsing, which refers to user, can click office
Neighbor node in portion's figure, Local map are automatically updated into the Local map centered on the neighbor node.
Figure 10 gives an example of extension browsing.User clicks office of " Tang Dechuan " displaying centered on " Tang Dechuan "
Portion's figure only highlights four neighbor nodes in Local map, and user, which clicks neighbor node " income-producing enterprise " and shows with " income-producing enterprise ", is
The Local map at center.
3) the switching displaying and displaying side by side of cooccurrence relation.
Switching displaying supports user centered on a word, is loaded not by selecting different co-occurrence ranks (window size)
Same Local map keeps centre word position constant.Displaying supports user centered on a word side by side, will be under different co-occurrence ranks
Local map shown side by side.The context of word is checked in switching displaying and side by side displaying convenient for user flexibility, finds relation line
Rope.
Fig. 8, Fig. 9 give the co-occurrence word of the word centered on " Tang Dechuan ", and Fig. 8 is the Local map of a window, and Fig. 9 is two windows
The Local map of mouth.The position of " Tang Dechuan " word is fixed in switching exhibition, and Fig. 8 is switched over Fig. 9;Displaying then can will be multiple side by side
The Local map of rank is shown simultaneously.
4) word figure browsing history.User carries out emphasis inspection by clicking the word in word figure to relevant documentation, usually
The extension function of browse in function 3 can be used.In navigation process, system can record the point of the mistake of user's click and related road
Diameter, path are preserved using tree construction, and user can load and search for historical path, are recalled convenient for user and are restored to check
State.
For Figure 10, " Tang Dechuan " and " income-producing enterprise " that user clicked can be saved.
5) word vertex ticks and document markup.
In navigation process, word node and relevant document can be marked in user.It is marked including two classes:
First, collection marks, the node and relevant documentation user after label can subsequently carry out emphasis inspection;
Second is that deleting label, the node and relevant documentation after label can be deleted from document sets, corresponding multistage total
Existing relative figure can be also updated.
The present embodiment also provides a kind of visualText INFORMATION DISCOVERY system based on multistage cooccurrence relation word figure, for real
The existing above method, composition as shown in figure 3, including document preprocessing module, keyword extracting module, multistage word figure structure module,
Word-document index structure module and visual information discovery module.
The above embodiments are merely illustrative of the technical solutions of the present invention rather than is limited, the ordinary skill of this field
Personnel can be modified or replaced equivalently technical scheme of the present invention, without departing from the spirit and scope of the present invention, this
The protection domain of invention should be subject to described in claims.
Claims (10)
1. a kind of visualText INFORMATION DISCOVERY method based on multistage cooccurrence relation word figure, step include:
The content of text of abstracting document carries out cutting to content of text, obtains text fragments;
Cutting is carried out to text fragments, extracts keyword, and tagged words class label;
According to cooccurrence relation structure multistage cooccurrence relation word figure of the keyword in text fragments, the node in figure corresponds to crucial
Word, the side in figure correspond to key words co-occurrence;
Word-document inverted index is built to each keyword in figure, to retrieve the document for including keyword;
VisualText information is obtained by cooccurrence relation word figure.
2. according to the method described in claim 1, it is characterized in that, before the content of text of abstracting document, first by document into
Row format parses.
3. according to the method described in claim 1, it is characterized in that, using symbol or fixed window to content of text and text fragments
Mouth carries out cutting, which includes punctuation mark, which is to start to ending to move from text.
4. according to the method described in claim 1, it is characterized in that, part of speech distinguishing label includes part of speech label, entity word label, text
Shelves core word label, semantic role label, customization type label.
5. according to the method described in claim 4, it is characterized in that, entity word label includes solid polymer composite word.
6. according to the method described in claim 4, it is characterized in that, for document core word label, document core word is found
Method includes calculating word weight using TF-IDF or TextRank, and word-based weight is ranked up keyword, takes ranking most
Topk high keyword is as document core word.
7. according to the method described in claim 1, it is characterized in that, the cooccurrence relation of keyword includes total in one text segment
Existing, co-occurrence, the co-occurrence in entire document in adjacent N number of text fragments.
8. the method according to the description of claim 7 is characterized in that for a pair of of keyword, cooccurrence relation can be only present in
In nearest single cooccurrence relation word figure, cooccurrence relation according to sequence from the near to the distant be one text segment in co-occurrence, in phase
Co-occurrence, the co-occurrence in entire document in adjacent N number of text fragments.
9. according to the method described in claim 1, it is characterized in that, obtaining visualText information by cooccurrence relation word figure
Method, including:Overall situation figure and the online browse of Local map, the switching of the selection browsing of Local map and extension browsing, cooccurrence relation
Displaying and side by side displaying, word figure browsing history, word vertex ticks and document markup.
10. a kind of visualText INFORMATION DISCOVERY system based on multistage cooccurrence relation word figure, including:
Document preprocessing module extracts content of text and carries out cutting, obtain text fragments for being parsed into row format to document
Ordered list;
Keyword extracting module carries out further cutting for being numbered for each text fragments, and to text fragments, obtains
<Word, word class>Set;
Multistage word figure builds module, for the cooccurrence relation according to keyword in text fragments, builds multistage cooccurrence relation word
Figure;
Word-document index builds module, and for building word-document inverted index, retrieval includes the document of keyword;
Visual information discovery module, for realizing document browsing, label, status saving function based on cooccurrence relation word figure.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810112596.2A CN108415900A (en) | 2018-02-05 | 2018-02-05 | A kind of visualText INFORMATION DISCOVERY method and system based on multistage cooccurrence relation word figure |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810112596.2A CN108415900A (en) | 2018-02-05 | 2018-02-05 | A kind of visualText INFORMATION DISCOVERY method and system based on multistage cooccurrence relation word figure |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108415900A true CN108415900A (en) | 2018-08-17 |
Family
ID=63127814
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810112596.2A Pending CN108415900A (en) | 2018-02-05 | 2018-02-05 | A kind of visualText INFORMATION DISCOVERY method and system based on multistage cooccurrence relation word figure |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108415900A (en) |
Cited By (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109145260A (en) * | 2018-08-24 | 2019-01-04 | 北京科技大学 | A kind of text information extraction method |
CN109359299A (en) * | 2018-09-28 | 2019-02-19 | 中国电子科技集团公司信息科学研究院 | A kind of internet of things equipment ability ontology based on commodity data is from construction method |
CN109726289A (en) * | 2018-12-29 | 2019-05-07 | 北京百度网讯科技有限公司 | Event detecting method and device |
CN109933707A (en) * | 2018-10-31 | 2019-06-25 | 中国科学院信息工程研究所 | A kind of theme corpus construction method and system based on search engine |
CN110399261A (en) * | 2019-06-13 | 2019-11-01 | 中国科学院信息工程研究所 | A kind of system alarm clustering method based on co-occurrence figure |
CN111078824A (en) * | 2019-12-18 | 2020-04-28 | 南京录信软件技术有限公司 | Method for reducing storage space occupied by Lucene dictionary-free n-gram word segmentation |
CN111145906A (en) * | 2019-12-31 | 2020-05-12 | 清华大学 | Item determination method, related device and readable storage medium |
CN111429912A (en) * | 2020-03-17 | 2020-07-17 | 厦门快商通科技股份有限公司 | Keyword detection method, system, mobile terminal and storage medium |
CN111444713A (en) * | 2019-01-16 | 2020-07-24 | 清华大学 | Method and device for extracting entity relationship in news event |
CN111666292A (en) * | 2020-04-24 | 2020-09-15 | 百度在线网络技术(北京)有限公司 | Similarity model establishing method and device for retrieving geographic positions |
CN111859962A (en) * | 2020-08-03 | 2020-10-30 | 广州威尔森信息科技有限公司 | Method and device for extracting data required by automobile public praise word cloud |
CN113486071A (en) * | 2021-07-27 | 2021-10-08 | 掌阅科技股份有限公司 | Searching method, server, client and system based on electronic book |
CN113901828A (en) * | 2020-06-22 | 2022-01-07 | 江苏税软软件科技有限公司 | Method for intelligently segmenting and labeling articles |
CN118377945A (en) * | 2024-06-25 | 2024-07-23 | 华能信息技术有限公司 | Visual page rapid construction system |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101067807A (en) * | 2007-05-24 | 2007-11-07 | 上海大学 | Text semantic visable representation and obtaining method |
CN103399901A (en) * | 2013-07-25 | 2013-11-20 | 三星电子(中国)研发中心 | Keyword extraction method |
CN104536956A (en) * | 2014-07-23 | 2015-04-22 | 中国科学院计算技术研究所 | A Microblog platform based event visualization method and system |
CN105843795A (en) * | 2016-03-21 | 2016-08-10 | 华南理工大学 | Topic model based document keyword extraction method and system |
CN106156286A (en) * | 2016-06-24 | 2016-11-23 | 广东工业大学 | Type extraction system and method towards technical literature knowledge entity |
CN106354708A (en) * | 2015-07-13 | 2017-01-25 | 中国电力科学研究院 | Client interaction information search engine system based on electricity information collection system |
US20170161702A1 (en) * | 2015-12-08 | 2017-06-08 | Rhapsody International Inc. | Graph-based music recommendation and dynamic media work micro-licensing systems and methods |
CN107016092A (en) * | 2017-04-06 | 2017-08-04 | 湘潭大学 | A kind of text search method based on flattening algorithm |
CN107480130A (en) * | 2017-07-25 | 2017-12-15 | 西北工业大学 | The property value homogeneity decision method of relation data based on WEB information |
-
2018
- 2018-02-05 CN CN201810112596.2A patent/CN108415900A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101067807A (en) * | 2007-05-24 | 2007-11-07 | 上海大学 | Text semantic visable representation and obtaining method |
CN103399901A (en) * | 2013-07-25 | 2013-11-20 | 三星电子(中国)研发中心 | Keyword extraction method |
CN104536956A (en) * | 2014-07-23 | 2015-04-22 | 中国科学院计算技术研究所 | A Microblog platform based event visualization method and system |
CN106354708A (en) * | 2015-07-13 | 2017-01-25 | 中国电力科学研究院 | Client interaction information search engine system based on electricity information collection system |
US20170161702A1 (en) * | 2015-12-08 | 2017-06-08 | Rhapsody International Inc. | Graph-based music recommendation and dynamic media work micro-licensing systems and methods |
CN105843795A (en) * | 2016-03-21 | 2016-08-10 | 华南理工大学 | Topic model based document keyword extraction method and system |
CN106156286A (en) * | 2016-06-24 | 2016-11-23 | 广东工业大学 | Type extraction system and method towards technical literature knowledge entity |
CN107016092A (en) * | 2017-04-06 | 2017-08-04 | 湘潭大学 | A kind of text search method based on flattening algorithm |
CN107480130A (en) * | 2017-07-25 | 2017-12-15 | 西北工业大学 | The property value homogeneity decision method of relation data based on WEB information |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109145260A (en) * | 2018-08-24 | 2019-01-04 | 北京科技大学 | A kind of text information extraction method |
CN109359299A (en) * | 2018-09-28 | 2019-02-19 | 中国电子科技集团公司信息科学研究院 | A kind of internet of things equipment ability ontology based on commodity data is from construction method |
CN109933707A (en) * | 2018-10-31 | 2019-06-25 | 中国科学院信息工程研究所 | A kind of theme corpus construction method and system based on search engine |
CN109933707B (en) * | 2018-10-31 | 2022-10-14 | 中国科学院信息工程研究所 | Topic corpus construction method and system based on search engine |
CN109726289A (en) * | 2018-12-29 | 2019-05-07 | 北京百度网讯科技有限公司 | Event detecting method and device |
CN111444713A (en) * | 2019-01-16 | 2020-07-24 | 清华大学 | Method and device for extracting entity relationship in news event |
CN111444713B (en) * | 2019-01-16 | 2022-04-29 | 清华大学 | Method and device for extracting entity relationship in news event |
CN110399261A (en) * | 2019-06-13 | 2019-11-01 | 中国科学院信息工程研究所 | A kind of system alarm clustering method based on co-occurrence figure |
CN111078824A (en) * | 2019-12-18 | 2020-04-28 | 南京录信软件技术有限公司 | Method for reducing storage space occupied by Lucene dictionary-free n-gram word segmentation |
CN111145906A (en) * | 2019-12-31 | 2020-05-12 | 清华大学 | Item determination method, related device and readable storage medium |
CN111145906B (en) * | 2019-12-31 | 2024-04-30 | 清华大学 | Project judging method, related device and readable storage medium |
CN111429912A (en) * | 2020-03-17 | 2020-07-17 | 厦门快商通科技股份有限公司 | Keyword detection method, system, mobile terminal and storage medium |
CN111429912B (en) * | 2020-03-17 | 2023-02-10 | 厦门快商通科技股份有限公司 | Keyword detection method, system, mobile terminal and storage medium |
CN111666292A (en) * | 2020-04-24 | 2020-09-15 | 百度在线网络技术(北京)有限公司 | Similarity model establishing method and device for retrieving geographic positions |
CN111666292B (en) * | 2020-04-24 | 2023-05-26 | 百度在线网络技术(北京)有限公司 | Similarity model establishment method and device for retrieving geographic position |
US11836174B2 (en) | 2020-04-24 | 2023-12-05 | Baidu Online Network Technology (Beijing) Co., Ltd. | Method and apparatus of establishing similarity model for retrieving geographic location |
CN113901828A (en) * | 2020-06-22 | 2022-01-07 | 江苏税软软件科技有限公司 | Method for intelligently segmenting and labeling articles |
CN111859962A (en) * | 2020-08-03 | 2020-10-30 | 广州威尔森信息科技有限公司 | Method and device for extracting data required by automobile public praise word cloud |
CN111859962B (en) * | 2020-08-03 | 2021-06-08 | 广州威尔森信息科技有限公司 | Method and device for extracting data required by automobile public praise word cloud |
CN113486071A (en) * | 2021-07-27 | 2021-10-08 | 掌阅科技股份有限公司 | Searching method, server, client and system based on electronic book |
CN118377945A (en) * | 2024-06-25 | 2024-07-23 | 华能信息技术有限公司 | Visual page rapid construction system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108415900A (en) | A kind of visualText INFORMATION DISCOVERY method and system based on multistage cooccurrence relation word figure | |
CN109857917B (en) | Security knowledge graph construction method and system for threat intelligence | |
US9135252B2 (en) | System and method for near and exact de-duplication of documents | |
CN105843795B (en) | Document keyword abstraction method and its system based on topic model | |
US10997678B2 (en) | Systems and methods for image searching of patent-related documents | |
CA2783344C (en) | Resource search operations | |
CN104281702B (en) | Data retrieval method and device based on electric power critical word participle | |
JP2006048684A (en) | Retrieval method based on phrase in information retrieval system | |
JP2006048686A (en) | Generation method for document explanation based on phrase | |
JP2006048685A (en) | Indexing method based on phrase in information retrieval system | |
JP2006048683A (en) | Phrase identification method in information retrieval system | |
CN106294588A (en) | The method and device of fast search content to be inquired about | |
US6694302B2 (en) | System, method and article of manufacture for personal catalog and knowledge management | |
CN113407678A (en) | Knowledge graph construction method, device and equipment | |
CN106649883B (en) | cross-language theme website automatic discovery method | |
WO2009035871A1 (en) | Browsing knowledge on the basis of semantic relations | |
Zeng et al. | Construction of scenic spot knowledge graph based on ontology | |
CN104933192A (en) | Automatic Chinese and Filipino bilingual parallel text collection system and implementation method | |
Umale et al. | Survey on document clustering approach for forensics analysis | |
CN109190041A (en) | A kind of labeling formula searching method participated in based on user | |
KR20150074268A (en) | Intangible Cultural Heritage Encyclopedia | |
Sultan et al. | Scraping Google Scholar Data Using Cloud Computing Techniques | |
Medrouk et al. | Review web pages collector tool for thematic corpus creation | |
JP2004234582A (en) | Dictionary construction method, system, and screen | |
Medina et al. | Document retrieval from multiple collections by using lightweight ontologies |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20180817 |