CN106599163A - Data mining method and device for big data - Google Patents
Data mining method and device for big data Download PDFInfo
- Publication number
- CN106599163A CN106599163A CN201611123018.6A CN201611123018A CN106599163A CN 106599163 A CN106599163 A CN 106599163A CN 201611123018 A CN201611123018 A CN 201611123018A CN 106599163 A CN106599163 A CN 106599163A
- Authority
- CN
- China
- Prior art keywords
- sentence
- subdata
- base
- cap
- subdata base
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/3332—Query translation
- G06F16/3335—Syntactic pre-processing, e.g. stopword elimination, stemming
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Fuzzy Systems (AREA)
- Mathematical Physics (AREA)
- Probability & Statistics with Applications (AREA)
- Software Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention provides a data mining method for big data. The method comprises the following steps: performing word segmentation on each statement in text database contents; identifying whether characters, words and word groups belong to entities after the word segmentation; then performing semantic annotation analysis on the characters, words and word groups after the word segmentation; performing syntactic analysis on the text database contents; generating a complete structured database according to a syntactic analysis result; dividing the complete structured database into different sub-databases; and selecting corresponding sub-databases, combination of the sub-databases or the complete structured database to perform mining analysis according to a specific mining target. By adoption of the method provided by the invention, the data mining efficiency can be improved. The invention further provides a data mining device for big data.
Description
Technical field
The present invention relates to technical field of computer information processing, in particular, is related to a kind of data for big data
Method for digging and device.
Background technology
At present, as the class of business of the increasingly extensive and different field of cyber-net application becomes increasingly abundant,
Different classes of object is effectively excavated from mass data record and is implemented from different to be directed to different classes of object
It is more and more important that reason scheme becomes.
However, existing technical scheme there are the following problems:As whole data base will be processed when excavating, required time compared with
It is long, data mining it is less efficient.
The content of the invention
The technical problem to be solved is to provide a kind of data digging method for big data, for improving
The efficiency of data mining.
To reach object above, according to an aspect of the invention, there is provided a kind of data mining side for big data
Method, comprises the steps:
Step 101:Participle is carried out to each sentence in the middle of text database content;
Step 102:Whether belong to entity to the word after participle described in step 101, word and phrase to be identified;
Step 103:Semantic tagger analysis is carried out to the word after participle described in step 101, word and phrase;
Step 104:Syntactic analysis is carried out to text database content;
Step 105:Fully structured data base is generated according to syntactic analysis result;
Step 106:Fully structured data base is divided into into different subdata bases;
Step 107:Target is excavated according to specific, corresponding subdata base, the combination of subdata base or complete is selected
Structured database carries out mining analysis.
Preferably, in step 103, the word after Entity recognition is counted and is classified after semantic tagger, and with point
The class labelling sentence.
Further, potential excavation target can be considered during classification annotation, while limiting the key words sorting of a sentence
Quantity.
Preferably, in step 105, the fully structured data base that generated statement structure is fixed, and generating complete knot
During structure data base, the key words sorting of each sentence is preserved, while counting to key words sorting.
Preferably, in step 106, statistical result or the conventional excavation target according to statement classification labelling, will be complete
Whole structured database is divided into different subdata bases, and gives subdata base to index, and its index is with statement classification labelling
Or excavate based on target.
Further, when splitting subdata base, the sentence for making labelling similar is put in same subdata base, different sons
Between data base, similarity is as far as possible little, wherein:
Between computing statement, the formula of similarity is:
Or:
Wherein, front formula is adapted to the preresearch estimates of large-scale data, and sim () is Similarity Measure function, and d1, d2 are sentence, α
For the granularity of key words sorting, key words sorting numbers of the L (d1) for the d1 sentences in structured database, its value are equal with L (d2),
L (d1 ∩ d2) is the number of the identical key words sorting in sentence d1 and sentence d2, and n1 and n2 is scalable coefficient, and its value is more than
0。
Between computing statement and subdata base, the computing formula of similarity is:
Or:
Wherein, front formula is adapted to the preresearch estimates of large-scale data, and D is subdata base, and L (d1 ∩ D) is the classification of sentence d1
The number of the index being contained in subdata base D in labelling, n3 and n4 are scalable coefficient, and its value is more than 0.
Calculating formula of similarity between subdata base is:
Or:
Wherein, front formula is adapted to the preresearch estimates of large-scale data, numbers of the L (D1) for the index in subdata base D1, L
The number that (D1 ∩ D2) is indexed for subdata base D1 and D2 identical, n5 and n6 are scalable coefficient, and its value is more than 0.
Preferably, in step 107, according to the difference for excavating target, different subdata bases, the group of subdata base are selected
Close or fully structured data base carries out mining analysis.
According to another aspect of the present invention, there is provided a kind of data mining device for big data, including:
Word-dividing mode, for carrying out participle to each sentence in the middle of text database content;
Words Entity recognition module, is identified for whether the word after participle, word and phrase belong to entity;
Semantic tagger module, for carrying out semantic tagger analysis to the word after participle, word and phrase;
Syntactic analysis module, for carrying out syntactic analysis to text database content;
Data base's generation module, for generating fully structured data base according to syntactic analysis result;
Data base splits module, for fully structured data base is divided into different subdata bases;
Data-mining module, for excavating target according to specific, selects corresponding subdata base, the combination of subdata base
Or fully structured data base carries out mining analysis.
Preferably, semanteme marks module, for the word after Entity recognition being counted and being classified after semantic tagger,
And with the key words sorting sentence.
Preferably, data base's generation module, for the fully structured data base that generated statement structure is fixed, and is generating
During fully structured data base, the key words sorting of each sentence is preserved, while counting to key words sorting.
Preferably, data base's segmentation module, for the statistical result according to statement classification labelling or conventional excavation mesh
Fully structured data base is divided into different subdata bases, and gives subdata base to index by mark, and its index is with sentence point
Class labelling is excavated based on target, and during segmentation subdata base, the sentence for making labelling similar is put in same subdata base, different
Subdata base between similarity it is as far as possible little, wherein:
Between computing statement, the formula of similarity is:
Or:
Wherein, front formula is adapted to the preresearch estimates of large-scale data, and sim () is Similarity Measure function, and d1, d2 are sentence, α
For the granularity of key words sorting, key words sorting numbers of the L (d1) for the d1 sentences in structured database, its value are equal with L (d2),
L (d1 ∩ d2) is the number of the identical key words sorting in sentence d1 and sentence d2, and n1 and n2 is scalable coefficient, and its value is more than
0;
Between computing statement and subdata base, the computing formula of similarity is:
Or:
Wherein, front formula is adapted to the preresearch estimates of large-scale data, and D is subdata base, and L (d1 ∩ D) is the classification of sentence d1
The number of the index being contained in subdata base D in labelling, n3 and n4 are scalable coefficient, and its value is more than 0;
Calculating formula of similarity between subdata base is:
Or:
Wherein, front formula is adapted to the preresearch estimates of large-scale data, numbers of the L (D1) for the index in subdata base D1, L
The number that (D1 ∩ D2) is indexed for subdata base D1 and D2 identical, n5 and n6 are scalable coefficient, and its value is more than 0.
Preferably, data-mining module, for according to the difference for excavating target, selecting different subdata bases, subdata
The combination in storehouse or fully structured data base carry out mining analysis.
Description of the drawings
Fig. 1 is a kind of flow chart of data digging method for big data according to embodiments of the present invention;
Fig. 2 is a kind of schematic diagram of data mining device for big data according to embodiments of the present invention.
Specific embodiment
In order that the objects, technical solutions and advantages of the present invention become more apparent, it is below in conjunction with drawings and Examples, right
The present invention is further elaborated.It should be appreciated that specific embodiment described herein is only to explain the present invention, and
It is not used in the restriction present invention.
Fig. 1 is a kind of flow chart of data digging method for big data according to embodiments of the present invention.
In step 101, participle is carried out to each sentence in the middle of text database content.
In step 102, whether entity is belonged to the word after participle described in step 101, word and phrase and is identified.
In step 103, semantic tagger analysis is carried out to the word after participle described in step 101, word and phrase.
The word after Entity recognition is counted and classified after semantic tagger, classified with the noun (object etc.) in sentence
Affiliated physics classification is carried out, and can such as be divided into vehicles class, electronic product etc., and with key words sorting this article database
Sentence.In one embodiment of the invention, the key words sorting of 4 sentences is respectively:
Sentence 1:A, B, C, D;
Sentence 2:A, B, C, E;
Sentence 3:A, F, G, H;
Sentence 4:A, F, I, J.
In step 104, syntactic analysis is carried out to text database content;
In step 105, fully structured data base is generated according to syntactic analysis result;
In one embodiment, the fully structured data base that generated statement structure is fixed, sentence structure fix and refer to institute
Some sentences are recombinated with fixed structure, are such as arranged according to the order of subject, predicate, object, attribute, the adverbial modifier, complement
Row, the composition lacked in sentence are filled with empty content.When fully structured data base is generated, the classification of each sentence is preserved
Labelling, while counting to key words sorting.In one embodiment of the invention, 4 sentences contain key words sorting A, contain
The sentence for having key words sorting B, C, F respectively has 2.
In step 106, fully structured data base is divided into into different subdata bases;
In one embodiment, statistical result or the conventional excavation target according to statement classification labelling, completely will tie
Structure data base is divided into different subdata bases, and gives subdata base to index, and its index is with statement classification labelling or digging
Based on pick target, during segmentation subdata base, the sentence for making similarity higher is put in same subdata base, different subdatas
Similarity between storehouse is as far as possible little, wherein:
Between computing statement, the formula of similarity is:
Or:
Wherein, front formula is adapted to the preresearch estimates of large-scale data, and sim () is Similarity Measure function, and d1, d2 are sentence, α
For the granularity of key words sorting, key words sorting numbers of the L (d1) for the d1 sentences in structured database, its value are equal with L (d2),
L (d1 ∩ d2) is the number of the identical key words sorting in sentence d1 and sentence d2, and n1 and n2 is scalable coefficient, and its value is more than
0;
Between computing statement and subdata base, the computing formula of similarity is:
Or:
Wherein, front formula is adapted to the preresearch estimates of large-scale data, and D is subdata base, and L (d1 ∩ D) is the classification of sentence d1
The number of the index being contained in subdata base D in labelling, n3 and n4 are scalable coefficient, and its value is more than 0;
Calculating formula of similarity between subdata base is:
Or:
Wherein, front formula is adapted to the preresearch estimates of large-scale data, numbers of the L (D1) for the index in subdata base D1, L
The number that (D1 ∩ D2) is indexed for subdata base D1 and D2 identical, n5 and n6 are scalable coefficient, and its value is more than 0.
In one embodiment of the invention, key words sorting is only divided into 1 granularity, sets which as 1.Granularity illustrates sentence
The degree of roughness classified in tag along sort or subdata base index, the such as granularity of the fineness ratio household electrical appliances of electronic product are thick, family
The granularity of the fineness ratio TV of electricity is thick, and the thicker coverage rate for representing a statement classification label of granularity is bigger, is calculated by formula
Similarity it is also higher.When while statement tag along sort or subdata base index are divided into multiple granularities, the calculating of similarity needs to use
Go to calculate with the tag along sort or index of primary particle size.
As the similarity of sentence 1 and sentence 2 is higher, the similarity of sentence 3 and sentence 4 is higher, therefore preliminary by sentence 1
Same subdata base D1 is put into sentence 2,4 points of sentence 3 and sentence are put into another subdata base D2.2 subdata bases
Index can take the higher front N items of frequency of tag along sort and determine, in one embodiment of the invention, take front 3 contingency table
Sign as index.Therefore, the tag along sort of subdata base D1 is { A, B, C }, and subdata base D2 tag along sorts are { A, F, G } (its
Middle G is selected for alphabet sequence).
Now, the similarity between 2 subdata bases is:
When a newly-increased sentence 5, (its label is:B, C, E, F) when, the similarity of computing statement 5 and subdata base,
And put it in the higher subdata base of similarity,
Now, (sentence 5, D1,1) (1) sentence 5 and its tag along sort, therefore are pressed certain to > sim to sim by sentence 5, D2
Structure (sentence is put into by structure fixed in subdata base) is put in subdata base D1.
In another embodiment of the present invention, the key words sorting of 4 sentences used is remained as:
Sentence 1:A, B, C, D;
Sentence 2:A, B, C, E;
Sentence 3:A, F, G, H;
Sentence 4:A, F, I, J.
One of conventional excavation target is categorized as:D3 { A, B, C }, wherein D4 { E, F, G }, D3, D4 are the son of generation filling
Data base, { A, B, C } and { E, F, G } are respectively one index, are made up of conventional excavation target.Computing statement and subdata
Similarity between storehouse:
When filling similarity threshold is (when the similarity between certain sentence and subdata base is more than this value, by the language
Sentence and its tag along sort by a fixed structure add subdata base) for 0 when, then subdata base D3 comprising sentence 1, sentence 2, sentence 3,
Sentence 4 totally 4 sentences and its tag along sort, subdata base D4 include sentence 2, sentence 3, sentence 4 totally 3 sentences and its contingency table
Sign.When it is 0.5 to fill similarity threshold, then subdata base D3 includes sentence 1, sentence 2 totally 2 sentences and its tag along sort,
Subdata base D4 includes sentence 3 totally 1 sentence and its tag along sort.
In step 107, target is excavated according to specific, select corresponding subdata base, the combination of subdata base or complete
Whole structured database carries out mining analysis.
In one embodiment of the invention, when excavation target has the characteristic of B, then using the sentence in subdata base D1
Structure and tag along sort carry out mining analysis, when excavation target has the characteristic of A, then using subdata base D1 and subdata base
Sentence structure and tag along sort in D2 carries out mining analysis.
Fig. 2 is a kind of schematic diagram of data mining device for big data according to embodiments of the present invention.
According to another aspect of the present invention, there is provided a kind of data mining device for big data, including:
Word-dividing mode 201, for carrying out participle to each sentence in the middle of text database content;
Words Entity recognition module 202, is identified for whether the word after participle, word and phrase belong to entity;
Semantic tagger module 203, for carrying out semantic tagger analysis to the word after participle, word and phrase;
Syntactic analysis module 204, for carrying out syntactic analysis to text database content;
Data base's generation module 205, for generating fully structured data base according to syntactic analysis result;
Data base splits module 206, for fully structured data base is divided into different subdata bases;
Data-mining module 207, for excavating target according to specific, selects corresponding subdata base, subdata base
Combination or fully structured data base carry out mining analysis.
Preferably, semanteme marks module 203, for the word after Entity recognition being counted and being divided after semantic tagger
Class, and with the key words sorting sentence.
Preferably, data base's generation module 205, for the fully structured data base that generated statement structure is fixed, and
When generating fully structured data base, the key words sorting of each sentence is preserved, while counting to key words sorting.
Preferably, data base's segmentation module 206, for the statistical result according to statement classification labelling or conventional excavation
Fully structured data base is divided into different subdata bases, and gives subdata base to index by target, and its index is with sentence
Key words sorting is excavated based on target, and during segmentation subdata base, the sentence for making labelling similar is put in same subdata base, no
Between same subdata base, similarity is as far as possible little, wherein:
Between computing statement, the formula of similarity is:
Or:
Wherein, front formula is adapted to the preresearch estimates of large-scale data, and sim () is Similarity Measure function, and d1, d2 are sentence, α
For the granularity of key words sorting, key words sorting numbers of the L (d1) for the d1 sentences in structured database, its value are equal with L (d2),
L (d1 ∩ d2) is the number of the identical key words sorting in sentence d1 and sentence d2, and n1 and n2 is scalable coefficient, and its value is more than
0;
Between computing statement and subdata base, the computing formula of similarity is:
Or:
Wherein, front formula is adapted to the preresearch estimates of large-scale data, and D is subdata base, and L (d1 ∩ D) is the classification of sentence d1
The number of the index being contained in subdata base D in labelling, n3 and n4 are scalable coefficient, and its value is more than 0;
Calculating formula of similarity between subdata base is:
Or:
Wherein, front formula is adapted to the preresearch estimates of large-scale data, numbers of the L (D1) for the index in subdata base D1, L
The number that (D1 ∩ D2) is indexed for subdata base D1 and D2 identical, n5 and n6 are scalable coefficient, and its value is more than 0.
Preferably, data-mining module 207, for according to the difference for excavating target, selecting different subdata bases, subnumber
Mining analysis are carried out according to the combination or fully structured data base in storehouse.
With the above-mentioned desirable embodiment according to the present invention as enlightenment, by above-mentioned description, ordinary skill
Personnel can carry out various change and modification completely in the range of without departing from this invention technological thought.This invention
Technical scope be not limited to the content in description, it is necessary to its technical model is determined according to right
Enclose.
Claims (10)
1. a kind of data digging method for big data, it is characterised in that comprise the steps:
Step 101:Participle is carried out to each sentence in the middle of text database content;
Step 102:Whether belong to entity to the word after participle described in step 101, word and phrase to be identified;
Step 103:Semantic tagger analysis is carried out to the word after participle described in step 101, word and phrase;
Step 104:Syntactic analysis is carried out to text database content;
Step 105:Fully structured data base is generated according to syntactic analysis result;
Step 106:Fully structured data base is divided into into different subdata bases;
Step 107:Target is excavated according to specific, corresponding subdata base, the combination of subdata base or complete structure is selected
Changing data base carries out mining analysis.
2. method according to claim 1, it is characterised in that in step 103,
The word after Entity recognition is counted and classified after semantic tagger, and with the key words sorting sentence.
3. method according to claim 1, it is characterised in that in step 105,
The fully structured data base that generated statement structure is fixed, and when fully structured data base is generated, preserve each language
The key words sorting of sentence, while counting to key words sorting.
4. method according to claim 1, it is characterised in that in step 106,
According to statistical result or the conventional excavation target of statement classification labelling, fully structured data base is divided into into difference
Subdata base, and give subdata base to index, its index is with statement classification labelling or excavates based on target, splits subdata
During storehouse, the sentence for making labelling similar is put in same subdata base, and between different subdata bases, similarity is as far as possible little, its
In:
Between computing statement, the formula of similarity is:
Wherein, sim () is Similarity Measure function, and d1, d2 are sentence, and α is the granularity of key words sorting, and L (d1) is structuring number
According to the key words sorting number of the d1 sentences in storehouse, its value is equal with L (d2), and L (d1 ∩ d2) is the phase in sentence d1 and sentence d2
The number of same key words sorting, n1 and n2 are scalable coefficient, and its value is more than 0;
Between computing statement and subdata base, the computing formula of similarity is:
Wherein, D is subdata base, and L (d1 ∩ D) is the index being contained in subdata base D in the key words sorting of sentence d1
Number, n3 and n4 are scalable coefficient, and its value is more than 0;
Calculating formula of similarity between subdata base is:
Wherein, numbers of the L (D1) for the index in subdata base D1, L (D1 ∩ D2) are indexed for subdata base D1 and D2 identical
Number, n5 and n6 be scalable coefficient, its value be more than 0.
5. method according to claim 1, it is characterised in that in step 107,
According to the difference for excavating target, the combination or fully structured data base for selecting different subdata base, subdata bases is entered
Row mining analysis.
6. a kind of data mining device for big data, it is characterised in that include:
Word-dividing mode, for carrying out participle to each sentence in the middle of text database content;
Words Entity recognition module, is identified for whether the word after participle, word and phrase belong to entity;
Semantic tagger module, for carrying out semantic tagger analysis to the word after participle, word and phrase;
Syntactic analysis module, for carrying out syntactic analysis to text database content;
Data base's generation module, for generating fully structured data base according to syntactic analysis result;
Data base splits module, for fully structured data base is divided into different subdata bases;
Data-mining module, for excavating target according to specific, select corresponding subdata base, the combination of subdata base or
Fully structured data base carries out mining analysis.
7. device according to claim 6, it is characterised in that:
Semanteme marks module, for the word after Entity recognition being counted and being classified after semantic tagger, and uses contingency table
Remember the sentence.
8. device according to claim 6, it is characterised in that:
Data base's generation module, for the fully structured data base that generated statement structure is fixed, and is generating fully structured
During data base, the key words sorting of each sentence is preserved, while counting to key words sorting.
9. device according to claim 6, it is characterised in that:
Data base splits module, for the statistical result according to statement classification labelling or conventional excavation target, completely will tie
Structure data base is divided into different subdata bases, and gives subdata base to index, and its index is with statement classification labelling or digging
Based on pick target, during segmentation subdata base, the sentence for making labelling similar is put in same subdata base, different subdata bases
Between similarity it is as far as possible little, wherein:
Between computing statement, the formula of similarity is:
Wherein, sim () is Similarity Measure function, and d1, d2 are sentence, and α is the granularity of key words sorting, and L (d1) is structuring number
According to the key words sorting number of the d1 sentences in storehouse, its value is equal with L (d2), and L (d1 ∩ d2) is the phase in sentence d1 and sentence d2
The number of same key words sorting, n1 and n2 are scalable coefficient, and its value is more than 0;
Between computing statement and subdata base, the computing formula of similarity is:
Wherein, D is subdata base, and L (d1 ∩ D) is the index being contained in subdata base D in the key words sorting of sentence d1
Number, n3 and n4 are scalable coefficient, and its value is more than 0;
Calculating formula of similarity between subdata base is:
Wherein, numbers of the L (D1) for the index in subdata base D1, L (D1 ∩ D2) are indexed for subdata base D1 and D2 identical
Number, n5 and n6 be scalable coefficient, its value be more than 0.
10. device according to claim 6, it is characterised in that:
Data-mining module, for according to the difference for excavating target, selecting the combination or complete of different subdata base, subdata bases
Whole structured database carries out mining analysis.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611123018.6A CN106599163B (en) | 2016-12-08 | 2016-12-08 | A kind of data digging method and device for big data |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611123018.6A CN106599163B (en) | 2016-12-08 | 2016-12-08 | A kind of data digging method and device for big data |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106599163A true CN106599163A (en) | 2017-04-26 |
CN106599163B CN106599163B (en) | 2019-11-22 |
Family
ID=58598579
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611123018.6A Active CN106599163B (en) | 2016-12-08 | 2016-12-08 | A kind of data digging method and device for big data |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106599163B (en) |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102043812A (en) * | 2009-10-13 | 2011-05-04 | 北京大学 | Method and system for retrieving medical information |
US20150324349A1 (en) * | 2014-05-12 | 2015-11-12 | Google Inc. | Automated reading comprehension |
CN105117388A (en) * | 2015-09-21 | 2015-12-02 | 上海智臻智能网络科技股份有限公司 | Intelligent robot interaction system |
CN105260178A (en) * | 2015-09-21 | 2016-01-20 | 上海智臻智能网络科技股份有限公司 | Intelligent cloud service application development method and system |
CN105302859A (en) * | 2015-09-21 | 2016-02-03 | 上海智臻智能网络科技股份有限公司 | Intelligent interaction system based on Internet |
CN105528410A (en) * | 2015-12-05 | 2016-04-27 | 浙江大学 | Method for concluding and classifying online comments of hospital |
-
2016
- 2016-12-08 CN CN201611123018.6A patent/CN106599163B/en active Active
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102043812A (en) * | 2009-10-13 | 2011-05-04 | 北京大学 | Method and system for retrieving medical information |
US20150324349A1 (en) * | 2014-05-12 | 2015-11-12 | Google Inc. | Automated reading comprehension |
CN105117388A (en) * | 2015-09-21 | 2015-12-02 | 上海智臻智能网络科技股份有限公司 | Intelligent robot interaction system |
CN105260178A (en) * | 2015-09-21 | 2016-01-20 | 上海智臻智能网络科技股份有限公司 | Intelligent cloud service application development method and system |
CN105302859A (en) * | 2015-09-21 | 2016-02-03 | 上海智臻智能网络科技股份有限公司 | Intelligent interaction system based on Internet |
CN105528410A (en) * | 2015-12-05 | 2016-04-27 | 浙江大学 | Method for concluding and classifying online comments of hospital |
Non-Patent Citations (1)
Title |
---|
郑美玉: "基于本体的中文博客二级自动分类研究", 《情报科学》 * |
Also Published As
Publication number | Publication date |
---|---|
CN106599163B (en) | 2019-11-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103617157B (en) | Based on semantic Text similarity computing method | |
CN103049435B (en) | Text fine granularity sentiment analysis method and device | |
CN103678564B (en) | Internet product research system based on data mining | |
Sebastiani | Classification of text, automatic | |
CN101593200B (en) | Method for classifying Chinese webpages based on keyword frequency analysis | |
CN103279478B (en) | A kind of based on distributed mutual information file characteristics extracting method | |
CN104166651A (en) | Data searching method and device based on integration of data objects in same classes | |
CN104111933A (en) | Method and device for acquiring business object label and building training model | |
CN102436480B (en) | Incidence relation excavation method for text-oriented knowledge unit | |
CN106547875A (en) | A kind of online incident detection method of the microblogging based on sentiment analysis and label | |
CN102262653A (en) | Label recommendation method and system based on user motivation orientation | |
CN107239564A (en) | A kind of text label based on supervision topic model recommends method | |
CN105488029A (en) | KNN based evidence taking method for instant communication tool of intelligent mobile phone | |
CN110019820A (en) | Main suit and present illness history symptom Timing Coincidence Detection method in a kind of case history | |
CN106445914A (en) | Microblog emotion classifier establishing method and device | |
CN105868387A (en) | Method for outlier data mining based on parallel computation | |
Sara-Meshkizadeh et al. | Webpage classification based on compound of using HTML features & URL features and features of sibling pages | |
CN107368610B (en) | Full-text-based large text CRF and rule classification method and system | |
CN101470699A (en) | Information extraction model training apparatus, information extraction apparatus and information extraction system and method thereof | |
CN106372123B (en) | Tag-based related content recommendation method and system | |
CN112270189A (en) | Question type analysis node generation method, question type analysis node generation system and storage medium | |
Sun | Research on product attribute extraction and classification method for online review | |
CN106599163A (en) | Data mining method and device for big data | |
CN113268614B (en) | Label system updating method and device, electronic equipment and readable storage medium | |
CN104281695A (en) | Combination theory based quasi natural language semantic information extraction method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |