CN104239479A - Document classification method and system - Google Patents
Document classification method and system Download PDFInfo
- Publication number
- CN104239479A CN104239479A CN201410449140.7A CN201410449140A CN104239479A CN 104239479 A CN104239479 A CN 104239479A CN 201410449140 A CN201410449140 A CN 201410449140A CN 104239479 A CN104239479 A CN 104239479A
- Authority
- CN
- China
- Prior art keywords
- document
- sorted
- classification
- training
- characteristic attribute
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
Abstract
The invention discloses a document classification method and a document classification system, and is applied to a Hadoop cluster comprising a Map program and a Reduce program. The method comprises the following steps that the Map program parses a training document and a document to be classified, determines a characteristic attribute according to a parsing result, and divides the characteristic attribute; the Map program generates a classifier according to the characteristic attribute of the training document and a classification result of the training document; the Reduce program classifies the document to be classified to obtain a classification result of the document to be classified by virtue of the classifier. According to the method and the system, a distributed characteristic of the Hadoop cluster is fully utilized, and the limitation of a conventional system frame is avoided; the method and the system have the characteristics of concurrency and high speed; massive documents can be rapidly classified, so that classification time is saved, and the document classification efficiency and the system performance are improved.
Description
Technical field
The present invention relates to field of computer technology, be specifically related to a kind of Document Classification Method and system.
Background technology
Day by day universal along with network technology, the data volume in network sharply increases, and application type is also very abundant.Data mining technology makes full use of existing information resource, finds out hiding knowledge from mass data, is a strong developing direction.Data mining relates to the fields such as machine learning, pattern-recognition, statistics, intelligent database, data visualization and high-performance calculation, its object is to find implicit, novel, interesting relation and rule from mass data.Wherein, document classification is an important directions of data mining.
In prior art, usually use traditional system framework to carry out document classification, when processing mass data, the classification time can be caused long, and system performance is low.
Summary of the invention
The invention provides a kind of Document Classification Method and system, to solve the low defect of system performance in prior art.
The invention provides a kind of Document Classification Method, be applied to and comprise in the Hadoop cluster of Map program and Reduce program, said method comprising the steps of:
Described Map program is resolved Training document and document to be sorted, according to analysis result determination characteristic attribute, and divides described characteristic attribute;
Described Map program, according to the characteristic attribute of described Training document and the classification results to described Training document, generates sorter;
Described Reduce program uses described sorter to classify to described document to be sorted, obtains the classification results of document to be sorted.
Alternatively, described Map program, according to after analysis result determination characteristic attribute, also comprises:
Described Map program, according to described characteristic attribute, carries out format conversion to described Training document and described document to be sorted respectively, obtains meeting the Training document of preset format and document to be sorted;
Described Map program, according to the characteristic attribute of described Training document and the classification results to described Training document, generates sorter, is specially:
Described Map program, according to the characteristic attribute of the Training document after format conversion and the classification results to described Training document, generates sorter;
Described Reduce program uses described sorter to classify to described document to be sorted, obtains the classification results of document to be sorted, is specially:
Described Reduce program uses described sorter to classify to the document to be sorted after format conversion, obtains the classification results of document to be sorted.
Alternatively, described Map program, according to the characteristic attribute of the Training document after format conversion and the classification results to described Training document, generates sorter, is specially:
Described Map program is according to the span of each characteristic attribute corresponding to the Training document after described format conversion and the classification results to described Training document, calculating the frequency of occurrences of each classification in described Training document and the conditional probability of each span of all characteristic attributes is estimated under each classification, is sorter by the described frequency of occurrences and described conditional probability estimated record.
Alternatively, described Reduce program uses described sorter to classify to the document to be sorted after format conversion, obtains the classification results of document to be sorted, is specially:
Described Reduce program obtains the span of all characteristic attributes of the document to be sorted after described format conversion, according to the frequency of occurrences in Training document of the span got, each classification and under each classification the conditional probability of each span of all characteristic attributes estimate, calculate the conditional probability that described document to be sorted belongs to each classification, and using the classification results of classification corresponding for conditional probability maximum for numerical value as described document to be sorted.
Alternatively, described in described Map program, Training document and document to be sorted are resolved, according to analysis result determination characteristic attribute, and described characteristic attribute are divided, be specially:
Described Map program, by resolving Training document and document to be sorted, obtains the attribute that Training document and document package to be sorted contain, and selected characteristic attribute in the attribute analytically obtained, and divide multiple span for each characteristic attribute.
Present invention also offers a kind of document classification system, be applied in Hadoop cluster, described system comprises:
Parsing module, for resolving Training document and document to be sorted, according to analysis result determination characteristic attribute, and divides described characteristic attribute;
Generation module, for the characteristic attribute of described Training document determined according to described parsing module and the classification results to described Training document, generates sorter;
Sort module, the described sorter generated for using described generation module is classified to described document to be sorted, obtains the classification results of document to be sorted.
Alternatively, described system, also comprises:
Modular converter, for the described characteristic attribute determined according to described parsing module, carries out format conversion to described Training document and described document to be sorted respectively, obtains meeting the Training document of preset format and document to be sorted;
Described generation module, specifically for according to the characteristic attribute of the Training document after described modular converter format conversion and the classification results to described Training document, generates sorter;
Described sort module, the described sorter generated specifically for using described generation module is classified to the document to be sorted after described modular converter format conversion, obtains the classification results of document to be sorted.
Alternatively, described generation module, specifically for according to the span of each characteristic attribute corresponding to the Training document after described modular converter format conversion and the classification results to described Training document, calculating the frequency of occurrences of each classification in described Training document and the conditional probability of each span of all characteristic attributes is estimated under each classification, is sorter by the described frequency of occurrences and described conditional probability estimated record.
Alternatively, described sort module, specifically for obtaining the span of all characteristic attributes of the document to be sorted after described modular converter format conversion, according to the frequency of occurrences in Training document of the span got, each classification and under each classification the conditional probability of each span of all characteristic attributes estimate, calculate the conditional probability that described document to be sorted belongs to each classification, and using the classification results of classification corresponding for conditional probability maximum for numerical value as described document to be sorted.
Alternatively, described parsing module, specifically for by resolving Training document and document to be sorted, obtains the attribute that Training document and document package to be sorted contain, and selected characteristic attribute in the attribute analytically obtained, and divide multiple span for each characteristic attribute.
The present invention takes full advantage of the distributed nature of Hadoop cluster, avoids the limitation of legacy system framework, has parallel feature fast, the classification to magnanimity document can be realized fast, save the classification time, improve the efficiency of document classification, improve system performance.
Accompanying drawing explanation
Fig. 1 is the process flow diagram of a kind of Document Classification Method in the embodiment of the present invention;
Fig. 2 is the structural representation of a kind of document classification system in the embodiment of the present invention.
Embodiment
Below in conjunction with the accompanying drawing in the embodiment of the present invention, be clearly and completely described the technical scheme in the embodiment of the present invention, obviously, described embodiment is only the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.
It should be noted that, if do not conflicted, each feature in the embodiment of the present invention and embodiment can be combined with each other, all within protection scope of the present invention.In addition, although show logical order in flow charts, in some cases, can be different from the step shown or described by order execution herein.
A kind of Document Classification Method is proposed in the embodiment of the present invention, be applied to and comprise in the Hadoop cluster of Map program and Reduce program, in use Hadoop order, Training document and document to be sorted are placed into HDFS (Hadoop Distributed File System, distributed file system) upper after, perform operation as shown in Figure 1:
Step 101, Map program is resolved Training document and document to be sorted, according to analysis result determination characteristic attribute, and divides characteristic attribute.
Particularly, Map program can by resolving Training document and document to be sorted, obtains the attribute that Training document and document package to be sorted contain, and selected characteristic attribute in the attribute analytically obtained, and divide multiple span for each characteristic attribute.
Wherein, under Training document and document to be sorted can be arranged in the different directories of HDFS, and managed by split catalog, the name of each file is class label, and the content under file is the document of the class corresponding with belonging to such label.
Such as, Training document be arranged in HDFS /train catalogue under, document to be sorted be arranged in HDFS /test catalogue under.Map program, according to the analysis result to Training document and document to be sorted, selects 3 characteristic attribute: a, daily record quantity/registration number of days; B, good friend's quantity/registration number of days; C, whether use true head portrait, and each characteristic attribute is divided into: { a<=0.05,0.05<a<0.2, a>=0.2}; { b<=0.1,0.1<b<0.8, b>=0.8}; { c=0 (not being), c=1 (YES) }.
Step 102, Map program, according to the characteristic attribute determined, carries out format conversion to Training document and document to be sorted respectively, obtains meeting the Training document of preset format and document to be sorted.
Particularly, Map program can PrepareTwentyNewsgroups class in utility command row Mahout, is meet the Training document of preset format and document to be sorted by Training document and document subject feature vector to be sorted.Wherein, preset format can be VectorWritable form, and in the document meeting VectorWritable form, first character is class label, and remaining character is characteristic attribute.
Step 103, the characteristic attribute of Map program according to the Training document after format conversion and the classification results to Training document, generate sorter.
Particularly, Map program can be corresponding according to the Training document after format conversion the span of each characteristic attribute and the classification results to Training document, calculating the frequency of occurrences of each classification in Training document and the conditional probability of each span of all characteristic attributes is estimated under each classification, is sorter by the above-mentioned frequency of occurrences and conditional probability estimated record.
Such as, the number of Training document is 10,000, and its classification results is: 8900 Training document belong to real account numbers (that is, C=0), and 1100 Training document belong to non-genuine account (that is, C=1).
The frequency of occurrences of each classification in Training document is:
P(C=0)=8900/10000=0.89;
P(C=1)=1100/10000=0.11;
Under each classification, the conditional probability of each span of all characteristic attributes is estimated as:
P(a<=0.05︱C=0)=0.3
P(0.05<a<0.2︱C=0)=0.5
P(a>=0.2︱C=0)=0.2
P(a<=0.05︱C=1)=0.8
P(0.05<a<0.2︱C=1)=0.1
P(a>=0.2︱C=1)=0.1
P(b<=0.1︱C=0)=0.1
P(0.1<b<0.8︱C=0)=0.7
P(b>=0.8︱C=0)=0.2
P(b<=0.1︱C=1)=0.7
P(0.1<b<0.8︱C=1)=0.2
P(b>=0.8︱C=1)=0.1
P(c=0︱C=0)=0.2
P(c=1︱C=0)=0.8
P(c=0︱C=1)=0.9
P(c=1︱C=1)=0.1
Step 104, Reduce program uses sorter to classify to the document to be sorted after format conversion, obtains the classification results of document to be sorted.
Particularly, Reduce program can obtain the span of all characteristic attributes of the document to be sorted after format conversion, according to the frequency of occurrences in Training document of the span got, each classification and under each classification the conditional probability of each span of all characteristic attributes estimate, calculate the conditional probability that document to be sorted belongs to each classification, and the classification results of classification corresponding for conditional probability maximum for numerical value as document to be sorted is recorded on HDFS.
Such as, the span of 3 characteristic attributes of document to be sorted is: 0.05<a<0.2,0.1<b<0.8, b>=0.8, c=0, then document to be sorted belongs to the conditional probability of real account numbers (that is, C=0) and is:
P(C=0)P(x︱C=0)
=P(C=0)P(0.05<a<0.2︱C=0)P(0.1<b<0.8︱C=0)P(c=0︱C=0)
=0.89*0.5*0.7*0.2
=0.0623;
The conditional probability that document to be sorted belongs to non-genuine account (that is, C=1) is:
P(C=1)P(x︱C=1)
=P(C=1)P(0.05<a<0.2︱C=1)P(0.1<b<0.8︱C=1)P(c=0︱C=1)
=0.11*0.1*0.2*0.9
=0.00198
The conditional probability belonging to real account numbers due to document to be sorted is maximum, then Reduce program determines that this document to be sorted belongs to real account numbers.
The embodiment of the present invention takes full advantage of the distributed nature of Hadoop cluster, avoids the limitation of legacy system framework, has parallel feature fast, the classification to magnanimity document can be realized fast, save the classification time, improve the efficiency of document classification, improve system performance.
Based on above-mentioned Webpage clustering method, the embodiment of the present invention proposes a kind of document classification system, is applied in Hadoop cluster, and as shown in Figure 2, this system comprises:
Parsing module 210, for resolving Training document and document to be sorted, according to analysis result determination characteristic attribute, and divides this characteristic attribute;
Particularly, above-mentioned parsing module 210, specifically for by resolving Training document and document to be sorted, obtains the attribute that Training document and document package to be sorted contain, and selected characteristic attribute in the attribute analytically obtained, and divide multiple span for each characteristic attribute.
Generation module 220, for the characteristic attribute of Training document determined according to parsing module 210 and the classification results to Training document, generates sorter;
Sort module 230, the sorter for using generation module 220 to generate is treated classifying documents and is classified, and obtains the classification results of document to be sorted.
Further, said system, also comprises:
Modular converter 240, for the described characteristic attribute determined according to parsing module 210, carries out format conversion to Training document and document to be sorted respectively, obtains meeting the Training document of preset format and document to be sorted;
Correspondingly, above-mentioned generation module 220, specifically for according to the characteristic attribute of the Training document after modular converter 240 format conversion and the classification results to Training document, generates sorter;
Above-mentioned sort module 230, classifies to the document to be sorted after modular converter 240 format conversion specifically for the sorter using generation module 220 to generate, obtains the classification results of document to be sorted.
Further, above-mentioned generation module 220, specifically for according to the span of each characteristic attribute corresponding to the Training document after modular converter 240 format conversion and the classification results to Training document, calculating the frequency of occurrences of each classification in Training document and the conditional probability of each span of all characteristic attributes is estimated under each classification, is sorter by the above-mentioned frequency of occurrences and above-mentioned conditional probability estimated record.
Correspondingly, above-mentioned sort module 230, specifically for obtaining the span of all characteristic attributes of the document to be sorted after modular converter 240 format conversion, according to the frequency of occurrences in Training document of the span got, each classification and under each classification the conditional probability of each span of all characteristic attributes estimate, calculate the conditional probability that described document to be sorted belongs to each classification, and using the classification results of classification corresponding for conditional probability maximum for numerical value as described document to be sorted.
The embodiment of the present invention takes full advantage of the distributed nature of Hadoop cluster, avoids the limitation of legacy system framework, has parallel feature fast, the classification to magnanimity document can be realized fast, save the classification time, improve the efficiency of document classification, improve system performance.
In conjunction with the software module that the step in the method that embodiment disclosed herein describes can directly use hardware, processor to perform, or the combination of the two is implemented.Software module can be placed in the storage medium of other form any known in random access memory (RAM), internal memory, ROM (read-only memory) (ROM), electrically programmable ROM, electrically erasable ROM, register, hard disk, moveable magnetic disc, CD-ROM or technical field.
The above; be only the specific embodiment of the present invention, but protection scope of the present invention is not limited thereto, is anyly familiar with those skilled in the art in the technical scope that the present invention discloses; change can be expected easily or replace, all should be encompassed within protection scope of the present invention.Therefore, protection scope of the present invention should described be as the criterion with the protection domain of claim.
Claims (10)
1. a Document Classification Method, is characterized in that, is applied to and comprises in the Hadoop cluster of Map program and Reduce program, said method comprising the steps of:
Described Map program is resolved Training document and document to be sorted, according to analysis result determination characteristic attribute, and divides described characteristic attribute;
Described Map program, according to the characteristic attribute of described Training document and the classification results to described Training document, generates sorter;
Described Reduce program uses described sorter to classify to described document to be sorted, obtains the classification results of document to be sorted.
2. the method for claim 1, is characterized in that, described Map program, according to after analysis result determination characteristic attribute, also comprises:
Described Map program, according to described characteristic attribute, carries out format conversion to described Training document and described document to be sorted respectively, obtains meeting the Training document of preset format and document to be sorted;
Described Map program, according to the characteristic attribute of described Training document and the classification results to described Training document, generates sorter, is specially:
Described Map program, according to the characteristic attribute of the Training document after format conversion and the classification results to described Training document, generates sorter;
Described Reduce program uses described sorter to classify to described document to be sorted, obtains the classification results of document to be sorted, is specially:
Described Reduce program uses described sorter to classify to the document to be sorted after format conversion, obtains the classification results of document to be sorted.
3. method as claimed in claim 2, is characterized in that, described Map program, according to the characteristic attribute of the Training document after format conversion and the classification results to described Training document, generates sorter, is specially:
Described Map program is according to the span of each characteristic attribute corresponding to the Training document after described format conversion and the classification results to described Training document, calculating the frequency of occurrences of each classification in described Training document and the conditional probability of each span of all characteristic attributes is estimated under each classification, is sorter by the described frequency of occurrences and described conditional probability estimated record.
4. method as claimed in claim 3, it is characterized in that, described Reduce program uses described sorter to classify to the document to be sorted after format conversion, obtains the classification results of document to be sorted, is specially:
Described Reduce program obtains the span of all characteristic attributes of the document to be sorted after described format conversion, according to the frequency of occurrences in Training document of the span got, each classification and under each classification the conditional probability of each span of all characteristic attributes estimate, calculate the conditional probability that described document to be sorted belongs to each classification, and using the classification results of classification corresponding for conditional probability maximum for numerical value as described document to be sorted.
5. the method for claim 1, is characterized in that, described Map program is resolved Training document and document to be sorted, according to analysis result determination characteristic attribute, and divides described characteristic attribute, is specially:
Described Map program, by resolving Training document and document to be sorted, obtains the attribute that Training document and document package to be sorted contain, and selected characteristic attribute in the attribute analytically obtained, and divide multiple span for each characteristic attribute.
6. a document classification system, is characterized in that, be applied in Hadoop cluster, described system comprises:
Parsing module, for resolving Training document and document to be sorted, according to analysis result determination characteristic attribute, and divides described characteristic attribute;
Generation module, for the characteristic attribute of described Training document determined according to described parsing module and the classification results to described Training document, generates sorter;
Sort module, the described sorter generated for using described generation module is classified to described document to be sorted, obtains the classification results of document to be sorted.
7. system as claimed in claim 6, is characterized in that, also comprise:
Modular converter, for the described characteristic attribute determined according to described parsing module, carries out format conversion to described Training document and described document to be sorted respectively, obtains meeting the Training document of preset format and document to be sorted;
Described generation module, specifically for according to the characteristic attribute of the Training document after described modular converter format conversion and the classification results to described Training document, generates sorter;
Described sort module, the described sorter generated specifically for using described generation module is classified to the document to be sorted after described modular converter format conversion, obtains the classification results of document to be sorted.
8. system as claimed in claim 7, is characterized in that,
Described generation module, specifically for according to the span of each characteristic attribute corresponding to the Training document after described modular converter format conversion and the classification results to described Training document, calculating the frequency of occurrences of each classification in described Training document and the conditional probability of each span of all characteristic attributes is estimated under each classification, is sorter by the described frequency of occurrences and described conditional probability estimated record.
9. system as claimed in claim 8, is characterized in that,
Described sort module, specifically for obtaining the span of all characteristic attributes of the document to be sorted after described modular converter format conversion, according to the frequency of occurrences in Training document of the span got, each classification and under each classification the conditional probability of each span of all characteristic attributes estimate, calculate the conditional probability that described document to be sorted belongs to each classification, and using the classification results of classification corresponding for conditional probability maximum for numerical value as described document to be sorted.
10. system as claimed in claim 6, is characterized in that,
Described parsing module, specifically for by resolving Training document and document to be sorted, obtain the attribute that Training document and document package to be sorted contain, and selected characteristic attribute in the attribute analytically obtained, and divide multiple span for each characteristic attribute.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410449140.7A CN104239479A (en) | 2014-09-04 | 2014-09-04 | Document classification method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201410449140.7A CN104239479A (en) | 2014-09-04 | 2014-09-04 | Document classification method and system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN104239479A true CN104239479A (en) | 2014-12-24 |
Family
ID=52227538
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201410449140.7A Pending CN104239479A (en) | 2014-09-04 | 2014-09-04 | Document classification method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104239479A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110889309A (en) * | 2018-09-07 | 2020-03-17 | 上海怀若智能科技有限公司 | Financial document classification management system and method |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040015557A1 (en) * | 1999-07-30 | 2004-01-22 | Eric Horvitz | Methods for routing items for communications based on a measure of criticality |
CN102222092A (en) * | 2011-06-03 | 2011-10-19 | 复旦大学 | Massive high-dimension data clustering method for MapReduce platform |
CN102639205A (en) * | 2009-07-20 | 2012-08-15 | Esk陶瓷有限及两合公司 | Separation apparatus for tubular flow-through apparatuses |
CN103455842A (en) * | 2013-09-04 | 2013-12-18 | 福州大学 | Credibility measuring method combining Bayesian algorithm and MapReduce |
-
2014
- 2014-09-04 CN CN201410449140.7A patent/CN104239479A/en active Pending
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20040015557A1 (en) * | 1999-07-30 | 2004-01-22 | Eric Horvitz | Methods for routing items for communications based on a measure of criticality |
CN102639205A (en) * | 2009-07-20 | 2012-08-15 | Esk陶瓷有限及两合公司 | Separation apparatus for tubular flow-through apparatuses |
CN102222092A (en) * | 2011-06-03 | 2011-10-19 | 复旦大学 | Massive high-dimension data clustering method for MapReduce platform |
CN103455842A (en) * | 2013-09-04 | 2013-12-18 | 福州大学 | Credibility measuring method combining Bayesian algorithm and MapReduce |
Non-Patent Citations (3)
Title |
---|
卫洁 等: "基于Hadoop的分布式朴素贝叶斯文本分类", 《计算机系统应用》 * |
喜歌: "贝叶斯分类", 《HTTP://WWW.CNBLOGS.COM/HEXINUAA/ARTICLES/2143483.HTML》 * |
董西成: "2.3.2 MapReduce编程实例", 《HTTP://BOOK.51CTO.COM/ART/201312/422139.HTM》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110889309A (en) * | 2018-09-07 | 2020-03-17 | 上海怀若智能科技有限公司 | Financial document classification management system and method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110019218B (en) | Data storage and query method and equipment | |
CN104850633B (en) | A kind of three-dimensional model searching system and method based on the segmentation of cartographical sketching component | |
CN102722713B (en) | Handwritten numeral recognition method based on lie group structure data and system thereof | |
CN103679132B (en) | A kind of nude picture detection method and system | |
WO2021109464A1 (en) | Personalized teaching resource recommendation method for large-scale users | |
CN101446962B (en) | Data conversion method, device thereof and data processing system | |
CN106528874B (en) | The CLR multi-tag data classification method of big data platform is calculated based on Spark memory | |
US20170337229A1 (en) | Spatial indexing for distributed storage using local indexes | |
TWI464604B (en) | Data clustering method and device, data processing apparatus and image processing apparatus | |
CN105279277A (en) | Knowledge data processing method and device | |
JP2018501579A (en) | Semantic representation of image content | |
CN104462802A (en) | Method for analyzing outlier data in large-scale data | |
CN103020645A (en) | System and method for junk picture recognition | |
CN108073815A (en) | Family's determination method, system and storage medium based on code slice | |
Azri et al. | Dendrogram clustering for 3D data analytics in smart city | |
CN106407392A (en) | A marking language-based node mapping relationship extracting method and system | |
CN103839074A (en) | Image classification method based on matching of sketch line segment information and space pyramid | |
CN103473275A (en) | Automatic image labeling method and automatic image labeling system by means of multi-feature fusion | |
CN107463624A (en) | A kind of method and system that city interest domain identification is carried out based on social media data | |
CN110874366A (en) | Data processing and query method and device | |
JP5765583B2 (en) | Multi-class classifier, multi-class classifying method, and program | |
CN104239479A (en) | Document classification method and system | |
CN104573101B (en) | A kind of data flow real-time grading method and system of rule-based route | |
CN104008095A (en) | Object recognition method based on semantic feature extraction and matching | |
CN113282568B (en) | IOT big data real-time sequence flow analysis application technical method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20141224 |