CN110046264A - A kind of automatic classification method towards mobile phone document - Google Patents
A kind of automatic classification method towards mobile phone document Download PDFInfo
- Publication number
- CN110046264A CN110046264A CN201910260996.2A CN201910260996A CN110046264A CN 110046264 A CN110046264 A CN 110046264A CN 201910260996 A CN201910260996 A CN 201910260996A CN 110046264 A CN110046264 A CN 110046264A
- Authority
- CN
- China
- Prior art keywords
- document
- class libraries
- text
- label
- image
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 39
- 239000013598 vector Substances 0.000 claims abstract description 47
- 239000000463 material Substances 0.000 claims abstract description 23
- 238000012549 training Methods 0.000 claims abstract description 17
- 238000012360 testing method Methods 0.000 claims abstract description 15
- 230000004927 fusion Effects 0.000 claims abstract description 6
- 238000000605 extraction Methods 0.000 claims description 15
- 238000013145 classification model Methods 0.000 claims description 14
- 230000011218 segmentation Effects 0.000 claims description 10
- 230000004913 activation Effects 0.000 claims description 8
- 230000003203 everyday effect Effects 0.000 claims description 6
- 238000013527 convolutional neural network Methods 0.000 claims description 3
- 238000013507 mapping Methods 0.000 claims description 3
- 238000013135 deep learning Methods 0.000 abstract 1
- 238000012545 processing Methods 0.000 description 5
- 230000006870 function Effects 0.000 description 4
- 238000004364 calculation method Methods 0.000 description 3
- 238000003058 natural language processing Methods 0.000 description 3
- 238000010586 diagram Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 238000010801 machine learning Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000002790 cross-validation Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000011835 investigation Methods 0.000 description 1
- 230000014759 maintenance of location Effects 0.000 description 1
- 239000011159 matrix material Substances 0.000 description 1
- 210000000056 organ Anatomy 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/40—Information retrieval; Database structures therefor; File system structures therefor of multimedia data, e.g. slideshows comprising image and additional audio data
- G06F16/45—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/55—Clustering; Classification
Abstract
The invention discloses a kind of automatic classification methods towards mobile phone document, this method constructs document class libraries first, document class libraries is divided into training set and test set, and content of text therein and picture material are extracted respectively from the training set of document class libraries, according to document class libraries and original document class libraries label correspondence establishment corpus class libraries and image class libraries, then text prediction label vector and image prediction label vector are obtained through deep learning to after the content of text and picture material progress data prediction in corpus class libraries and image class libraries respectively, finally use Weighted Fusion formula, by image tag vector sum text label vector combination, and document prediction label probability is obtained after being normalized, document prediction label probability is compared with preset threshold, complete the automatic classification of document.The present invention measures document classification as index simultaneously as a result, realizing that non-structured document is quickly and effectively classified using picture material and content of text.
Description
Technical field
The present invention relates to document management fields, more particularly to a kind of automatic classification method towards mobile phone document.
Background technique
With the development of internet, digital office is also continued to develop with it, but also gradually sudden and violent in this process
Expose some problems.And most obvious one is exactly a large amount of digital office bring heap files and the intrinsic inertia institute of people
Bring document is overstock, so that the document classification of people is in utter disorder, reduces office efficiency and office experience.According to state
The investigation of Archives Administration, family is shown: having nearly 80% central government and state organs, central enterprise uses office automation or electronics political affairs
Business system generates all kinds of electronic documents nearly 200,000,000.Thus it is not difficult to visualize, in the near future, electronic document will become government,
The main supporting body and the form of expression of enterprises and institutions' information resources., management heterogeneous for the document on mobile phone and classification confusion etc. are asked
Topic is dedicated to document management automation, establishes the automatic classification managing system of document, make people to the file in oneself mobile phone
It is very clear, it is convenient that document in mobile phone is classified and searched.Effective management function for file is not only carried, more
The intelligent automatic classification of text also crucially is realized to document, returns huge heterogeneous local document automation, intelligence
Class.
And so far, non-structured document (Word/PDF/PPT) classification is confined to carry out according to the text in document
Classification, and the emphasis of most technique studies is natural language processing (NLP).The presence of image in document is often ignored, but
Be image it is also one of main information source of the mankind, wherein may including the important information of this document, cannot be ignored.
And in the non-structured document file based on image, picture material is also an important influence factor in classification.
Existing office software focuses on the processing in the processes such as text, table, but really focuses on carrying out automatic sorting point to large volume document
On the market or blank out, and existing Document Classification Method exists to require study and changes there is also insufficient the system of class
Into place.
Summary of the invention
In order to solve the above technical problems, the present invention provides a kind of automatic classification method towards mobile phone document.
In order to solve the above technical problems, one technical scheme adopted by the invention is that: it provides a kind of towards mobile phone document
Automatic classification method, including S1: collecting and arrange pass of the multiple labels for being most commonly used to document classification as building document class libraries
Keyword constructs multiple document class libraries, the document class libraries according to the rule of the corresponding document class libraries of a label
It is non-classified document class libraries comprising the document class libraries that multiple everyday words are label and a label, and by the document class libraries
It is divided into training set and test set;
S2: extracting content of text therein and picture material respectively from the training set of the document class libraries, and according to
Each document class libraries and its corresponding label, correspondence establishment corpus class libraries and image class libraries, and by the corpus class
Library and image class libraries are divided into training set and test set;
S3: data prediction is carried out to the content of text in the test set of the corpus class libraries and image class libraries, constructs word
Allusion quotation, and text prediction label vector is obtained by constructing textual classification model;To the figure in the training set in described image class libraries
As content progress data prediction, and image prediction label vector is obtained by constructing image disaggregated model;
S4: by text prediction label vector and image prediction label vector by after Weighted Fusion document prediction label to
Amount, the document prediction label vector obtain document prediction label probability after passing through normalized;
S5: the probability of document prediction label is compared with preset threshold value, when the document prediction label probability is big
When the threshold value, the document is included into the document class libraries of common classification word corresponding to document prediction label,
When the document prediction label probability is less than the threshold value, it is in non-classified document class libraries that the document, which is included into label,.
Preferably, further include situation that a document occurs in multiple document class libraries in the step S1, that is, assume to
Classifying documents are Xi,Wherein Yi is document class libraries corresponding to document Xi to be sorted
Set,J is all possible document class libraries number.
Preferably, the text in the picture material in each text class libraries is passed through into OCR technique in the step S2
It is added in corresponding corpus class libraries after identification as content of text.
Preferably, the step S3 specifically includes S31: carrying out text point to the content of text using Chinese words segmentation
Word;
S32: to the text word segmentation result removal stop words and low-frequency word in the step S31, specifically, by described
The stop words in common deactivated vocabulary is rejected in word segmentation result, and minimum word frequency is arranged according to document text size, is filtered out low
In the low-frequency word of the minimum word frequency;
S33: the content of text after eliminating stop words and low-frequency word in step S32 is passed through using Wor2vec kit
The method of mapping indicates the content of text in the form of term vector;
S34: carrying out further feature extraction using convolutional neural networks, and wherein convolutional layer is to the institute in the step S33
Predicate vector carries out preliminary feature extraction, and the preliminary feature of extraction input pond layer is generated feature vector, then full connection
Layer connects all described eigenvectors, and adds an output layer, and uses sigmoid activation primitive, calculates every
The probability of a label finally exports text prediction label vector.
Preferably, the step S3 also specifically includes S35: being rotated, is scaled to picture material, cut and normalizing
Change;
S36: carrying out the preliminary feature extraction of convolutional layer to the step S35 treated picture material, and will extraction just
It walks feature input pond layer and generates feature vector, then full articulamentum connects all described eigenvectors, and adds one
A output layer, and sigmoid activation primitive is used, calculate the probability of each label, final output image prediction label vector.
Preferably, the textual classification model measures performance using cross entropy formula, and described image disaggregated model is using flat
Mean square deviation assesses the loss in learning process.
It is in contrast to the prior art, the beneficial effects of the present invention are:
1. can be realized non-structured document quickly and effectively to classify
2. extracting text from full document using machine learning method building textual classification model and image classification model
Two parts of content and picture material and correspondence establishment corpus class libraries and image class libraries, classify, lead in this process
Mass data training study is crossed, so that document is realized Machine automated classification, has saved manpower and material resources, and then improves work effect
Rate.
3. corpus class libraries and the classification results of image class libraries are measured document classification as a result, in this way as classification indicators
Keep classification results more accurate, applicable document content and format are more extensive.
Detailed description of the invention
Fig. 1 is the flow diagram of the automatic classification method towards mobile phone document of the embodiment of the present invention;
Fig. 2 is the idiographic flow schematic diagram of the step S3 of the automatic classification method shown in FIG. 1 towards mobile phone document.
Specific embodiment
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that the described embodiments are merely a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
As shown in Figure 1, the present invention includes S1: collecting and arrange the multiple labels for being most commonly used to document classification as building text
The keyword of shelves class libraries constructs multiple document class libraries, institute according to the rule of the corresponding document class libraries of a label
It is the document class libraries of label and a label is non-classified document class libraries that state document class libraries, which include multiple everyday words, and by institute
It states document class libraries and is divided into training set and test set.
Wherein, in the label of finishing collecting document class libraries, the label data of classification can be crawled using crawl device,
The higher word of search engine degree of correlation in document classification can also be chosen to integrate the label for choosing each document class libraries.Structure
The mode of class libraries is built to obtain or modes such as collection manually using the crawl device document that crawls or increase income.In the present embodiment, building altogether
N+1 document class libraries is non-classified class libraries, mark including the N number of document class libraries using everyday words as label and a label
Label are in non-classified document class libraries comprising being not belonging to the everyday words to be usually used in document classification as the document of label, initial
Under state, any document is free of in the document class libraries, in subsequent steps, in addition to needing in the document classification result of step S5
It uses, remaining situation does not consider that the document class libraries participates in.
It in the present embodiment, further include situation that a document occurs in multiple document class libraries in the step S1, i.e.,
Assuming that document to be sorted is Xi,Wherein Yi is more corresponding to document Xi to be sorted
The set of the document class libraries of a label,J is the corresponding document class libraries number of all possible label.
S2: extracting content of text therein and picture material respectively from the training set of the document class libraries, and according to
Each document class libraries and its corresponding label, correspondence establishment corpus class libraries and image class libraries, and by the corpus class
Library and image class libraries are divided into 80% training set and 20% test set.
Wherein, the picture material in image class libraries is drawn into save in a file and leads to the text in picture material
It is added in corresponding corpus class libraries after crossing OCR technique identification as content of text.In the present embodiment, in each text class libraries
Text in picture material is added in corresponding corpus class libraries after being identified by OCR technique as content of text.Such as it can adopt
With Baidu's OCR api interface, function includes high-accuracy general text API, table Text region API and two dimensional code identification
API can extract common language, rarely used word, table text, certificate text etc. in picture.
S3: data prediction is carried out to the content of text in the training set of the corpus class libraries and image class libraries, constructs word
Allusion quotation, and text prediction label vector is obtained by constructing textual classification model;To the figure in the training set in described image class libraries
As content progress data prediction, and image prediction label vector is obtained by constructing image disaggregated model.
As shown in Fig. 2, wherein step S3 specifically includes S31: carrying out text to the content of text using Chinese words segmentation
This participle;
English using space as natural separator, and Chinese because the particularity of its language is in addition to punctuation mark, text is not
There are intervals to influence subsequent processing structure so Chinese word segmentation is the basis of natural language processing.Chinese words segmentation is existing more
It is mature, it is possible to directly adopt prior art algorithm or open source projects tool carries out at participle the text in corpus class libraries
Reason, such as: jieba, SnowNLP, THULAC.
S32: to the text word segmentation result removal stop words and low-frequency word in the step S31, specifically, by described
The stop words in common deactivated vocabulary is rejected in word segmentation result, and minimum word frequency is arranged according to document text size, is filtered out low
In the low-frequency word of the minimum word frequency;
S33: the content of text after eliminating stop words and low-frequency word in step S32 is passed through using Wor2vec kit
The method of mapping indicates the content of text in the form of term vector;
S34: carrying out further feature extraction using convolutional neural networks, and wherein convolutional layer is to the institute in the step S33
Predicate vector carries out preliminary feature extraction, and the preliminary feature of extraction input pond layer is generated feature vector, then full connection
Layer connects all described eigenvectors, and adds an output layer, and uses sigmoid activation primitive, calculates every
The probability of a label finally exports text prediction label vector.
Specifically, the matrix of construction A*B, wherein A is word number, and B represents term vector dimension.In order to carry out at batch vector
Reason, is fixed as length A for text.Then convolution operation is carried out to each text, using filter W ∈ Rhb, the size of filter
For h*b, wherein h is the length of n-gram, then the target of convolution is ci=f (Wxi:i+h-1+d(c)), wherein d is offset, and f is non-thread
Property activation primitive.In convolution process, this filter may generate one group of feature { c on N-h+1 window1,c2,…,
cN-h+1}.Then this group of feature is input to pond layer and generates feature vectorIt is thus real
Show the target for extracting single feature from one group of feature, then connects all feature vectors in full articulamentum, and
An output layer is added, using sigmoid activation primitive, calculates the probability of each label, finally exports text prediction label
Vector
In order to assess the performance of textual classification model, one layer of output layer is added in original output layer, according to cross entropy public affairs
Formula is measured, specifically:Wherein p (x) presentation class x is the probability correctly classified, p
Value can only be 0 or 1, it is the prediction probability correctly classified that q (x), which is then x type, and value range is (0,1).
S35: picture material is rotated, scaled, cut and is normalized;
S36: carrying out the preliminary feature extraction of convolutional layer to the step S35 treated picture material, and will extraction just
It walks feature input pond layer and generates feature vector, then full articulamentum connects all described eigenvectors, and adds one
A output layer, and sigmoid activation primitive is used, calculate the probability of each label, final output image prediction label vector.
Concrete processing procedure in this step is identical with the concrete processing procedure of step S34, final output image prediction label vector
In order to measure the performance of image classification model, one layer of output layer is added on model, is assessed using average variance
Loss in learning process, specific cost function are as follows:Wherein OcnnRepresentative image disaggregated model is pre-
The label of the data set of survey, OrealThe true label of data set is represented, when e is smaller, illustrates that model prediction performance is more preferable.
S4: by text prediction label vector and image prediction label vector by after Weighted Fusion document prediction label to
Amount, the document prediction label vector obtain document prediction label probability after passing through normalized.
S5: document prediction label probability is compared with preset threshold value, when the document prediction label probability is greater than
Or when being equal to the threshold value, the document is included into common classification word document class libraries corresponding to document prediction label, institute is worked as
When stating document prediction label probability less than the threshold value, it is in non-classified document class libraries that the document, which is included into label,.
Specifically, by text prediction label vectorWith image prediction label vectorIt is weighted fusion, is calculated
Document label vectorCalculation formula is as follows:Wherein a is text feature similarity weight,
B is characteristics of image similarity weight, and carries out numerical value processing using sigmoid function, by the output data normalizing of multiple classification
Change, is converted into final document and conjecture label probability Pj, it is equivalent on original two models in this way, utilizes weighted average
Method, add one layer of LR classifier, the fusion of textual classification model and image classification model completed, finally when the pre- mark of document
Sign probability Pj(1≤j≤N) is greater than threshold value, then the document is included into the text of common classification word corresponding to document prediction label
In shelves class libraries.
Further, threshold value cannot be excessively high or too low, excessively high, document can not be ranged simultaneously several degrees of correlation compared with
It is too low in high classification, it is unfavorable for correctly classifying, loses meaning.For threshold value, first by text test set and figure
As test set is divided into more equal parts, using the method for cross validation, model and the best document classification model of retention are verified,
Here using the accuracy of Hamming loss (Hamming loss) Lai Hengliang document classification model, Hamming loss can indicate institute
There is the ratio of error sample in label, so the classification capacity of the smaller then network of the value is stronger.Calculation formula is as follows:
Wherein | D | indicate total sample number, | L | indicate total number of labels, xi and yi respectively indicate prediction result and true value, xor
Indicate XOR operation, and stipulated that in the process, a, b weight are fixed value, and a, b meet a+b=1, are surveyed by document class libraries
The repeatedly test of examination collection, can obtain threshold value.
When obtaining text feature similarity weight a and characteristics of image similarity weight b, precision and recall rate are introduced, it is quasi-
True rate (Precision) refers to that for given test data set, the relevant documentation number being correctly retrieved accounts for real in document class libraries
The ratio for the relevant documentation number that border is retrieved.Recall rate (Recall Rate) refers to given test data set, is correctly examined
The relevant documentation number that rope goes out accounts for the ratio of relevant documentation number all in document class libraries.In multi-tag, calculation formula deformation
It is as follows:
Wherein | D | indicate total sample number, xi and yi respectively indicate prediction result and true value, and similarly fixed threshold is optimal
A, b are equally divided into 0.01 to be more equal parts that scale increases in [0,1] section by value, and a, b meet a+b=1.By text
Shelves class libraries test set is repeatedly tested, and comprehensively considers precision and recall rate two indices, can obtain text feature similarity weight a
For and characteristics of image similarity weight b.
Work as PjWhen more than or equal to threshold value, Xi is successfully classified into the document class libraries that label is j, and updating ought be above
Shelves class libraries and "current" model;But PjWhen less than threshold value, Xi is classified into the document class libraries with unfiled label, and not
It is classified into any document class libraries with everyday words label, and updates non-classified document class libraries.
According to the relationship of document prediction label probability and threshold value, document may be divided into the label pair of multiple common classification words
In the multiple document class libraries answered, it is also possible to be divided into the document class libraries with unfiled label, i.e., document Xi to be sorted points
After class,Z represents the document class libraries updated.Wherein Y'iFor corresponding to document Xi
The collection of document of label,I (1≤I≤N+1) is all possible label number, when I is N+1
When, indicating that document Xi to be sorted is classified to tag is non-classified document class libraries, is not classified to N number of common classification word
In the corresponding document class libraries of label;Indicate that document Xi to be sorted is included into the corresponding document class libraries of l label simultaneously when for I being l
In.
Quickly and effectively classify by the above-mentioned means, the present invention can be realized non-structured document, utilizes machine learning side
Method constructs textual classification model and image classification model, extracts two parts of word content and picture material from full document
And correspondence establishment corpus class libraries and image class libraries, classify, is learnt in this process by mass data training, make document
Machine automated classification is realized, has saved manpower and material resources, and then improve work efficiency;And by corpus class libraries and image class
The classification results in library measure document classification as a result, keeping classification results more accurate in this way, in applicable document as classification indicators
Hold and format is more extensive.
The above description is only an embodiment of the present invention, is not intended to limit the scope of the invention, all to utilize this hair
Equivalent structure or equivalent flow shift made by bright specification and accompanying drawing content is applied directly or indirectly in other relevant skills
Art field, is included within the scope of the present invention.
Claims (6)
1. a kind of automatic classification method towards mobile phone document characterized by comprising
S1: collecting and arranges keyword of the multiple labels for being most commonly used to document classification as building document class libraries, according to one
The rule of the corresponding document class libraries of the label constructs multiple document class libraries, and the document class libraries includes multiple everyday words
Be non-classified document class libraries for the document class libraries of label and a label, and by the document class libraries be divided into training set and
Test set;
S2: content of text therein and picture material are extracted respectively from the training set of the document class libraries, and according to each
The document class libraries and its corresponding label, correspondence establishment corpus class libraries and image class libraries, and by the corpus class libraries and
Image class libraries is divided into training set and test set;
S3: carrying out data prediction to the content of text in the test set of the corpus class libraries and image class libraries, constructs dictionary, and
Text prediction label vector is obtained by constructing textual classification model;To the picture material in the training set in described image class libraries
Data prediction is carried out, and obtains image prediction label vector by constructing image disaggregated model;
S4: by text prediction label vector and image prediction label vector by obtaining document prediction label vector after Weighted Fusion,
The document prediction label vector obtains document prediction label probability after passing through normalized.
S5: the probability of document prediction label is compared with preset threshold value, be greater than when the document prediction label probability or
When equal to the threshold value, the document is included into the document class libraries of common classification word corresponding to document prediction label, works as institute
When stating document prediction label probability less than the threshold value, it is in non-classified document class libraries that the document, which is included into label,.
2. the automatic classification method according to claim 1 towards mobile phone document, which is characterized in that in the step S1 also
Including the situation that a document occurs in multiple document class libraries, that is, assume that document to be sorted is Xi,Wherein Yi is the set of document class libraries corresponding to document Xi to be sorted,J is all possible document class libraries number.
3. the automatic classification method according to claim 1 towards mobile phone document, which is characterized in that will in the step S2
The text in picture material in each text class libraries is used as content of text that corresponding language is added after identifying by OCR technique
Expect in class libraries.
4. the automatic classification method according to claim 1 towards mobile phone document, which is characterized in that the step S3 is specific
Include:
S31: text participle is carried out to the content of text using Chinese words segmentation;
S32: to the text word segmentation result removal stop words and low-frequency word in the step S31, specifically, by the participle
As a result the stop words in common deactivated vocabulary is rejected in, and minimum word frequency is arranged according to document text size, is filtered out lower than institute
State the low-frequency word of minimum word frequency;
S33: the content of text after eliminating stop words and low-frequency word in step S32 is passed through mapping using Wor2vec kit
Method the content of text is indicated in the form of term vector;
S34: carrying out further feature extraction using convolutional neural networks, and wherein convolutional layer is to institute's predicate in the step S33
Vector carries out preliminary feature extraction, and the preliminary feature of extraction input pond layer is generated feature vector, and then full articulamentum will
All described eigenvector connections, and an output layer is added, and use sigmoid activation primitive, calculate each mark
The probability of label finally exports text prediction label vector.
5. the automatic classification method according to claim 1 towards mobile phone document, which is characterized in that the step S3 also has
Body includes:
S35: picture material is rotated, scaled, cut and is normalized;
S36: carrying out the preliminary feature extraction of convolutional layer to the step S35 treated picture material, and by the preliminary spy of extraction
Sign input pond layer generates feature vector, and then full articulamentum connects all described eigenvectors, and addition one is defeated
Layer out, and sigmoid activation primitive is used, calculate the probability of each label, final output image prediction label vector.
6. the automatic classification method according to claim 1 towards mobile phone document, which is characterized in that the text classification mould
Type measures performance using cross entropy formula, and described image disaggregated model is using the loss in average variance assessment learning process.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910260996.2A CN110046264A (en) | 2019-04-02 | 2019-04-02 | A kind of automatic classification method towards mobile phone document |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910260996.2A CN110046264A (en) | 2019-04-02 | 2019-04-02 | A kind of automatic classification method towards mobile phone document |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110046264A true CN110046264A (en) | 2019-07-23 |
Family
ID=67275718
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910260996.2A Pending CN110046264A (en) | 2019-04-02 | 2019-04-02 | A kind of automatic classification method towards mobile phone document |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110046264A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110503081A (en) * | 2019-08-30 | 2019-11-26 | 山东师范大学 | Act of violence detection method, system, equipment and medium based on inter-frame difference |
CN111614786A (en) * | 2020-06-05 | 2020-09-01 | 易盼红 | System and method for processing data at high speed by remote server based on block chain |
CN112100379A (en) * | 2020-09-15 | 2020-12-18 | 北京百度网讯科技有限公司 | Method, apparatus, device and storage medium for processing data |
CN112329669A (en) * | 2020-11-11 | 2021-02-05 | 孙立业 | Electronic file management method |
CN113361249A (en) * | 2021-06-30 | 2021-09-07 | 北京百度网讯科技有限公司 | Document duplication judgment method and device, electronic equipment and storage medium |
CN116843377A (en) * | 2023-07-25 | 2023-10-03 | 河北鑫考科技股份有限公司 | Consumption behavior prediction method, device, equipment and medium based on big data |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107066583A (en) * | 2017-04-14 | 2017-08-18 | 华侨大学 | A kind of picture and text cross-module state sensibility classification method merged based on compact bilinearity |
CN107832663A (en) * | 2017-09-30 | 2018-03-23 | 天津大学 | A kind of multi-modal sentiment analysis method based on quantum theory |
CN108764268A (en) * | 2018-04-02 | 2018-11-06 | 华南理工大学 | A kind of multi-modal emotion identification method of picture and text based on deep learning |
CN108960073A (en) * | 2018-06-05 | 2018-12-07 | 大连理工大学 | Cross-module state image steganalysis method towards Biomedical literature |
CN109299341A (en) * | 2018-10-29 | 2019-02-01 | 山东师范大学 | One kind confrontation cross-module state search method dictionary-based learning and system |
CN109522548A (en) * | 2018-10-26 | 2019-03-26 | 天津大学 | A kind of text emotion analysis method based on two-way interactive neural network |
CN109522942A (en) * | 2018-10-29 | 2019-03-26 | 中国科学院深圳先进技术研究院 | A kind of image classification method, device, terminal device and storage medium |
-
2019
- 2019-04-02 CN CN201910260996.2A patent/CN110046264A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107066583A (en) * | 2017-04-14 | 2017-08-18 | 华侨大学 | A kind of picture and text cross-module state sensibility classification method merged based on compact bilinearity |
CN107832663A (en) * | 2017-09-30 | 2018-03-23 | 天津大学 | A kind of multi-modal sentiment analysis method based on quantum theory |
CN108764268A (en) * | 2018-04-02 | 2018-11-06 | 华南理工大学 | A kind of multi-modal emotion identification method of picture and text based on deep learning |
CN108960073A (en) * | 2018-06-05 | 2018-12-07 | 大连理工大学 | Cross-module state image steganalysis method towards Biomedical literature |
CN109522548A (en) * | 2018-10-26 | 2019-03-26 | 天津大学 | A kind of text emotion analysis method based on two-way interactive neural network |
CN109299341A (en) * | 2018-10-29 | 2019-02-01 | 山东师范大学 | One kind confrontation cross-module state search method dictionary-based learning and system |
CN109522942A (en) * | 2018-10-29 | 2019-03-26 | 中国科学院深圳先进技术研究院 | A kind of image classification method, device, terminal device and storage medium |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110503081A (en) * | 2019-08-30 | 2019-11-26 | 山东师范大学 | Act of violence detection method, system, equipment and medium based on inter-frame difference |
CN111614786A (en) * | 2020-06-05 | 2020-09-01 | 易盼红 | System and method for processing data at high speed by remote server based on block chain |
CN112100379A (en) * | 2020-09-15 | 2020-12-18 | 北京百度网讯科技有限公司 | Method, apparatus, device and storage medium for processing data |
CN112100379B (en) * | 2020-09-15 | 2023-07-28 | 北京百度网讯科技有限公司 | Method, apparatus, device and storage medium for processing data |
CN112329669A (en) * | 2020-11-11 | 2021-02-05 | 孙立业 | Electronic file management method |
CN113361249A (en) * | 2021-06-30 | 2021-09-07 | 北京百度网讯科技有限公司 | Document duplication judgment method and device, electronic equipment and storage medium |
CN113361249B (en) * | 2021-06-30 | 2023-11-17 | 北京百度网讯科技有限公司 | Document weight judging method, device, electronic equipment and storage medium |
CN116843377A (en) * | 2023-07-25 | 2023-10-03 | 河北鑫考科技股份有限公司 | Consumption behavior prediction method, device, equipment and medium based on big data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111753060B (en) | Information retrieval method, apparatus, device and computer readable storage medium | |
CN110046264A (en) | A kind of automatic classification method towards mobile phone document | |
WO2020224097A1 (en) | Intelligent semantic document recommendation method and device, and computer-readable storage medium | |
CN112667794A (en) | Intelligent question-answer matching method and system based on twin network BERT model | |
EP3848797A1 (en) | Automatic parameter value resolution for api evaluation | |
CN109657011B (en) | Data mining system for screening terrorist attack event crime groups | |
CN108804595B (en) | Short text representation method based on word2vec | |
CN106845358B (en) | Method and system for recognizing image features of handwritten characters | |
CN110516074B (en) | Website theme classification method and device based on deep learning | |
CN108959305A (en) | A kind of event extraction method and system based on internet big data | |
CN112559684A (en) | Keyword extraction and information retrieval method | |
CN109960727A (en) | For the individual privacy information automatic testing method and system of non-structured text | |
WO2021190662A1 (en) | Medical text sorting method and apparatus, electronic device, and storage medium | |
CN111177367A (en) | Case classification method, classification model training method and related products | |
CN116187444A (en) | K-means++ based professional field sensitive entity knowledge base construction method | |
CN114092948B (en) | Bill identification method, device, equipment and storage medium | |
CN109582743B (en) | Data mining system for terrorist attack event | |
CN112489689B (en) | Cross-database voice emotion recognition method and device based on multi-scale difference countermeasure | |
CN113946657A (en) | Knowledge reasoning-based automatic identification method for power service intention | |
CN116049376B (en) | Method, device and system for retrieving and replying information and creating knowledge | |
CN115269816A (en) | Core personnel mining method and device based on information processing method and storage medium | |
CN113435213B (en) | Method and device for returning answers to user questions and knowledge base | |
CN112579783B (en) | Short text clustering method based on Laplace atlas | |
CN113761123A (en) | Keyword acquisition method and device, computing equipment and storage medium | |
CN111798217A (en) | Data analysis system and method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190723 |
|
RJ01 | Rejection of invention patent application after publication |