CN113254634A - File classification method and system based on phase space - Google Patents
File classification method and system based on phase space Download PDFInfo
- Publication number
- CN113254634A CN113254634A CN202110153675.XA CN202110153675A CN113254634A CN 113254634 A CN113254634 A CN 113254634A CN 202110153675 A CN202110153675 A CN 202110153675A CN 113254634 A CN113254634 A CN 113254634A
- Authority
- CN
- China
- Prior art keywords
- file
- data
- adopting
- text
- classification
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 32
- 238000013145 classification model Methods 0.000 claims abstract description 36
- 238000005516 engineering process Methods 0.000 claims abstract description 31
- 238000012360 testing method Methods 0.000 claims abstract description 19
- 238000012706 support-vector machine Methods 0.000 claims abstract description 14
- 239000013598 vector Substances 0.000 claims abstract description 13
- 238000004458 analytical method Methods 0.000 claims abstract description 9
- 238000000605 extraction Methods 0.000 claims abstract description 8
- 238000011156 evaluation Methods 0.000 claims description 9
- 230000011218 segmentation Effects 0.000 claims description 8
- 238000013075 data extraction Methods 0.000 claims description 6
- 238000012549 training Methods 0.000 claims description 5
- 238000013144 data compression Methods 0.000 claims description 4
- 238000007726 management method Methods 0.000 abstract description 7
- 238000010586 diagram Methods 0.000 description 3
- 238000007781 pre-processing Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 239000000284 extract Substances 0.000 description 2
- 102100032202 Cornulin Human genes 0.000 description 1
- 101000920981 Homo sapiens Cornulin Proteins 0.000 description 1
- 101150107801 Top2a gene Proteins 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 230000010354 integration Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000004519 manufacturing process Methods 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000005065 mining Methods 0.000 description 1
- 238000011160 research Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/353—Clustering; Classification into predefined classes
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
- G06F18/241—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
- G06F18/2411—Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/40—Document-oriented image-based pattern recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V30/00—Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
- G06V30/10—Character recognition
Abstract
The invention provides a method and a system for classifying files based on a phase space, wherein the method for classifying the files comprises the following steps: reading the file content by adopting a text analysis technology and an OCR technology; automatically extracting the file keywords by adopting a keyword extraction technology; adopting word2vec to extract features aiming at the archive text, constructing a text vector, and simultaneously considering the global vector weight of the text and the keyword weight of the text; compressing the archive data by adopting a clustering technology; establishing a file classification model according to file contents by adopting a support vector machine text classification technology, evaluating the model by utilizing test data, and optimizing the model according to a model test result; and classifying the unknown class of file data by applying a file classification model. The invention solves the technical problem that the traditional file management technology can not comprehensively analyze the unstructured and semi-structured data of various file texts, and greatly saves manpower.
Description
Technical Field
The invention belongs to the technical field of file classification management, and particularly relates to a file classification method and system based on a phase space.
Background
The archive work is an indispensable component of various social careers, and the informatization has great influence on the archive work. The file document is intelligently managed by adopting a text analysis technology, an intelligent and networked service platform is constructed, a perfect intelligent file application system is formed, and required file information resource services are quickly and conveniently provided for all parties in the society. And establishing an intelligent file collection, intelligent management, intelligent service, intelligent protection and intelligent supervision platform, and realizing the integration based on electronic documents and the warehouse-type management of business data.
With the continuous expansion of production scale and operation scale, various large scientific research institutions and intellectual property base institutions in China have knowledge in the forms of treatises, survey reports, historical documents, academic monographs and the like. This knowledge information has been characterized as big data: firstly, it is large in scale, from TB level to PB level, and secondly it is quite complex in form, such as plain text, XML file, Office document, image, audio-video, etc. In particular, for more remote archival data, no electronic version, only paper version, and not particularly intact due to long-term storage, the results of recognition by OCR after scanning are not satisfactory, which directly affects the processing of such archives.
The archives are huge in types and content, classification of the archives is very important, accurate classification of the archives is defined, management and use of the archives are more convenient, time is consumed for manual classification, different people can understand standards when classification of the archives is performed, classification results can be different, and accuracy of archives classification is directly affected. The text classification technology is characterized in that key features capable of reflecting text characteristics are extracted from a text by learning the classification rule of known category data and adopting a machine learning method, and the mapping between the features and the categories is captured and used for processing the data of unknown categories.
The core idea of the method for classifying the texts of the files is to divide words of the text data of the files, carry out vectorization, and then carry out modeling by adopting a mining method, more words need to be reserved if more information needs to be reserved, so that the number of fields is undoubtedly more, the idea of minimizing the structural risk is introduced into the support vector machine method, support vectors on classification boundaries are searched, models are built only by using the support vectors, and all the building ideas determine that the support vector machine can obtain better prediction models by other methods even if fewer data samples are used, and the models have better generalization popularization capability.
Therefore, a file classification method and system based on a phase space are urgently needed, a text analysis technology is adopted, file contents are read from an electronic file, a word segmentation technology is adopted to segment file texts, keywords are automatically extracted, word2vec is adopted to vectorize the file texts, file text weights and keyword weights are comprehensively considered, a clustering technology is adopted to compress file text data, then a support vector machine classification method is adopted to establish a file classification model, and files are classified.
Disclosure of Invention
In order to solve the above technical problems, the present invention provides a method and a system for classifying files based on a phase space, wherein the method for classifying files comprises the following steps:
step S1: reading the file content by adopting a text analysis technology and an OCR technology;
step S2: automatically extracting the file keywords by adopting a keyword extraction technology;
step S3: adopting word2vec to extract features aiming at the archive text, constructing a text vector, and simultaneously considering the global vector weight of the text and the keyword weight of the text;
step S4: compressing the archive data by adopting a clustering technology;
step S5: establishing a file classification model according to file contents by adopting a support vector machine text classification technology, evaluating the model by utilizing test data, and optimizing the model according to a model test result; and classifying the unknown class of file data by applying a file classification model.
Preferably, the step S1 includes the steps of:
step S11: for a common electronic document, directly reading the file content by adopting a text analysis technology;
step S12: and identifying the content of the picture file by adopting an OCR technology for the scanned file and the picture file of the paper file.
Preferably, the step S2 includes the steps of:
step S21: performing word segmentation on the file by adopting a word segmentation technology;
step S22: and automatically extracting the file keywords by adopting a keyword extraction technology for constructing a text vector.
Preferably, the step S3 includes the following steps:
step S31: performing word segmentation on the text aiming at known types of archive data, and performing 0-1 vectorization;
step S32: words are vectorized by adopting word2vec, and text global information and keyword weight information are comprehensively considered.
Preferably, the step S4 includes the steps of:
step S41: constructing a clustering feature tree according to the similarity;
step S42: and extracting modeling data from the clustering feature tree.
Preferably, the step S5 includes the steps of:
step S51: dividing a data set into a training set and a testing set;
step S52: establishing a file classification model by using a training set and a support vector machine method based on data compression;
step S53: testing the classification model by using the test set, and optimizing the model according to the test result;
step S54: and classifying the unknown class of file data by applying a file classification model.
Preferably, the file classification system comprises a file data acquisition module, a file data extraction module, a file data classification modeling module, a file classification model evaluation module and a file classification model using module; the file data acquisition module is used for acquiring file data and reading file contents from the electronic document; the archive data extraction module is used for segmenting the archive data and extracting keywords; the archive classification model modeling module is used for classifying archive data, words are vectorized by adopting word2vec, the weights of all words of a single archive document and the weights of keywords are considered, the archive data are compressed by adopting a clustering idea, and a classification model is established by adopting a support vector machine; the file classification model evaluation module evaluates the file classification model by adopting test data and optimizes the model according to an evaluation result; and the archive classification model using module is used for judging the classification of unknown classification data by using the established model and storing the classification result.
Compared with the prior art, the invention has the beneficial effects that: the invention can read normal electronic documents, can also read picture data by adopting an OCR recognition technology, not only considers the weighting weight of all words of the file, but also focuses on the weighting of key words, so that the information is more comprehensive, and compresses the data by adopting clustering, thereby not only considering the universality of the data, but also reserving the characteristics of main data, and ensuring that the generalization capability of the model is better, thereby solving the technical problem that the traditional file management technology can not comprehensively analyze unstructured and semi-structured data of various file texts, and greatly saving manpower.
Drawings
FIG. 1 is a schematic diagram of the system of the present invention;
FIG. 2 is an overall flow diagram of the present invention;
FIG. 3 is a data processing flow diagram of the data compression link of the present invention.
Detailed Description
The invention is further described below with reference to the accompanying drawings:
example (b):
a method and a system for classifying files based on phase space are disclosed, as shown in figure 1, the file classification system comprises a file data acquisition module, a file data extraction module, a file data classification modeling module, a file classification model evaluation module and a file classification model using module; the file data acquisition module is used for acquiring file data and reading file contents from the electronic document; the archive data extraction module is used for segmenting the archive data and extracting keywords; the archive classification model modeling module is used for classifying archive data, words are vectorized by adopting word2vec, the weights of all words of a single archive document and the weights of keywords are considered, the archive data are compressed by adopting a clustering idea, and a classification model is established by adopting a support vector machine; the file classification model evaluation module evaluates the file classification model by adopting test data and optimizes the model according to an evaluation result; and the archive classification model using module is used for judging the classification of unknown classification data by using the established model and storing the classification result.
As shown in fig. 2, the archive classification method includes the following steps:
step S1: collecting archive data, and reading the archive content by adopting a text analysis technology;
step S11: for a common electronic document, directly reading the file content by adopting a text analysis technology;
step S12: identifying the content of the picture file by adopting an OCR technology for the scanned file and the picture file of the paper file;
the OCR technical process comprises image preprocessing, character detection and text recognition; the image preprocessing adopts a neural network based on CNN as a characteristic extraction means; the character detection adopts box to mark all character positions in the image; the text recognition adopts a CRNN + CTC algorithm, firstly CNN extracts image convolution characteristics, then LSTM further extracts sequence characteristics in the image convolution characteristics, and finally CTC is introduced to solve the problem that characters cannot be aligned during training;
step S2: preprocessing archive data, comprising the following steps:
step S21: performing word segmentation on the read file text data, and removing stop words;
step S22: extracting file keywords by adopting a keyword extraction technology aiming at the file text;
step S3: performing feature extraction on the archive text, wherein a word2vec method is adopted to construct a text vector, and the method comprises the following steps:
step S31: aiming at known types of archive data, performing 0-1 vectorization on all words of a text;
step S32: carrying out weighted average by adopting the weights of the word2 vecs, extracting the weights of the word2 vecs of the keywords of the text, and combining the weights of the two parts, thereby not only considering the weights of all words of a single text, keeping the integrity of text information, but also highlighting the weight information of the keywords;
step S4: as shown in fig. 3, the clustering technique is used to compress the archive data, and includes the following steps:
step S41: traversing from the root node of the clustering characteristic tree;
step S42: if the current node is a leaf node, go to step S43, otherwise go to step S46;
step S43: finding a child node closest to the data in the current node, calculating the cluster diameter after merging the data and the data of the child node, if the cluster diameter is smaller than a threshold value, turning to the step S44, otherwise, turning to the step S45;
step S44: merging the piece of data with the nearest child node;
step S45: the data is used as a new child node of the current node, at this time, if the number of child nodes of the current node exceeds a certain threshold, the current node is split into two nodes, two child nodes with the farthest distance can be selected as initial nodes, and other child nodes are divided into proper nodes according to the distance to be combined;
step S46: and finding the child node closest to the piece of data in the current node, and taking the child node as a new current node, and going to step S42.
For the newly added data, the new data can be added on the original clustering feature tree without reconstructing the clustering feature tree by using all data.
Extracting modeling data from the clustering feature tree, and forming a classification hyperplane to construct a model by searching for a support vector because a support vector machine is a modeling method based on a structural risk minimization principle; based on the method, the boundary of each cluster of data under the leaf node of the clustering feature tree can be calculated, and the boundary point which is most likely to become the support vector is taken as the modeling data of the support vector machine, so that the data compression is realized.
In the present embodiment, the specific boundary calculation method is described by the following example:
assume that a cluster of data contains records: (-5, -4, -2), (-4, -6, -7), (-3, -2,0), (-2, -1,1), (-1,0,2), (0,1,3), (1,2,4), (2,3,5), (3,4,6), (4,5,7), (5,9,8), (6,7,9), then take the maximum and minimum of Top2 in each dimension:
the 1 st dimension maximum points are (6,7,9), (5,9,8), and the minimum points are: (-5, -4, -2), (-4, -6, -1), the 2 nd dimension maxima are (5,9,8), (6,7,9), the minima are: (-4, -6, -1), (-5, -4, -2), maxima in dimension 3 are (6,7,9), (5,9,8), minima are: (-4, -6, -7),(-5, -4, -2)
Finally, the selected extreme point is a union set of the different extreme points, and 5 records are obtained in total;
step S5: constructing a file classification model by adopting a support vector machine method; evaluating the model by using the test data, and optimizing the model according to the test result of the model; and classifying the unknown class of file data by applying a file classification model.
The invention can read normal electronic documents, can also read picture data by adopting an OCR recognition technology, not only considers the weighting weight of all words of the file, but also focuses on the weighting of key words, so that the information is more comprehensive, and compresses the data by adopting clustering, thereby not only considering the universality of the data, but also reserving the characteristics of main data, and ensuring that the generalization capability of the model is better, thereby solving the technical problem that the traditional file management technology can not comprehensively analyze unstructured and semi-structured data of various file texts, and greatly saving manpower.
The technical solutions of the present invention or similar technical solutions designed by those skilled in the art based on the teachings of the technical solutions of the present invention are all within the scope of the present invention.
Claims (7)
1. A method and a system for classifying files based on a phase space are characterized in that the method for classifying the files comprises the following steps:
step S1: reading the file content by adopting a text analysis technology and an OCR technology;
step S2: automatically extracting the file keywords by adopting a keyword extraction technology;
step S3: adopting word2vec to extract features aiming at the archive text, constructing a text vector, and simultaneously considering the global vector weight of the text and the keyword weight of the text;
step S4: compressing the archive data by adopting a clustering technology;
step S5: establishing a file classification model according to file contents by adopting a support vector machine text classification technology, evaluating the model by utilizing test data, and optimizing the model according to a model test result; and classifying the unknown class of file data by applying a file classification model.
2. The method and system for classifying files according to claim 1, wherein said step S1 comprises the steps of:
step S11: for a common electronic document, directly reading the file content by adopting a text analysis technology;
step S12: and identifying the content of the picture file by adopting an OCR technology for the scanned file and the picture file of the paper file.
3. The method and system for classifying files according to claim 1, wherein said step S2 comprises the steps of:
step S21: performing word segmentation on the file by adopting a word segmentation technology;
step S22: and automatically extracting the file keywords by adopting a keyword extraction technology for constructing a text vector.
4. The method and system for classifying files according to claim 1, wherein said step S3 comprises the steps of:
step S31: performing word segmentation on the text aiming at known types of archive data, and performing 0-1 vectorization;
step S32: words are vectorized by adopting word2vec, and text global information and keyword weight information are comprehensively considered.
5. The method and system for classifying files according to claim 1, wherein said step S4 comprises the steps of:
step S41: constructing a clustering feature tree according to the similarity;
step S42: and extracting modeling data from the clustering feature tree.
6. The method and system for classifying files according to claim 1, wherein said step S5 comprises the steps of:
step S51: dividing a data set into a training set and a testing set;
step S52: establishing a file classification model by using a training set and a support vector machine method based on data compression;
step S53: testing the classification model by using the test set, and optimizing the model according to the test result;
step S54: and classifying the unknown class of file data by applying a file classification model.
7. The method and system for classifying files based on phase space according to claim 1, wherein the file classification system comprises a file data acquisition module, a file data extraction module, a file data classification modeling module, a file classification model evaluation module, and a file classification model using module; the file data acquisition module is used for acquiring file data and reading file contents from the electronic document; the archive data extraction module is used for segmenting the archive data and extracting keywords; the archive classification model modeling module is used for classifying archive data, words are vectorized by adopting word2vec, the weights of all words of a single archive document and the weights of keywords are considered, the archive data are compressed by adopting a clustering idea, and a classification model is established by adopting a support vector machine; the file classification model evaluation module evaluates the file classification model by adopting test data and optimizes the model according to an evaluation result; and the archive classification model using module is used for judging the classification of unknown classification data by using the established model and storing the classification result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110153675.XA CN113254634A (en) | 2021-02-04 | 2021-02-04 | File classification method and system based on phase space |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110153675.XA CN113254634A (en) | 2021-02-04 | 2021-02-04 | File classification method and system based on phase space |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113254634A true CN113254634A (en) | 2021-08-13 |
Family
ID=77180874
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110153675.XA Pending CN113254634A (en) | 2021-02-04 | 2021-02-04 | File classification method and system based on phase space |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113254634A (en) |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113254659A (en) * | 2021-02-04 | 2021-08-13 | 天津德尔塔科技有限公司 | File studying and judging method and system based on knowledge graph technology |
CN115794496A (en) * | 2023-02-07 | 2023-03-14 | 中信天津金融科技服务有限公司 | Archive storage method and system based on information extraction |
CN116663549A (en) * | 2023-05-18 | 2023-08-29 | 海南科技职业大学 | Digitized management method, system and storage medium based on enterprise files |
Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103744835A (en) * | 2014-01-02 | 2014-04-23 | 上海大学 | Text keyword extracting method based on subject model |
CN105373583A (en) * | 2015-10-12 | 2016-03-02 | 国家计算机网络与信息安全管理中心 | Modeling method for support vector machine based on data compression |
CN106095737A (en) * | 2016-06-07 | 2016-11-09 | 杭州凡闻科技有限公司 | Documents Similarity computational methods and similar document the whole network retrieval tracking |
CN107122352A (en) * | 2017-05-18 | 2017-09-01 | 成都四方伟业软件股份有限公司 | A kind of method of the extracting keywords based on K MEANS, WORD2VEC |
CN107992633A (en) * | 2018-01-09 | 2018-05-04 | 国网福建省电力有限公司 | Electronic document automatic classification method and system based on keyword feature |
CN108563636A (en) * | 2018-04-04 | 2018-09-21 | 广州杰赛科技股份有限公司 | Extract method, apparatus, equipment and the storage medium of text key word |
CN108804641A (en) * | 2018-06-05 | 2018-11-13 | 鼎易创展咨询(北京)有限公司 | A kind of computational methods of text similarity, device, equipment and storage medium |
WO2019035765A1 (en) * | 2017-08-14 | 2019-02-21 | Dathena Science Pte. Ltd. | Methods, machine learning engines and file management platform systems for content and context aware data classification and security anomaly detection |
CN111104794A (en) * | 2019-12-25 | 2020-05-05 | 同方知网(北京)技术有限公司 | Text similarity matching method based on subject words |
CN111898384A (en) * | 2020-05-30 | 2020-11-06 | 中国兵器科学研究院 | Text emotion recognition method and device, storage medium and electronic equipment |
-
2021
- 2021-02-04 CN CN202110153675.XA patent/CN113254634A/en active Pending
Patent Citations (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103744835A (en) * | 2014-01-02 | 2014-04-23 | 上海大学 | Text keyword extracting method based on subject model |
CN105373583A (en) * | 2015-10-12 | 2016-03-02 | 国家计算机网络与信息安全管理中心 | Modeling method for support vector machine based on data compression |
CN106095737A (en) * | 2016-06-07 | 2016-11-09 | 杭州凡闻科技有限公司 | Documents Similarity computational methods and similar document the whole network retrieval tracking |
CN107122352A (en) * | 2017-05-18 | 2017-09-01 | 成都四方伟业软件股份有限公司 | A kind of method of the extracting keywords based on K MEANS, WORD2VEC |
WO2019035765A1 (en) * | 2017-08-14 | 2019-02-21 | Dathena Science Pte. Ltd. | Methods, machine learning engines and file management platform systems for content and context aware data classification and security anomaly detection |
CN107992633A (en) * | 2018-01-09 | 2018-05-04 | 国网福建省电力有限公司 | Electronic document automatic classification method and system based on keyword feature |
CN108563636A (en) * | 2018-04-04 | 2018-09-21 | 广州杰赛科技股份有限公司 | Extract method, apparatus, equipment and the storage medium of text key word |
CN108804641A (en) * | 2018-06-05 | 2018-11-13 | 鼎易创展咨询(北京)有限公司 | A kind of computational methods of text similarity, device, equipment and storage medium |
CN111104794A (en) * | 2019-12-25 | 2020-05-05 | 同方知网(北京)技术有限公司 | Text similarity matching method based on subject words |
CN111898384A (en) * | 2020-05-30 | 2020-11-06 | 中国兵器科学研究院 | Text emotion recognition method and device, storage medium and electronic equipment |
Non-Patent Citations (2)
Title |
---|
苏玉龙等: "基于关键词的文本向量化与分类算法研究", 《贵州大学学报(自然科学版)》 * |
陈杰等: "基于Word2vec的文档分类方法", 《计算机系统应用》 * |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113254659A (en) * | 2021-02-04 | 2021-08-13 | 天津德尔塔科技有限公司 | File studying and judging method and system based on knowledge graph technology |
CN115794496A (en) * | 2023-02-07 | 2023-03-14 | 中信天津金融科技服务有限公司 | Archive storage method and system based on information extraction |
CN116663549A (en) * | 2023-05-18 | 2023-08-29 | 海南科技职业大学 | Digitized management method, system and storage medium based on enterprise files |
CN116663549B (en) * | 2023-05-18 | 2024-03-19 | 海南科技职业大学 | Digitized management method, system and storage medium based on enterprise files |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109635171B (en) | Fusion reasoning system and method for news program intelligent tags | |
CN110597735B (en) | Software defect prediction method for open-source software defect feature deep learning | |
CN113254634A (en) | File classification method and system based on phase space | |
CN107562742B (en) | Image data processing method and device | |
CN106649490B (en) | Image retrieval method and device based on depth features | |
CN112699246B (en) | Domain knowledge pushing method based on knowledge graph | |
US9141853B1 (en) | System and method for extracting information from documents | |
CN112541490A (en) | Archive image information structured construction method and device based on deep learning | |
CN104133875A (en) | Face-based video labeling method and face-based video retrieving method | |
CN107577702B (en) | Method for distinguishing traffic information in social media | |
CN109508458A (en) | The recognition methods of legal entity and device | |
CN111026880B (en) | Joint learning-based judicial knowledge graph construction method | |
CN113449111B (en) | Social governance hot topic automatic identification method based on time-space semantic knowledge migration | |
Van Phan et al. | A nom historical document recognition system for digital archiving | |
CN113190502A (en) | Archive management method based on deep learning | |
CN114780746A (en) | Knowledge graph-based document retrieval method and related equipment thereof | |
CN115238081B (en) | Intelligent cultural relic identification method, system and readable storage medium | |
CN111860524A (en) | Intelligent classification device and method for digital files | |
CN105678244A (en) | Approximate video retrieval method based on improvement of editing distance | |
CN116186350B (en) | Power transmission line engineering searching method and device based on knowledge graph and topic text | |
CN112200212A (en) | Artificial intelligence-based enterprise material classification catalogue construction method | |
CN111460817A (en) | Method and system for recommending criminal legal document related law provision | |
CN115935042A (en) | Intelligent pledge asset duplicate checking method and system based on fusion model | |
CN114238735B (en) | Intelligent internet data acquisition method | |
CN115186138A (en) | Comparison method and terminal for power distribution network data |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20210813 |
|
RJ01 | Rejection of invention patent application after publication |