CN115858474A - AIGC-based file arrangement system - Google Patents
AIGC-based file arrangement system Download PDFInfo
- Publication number
- CN115858474A CN115858474A CN202310165385.6A CN202310165385A CN115858474A CN 115858474 A CN115858474 A CN 115858474A CN 202310165385 A CN202310165385 A CN 202310165385A CN 115858474 A CN115858474 A CN 115858474A
- Authority
- CN
- China
- Prior art keywords
- classification
- module
- hyperplane
- file
- files
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000004458 analytical method Methods 0.000 claims abstract description 24
- 238000005516 engineering process Methods 0.000 claims abstract description 6
- 238000004364 calculation method Methods 0.000 claims description 47
- 239000011159 matrix material Substances 0.000 claims description 23
- 238000000034 method Methods 0.000 claims description 16
- 238000010224 classification analysis Methods 0.000 claims description 15
- 230000000694 effects Effects 0.000 abstract description 2
- 238000010586 diagram Methods 0.000 description 5
- 101001121408 Homo sapiens L-amino-acid oxidase Proteins 0.000 description 2
- 102100026388 L-amino-acid oxidase Human genes 0.000 description 2
- 101100012902 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) FIG2 gene Proteins 0.000 description 2
- 101100233916 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) KAR5 gene Proteins 0.000 description 2
- 238000000354 decomposition reaction Methods 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 101000827703 Homo sapiens Polyphosphoinositide phosphatase Proteins 0.000 description 1
- 102100023591 Polyphosphoinositide phosphatase Human genes 0.000 description 1
- 230000002776 aggregation Effects 0.000 description 1
- 238000004220 aggregation Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007812 deficiency Effects 0.000 description 1
- 239000000463 material Substances 0.000 description 1
Images
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Description
技术领域Technical Field
本发明涉及文件系统结构领域,具体涉及一种基于AIGC的文件整理系统。The present invention relates to the field of file system structure, and in particular to a file arrangement system based on AIGC.
背景技术Background Art
在日常的工作生活中,会产生大量的文本文件,为了提高工作效率,需要对文本文件进行整理分类,在传统的文件整理系统中,用户需要手动地为每个文件打标签、整理分类等,工作量大且容易出错,现需要一种智能的文件整理系统来辅助整理文件,降低在整理文件上花的时间,提高工作时间的有效利用率。In daily work and life, a large number of text files are generated. In order to improve work efficiency, text files need to be sorted and classified. In traditional file sorting systems, users need to manually label, sort and classify each file, which is a large workload and prone to errors. Now we need an intelligent file sorting system to assist in sorting files, reduce the time spent on sorting files, and improve the effective utilization of working time.
背景技术的前述论述仅意图便于理解本发明。此论述并不认可或承认提及的材料中的任一种公共常识的一部分。The foregoing discussion of the background art is intended only to facilitate an understanding of the present invention. This discussion does not acknowledge or admit that any of the material referred to is part of the common general knowledge.
现在已经开发出了很多文件整理系统,经过我们大量的检索与参考,发现现有的整理系统有如公开号为CN110245119B所公开的系统,这些系统一般包括:存储系统获取第一文件的第一信息,其中,所述第一文件以数据分片的形式存储在存储介质中,所述第一信息包括所述第一文件的N个数据分片和每个数据分片在存储介质中所占用的单位存储区域的个数,所述N个数据分片中的其中两个数据分片占用的单位存储区域不连续,N为大于或等于2的整数;所述存储系统根据所述第一信息确定所述第一文件的第一参数,其中,所述第一参数用于表征所述第一文件存储在所述存储介质中的位置的聚集程度。但该系统仅仅在存储方面进行整理,并不能自动对大量文件进行分类处理,无法提高工作效率。Many file sorting systems have been developed. After extensive searching and reference, we found that the existing sorting systems include the system disclosed in the publication number CN110245119B. These systems generally include: a storage system obtains first information of a first file, wherein the first file is stored in a storage medium in the form of data slices, the first information includes N data slices of the first file and the number of unit storage areas occupied by each data slice in the storage medium, two of the N data slices occupy discontinuous unit storage areas, and N is an integer greater than or equal to 2; the storage system determines a first parameter of the first file based on the first information, wherein the first parameter is used to characterize the degree of aggregation of the location where the first file is stored in the storage medium. However, the system only sorts out the storage aspect, and cannot automatically classify and process a large number of files, and cannot improve work efficiency.
发明内容Summary of the invention
本发明的目的在于,针对所存在的不足,提出了一种基于AIGC的文件整理系统。The purpose of the present invention is to propose a file arrangement system based on AIGC in view of the existing deficiencies.
本发明采用如下技术方案:The present invention adopts the following technical solution:
一种基于AIGC的文件整理系统,包括存储模块、分析模块、分类模块、应用模块和输出模块;A file arrangement system based on AIGC, comprising a storage module, an analysis module, a classification module, an application module and an output module;
所述存储模块用于负责存储用户的文件,所述分析模块基于AIGC技术对文件进行分析,所述分类模块利用分析模块生成的信息对文件进行分类,所述应用模块为用户提供应用功能,所述输出模块负责生成用于决策的报告;The storage module is responsible for storing the user's files, the analysis module analyzes the files based on the AIGC technology, the classification module classifies the files using the information generated by the analysis module, the application module provides application functions for the user, and the output module is responsible for generating reports for decision-making;
所述分析模块接收训练用的文件,将文件转换成特征信息,基于特征信息创建超平面,所述超平面信息被发送至所述分类模块,所述分类模块基于超平面对待分类的文件进行计算处理得到分类码,所述分类码的每一位对应一个超平面,所述分类模块根据分类码将待分类的文件保存至存储模块的对应区域;The analysis module receives training files, converts the files into feature information, creates a hyperplane based on the feature information, and sends the hyperplane information to the classification module. The classification module calculates and processes the files to be classified based on the hyperplane to obtain a classification code, each bit of the classification code corresponds to a hyperplane, and the classification module saves the files to be classified to the corresponding area of the storage module according to the classification code;
进一步的,所述分析模块包括词汇统计单元、词频矩阵处理单元、特征处理单元和分类分析单元,所述词汇统计单元根据输入的文本文件创建一个词汇表并基于词汇表对每个文本文件进行词汇统计,所述词频矩阵处理单元根据统计结果生成一个m*n的矩阵,其中,m表示文件的数量,n表示词汇表中词汇的数量,所述特征处理单元根据词频矩阵处理得到每个文件的n维特征数,所述分类分析单元基于特征数以及文本文件的分类码创建超平面;Further, the analysis module includes a vocabulary statistics unit, a word frequency matrix processing unit, a feature processing unit and a classification analysis unit, the vocabulary statistics unit creates a vocabulary table according to the input text file and performs vocabulary statistics on each text file based on the vocabulary table, the word frequency matrix processing unit generates an m*n matrix according to the statistical results, wherein m represents the number of files and n represents the number of words in the vocabulary table, the feature processing unit obtains the n-dimensional feature number of each file according to the word frequency matrix processing, and the classification analysis unit creates a hyperplane based on the feature number and the classification code of the text file;
进一步的,所述特征处理单元基于矩阵中的元素值根据下式进行计算处理:Furthermore, the feature processing unit performs calculation processing based on the element values in the matrix according to the following formula:
; ;
其中,表示矩阵中第x行、第y列的元素值,i表示行数的变量,j表示列数的变量;in, Represents the element value of the x-th row and y-th column in the matrix, i represents the variable of the number of rows, and j represents the variable of the number of columns;
由构成第x个文件的特征数;Depend on The number of features that make up the xth file;
所述分类分析单元设置的超平面A为:The hyperplane A set by the classification analysis unit is:
; ;
其中,为超平面系数,为超平面元素,b为阈值;in, is the hyperplane coefficient, is the hyperplane element, b is the threshold;
所述超平面A满足:The hyperplane A satisfies:
; ;
其中,表示一类特征数对应的文件序号,表示二类特征数对应的文件序号;in, Indicates the file number corresponding to a type of feature number, Indicates the file number corresponding to the second type of feature number;
进一步的,所述分类模块包括文本处理单元、计算处理单元和执行单元,所述文本处理单元用于统计出文本文件中特定词汇的出现频次,所述计算处理单元用于执行计算任务,所述执行单元根据所述计算处理单元的计算结果将文本分类并保存至对应的存储区域;Furthermore, the classification module includes a text processing unit, a calculation processing unit and an execution unit, wherein the text processing unit is used to count the occurrence frequency of specific words in the text file, the calculation processing unit is used to perform calculation tasks, and the execution unit classifies the text according to the calculation results of the calculation processing unit and saves it to the corresponding storage area;
进一步的,所述计算处理单元包括特征数计算处理器和超平面计算处理器,所述特征数计算处理器根据下式计算出待分类文件的分类特征数:Furthermore, the calculation processing unit includes a feature number calculation processor and a hyperplane calculation processor, and the feature number calculation processor calculates the classification feature number of the file to be classified according to the following formula: :
; ;
其中,为所述文本处理单元统计得到的第y个词汇频次;in, The frequency of the yth word obtained by counting the text processing unit;
所述超平面计算处理器根据超平面对分类特征数进行处理,每个超平面的计算结果为:The hyperplane calculation processor processes the classification feature number according to the hyperplane, and the calculation result of each hyperplane is:
; ;
所述执行单元根据的正负性确定分类码上对应位数上的值,并得到分类码,所述执行单元根据分类码将文本文件保存至对应的分类区域。The execution unit is based on The positive or negative value of determines the value of the corresponding bit in the classification code, and obtains the classification code. The execution unit saves the text file to the corresponding classification area according to the classification code.
本发明所取得的有益效果是:The beneficial effects achieved by the present invention are:
本系统通过分析文本中的词汇、语法和语义信息,识别文档的主体、情感和关键词等信息,然后计算每个词语的重要性,将文档转换为n维向量的特征数,并基于特征数创建超平面,超平面的数量决定了能够分类的分类数上限,本系统能够利用超平面自动地为文件打标签、分类等,大大提高了效率和准确性。This system analyzes the vocabulary, grammar and semantic information in the text, identifies the subject, sentiment and keywords of the document, calculates the importance of each word, converts the document into the feature number of an n-dimensional vector, and creates a hyperplane based on the feature number. The number of hyperplanes determines the upper limit of the number of categories that can be classified. This system can use hyperplanes to automatically label and classify files, greatly improving efficiency and accuracy.
为使能更进一步了解本发明的特征及技术内容,请参阅以下有关本发明的详细说明与附图,然而所提供的附图仅用于提供参考与说明,并非用来对本发明加以限制。To further understand the features and technical contents of the present invention, please refer to the following detailed description and drawings of the present invention. However, the drawings provided are only for reference and description and are not intended to limit the present invention.
附图说明BRIEF DESCRIPTION OF THE DRAWINGS
图1为本发明整体结构框架示意图;FIG1 is a schematic diagram of the overall structural framework of the present invention;
图2为本发明分析模块构成示意图;FIG2 is a schematic diagram of the analysis module of the present invention;
图3为本发明分类模块构成示意图;FIG3 is a schematic diagram of the classification module of the present invention;
图4为本发明计算处理单元构成示意图;FIG4 is a schematic diagram of the structure of a computing processing unit of the present invention;
图5为本发明设置超平面流程示意图。FIG. 5 is a schematic diagram of the process of setting a hyperplane according to the present invention.
具体实施方式DETAILED DESCRIPTION
以下是通过特定的具体实施例来说明本发明的实施方式,本领域技术人员可由本说明书所公开的内容了解本发明的优点与效果。本发明可通过其他不同的具体实施例加以施行或应用,本说明书中的各项细节也可基于不同观点与应用,在不悖离本发明的精神下进行各种修饰与变更。另外,本发明的附图仅为简单示意说明,并非依实际尺寸的描绘,事先声明。以下的实施方式将进一步详细说明本发明的相关技术内容,但所公开的内容并非用以限制本发明的保护范围。The following is an explanation of the embodiments of the present invention through specific embodiments. Those skilled in the art can understand the advantages and effects of the present invention from the contents disclosed in this specification. The present invention can be implemented or applied through other different specific embodiments, and the details in this specification can also be modified and changed in various ways based on different viewpoints and applications without departing from the spirit of the present invention. In addition, the drawings of the present invention are only simple schematic illustrations and are not depicted according to actual sizes. It is stated in advance. The following embodiments will further explain the relevant technical contents of the present invention in detail, but the disclosed contents are not intended to limit the scope of protection of the present invention.
实施例一:Embodiment 1:
本实施例提供了一种基于AIGC的文件整理系统,结合图1,包括存储模块、分析模块、分类模块、应用模块和输出模块;This embodiment provides a file arrangement system based on AIGC, which includes a storage module, an analysis module, a classification module, an application module and an output module in conjunction with FIG1 ;
所述存储模块用于负责存储用户的文件,所述分析模块基于AIGC技术对文件进行分析,所述分类模块利用分析模块生成的信息对文件进行分类,所述应用模块为用户提供应用功能,所述输出模块负责生成用于决策的报告;The storage module is responsible for storing the user's files, the analysis module analyzes the files based on the AIGC technology, the classification module classifies the files using the information generated by the analysis module, the application module provides application functions for the user, and the output module is responsible for generating reports for decision-making;
所述分析模块接收训练用的文件,将文件转换成特征信息,基于特征信息创建超平面,所述超平面信息被发送至所述分类模块,所述分类模块基于超平面对待分类的文件进行计算处理得到分类码,所述分类码的每一位对应一个超平面,所述分类模块根据分类码将待分类的文件保存至存储模块的对应区域;The analysis module receives training files, converts the files into feature information, creates a hyperplane based on the feature information, and sends the hyperplane information to the classification module. The classification module calculates and processes the files to be classified based on the hyperplane to obtain a classification code, each bit of the classification code corresponds to a hyperplane, and the classification module saves the files to be classified to the corresponding area of the storage module according to the classification code;
所述分析模块包括词汇统计单元、词频矩阵处理单元、特征处理单元和分类分析单元,所述词汇统计单元根据输入的文本文件创建一个词汇表并基于词汇表对每个文本文件进行词汇统计,所述词频矩阵处理单元根据统计结果生成一个m*n的矩阵,其中,m表示文件的数量,n表示词汇表中词汇的数量,所述特征处理单元根据词频矩阵处理得到每个文件的n维特征数,所述分类分析单元基于特征数以及文本文件的分类码创建超平面;The analysis module includes a vocabulary statistics unit, a word frequency matrix processing unit, a feature processing unit and a classification analysis unit. The vocabulary statistics unit creates a vocabulary table according to the input text file and performs vocabulary statistics on each text file based on the vocabulary table. The word frequency matrix processing unit generates an m*n matrix according to the statistical results, wherein m represents the number of files and n represents the number of words in the vocabulary table. The feature processing unit obtains the n-dimensional feature number of each file according to the word frequency matrix processing. The classification analysis unit creates a hyperplane based on the feature number and the classification code of the text file.
所述特征处理单元基于矩阵中的元素值根据下式进行计算处理:The feature processing unit performs calculation processing based on the element values in the matrix according to the following formula:
; ;
其中,表示矩阵中第x行、第y列的元素值,i表示行数的变量,j表示列数的变量;in, Represents the element value of the x-th row and y-th column in the matrix, i represents the variable of the number of rows, and j represents the variable of the number of columns;
由构成第x个文件的特征数;Depend on The number of features that make up the xth file;
所述分类分析单元设置的超平面A为:The hyperplane A set by the classification analysis unit is:
; ;
其中,为超平面系数,为超平面元素,b为阈值;in, is the hyperplane coefficient, is the hyperplane element, b is the threshold;
所述超平面A满足:The hyperplane A satisfies:
; ;
其中,表示一类特征数对应的文件序号,表示二类特征数对应的文件序号;in, Indicates the file number corresponding to a type of feature number, Indicates the file number corresponding to the second type of feature number;
所述分类模块包括文本处理单元、计算处理单元和执行单元,所述文本处理单元用于统计出文本文件中特定词汇的出现频次,所述计算处理单元用于执行计算任务,所述执行单元根据所述计算处理单元的计算结果将文本分类并保存至对应的存储区域;The classification module includes a text processing unit, a calculation processing unit and an execution unit, wherein the text processing unit is used to count the occurrence frequency of specific words in the text file, the calculation processing unit is used to perform calculation tasks, and the execution unit classifies the text according to the calculation results of the calculation processing unit and saves it to the corresponding storage area;
所述计算处理单元包括特征数计算处理器和超平面计算处理器,所述特征数计算处理器根据下式计算出待分类文件的分类特征数:The calculation processing unit includes a feature number calculation processor and a hyperplane calculation processor. The feature number calculation processor calculates the classification feature number of the file to be classified according to the following formula: :
; ;
其中,为所述文本处理单元统计得到的第y个词汇频次;in, The frequency of the yth word obtained by counting the text processing unit;
所述超平面计算处理器根据超平面对分类特征数进行处理,每个超平面的计算结果为:The hyperplane calculation processor processes the classification feature number according to the hyperplane, and the calculation result of each hyperplane is:
; ;
所述执行单元根据的正负性确定分类码上对应位数上的值,并得到分类码,所述执行单元根据分类码将文本文件保存至对应的分类区域。The execution unit is based on The positive or negative value of determines the value of the corresponding bit in the classification code, and obtains the classification code. The execution unit saves the text file to the corresponding classification area according to the classification code.
实施例二:Embodiment 2:
本实施例包含了实施例一中的全部内容,提供了一种基于AIGC的文件整理系统,包括存储模块、分析模块、分类模块、应用模块和输出模块;This embodiment includes all the contents of the first embodiment, and provides a file arrangement system based on AIGC, including a storage module, an analysis module, a classification module, an application module and an output module;
所述存储模块用于负责存储用户的文件,所述分析模块基于AIGC技术对文件进行分析,以识别文件内容的主题、关键词等信息,所述分类模块利用分析模块生成的信息对文件进行分类,所述应用模块为用户提供多种应用功能,所述输出模块负责生成用于决策的报告并以适当的形式呈现给用户;The storage module is responsible for storing the user's files. The analysis module analyzes the files based on AIGC technology to identify the subject, keywords and other information of the file content. The classification module classifies the files using the information generated by the analysis module. The application module provides users with a variety of application functions. The output module is responsible for generating reports for decision-making and presenting them to users in an appropriate form.
结合图2,所述分析模块包括词汇统计单元、词频矩阵处理单元、特征处理单元和分类分析单元,所述词汇统计单元能够对输入的每一个文本文件进行处理,处理过程包括分解、除词、提取和统计,分解过程为将整个文本分解成多个短语,除词过程为删除连接性词语,提取过程为提取短语中的词干,统计过程为统计每个词干出现的次数,所述词汇统计单元根据提取出的词干创建一个词汇表用于记录出现过的词汇;In conjunction with FIG2 , the analysis module includes a vocabulary statistics unit, a word frequency matrix processing unit, a feature processing unit and a classification analysis unit. The vocabulary statistics unit can process each input text file, and the processing process includes decomposition, word removal, extraction and statistics. The decomposition process is to decompose the entire text into multiple phrases, the word removal process is to delete connecting words, the extraction process is to extract the stems in the phrases, and the statistical process is to count the number of times each stem appears. The vocabulary statistics unit creates a vocabulary table based on the extracted stems to record the words that have appeared;
所述词频矩阵处理单元根据统计结果生成一个m*n的矩阵,其中,m表示文件的数量,n表示词汇表中词汇的数量,每一行代表一个文件,每一列代表一个词语,每个元素代表该词语在该文件中出现的频次;The word frequency matrix processing unit generates an m*n matrix according to the statistical results, wherein m represents the number of files, n represents the number of words in the vocabulary, each row represents a file, each column represents a word, and each element represents the frequency of the word appearing in the file;
所述特征处理单元基于矩阵中的元素值根据下式进行计算处理:The feature processing unit performs calculation processing based on the element values in the matrix according to the following formula:
; ;
其中,表示矩阵中第x行、第y列的元素值,i表示行数的变量,j表示列数的变量;in, Represents the element value of the x-th row and y-th column in the matrix, i represents the variable of the number of rows, and j represents the variable of the number of columns;
由构成第x个文件的特征数;Depend on The number of features that make up the xth file;
所述分类分析单元通过设置个超平面来实现对个类型文件的分类分析,文件的类型用位的二进制数表示,所述二进制数称为分类码,每个超平面对应于分类码的其中一位,所述分类码随文本文件一同发送至所述分析模块,文本文件经处理得到特征数后,分类码与对应的特征数进行配对被发送至所述分类分析单元;The classification analysis unit is configured by A hyperplane is used to realize Classification analysis of file types, the file types are The binary number of bits is represented by the classification code, each hyperplane corresponds to one bit of the classification code, and the classification code is sent to the analysis module together with the text file. After the text file is processed to obtain the feature number, the classification code is paired with the corresponding feature number and sent to the classification analysis unit;
结合图5,所述分类分析单元设置一个超平面的过程包括如下步骤:5, the process of the classification analysis unit setting a hyperplane includes the following steps:
S1、确定超平面对应的分类码位数,称为目标位;S1. Determine the number of classification code bits corresponding to the hyperplane, called the target bit;
S2、根据目标位的值将分类码分为两类,目标位的值为1的分类码为一类码,目标位的值为0的分类码为二类码;S2. Classify the classification codes into two categories according to the value of the target bit. The classification codes with the target bit value of 1 are the first-class codes, and the classification codes with the target bit value of 0 are the second-class codes.
S3、将特征数根据分类码在步骤S2中的分类结果进行分类,即一类码对应的特征数为一类特征数,二类码对应的特征数为二类特征数;S3, classifying the feature numbers according to the classification results of the classification codes in step S2, that is, the feature numbers corresponding to the first-class code are the first-class feature numbers, and the feature numbers corresponding to the second-class code are the second-class feature numbers;
S4、设置一个超平面A:S4. Set a hyperplane A:
; ;
其中,为超平面系数,为超平面元素,b为阈值;in, is the hyperplane coefficient, is the hyperplane element, b is the threshold;
所述超平面A满足:The hyperplane A satisfies:
; ;
其中,表示一类特征数对应的文件序号,表示二类特征数对应的文件序号,满足的不等式的形式确定了正负性与0、1值的对应关系;in, Indicates the file number corresponding to a type of feature number, It represents the file number corresponding to the second type of feature number. The form of the inequality satisfied determines the corresponding relationship between the positive and negative values 0 and 1;
S5、调整阈值b的数值,使得一类特征数到超平面的最短距离与二类特征数到超平面的最短距离相等,此时的阈值记为,最终得到确定的超平面:S5. Adjust the value of the threshold b so that the shortest distance from the first type of feature number to the hyperplane is equal to the shortest distance from the second type of feature number to the hyperplane. The threshold at this time is recorded as , and finally get the determined hyperplane:
; ;
特征数到超平面的距离公式为:Number of features The distance formula to the hyperplane is:
; ;
所述分类分析单元根据上述过程设置得到个超平面,所述超平面信息被发送至所述分类模块,所述超平面信息中包含了正负性与0、1的对应关系;The classification analysis unit is set according to the above process to obtain A hyperplane, the hyperplane information is sent to the classification module, and the hyperplane information contains the correspondence between positive and negative and 0, 1;
结合图3,所述分类模块包括文本处理单元、计算处理单元和执行单元,所述文本处理单元用于统计出文本文件中特定词汇的出现频次,所述计算处理单元用于执行计算任务,所述执行单元根据所述计算处理单元的计算结果将文本分类并保存至对应的存储区域;In conjunction with FIG3 , the classification module includes a text processing unit, a calculation processing unit and an execution unit, wherein the text processing unit is used to count the occurrence frequency of specific words in the text file, the calculation processing unit is used to perform calculation tasks, and the execution unit classifies the text according to the calculation results of the calculation processing unit and saves it to the corresponding storage area;
所述文本处理单元保存有所述词汇统计单元创建的词汇表,并基于所述词汇表中记录的词汇进行统计;The text processing unit stores the vocabulary table created by the vocabulary statistics unit, and performs statistics based on the words recorded in the vocabulary table;
结合图4,所述计算处理单元包括特征数计算处理器和超平面计算处理器,所述特征数计算处理器根据下式计算出待分类文件的分类特征数:4, the calculation processing unit includes a feature number calculation processor and a hyperplane calculation processor, and the feature number calculation processor calculates the classification feature number of the file to be classified according to the following formula: :
; ;
其中,为所述文本处理单元统计得到的第y个词汇频次;in, The frequency of the yth word obtained by counting the text processing unit;
所述超平面计算处理器根据个超平面对分类特征数进行处理,每个超平面的计算结果为:The hyperplane computing processor is based on Hyperplanes process the classification features, and the calculation result of each hyperplane is:
; ;
所述执行单元根据Ns个的正负性确定分类码上对应位数上的值,并得到分类码,所述执行单元根据分类码将文本文件保存至对应的分类区域;The execution unit is based on Ns The positive or negative value of determines the value of the corresponding bit on the classification code, and obtains the classification code, and the execution unit saves the text file to the corresponding classification area according to the classification code;
所述存储模块包括存储单元和逻辑单元,所述存储单元划分为Ns+1个存储区域,其中一个存储区域用于保存待分类的文本文件,该存储区域称为临时区域,另外Ns个存储区域用于保存分类后的文本文件,该存储区域称为分类区域,所述逻辑单元用于记录Ns个分类区域之间的逻辑关系;The storage module includes a storage unit and a logic unit, the storage unit is divided into Ns+1 storage areas, one of which is used to store text files to be classified, and the storage area is called a temporary area, and the other Ns storage areas are used to store classified text files, and the storage area is called a classification area. The logic unit is used to record the logical relationship between the Ns classification areas;
所述应用模块用于提供应用接口,所述应用接口包括分类接口和报告接口,所述分类接口与所述临时区域连接,通过所述分类接口将输入的文本文件保存至所述临时区域,所述临时区域接收新文件后,启动分类模块对输入的文本文件进行分类保存,所述报告接口用于接收时间信息,并对处于时间信息内新增的文本文件进行统计,得到在对应时间段内增加的各种分类文件的占比。The application module is used to provide an application interface, which includes a classification interface and a report interface. The classification interface is connected to the temporary area, and the input text file is saved to the temporary area through the classification interface. After the temporary area receives the new file, the classification module is started to classify and save the input text file. The report interface is used to receive time information and count the newly added text files within the time information to obtain the proportion of various classified files added in the corresponding time period.
以上所公开的内容仅为本发明的优选可行实施例,并非因此局限本发明的保护范围,所以凡是运用本发明说明书及附图内容所做的等效技术变化,均包含于本发明的保护范围内,此外,随着技术发展其中的元素可以更新的。The contents disclosed above are only preferred feasible embodiments of the present invention, and do not limit the protection scope of the present invention. Therefore, all equivalent technical changes made using the contents of the present invention specification and drawings are included in the protection scope of the present invention. In addition, the elements therein can be updated as technology develops.
Claims (5)
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310165385.6A CN115858474B (en) | 2023-02-27 | 2023-02-27 | File arrangement system based on AIGC |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202310165385.6A CN115858474B (en) | 2023-02-27 | 2023-02-27 | File arrangement system based on AIGC |
Publications (2)
Publication Number | Publication Date |
---|---|
CN115858474A true CN115858474A (en) | 2023-03-28 |
CN115858474B CN115858474B (en) | 2023-05-09 |
Family
ID=85658908
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202310165385.6A Active CN115858474B (en) | 2023-02-27 | 2023-02-27 | File arrangement system based on AIGC |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN115858474B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116306317A (en) * | 2023-05-12 | 2023-06-23 | 环球数科集团有限公司 | Automatic AIGC modeling system based on artificial intelligence |
CN116821319A (en) * | 2023-08-30 | 2023-09-29 | 环球数科集团有限公司 | A rapid screening processing system based on AIGC |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120166366A1 (en) * | 2010-12-22 | 2012-06-28 | Microsoft Corporation | Hierarchical classification system |
CN106407406A (en) * | 2016-09-22 | 2017-02-15 | 国信优易数据有限公司 | A text processing method and system |
US20170075974A1 (en) * | 2015-09-11 | 2017-03-16 | Adobe Systems Incorporated | Categorization of forms to aid in form search |
CN107577738A (en) * | 2017-08-28 | 2018-01-12 | 电子科技大学 | A FMECA Method for Data Processing by SVM Text Mining |
US20200026798A1 (en) * | 2018-07-20 | 2020-01-23 | EMC IP Holding Company LLC | Identification and curation of application programming interface data from different sources |
CN111125359A (en) * | 2019-12-17 | 2020-05-08 | 东软集团股份有限公司 | Text information classification method, device and equipment |
CN114265937A (en) * | 2021-12-24 | 2022-04-01 | 中国电力科学研究院有限公司 | Intelligent classification and analysis method, system, storage medium and server of scientific and technological information |
CN114610884A (en) * | 2022-03-07 | 2022-06-10 | 中国人民解放军63893部队 | Classification method based on PCA combined feature extraction and approximate support vector machine |
-
2023
- 2023-02-27 CN CN202310165385.6A patent/CN115858474B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120166366A1 (en) * | 2010-12-22 | 2012-06-28 | Microsoft Corporation | Hierarchical classification system |
US20170075974A1 (en) * | 2015-09-11 | 2017-03-16 | Adobe Systems Incorporated | Categorization of forms to aid in form search |
CN106407406A (en) * | 2016-09-22 | 2017-02-15 | 国信优易数据有限公司 | A text processing method and system |
CN107577738A (en) * | 2017-08-28 | 2018-01-12 | 电子科技大学 | A FMECA Method for Data Processing by SVM Text Mining |
US20200026798A1 (en) * | 2018-07-20 | 2020-01-23 | EMC IP Holding Company LLC | Identification and curation of application programming interface data from different sources |
CN111125359A (en) * | 2019-12-17 | 2020-05-08 | 东软集团股份有限公司 | Text information classification method, device and equipment |
CN114265937A (en) * | 2021-12-24 | 2022-04-01 | 中国电力科学研究院有限公司 | Intelligent classification and analysis method, system, storage medium and server of scientific and technological information |
CN114610884A (en) * | 2022-03-07 | 2022-06-10 | 中国人民解放军63893部队 | Classification method based on PCA combined feature extraction and approximate support vector machine |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116306317A (en) * | 2023-05-12 | 2023-06-23 | 环球数科集团有限公司 | Automatic AIGC modeling system based on artificial intelligence |
CN116821319A (en) * | 2023-08-30 | 2023-09-29 | 环球数科集团有限公司 | A rapid screening processing system based on AIGC |
CN116821319B (en) * | 2023-08-30 | 2023-10-27 | 环球数科集团有限公司 | Quick screening type processing system based on AIGC |
Also Published As
Publication number | Publication date |
---|---|
CN115858474B (en) | 2023-05-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN115858474B (en) | File arrangement system based on AIGC | |
CN107609121B (en) | News text classification method based on LDA and word2vec algorithm | |
CN110851598B (en) | Text classification method and device, terminal equipment and storage medium | |
CN104391835B (en) | Feature Words system of selection and device in text | |
TWI662425B (en) | A method of automatically generating semantic similar sentence samples | |
CN110674289A (en) | Method, device and storage medium for judging the category to which an article belongs based on word segmentation weight | |
CN112650923A (en) | Public opinion processing method and device for news events, storage medium and computer equipment | |
CN107122382A (en) | A kind of patent classification method based on specification | |
CN109934251B (en) | Method, system and storage medium for recognizing text in Chinese language | |
Dal Bianco et al. | A practical and effective sampling selection strategy for large scale deduplication | |
WO2021051864A1 (en) | Dictionary expansion method and apparatus, electronic device and storage medium | |
CN108027814A (en) | Disable word recognition method and device | |
CN110222192A (en) | Corpus method for building up and device | |
CN109993216A (en) | A text classification method based on K nearest neighbors KNN and its equipment | |
CN110134777A (en) | Problem deduplication method, device, electronic device and computer-readable storage medium | |
CN118211131B (en) | A text data preprocessing method and system suitable for large financial models | |
CN118689992A (en) | A table content RAG customer service question and answer method based on a large language model | |
CN115098690A (en) | Multi-data document classification method and system based on cluster analysis | |
CN106503146A (en) | Computer text feature selection method, classification feature selection method and system | |
CN114462049A (en) | An automatic vulnerability classification method and system based on weighted Word2vec | |
CN109871429B (en) | Short text retrieval method integrating Wikipedia classification and explicit semantic features | |
CN112784040B (en) | Text Classification Method for Vertical Industry Based on Corpus | |
CN104123291B (en) | A kind of method and device of data classification | |
WO2018100700A1 (en) | Data conversion device and data conversion method | |
CN112860897A (en) | Text classification method based on improved ClusterGCN |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
CP03 | Change of name, title or address |
Address after: 518063 No. 01-03, floor 17, block B, building 10, Shenzhen Bay science and technology ecological park, No. 10, Gaoxin South ninth Road, Yuehai street, Nanshan District, Shenzhen, Guangdong Patentee after: Global Numerical Technology Co.,Ltd. Country or region after: China Address before: No. 01-03, 17th Floor, Building B, Shenzhen Bay Science and Technology Ecological Park, No. 10 Gaoxin South 9th Road, Yuehai Street, Nanshan District, Shenzhen City, Guangdong Province Patentee before: Global Digital Group Co.,Ltd. Country or region before: China |
|
CP03 | Change of name, title or address |