CN115858474A - AIGC-based file arrangement system - Google Patents

AIGC-based file arrangement system Download PDF

Info

Publication number
CN115858474A
CN115858474A CN202310165385.6A CN202310165385A CN115858474A CN 115858474 A CN115858474 A CN 115858474A CN 202310165385 A CN202310165385 A CN 202310165385A CN 115858474 A CN115858474 A CN 115858474A
Authority
CN
China
Prior art keywords
classification
module
hyperplane
file
files
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202310165385.6A
Other languages
Chinese (zh)
Other versions
CN115858474B (en
Inventor
张卫平
丁烨
张伟
李显阔
米小武
刘顿
王丹
郑小龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Global Numerical Technology Co ltd
Original Assignee
Global Digital Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Global Digital Group Co Ltd filed Critical Global Digital Group Co Ltd
Priority to CN202310165385.6A priority Critical patent/CN115858474B/en
Publication of CN115858474A publication Critical patent/CN115858474A/en
Application granted granted Critical
Publication of CN115858474B publication Critical patent/CN115858474B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides an AIGC-based file arrangement system which is characterized by comprising a storage module, an analysis module, a classification module, an application module and an output module, wherein the storage module is used for storing files of users, the analysis module analyzes the files based on AIGC technology, the classification module classifies the files by utilizing information generated by the analysis module, the application module provides application functions for the users, and the output module is used for generating reports for decision making; the system analyzes and formulates classification rules through files for training, and rapidly classifies new files based on the formulated classification rules, so that the effect of rapidly arranging a large number of files can be achieved.

Description

一种基于AIGC的文件整理系统A file arrangement system based on AIGC

技术领域Technical Field

本发明涉及文件系统结构领域,具体涉及一种基于AIGC的文件整理系统。The present invention relates to the field of file system structure, and in particular to a file arrangement system based on AIGC.

背景技术Background Art

在日常的工作生活中,会产生大量的文本文件,为了提高工作效率,需要对文本文件进行整理分类,在传统的文件整理系统中,用户需要手动地为每个文件打标签、整理分类等,工作量大且容易出错,现需要一种智能的文件整理系统来辅助整理文件,降低在整理文件上花的时间,提高工作时间的有效利用率。In daily work and life, a large number of text files are generated. In order to improve work efficiency, text files need to be sorted and classified. In traditional file sorting systems, users need to manually label, sort and classify each file, which is a large workload and prone to errors. Now we need an intelligent file sorting system to assist in sorting files, reduce the time spent on sorting files, and improve the effective utilization of working time.

背景技术的前述论述仅意图便于理解本发明。此论述并不认可或承认提及的材料中的任一种公共常识的一部分。The foregoing discussion of the background art is intended only to facilitate an understanding of the present invention. This discussion does not acknowledge or admit that any of the material referred to is part of the common general knowledge.

现在已经开发出了很多文件整理系统,经过我们大量的检索与参考,发现现有的整理系统有如公开号为CN110245119B所公开的系统,这些系统一般包括:存储系统获取第一文件的第一信息,其中,所述第一文件以数据分片的形式存储在存储介质中,所述第一信息包括所述第一文件的N个数据分片和每个数据分片在存储介质中所占用的单位存储区域的个数,所述N个数据分片中的其中两个数据分片占用的单位存储区域不连续,N为大于或等于2的整数;所述存储系统根据所述第一信息确定所述第一文件的第一参数,其中,所述第一参数用于表征所述第一文件存储在所述存储介质中的位置的聚集程度。但该系统仅仅在存储方面进行整理,并不能自动对大量文件进行分类处理,无法提高工作效率。Many file sorting systems have been developed. After extensive searching and reference, we found that the existing sorting systems include the system disclosed in the publication number CN110245119B. These systems generally include: a storage system obtains first information of a first file, wherein the first file is stored in a storage medium in the form of data slices, the first information includes N data slices of the first file and the number of unit storage areas occupied by each data slice in the storage medium, two of the N data slices occupy discontinuous unit storage areas, and N is an integer greater than or equal to 2; the storage system determines a first parameter of the first file based on the first information, wherein the first parameter is used to characterize the degree of aggregation of the location where the first file is stored in the storage medium. However, the system only sorts out the storage aspect, and cannot automatically classify and process a large number of files, and cannot improve work efficiency.

发明内容Summary of the invention

本发明的目的在于,针对所存在的不足,提出了一种基于AIGC的文件整理系统。The purpose of the present invention is to propose a file arrangement system based on AIGC in view of the existing deficiencies.

本发明采用如下技术方案:The present invention adopts the following technical solution:

一种基于AIGC的文件整理系统,包括存储模块、分析模块、分类模块、应用模块和输出模块;A file arrangement system based on AIGC, comprising a storage module, an analysis module, a classification module, an application module and an output module;

所述存储模块用于负责存储用户的文件,所述分析模块基于AIGC技术对文件进行分析,所述分类模块利用分析模块生成的信息对文件进行分类,所述应用模块为用户提供应用功能,所述输出模块负责生成用于决策的报告;The storage module is responsible for storing the user's files, the analysis module analyzes the files based on the AIGC technology, the classification module classifies the files using the information generated by the analysis module, the application module provides application functions for the user, and the output module is responsible for generating reports for decision-making;

所述分析模块接收训练用的文件,将文件转换成特征信息,基于特征信息创建超平面,所述超平面信息被发送至所述分类模块,所述分类模块基于超平面对待分类的文件进行计算处理得到分类码,所述分类码的每一位对应一个超平面,所述分类模块根据分类码将待分类的文件保存至存储模块的对应区域;The analysis module receives training files, converts the files into feature information, creates a hyperplane based on the feature information, and sends the hyperplane information to the classification module. The classification module calculates and processes the files to be classified based on the hyperplane to obtain a classification code, each bit of the classification code corresponds to a hyperplane, and the classification module saves the files to be classified to the corresponding area of the storage module according to the classification code;

进一步的,所述分析模块包括词汇统计单元、词频矩阵处理单元、特征处理单元和分类分析单元,所述词汇统计单元根据输入的文本文件创建一个词汇表并基于词汇表对每个文本文件进行词汇统计,所述词频矩阵处理单元根据统计结果生成一个m*n的矩阵,其中,m表示文件的数量,n表示词汇表中词汇的数量,所述特征处理单元根据词频矩阵处理得到每个文件的n维特征数,所述分类分析单元基于特征数以及文本文件的分类码创建超平面;Further, the analysis module includes a vocabulary statistics unit, a word frequency matrix processing unit, a feature processing unit and a classification analysis unit, the vocabulary statistics unit creates a vocabulary table according to the input text file and performs vocabulary statistics on each text file based on the vocabulary table, the word frequency matrix processing unit generates an m*n matrix according to the statistical results, wherein m represents the number of files and n represents the number of words in the vocabulary table, the feature processing unit obtains the n-dimensional feature number of each file according to the word frequency matrix processing, and the classification analysis unit creates a hyperplane based on the feature number and the classification code of the text file;

进一步的,所述特征处理单元基于矩阵中的元素值根据下式进行计算处理:Furthermore, the feature processing unit performs calculation processing based on the element values in the matrix according to the following formula:

Figure SMS_1
Figure SMS_1
;

其中,

Figure SMS_2
表示矩阵中第x行、第y列的元素值,i表示行数的变量,j表示列数的变量;in,
Figure SMS_2
Represents the element value of the x-th row and y-th column in the matrix, i represents the variable of the number of rows, and j represents the variable of the number of columns;

Figure SMS_3
构成第x个文件的特征数;Depend on
Figure SMS_3
The number of features that make up the xth file;

所述分类分析单元设置的超平面A为:The hyperplane A set by the classification analysis unit is:

Figure SMS_4
Figure SMS_4
;

其中,

Figure SMS_5
为超平面系数,
Figure SMS_6
为超平面元素,b为阈值;in,
Figure SMS_5
is the hyperplane coefficient,
Figure SMS_6
is the hyperplane element, b is the threshold;

所述超平面A满足:The hyperplane A satisfies:

Figure SMS_7
Figure SMS_7
;

其中,

Figure SMS_8
表示一类特征数对应的文件序号,
Figure SMS_9
表示二类特征数对应的文件序号;in,
Figure SMS_8
Indicates the file number corresponding to a type of feature number,
Figure SMS_9
Indicates the file number corresponding to the second type of feature number;

进一步的,所述分类模块包括文本处理单元、计算处理单元和执行单元,所述文本处理单元用于统计出文本文件中特定词汇的出现频次,所述计算处理单元用于执行计算任务,所述执行单元根据所述计算处理单元的计算结果将文本分类并保存至对应的存储区域;Furthermore, the classification module includes a text processing unit, a calculation processing unit and an execution unit, wherein the text processing unit is used to count the occurrence frequency of specific words in the text file, the calculation processing unit is used to perform calculation tasks, and the execution unit classifies the text according to the calculation results of the calculation processing unit and saves it to the corresponding storage area;

进一步的,所述计算处理单元包括特征数计算处理器和超平面计算处理器,所述特征数计算处理器根据下式计算出待分类文件的分类特征数

Figure SMS_10
:Furthermore, the calculation processing unit includes a feature number calculation processor and a hyperplane calculation processor, and the feature number calculation processor calculates the classification feature number of the file to be classified according to the following formula:
Figure SMS_10
:

Figure SMS_11
Figure SMS_11
;

其中,

Figure SMS_12
为所述文本处理单元统计得到的第y个词汇频次;in,
Figure SMS_12
The frequency of the yth word obtained by counting the text processing unit;

所述超平面计算处理器根据超平面对分类特征数进行处理,每个超平面的计算结果为:The hyperplane calculation processor processes the classification feature number according to the hyperplane, and the calculation result of each hyperplane is:

Figure SMS_13
Figure SMS_13
;

所述执行单元根据

Figure SMS_14
的正负性确定分类码上对应位数上的值,并得到分类码,所述执行单元根据分类码将文本文件保存至对应的分类区域。The execution unit is based on
Figure SMS_14
The positive or negative value of determines the value of the corresponding bit in the classification code, and obtains the classification code. The execution unit saves the text file to the corresponding classification area according to the classification code.

本发明所取得的有益效果是:The beneficial effects achieved by the present invention are:

本系统通过分析文本中的词汇、语法和语义信息,识别文档的主体、情感和关键词等信息,然后计算每个词语的重要性,将文档转换为n维向量的特征数,并基于特征数创建超平面,超平面的数量决定了能够分类的分类数上限,本系统能够利用超平面自动地为文件打标签、分类等,大大提高了效率和准确性。This system analyzes the vocabulary, grammar and semantic information in the text, identifies the subject, sentiment and keywords of the document, calculates the importance of each word, converts the document into the feature number of an n-dimensional vector, and creates a hyperplane based on the feature number. The number of hyperplanes determines the upper limit of the number of categories that can be classified. This system can use hyperplanes to automatically label and classify files, greatly improving efficiency and accuracy.

为使能更进一步了解本发明的特征及技术内容,请参阅以下有关本发明的详细说明与附图,然而所提供的附图仅用于提供参考与说明,并非用来对本发明加以限制。To further understand the features and technical contents of the present invention, please refer to the following detailed description and drawings of the present invention. However, the drawings provided are only for reference and description and are not intended to limit the present invention.

附图说明BRIEF DESCRIPTION OF THE DRAWINGS

图1为本发明整体结构框架示意图;FIG1 is a schematic diagram of the overall structural framework of the present invention;

图2为本发明分析模块构成示意图;FIG2 is a schematic diagram of the analysis module of the present invention;

图3为本发明分类模块构成示意图;FIG3 is a schematic diagram of the classification module of the present invention;

图4为本发明计算处理单元构成示意图;FIG4 is a schematic diagram of the structure of a computing processing unit of the present invention;

图5为本发明设置超平面流程示意图。FIG. 5 is a schematic diagram of the process of setting a hyperplane according to the present invention.

具体实施方式DETAILED DESCRIPTION

以下是通过特定的具体实施例来说明本发明的实施方式,本领域技术人员可由本说明书所公开的内容了解本发明的优点与效果。本发明可通过其他不同的具体实施例加以施行或应用,本说明书中的各项细节也可基于不同观点与应用,在不悖离本发明的精神下进行各种修饰与变更。另外,本发明的附图仅为简单示意说明,并非依实际尺寸的描绘,事先声明。以下的实施方式将进一步详细说明本发明的相关技术内容,但所公开的内容并非用以限制本发明的保护范围。The following is an explanation of the embodiments of the present invention through specific embodiments. Those skilled in the art can understand the advantages and effects of the present invention from the contents disclosed in this specification. The present invention can be implemented or applied through other different specific embodiments, and the details in this specification can also be modified and changed in various ways based on different viewpoints and applications without departing from the spirit of the present invention. In addition, the drawings of the present invention are only simple schematic illustrations and are not depicted according to actual sizes. It is stated in advance. The following embodiments will further explain the relevant technical contents of the present invention in detail, but the disclosed contents are not intended to limit the scope of protection of the present invention.

实施例一:Embodiment 1:

本实施例提供了一种基于AIGC的文件整理系统,结合图1,包括存储模块、分析模块、分类模块、应用模块和输出模块;This embodiment provides a file arrangement system based on AIGC, which includes a storage module, an analysis module, a classification module, an application module and an output module in conjunction with FIG1 ;

所述存储模块用于负责存储用户的文件,所述分析模块基于AIGC技术对文件进行分析,所述分类模块利用分析模块生成的信息对文件进行分类,所述应用模块为用户提供应用功能,所述输出模块负责生成用于决策的报告;The storage module is responsible for storing the user's files, the analysis module analyzes the files based on the AIGC technology, the classification module classifies the files using the information generated by the analysis module, the application module provides application functions for the user, and the output module is responsible for generating reports for decision-making;

所述分析模块接收训练用的文件,将文件转换成特征信息,基于特征信息创建超平面,所述超平面信息被发送至所述分类模块,所述分类模块基于超平面对待分类的文件进行计算处理得到分类码,所述分类码的每一位对应一个超平面,所述分类模块根据分类码将待分类的文件保存至存储模块的对应区域;The analysis module receives training files, converts the files into feature information, creates a hyperplane based on the feature information, and sends the hyperplane information to the classification module. The classification module calculates and processes the files to be classified based on the hyperplane to obtain a classification code, each bit of the classification code corresponds to a hyperplane, and the classification module saves the files to be classified to the corresponding area of the storage module according to the classification code;

所述分析模块包括词汇统计单元、词频矩阵处理单元、特征处理单元和分类分析单元,所述词汇统计单元根据输入的文本文件创建一个词汇表并基于词汇表对每个文本文件进行词汇统计,所述词频矩阵处理单元根据统计结果生成一个m*n的矩阵,其中,m表示文件的数量,n表示词汇表中词汇的数量,所述特征处理单元根据词频矩阵处理得到每个文件的n维特征数,所述分类分析单元基于特征数以及文本文件的分类码创建超平面;The analysis module includes a vocabulary statistics unit, a word frequency matrix processing unit, a feature processing unit and a classification analysis unit. The vocabulary statistics unit creates a vocabulary table according to the input text file and performs vocabulary statistics on each text file based on the vocabulary table. The word frequency matrix processing unit generates an m*n matrix according to the statistical results, wherein m represents the number of files and n represents the number of words in the vocabulary table. The feature processing unit obtains the n-dimensional feature number of each file according to the word frequency matrix processing. The classification analysis unit creates a hyperplane based on the feature number and the classification code of the text file.

所述特征处理单元基于矩阵中的元素值根据下式进行计算处理:The feature processing unit performs calculation processing based on the element values in the matrix according to the following formula:

Figure SMS_15
Figure SMS_15
;

其中,

Figure SMS_16
表示矩阵中第x行、第y列的元素值,i表示行数的变量,j表示列数的变量;in,
Figure SMS_16
Represents the element value of the x-th row and y-th column in the matrix, i represents the variable of the number of rows, and j represents the variable of the number of columns;

Figure SMS_17
构成第x个文件的特征数;Depend on
Figure SMS_17
The number of features that make up the xth file;

所述分类分析单元设置的超平面A为:The hyperplane A set by the classification analysis unit is:

Figure SMS_18
Figure SMS_18
;

其中,

Figure SMS_19
为超平面系数,
Figure SMS_20
为超平面元素,b为阈值;in,
Figure SMS_19
is the hyperplane coefficient,
Figure SMS_20
is the hyperplane element, b is the threshold;

所述超平面A满足:The hyperplane A satisfies:

Figure SMS_21
Figure SMS_21
;

其中,

Figure SMS_22
表示一类特征数对应的文件序号,
Figure SMS_23
表示二类特征数对应的文件序号;in,
Figure SMS_22
Indicates the file number corresponding to a type of feature number,
Figure SMS_23
Indicates the file number corresponding to the second type of feature number;

所述分类模块包括文本处理单元、计算处理单元和执行单元,所述文本处理单元用于统计出文本文件中特定词汇的出现频次,所述计算处理单元用于执行计算任务,所述执行单元根据所述计算处理单元的计算结果将文本分类并保存至对应的存储区域;The classification module includes a text processing unit, a calculation processing unit and an execution unit, wherein the text processing unit is used to count the occurrence frequency of specific words in the text file, the calculation processing unit is used to perform calculation tasks, and the execution unit classifies the text according to the calculation results of the calculation processing unit and saves it to the corresponding storage area;

所述计算处理单元包括特征数计算处理器和超平面计算处理器,所述特征数计算处理器根据下式计算出待分类文件的分类特征数

Figure SMS_24
:The calculation processing unit includes a feature number calculation processor and a hyperplane calculation processor. The feature number calculation processor calculates the classification feature number of the file to be classified according to the following formula:
Figure SMS_24
:

Figure SMS_25
Figure SMS_25
;

其中,

Figure SMS_26
为所述文本处理单元统计得到的第y个词汇频次;in,
Figure SMS_26
The frequency of the yth word obtained by counting the text processing unit;

所述超平面计算处理器根据超平面对分类特征数进行处理,每个超平面的计算结果为:The hyperplane calculation processor processes the classification feature number according to the hyperplane, and the calculation result of each hyperplane is:

Figure SMS_27
Figure SMS_27
;

所述执行单元根据

Figure SMS_28
的正负性确定分类码上对应位数上的值,并得到分类码,所述执行单元根据分类码将文本文件保存至对应的分类区域。The execution unit is based on
Figure SMS_28
The positive or negative value of determines the value of the corresponding bit in the classification code, and obtains the classification code. The execution unit saves the text file to the corresponding classification area according to the classification code.

实施例二:Embodiment 2:

本实施例包含了实施例一中的全部内容,提供了一种基于AIGC的文件整理系统,包括存储模块、分析模块、分类模块、应用模块和输出模块;This embodiment includes all the contents of the first embodiment, and provides a file arrangement system based on AIGC, including a storage module, an analysis module, a classification module, an application module and an output module;

所述存储模块用于负责存储用户的文件,所述分析模块基于AIGC技术对文件进行分析,以识别文件内容的主题、关键词等信息,所述分类模块利用分析模块生成的信息对文件进行分类,所述应用模块为用户提供多种应用功能,所述输出模块负责生成用于决策的报告并以适当的形式呈现给用户;The storage module is responsible for storing the user's files. The analysis module analyzes the files based on AIGC technology to identify the subject, keywords and other information of the file content. The classification module classifies the files using the information generated by the analysis module. The application module provides users with a variety of application functions. The output module is responsible for generating reports for decision-making and presenting them to users in an appropriate form.

结合图2,所述分析模块包括词汇统计单元、词频矩阵处理单元、特征处理单元和分类分析单元,所述词汇统计单元能够对输入的每一个文本文件进行处理,处理过程包括分解、除词、提取和统计,分解过程为将整个文本分解成多个短语,除词过程为删除连接性词语,提取过程为提取短语中的词干,统计过程为统计每个词干出现的次数,所述词汇统计单元根据提取出的词干创建一个词汇表用于记录出现过的词汇;In conjunction with FIG2 , the analysis module includes a vocabulary statistics unit, a word frequency matrix processing unit, a feature processing unit and a classification analysis unit. The vocabulary statistics unit can process each input text file, and the processing process includes decomposition, word removal, extraction and statistics. The decomposition process is to decompose the entire text into multiple phrases, the word removal process is to delete connecting words, the extraction process is to extract the stems in the phrases, and the statistical process is to count the number of times each stem appears. The vocabulary statistics unit creates a vocabulary table based on the extracted stems to record the words that have appeared;

所述词频矩阵处理单元根据统计结果生成一个m*n的矩阵,其中,m表示文件的数量,n表示词汇表中词汇的数量,每一行代表一个文件,每一列代表一个词语,每个元素代表该词语在该文件中出现的频次;The word frequency matrix processing unit generates an m*n matrix according to the statistical results, wherein m represents the number of files, n represents the number of words in the vocabulary, each row represents a file, each column represents a word, and each element represents the frequency of the word appearing in the file;

所述特征处理单元基于矩阵中的元素值根据下式进行计算处理:The feature processing unit performs calculation processing based on the element values in the matrix according to the following formula:

Figure SMS_29
Figure SMS_29
;

其中,

Figure SMS_30
表示矩阵中第x行、第y列的元素值,i表示行数的变量,j表示列数的变量;in,
Figure SMS_30
Represents the element value of the x-th row and y-th column in the matrix, i represents the variable of the number of rows, and j represents the variable of the number of columns;

Figure SMS_31
构成第x个文件的特征数;Depend on
Figure SMS_31
The number of features that make up the xth file;

所述分类分析单元通过设置

Figure SMS_32
个超平面来实现对
Figure SMS_33
个类型文件的分类分析,文件的类型用
Figure SMS_34
位的二进制数表示,所述二进制数称为分类码,每个超平面对应于分类码的其中一位,所述分类码随文本文件一同发送至所述分析模块,文本文件经处理得到特征数后,分类码与对应的特征数进行配对被发送至所述分类分析单元;The classification analysis unit is configured by
Figure SMS_32
A hyperplane is used to realize
Figure SMS_33
Classification analysis of file types, the file types are
Figure SMS_34
The binary number of bits is represented by the classification code, each hyperplane corresponds to one bit of the classification code, and the classification code is sent to the analysis module together with the text file. After the text file is processed to obtain the feature number, the classification code is paired with the corresponding feature number and sent to the classification analysis unit;

结合图5,所述分类分析单元设置一个超平面的过程包括如下步骤:5, the process of the classification analysis unit setting a hyperplane includes the following steps:

S1、确定超平面对应的分类码位数,称为目标位;S1. Determine the number of classification code bits corresponding to the hyperplane, called the target bit;

S2、根据目标位的值将分类码分为两类,目标位的值为1的分类码为一类码,目标位的值为0的分类码为二类码;S2. Classify the classification codes into two categories according to the value of the target bit. The classification codes with the target bit value of 1 are the first-class codes, and the classification codes with the target bit value of 0 are the second-class codes.

S3、将特征数根据分类码在步骤S2中的分类结果进行分类,即一类码对应的特征数为一类特征数,二类码对应的特征数为二类特征数;S3, classifying the feature numbers according to the classification results of the classification codes in step S2, that is, the feature numbers corresponding to the first-class code are the first-class feature numbers, and the feature numbers corresponding to the second-class code are the second-class feature numbers;

S4、设置一个超平面A:S4. Set a hyperplane A:

Figure SMS_35
Figure SMS_35
;

其中,

Figure SMS_36
为超平面系数,
Figure SMS_37
为超平面元素,b为阈值;in,
Figure SMS_36
is the hyperplane coefficient,
Figure SMS_37
is the hyperplane element, b is the threshold;

所述超平面A满足:The hyperplane A satisfies:

Figure SMS_38
Figure SMS_38
;

其中,

Figure SMS_39
表示一类特征数对应的文件序号,
Figure SMS_40
表示二类特征数对应的文件序号,满足的不等式的形式确定了正负性与0、1值的对应关系;in,
Figure SMS_39
Indicates the file number corresponding to a type of feature number,
Figure SMS_40
It represents the file number corresponding to the second type of feature number. The form of the inequality satisfied determines the corresponding relationship between the positive and negative values 0 and 1;

S5、调整阈值b的数值,使得一类特征数到超平面的最短距离与二类特征数到超平面的最短距离相等,此时的阈值记为

Figure SMS_41
,最终得到确定的超平面:S5. Adjust the value of the threshold b so that the shortest distance from the first type of feature number to the hyperplane is equal to the shortest distance from the second type of feature number to the hyperplane. The threshold at this time is recorded as
Figure SMS_41
, and finally get the determined hyperplane:

Figure SMS_42
Figure SMS_42
;

特征数

Figure SMS_43
到超平面的距离公式为:Number of features
Figure SMS_43
The distance formula to the hyperplane is:

Figure SMS_44
Figure SMS_44
;

所述分类分析单元根据上述过程设置得到

Figure SMS_45
个超平面,所述超平面信息被发送至所述分类模块,所述超平面信息中包含了正负性与0、1的对应关系;The classification analysis unit is set according to the above process to obtain
Figure SMS_45
A hyperplane, the hyperplane information is sent to the classification module, and the hyperplane information contains the correspondence between positive and negative and 0, 1;

结合图3,所述分类模块包括文本处理单元、计算处理单元和执行单元,所述文本处理单元用于统计出文本文件中特定词汇的出现频次,所述计算处理单元用于执行计算任务,所述执行单元根据所述计算处理单元的计算结果将文本分类并保存至对应的存储区域;In conjunction with FIG3 , the classification module includes a text processing unit, a calculation processing unit and an execution unit, wherein the text processing unit is used to count the occurrence frequency of specific words in the text file, the calculation processing unit is used to perform calculation tasks, and the execution unit classifies the text according to the calculation results of the calculation processing unit and saves it to the corresponding storage area;

所述文本处理单元保存有所述词汇统计单元创建的词汇表,并基于所述词汇表中记录的词汇进行统计;The text processing unit stores the vocabulary table created by the vocabulary statistics unit, and performs statistics based on the words recorded in the vocabulary table;

结合图4,所述计算处理单元包括特征数计算处理器和超平面计算处理器,所述特征数计算处理器根据下式计算出待分类文件的分类特征数

Figure SMS_46
:4, the calculation processing unit includes a feature number calculation processor and a hyperplane calculation processor, and the feature number calculation processor calculates the classification feature number of the file to be classified according to the following formula:
Figure SMS_46
:

Figure SMS_47
Figure SMS_47
;

其中,

Figure SMS_48
为所述文本处理单元统计得到的第y个词汇频次;in,
Figure SMS_48
The frequency of the yth word obtained by counting the text processing unit;

所述超平面计算处理器根据

Figure SMS_49
个超平面对分类特征数进行处理,每个超平面的计算结果为:The hyperplane computing processor is based on
Figure SMS_49
Hyperplanes process the classification features, and the calculation result of each hyperplane is:

Figure SMS_50
Figure SMS_50
;

所述执行单元根据Ns个

Figure SMS_51
的正负性确定分类码上对应位数上的值,并得到分类码,所述执行单元根据分类码将文本文件保存至对应的分类区域;The execution unit is based on Ns
Figure SMS_51
The positive or negative value of determines the value of the corresponding bit on the classification code, and obtains the classification code, and the execution unit saves the text file to the corresponding classification area according to the classification code;

所述存储模块包括存储单元和逻辑单元,所述存储单元划分为Ns+1个存储区域,其中一个存储区域用于保存待分类的文本文件,该存储区域称为临时区域,另外Ns个存储区域用于保存分类后的文本文件,该存储区域称为分类区域,所述逻辑单元用于记录Ns个分类区域之间的逻辑关系;The storage module includes a storage unit and a logic unit, the storage unit is divided into Ns+1 storage areas, one of which is used to store text files to be classified, and the storage area is called a temporary area, and the other Ns storage areas are used to store classified text files, and the storage area is called a classification area. The logic unit is used to record the logical relationship between the Ns classification areas;

所述应用模块用于提供应用接口,所述应用接口包括分类接口和报告接口,所述分类接口与所述临时区域连接,通过所述分类接口将输入的文本文件保存至所述临时区域,所述临时区域接收新文件后,启动分类模块对输入的文本文件进行分类保存,所述报告接口用于接收时间信息,并对处于时间信息内新增的文本文件进行统计,得到在对应时间段内增加的各种分类文件的占比。The application module is used to provide an application interface, which includes a classification interface and a report interface. The classification interface is connected to the temporary area, and the input text file is saved to the temporary area through the classification interface. After the temporary area receives the new file, the classification module is started to classify and save the input text file. The report interface is used to receive time information and count the newly added text files within the time information to obtain the proportion of various classified files added in the corresponding time period.

以上所公开的内容仅为本发明的优选可行实施例,并非因此局限本发明的保护范围,所以凡是运用本发明说明书及附图内容所做的等效技术变化,均包含于本发明的保护范围内,此外,随着技术发展其中的元素可以更新的。The contents disclosed above are only preferred feasible embodiments of the present invention, and do not limit the protection scope of the present invention. Therefore, all equivalent technical changes made using the contents of the present invention specification and drawings are included in the protection scope of the present invention. In addition, the elements therein can be updated as technology develops.

Claims (5)

1.一种基于AIGC的文件整理系统,其特征在于,包括存储模块、分析模块、分类模块、应用模块和输出模块;1. A file arrangement system based on AIGC, characterized by comprising a storage module, an analysis module, a classification module, an application module and an output module; 所述存储模块用于负责存储用户的文件,所述分析模块基于AIGC技术对文件进行分析,所述分类模块利用分析模块生成的信息对文件进行分类,所述应用模块为用户提供应用功能,所述输出模块负责生成用于决策的报告;The storage module is responsible for storing the user's files, the analysis module analyzes the files based on the AIGC technology, the classification module classifies the files using the information generated by the analysis module, the application module provides application functions for the user, and the output module is responsible for generating reports for decision-making; 所述分析模块接收训练用的文件,将文件转换成特征信息,基于特征信息创建超平面,所述超平面信息被发送至所述分类模块,所述分类模块基于超平面对待分类的文件进行计算处理得到分类码,所述分类码的每一位对应一个超平面,所述分类模块根据分类码将待分类的文件保存至存储模块的对应区域。The analysis module receives training files, converts the files into feature information, creates a hyperplane based on the feature information, and the hyperplane information is sent to the classification module. The classification module calculates and processes the files to be classified based on the hyperplane to obtain a classification code. Each bit of the classification code corresponds to a hyperplane. The classification module saves the files to be classified to the corresponding area of the storage module according to the classification code. 2.如权利要求1所述的一种基于AIGC的文件整理系统,其特征在于,所述分析模块包括词汇统计单元、词频矩阵处理单元、特征处理单元和分类分析单元,所述词汇统计单元根据输入的文本文件创建一个词汇表并基于词汇表对每个文本文件进行词汇统计,所述词频矩阵处理单元根据统计结果生成一个m*n的矩阵,其中,m表示文件的数量,n表示词汇表中词汇的数量,所述特征处理单元根据词频矩阵处理得到每个文件的n维特征数,所述分类分析单元基于特征数以及文本文件的分类码创建超平面。2. A file sorting system based on AIGC as described in claim 1, characterized in that the analysis module includes a vocabulary statistics unit, a word frequency matrix processing unit, a feature processing unit and a classification analysis unit, the vocabulary statistics unit creates a vocabulary table according to the input text file and performs vocabulary statistics on each text file based on the vocabulary table, the word frequency matrix processing unit generates an m*n matrix according to the statistical results, wherein m represents the number of files and n represents the number of words in the vocabulary table, the feature processing unit obtains the n-dimensional feature number of each file according to the word frequency matrix processing, and the classification analysis unit creates a hyperplane based on the feature number and the classification code of the text file. 3.如权利要求2所述的一种基于AIGC的文件整理系统,其特征在于,所述特征处理单元基于矩阵中的元素值根据下式进行计算处理:3. A file arrangement system based on AIGC as claimed in claim 2, characterized in that the feature processing unit performs calculation processing based on the element values in the matrix according to the following formula:
Figure QLYQS_1
Figure QLYQS_1
;
其中,
Figure QLYQS_2
表示矩阵中第x行、第y列的元素值,i表示行数的变量,j表示列数的变量;
in,
Figure QLYQS_2
Represents the element value of the x-th row and y-th column in the matrix, i represents the variable of the number of rows, and j represents the variable of the number of columns;
Figure QLYQS_3
构成第x个文件的特征数;
Depend on
Figure QLYQS_3
The number of features that make up the xth file;
所述分类分析单元设置的超平面A为:The hyperplane A set by the classification analysis unit is:
Figure QLYQS_4
Figure QLYQS_4
;
其中,
Figure QLYQS_5
为超平面系数,
Figure QLYQS_6
为超平面元素,b为阈值;
in,
Figure QLYQS_5
is the hyperplane coefficient,
Figure QLYQS_6
is the hyperplane element, b is the threshold;
所述超平面A满足:The hyperplane A satisfies:
Figure QLYQS_7
Figure QLYQS_7
;
其中,
Figure QLYQS_8
表示一类特征数对应的文件序号,
Figure QLYQS_9
表示二类特征数对应的文件序号。
in,
Figure QLYQS_8
Indicates the file number corresponding to a type of feature number,
Figure QLYQS_9
Indicates the file number corresponding to the second type of feature number.
4.如权利要求3所述的一种基于AIGC的文件整理系统,其特征在于,所述分类模块包括文本处理单元、计算处理单元和执行单元,所述文本处理单元用于统计出文本文件中特定词汇的出现频次,所述计算处理单元用于执行计算任务,所述执行单元根据所述计算处理单元的计算结果将文本分类并保存至对应的存储区域。4. A file sorting system based on AIGC as described in claim 3, characterized in that the classification module includes a text processing unit, a calculation processing unit and an execution unit, the text processing unit is used to count the occurrence frequency of specific words in the text file, the calculation processing unit is used to perform calculation tasks, and the execution unit classifies the text according to the calculation results of the calculation processing unit and saves it to the corresponding storage area. 5.如权利要求4所述的一种基于AIGC的文件整理系统,其特征在于,所述计算处理单元包括特征数计算处理器和超平面计算处理器,所述特征数计算处理器根据下式计算出待分类文件的分类特征数
Figure QLYQS_10
5. A file sorting system based on AIGC as claimed in claim 4, characterized in that the calculation processing unit includes a feature number calculation processor and a hyperplane calculation processor, and the feature number calculation processor calculates the classification feature number of the file to be classified according to the following formula:
Figure QLYQS_10
:
Figure QLYQS_11
Figure QLYQS_11
;
其中,
Figure QLYQS_12
为所述文本处理单元统计得到的第y个词汇频次;
in,
Figure QLYQS_12
The frequency of the yth word obtained by counting the text processing unit;
所述超平面计算处理器根据超平面对分类特征数进行处理,每个超平面的计算结果为:The hyperplane calculation processor processes the classification feature number according to the hyperplane, and the calculation result of each hyperplane is:
Figure QLYQS_13
Figure QLYQS_13
;
所述执行单元根据
Figure QLYQS_14
的正负性确定分类码上对应位数上的值,并得到分类码,所述执行单元根据分类码将文本文件保存至对应的分类区域。
The execution unit is based on
Figure QLYQS_14
The positive or negative value of determines the value of the corresponding bit in the classification code, and obtains the classification code. The execution unit saves the text file to the corresponding classification area according to the classification code.
CN202310165385.6A 2023-02-27 2023-02-27 File arrangement system based on AIGC Active CN115858474B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310165385.6A CN115858474B (en) 2023-02-27 2023-02-27 File arrangement system based on AIGC

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310165385.6A CN115858474B (en) 2023-02-27 2023-02-27 File arrangement system based on AIGC

Publications (2)

Publication Number Publication Date
CN115858474A true CN115858474A (en) 2023-03-28
CN115858474B CN115858474B (en) 2023-05-09

Family

ID=85658908

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310165385.6A Active CN115858474B (en) 2023-02-27 2023-02-27 File arrangement system based on AIGC

Country Status (1)

Country Link
CN (1) CN115858474B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116306317A (en) * 2023-05-12 2023-06-23 环球数科集团有限公司 Automatic AIGC modeling system based on artificial intelligence
CN116821319A (en) * 2023-08-30 2023-09-29 环球数科集团有限公司 A rapid screening processing system based on AIGC

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120166366A1 (en) * 2010-12-22 2012-06-28 Microsoft Corporation Hierarchical classification system
CN106407406A (en) * 2016-09-22 2017-02-15 国信优易数据有限公司 A text processing method and system
US20170075974A1 (en) * 2015-09-11 2017-03-16 Adobe Systems Incorporated Categorization of forms to aid in form search
CN107577738A (en) * 2017-08-28 2018-01-12 电子科技大学 A FMECA Method for Data Processing by SVM Text Mining
US20200026798A1 (en) * 2018-07-20 2020-01-23 EMC IP Holding Company LLC Identification and curation of application programming interface data from different sources
CN111125359A (en) * 2019-12-17 2020-05-08 东软集团股份有限公司 Text information classification method, device and equipment
CN114265937A (en) * 2021-12-24 2022-04-01 中国电力科学研究院有限公司 Intelligent classification and analysis method, system, storage medium and server of scientific and technological information
CN114610884A (en) * 2022-03-07 2022-06-10 中国人民解放军63893部队 Classification method based on PCA combined feature extraction and approximate support vector machine

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120166366A1 (en) * 2010-12-22 2012-06-28 Microsoft Corporation Hierarchical classification system
US20170075974A1 (en) * 2015-09-11 2017-03-16 Adobe Systems Incorporated Categorization of forms to aid in form search
CN106407406A (en) * 2016-09-22 2017-02-15 国信优易数据有限公司 A text processing method and system
CN107577738A (en) * 2017-08-28 2018-01-12 电子科技大学 A FMECA Method for Data Processing by SVM Text Mining
US20200026798A1 (en) * 2018-07-20 2020-01-23 EMC IP Holding Company LLC Identification and curation of application programming interface data from different sources
CN111125359A (en) * 2019-12-17 2020-05-08 东软集团股份有限公司 Text information classification method, device and equipment
CN114265937A (en) * 2021-12-24 2022-04-01 中国电力科学研究院有限公司 Intelligent classification and analysis method, system, storage medium and server of scientific and technological information
CN114610884A (en) * 2022-03-07 2022-06-10 中国人民解放军63893部队 Classification method based on PCA combined feature extraction and approximate support vector machine

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116306317A (en) * 2023-05-12 2023-06-23 环球数科集团有限公司 Automatic AIGC modeling system based on artificial intelligence
CN116821319A (en) * 2023-08-30 2023-09-29 环球数科集团有限公司 A rapid screening processing system based on AIGC
CN116821319B (en) * 2023-08-30 2023-10-27 环球数科集团有限公司 Quick screening type processing system based on AIGC

Also Published As

Publication number Publication date
CN115858474B (en) 2023-05-09

Similar Documents

Publication Publication Date Title
CN115858474B (en) File arrangement system based on AIGC
CN107609121B (en) News text classification method based on LDA and word2vec algorithm
CN110851598B (en) Text classification method and device, terminal equipment and storage medium
CN104391835B (en) Feature Words system of selection and device in text
TWI662425B (en) A method of automatically generating semantic similar sentence samples
CN110674289A (en) Method, device and storage medium for judging the category to which an article belongs based on word segmentation weight
CN112650923A (en) Public opinion processing method and device for news events, storage medium and computer equipment
CN107122382A (en) A kind of patent classification method based on specification
CN109934251B (en) Method, system and storage medium for recognizing text in Chinese language
Dal Bianco et al. A practical and effective sampling selection strategy for large scale deduplication
WO2021051864A1 (en) Dictionary expansion method and apparatus, electronic device and storage medium
CN108027814A (en) Disable word recognition method and device
CN110222192A (en) Corpus method for building up and device
CN109993216A (en) A text classification method based on K nearest neighbors KNN and its equipment
CN110134777A (en) Problem deduplication method, device, electronic device and computer-readable storage medium
CN118211131B (en) A text data preprocessing method and system suitable for large financial models
CN118689992A (en) A table content RAG customer service question and answer method based on a large language model
CN115098690A (en) Multi-data document classification method and system based on cluster analysis
CN106503146A (en) Computer text feature selection method, classification feature selection method and system
CN114462049A (en) An automatic vulnerability classification method and system based on weighted Word2vec
CN109871429B (en) Short text retrieval method integrating Wikipedia classification and explicit semantic features
CN112784040B (en) Text Classification Method for Vertical Industry Based on Corpus
CN104123291B (en) A kind of method and device of data classification
WO2018100700A1 (en) Data conversion device and data conversion method
CN112860897A (en) Text classification method based on improved ClusterGCN

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
CP03 Change of name, title or address

Address after: 518063 No. 01-03, floor 17, block B, building 10, Shenzhen Bay science and technology ecological park, No. 10, Gaoxin South ninth Road, Yuehai street, Nanshan District, Shenzhen, Guangdong

Patentee after: Global Numerical Technology Co.,Ltd.

Country or region after: China

Address before: No. 01-03, 17th Floor, Building B, Shenzhen Bay Science and Technology Ecological Park, No. 10 Gaoxin South 9th Road, Yuehai Street, Nanshan District, Shenzhen City, Guangdong Province

Patentee before: Global Digital Group Co.,Ltd.

Country or region before: China

CP03 Change of name, title or address