CN117709909A - Business data processing method based on large language model - Google Patents

Business data processing method based on large language model Download PDF

Info

Publication number
CN117709909A
CN117709909A CN202410170293.1A CN202410170293A CN117709909A CN 117709909 A CN117709909 A CN 117709909A CN 202410170293 A CN202410170293 A CN 202410170293A CN 117709909 A CN117709909 A CN 117709909A
Authority
CN
China
Prior art keywords
data
file
classification
subsystem
subset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202410170293.1A
Other languages
Chinese (zh)
Other versions
CN117709909B (en
Inventor
黄余
武文斌
孙利民
佘明洪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Mianyang Normal University
Original Assignee
Mianyang Normal University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Mianyang Normal University filed Critical Mianyang Normal University
Priority to CN202410170293.1A priority Critical patent/CN117709909B/en
Publication of CN117709909A publication Critical patent/CN117709909A/en
Application granted granted Critical
Publication of CN117709909B publication Critical patent/CN117709909B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/103Workflow collaboration or project management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/14Details of searching files based on file metadata
    • G06F16/148File search processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/16File or folder operations, e.g. details of user interfaces specifically adapted to file systems
    • G06F16/162Delete operations
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/903Querying
    • G06F16/9032Query formulation
    • G06F16/90332Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/906Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/604Tools and structures for managing or administering access control systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F21/00Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F21/60Protecting data
    • G06F21/62Protecting access to data via a platform, e.g. using keys or access control rules
    • G06F21/6218Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0481Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
    • G06F3/04817Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance using icons
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • G06N5/041Abduction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/02Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
    • H04L63/0272Virtual private networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L63/00Network architectures or network communication protocols for network security
    • H04L63/10Network architectures or network communication protocols for network security for controlling access to devices or network resources
    • H04L63/101Access control lists [ACL]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L9/00Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
    • H04L9/40Network security protocols
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2221/00Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/21Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
    • G06F2221/2141Access rights, e.g. capability lists, access control lists, access tables, access matrices

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Security & Cryptography (AREA)
  • Software Systems (AREA)
  • Computer Hardware Design (AREA)
  • Human Computer Interaction (AREA)
  • Business, Economics & Management (AREA)
  • Signal Processing (AREA)
  • Mathematical Physics (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Human Resources & Organizations (AREA)
  • Computing Systems (AREA)
  • Strategic Management (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Bioethics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Library & Information Science (AREA)
  • Evolutionary Computation (AREA)
  • Automation & Control Theory (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Operations Research (AREA)
  • Quality & Reliability (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a business data processing method based on a large language model, which comprises the following steps: s1, modularizing a subsystem according to department work items; s2, making a file subset classification table; the invention relates to the technical field of electric digital data processing. According to the business data processing method based on the large language model, various business data files of different departments and different personnel in an enterprise are integrated uniformly, uniform rules are formulated, a subset classification table is established to carry out ordered classification, identification classification and marking can be carried out when data are uploaded, and further when a main system processes the data, the direction to be processed can be quickly found out, such as classified storage record or sending to other subsystems, the difficulty of staff operation is saved, the error rate is reduced, a log table is simultaneously identified and processed and generated, a reminding mechanism is matched, the repeated operation problem is effectively avoided, the working efficiency is improved, meanwhile, the tracing is convenient, and the business data management is convenient.

Description

一种基于大语言模型的业务数据处理方法A business data processing method based on large language model

技术领域Technical field

本发明涉及电数字数据处理技术领域,具体为一种基于大语言模型的业务数据处理方法。The invention relates to the technical field of electronic digital data processing, and is specifically a business data processing method based on a large language model.

背景技术Background technique

企业在工作过程中,通常会处理大量繁杂的数据文件,并且现有数据类型种类众多,对于整个企业不同部门来说,需要处理的数据文件种类更繁杂。In the course of work, enterprises usually process a large number of complex data files, and there are many types of existing data. For different departments of the entire enterprise, the types of data files that need to be processed are even more complex.

CN102368311B公开了一种业务数据处理装置,针对同一环节具有不同具体业务规则的应用业务,可以根据用户定义的业务规则来灵活处理其自身数据,支持后续流程,并兼顾资源共享,无需重复开发,也解决了因多个业务规则的引入而造成用户重复操作的问题。CN102368311B discloses a business data processing device, which can flexibly process its own data according to user-defined business rules for application businesses with different specific business rules in the same link, support subsequent processes, and take into account resource sharing, without repeated development, and Solved the problem of repeated user operations caused by the introduction of multiple business rules.

上述技术是针对具有不同规则的同一类文件进行统一规则的提取,进而通过统一的规则管理所有的文件,但是对于一些企业来说,其文件种类可能更多样化,如图纸文件、财务报表、人事信息、客户资料等,无法找到统一的规则来定义管理所有类型文件,因此在工作过程中,难以有效地处理不同的文件;The above technology extracts unified rules for the same type of documents with different rules, and then manages all documents through unified rules. However, for some enterprises, the types of documents may be more diverse, such as drawing documents, financial statements, It is impossible to find unified rules to define and manage all types of documents such as personnel information, customer information, etc. Therefore, it is difficult to effectively handle different documents during the work process;

甚至一些文件在不同部门中同时存在,甚至同一文件还会有细微差别,例如不同部门内的同样一份文件,在文件夹名上有偏差,而不同部门以及同一部门不同员工的终端设备通常资源也不共享,若需要上传系统,系统仅会识别最外层文件夹名是否一致,哪怕只增加一个字也判定不一致,即会出现文件重复的问题,因此导致大量工作的重复,并且重复的操作可能会导致错误的数据掩盖正确数据,进而可能造成后续的一系列麻烦。Even some files exist in different departments at the same time, and even the same file will have subtle differences. For example, the same file in different departments has deviations in the folder name, and the terminal devices of different departments and different employees of the same department usually have resources. It is also not shared. If it needs to be uploaded to the system, the system will only identify whether the outermost folder name is consistent. Even if only one word is added, it will be judged to be inconsistent. That is, the problem of file duplication will occur, thus resulting in the duplication of a lot of work and repeated operations. It may lead to incorrect data covering up correct data, which may cause a series of subsequent troubles.

发明内容Contents of the invention

针对现有技术的不足,本发明提供了一种基于大语言模型的业务数据处理方法,解决了企业内较多账户处理大量数据,容易存在重复操作的问题,甚至最后会导致数据的错误,影响后续工作的问题。In view of the shortcomings of the existing technology, the present invention provides a business data processing method based on a large language model, which solves the problem of multiple accounts in an enterprise processing a large amount of data, which is prone to repeated operations, and may even lead to data errors in the end. Issues with follow-up work.

为实现以上目的,本发明通过以下技术方案予以实现:一种基于大语言模型的业务数据处理方法,具体包括以下步骤:In order to achieve the above objectives, the present invention is implemented through the following technical solutions: a business data processing method based on a large language model, which specifically includes the following steps:

S1、依据部门工作项目模块化制定子系统;S1. Modularly develop subsystems based on department work projects;

S2、制定文件子集分类表:先将文件数据创建子集,其公式表示为:S2. Develop a file subset classification table: first create a subset of file data, and its formula is expressed as:

S={x|x∈A,P(x)};S={x|x∈A,P(x)};

其中,S表示创建的子集,A表示原始数据集,P(x)表示用来分类的条件或规则,其表示S是由A中满足条件P(x)的元素x组成的子集;Among them, S represents the created subset, A represents the original data set, and P(x) represents the condition or rule used for classification, which means that S is a subset composed of elements x in A that satisfy the condition P(x);

然后依据子集建立文件子集分类表,对新的文件名进行处理和分类识别,设一级分类为A,包含大小两类类别,分别为A1和A2;二级分类为B,包含大小两类类别,分别为B1和B2,则分类表的公式可以表示为:Then establish a file subset classification table based on the subset, and process and classify the new file names. Let the first-level classification be A, including two categories, large and small, respectively A1 and A2; the second-level classification is B, including both large and small. The class categories are B1 and B2 respectively, then the formula of the classification table can be expressed as:

A={A1,A2},B={B1,B2};A={A1, A2}, B={B1, B2};

其中,A1和A2为A的子类别编号,B1和B2为B的子类别编号,且A1、A2、B1和B2类别下进一步划分n个编号指代具体关键词;Among them, A1 and A2 are the sub-category numbers of A, B1 and B2 are the sub-category numbers of B, and n numbers are further divided under the categories A1, A2, B1 and B2 to refer to specific keywords;

S3、子系统向主系统上传企业数据前,进行关键词标记;S3. Before uploading enterprise data to the main system, the subsystem performs keyword tagging;

S4、主系统的数据经过数据管理单元进行识别分类和处理,同时针对初次上传的文件,遍历所有子系统查找相似文件,对存在相似文件的子系统发出提醒,以消除重复工作;S4. The data of the main system is identified, classified and processed through the data management unit. At the same time, for the files uploaded for the first time, all subsystems are traversed to find similar files, and reminders are issued to subsystems with similar files to eliminate duplication of work;

S5、数据识别结合数据标记与覆盖策略识别与处理数据,同时配合用户交互深度参与处理过程;S5. Data identification combines data marking and coverage strategies to identify and process data, and at the same time cooperates with user interaction to deeply participate in the processing process;

S6、主数据库文件在达到设定时间后未访问,则表示项目结束,自动删除所有相关联的子系统数据库数据,并将主数据库对应的数据转移至完结库中;S6. If the main database file is not accessed after the set time is reached, it means that the project is over, all associated subsystem database data will be automatically deleted, and the data corresponding to the main database will be transferred to the completion library;

S7、查找数据时通过大语言交互模块询问,大语言交互模块反馈数据下载项并附带数据历史日志。S7. When searching for data, ask through the large language interaction module. The large language interaction module will feedback the data download items and attach the data history log.

优选的,所述S2具体包括以下步骤:Preferably, the S2 specifically includes the following steps:

(1)文件名划分:将文件名按照空格、符号对数字、字母和名词进行划分,得到单词列表;对单词列表中的每个单词进行词性标注;(1) File name division: Divide the file name into numbers, letters and nouns according to spaces and symbols to obtain a word list; perform part-of-speech tagging for each word in the word list;

(2)历史识别记录:根据历史文件分类记录和文件具体数据信息,提取文件名字句并依据人为设定的规则剔除无效特征,保留有效特征,以辅助后续的自动分类过程;(2) Historical identification records: Based on historical file classification records and specific file data information, file name sentences are extracted and invalid features are eliminated based on artificially set rules, while valid features are retained to assist in the subsequent automatic classification process;

(3)设定规则分类标记:人工依据业务数据文件隶属的部门、文件类型,制定文件子集分类表,并人工设定规则对提取的特征编辑子集编码代替特征,添加编码对文件名进行分类标记,将文件名与对应的类别进行关联;(3) Set rule classification tags: Manually formulate a file subset classification table based on the department and file type to which the business data file belongs, and manually set rules to edit the extracted features, edit the subset codes to replace the features, and add codes to the file names. Category tags associate file names with corresponding categories;

(4)算法整合:将有效特征、历史识别记录和制定文件子集分类表整合成算法模型,对新的文件名进行标记处理和分类识别。(4) Algorithm integration: Integrate effective features, historical identification records and formulated file subset classification tables into an algorithm model, and label and classify new file names.

优选的,所述S4具体包括:Preferably, the S4 specifically includes:

(1)分词处理和关键词提取:(1) Word segmentation processing and keyword extraction:

1、分词处理:将文件名中的非字母数字字符替换为空格,使用正则表达式将文件名按照单词进行分割;1. Word segmentation processing: replace non-alphanumeric characters in the file name with spaces, and use regular expressions to split the file name according to words;

2、提取关键词:对于每个分割出来的单词,对照文件子集分类表内的关键词,剔除多余单词,将处理后的单词作为文件名的关键词;2. Extract keywords: For each segmented word, compare it with the keywords in the file subset classification table, eliminate redundant words, and use the processed words as keywords in the file name;

(2)使用分类算法模型将提取的关键词对照子集分类表进行数据分类;(2) Use the classification algorithm model to classify the extracted keywords against the subset classification table;

(3)数据存储,具体包括:(3) Data storage, specifically including:

1、存储至主系统数据库;1. Store in the main system database;

2、发送至其他子系统数据库;2. Send to other subsystem databases;

3、生成日志表。3. Generate log table.

优选的,所述S5具体包括:对于未识别到标记的数据进行提醒,并提取日志表判断是否为初次上传,并依据子系统的来向和上传的次数提示标记选项,由上传者选择增加标记,非初次上传的数据分配后覆盖原数据,并将原数据转移至暂存站暂存。Preferably, the S5 specifically includes: reminding the data for which the mark has not been recognized, extracting the log table to determine whether it is the first upload, and prompting the mark option based on the direction of the subsystem and the number of uploads, and the uploader chooses to add the mark. , the data not uploaded for the first time will overwrite the original data after distribution, and the original data will be transferred to the temporary storage station for temporary storage.

优选的,所述S3子系统向主系统上传数据前需要进行一系列的数据处理和分类,具体包括以下几个步骤:Preferably, the S3 subsystem needs to perform a series of data processing and classification before uploading data to the main system, which specifically includes the following steps:

(1)对子系统上传的数据包进行划分,若数据包类含有不同类型数据,将不同类型数据分类成不同子集;(1) Divide the data packets uploaded by the subsystem. If the data packet class contains different types of data, classify the different types of data into different subsets;

(2)根据子集分类表中的规则,对每个文件添加编码进行标记,在标记的过程中,依据识别出的特征结合文件子集分类表中分类标记的顺序,对子集文件分别叠加分类标记;(2) According to the rules in the subset classification table, add coding to each file to mark it. During the marking process, the subset files are superimposed based on the identified characteristics and the order of classification marks in the file subset classification table. Classification tag;

(3)将标记好编码的数据上传到主系统。(3) Upload the marked and coded data to the main system.

优选的,所述子系统按照企业部门划分为若干部门子系统,部门内每个员工账户划分在同一部门子系统内。Preferably, the subsystem is divided into several department subsystems according to enterprise departments, and each employee account in the department is divided into the same department subsystem.

优选的,所述主系统包括数据管理单元、主系统数据库、大语言交互模块和子系统权限管理模块,所述数据管理单元和大语言交互模块通过访问主系统数据库来上传、下载和查看数据,所述子系统权限管理模块用于管理部门子系统的账户权限。Preferably, the main system includes a data management unit, a main system database, a large language interaction module and a subsystem authority management module. The data management unit and the large language interaction module upload, download and view data by accessing the main system database, so The subsystem permission management module described above is used to manage the account permissions of department subsystems.

优选的,所述子系统与主系统通过企业内网连接,所述子系统位于外网时通过VPN进入内网访问主系统。Preferably, the subsystem is connected to the main system through an enterprise intranet, and when the subsystem is located on the external network, it enters the intranet through VPN to access the main system.

优选的,所述部门子系统包括子系统数据库、系统框架、通信机制、扩展模块和模块化功能单元,所述模块化功能单元还包括若干功能模块以及定义若干功能模块接口的接口定义模块。Preferably, the department subsystem includes a subsystem database, a system framework, a communication mechanism, an expansion module and a modular functional unit. The modular functional unit also includes several functional modules and an interface definition module that defines several functional module interfaces.

优选的,所述通信机制用于将模块化功能单元集成的功能模块连接至系统框架构成完整系统,所述扩展模块用于加载新功能数据后扩展为新功能模块,或在原有功能模块故障时加载该功能模块数据临时使用,所述子系统数据库用于储存本地数据以及功能数据的模块化数据。Preferably, the communication mechanism is used to connect the functional modules integrated by the modular functional units to the system framework to form a complete system, and the expansion module is used to load new functional data and then expand into new functional modules, or when the original functional module fails. The functional module data is loaded for temporary use, and the subsystem database is used to store local data and modular data of functional data.

有益效果beneficial effects

本发明提供了一种基于大语言模型的业务数据处理方法。与现有技术相比具备以下有益效果:The present invention provides a business data processing method based on a large language model. Compared with existing technology, it has the following beneficial effects:

1、该基于大语言模型的业务数据处理方法,通过对企业内不同部门、不同人员多种多样的业务数据文件进行统一整合,并制定统一的规则,建立子集分类表进行有序的分类,在上传数据时可进行识别分类并标记,进而在主系统处理数据时,可快速的找到应该处理的方向,如分类储存记录或发送至其他子系统,节省了员工操作的难度,也降低了错误率,同时识别处理并生成日志表,配合提醒机制和子系统自动查找功能,对存有共同数据的子系统进行提示,有效的避免了重复操作问题,提高了工作效率,同时也方便追溯,便于管理业务数据,同时制定时间规则,定期清理储存的文件,并对子系统进行清理,避免子系统数据冗杂,进而降低了员工处理数据的工作量,也可有效避免重复操作。1. This business data processing method based on the large language model uniformly integrates a variety of business data files from different departments and different personnel within the enterprise, formulates unified rules, and establishes a subset classification table for orderly classification. When uploading data, it can be identified, classified and marked. Then when the main system processes the data, it can quickly find the direction that should be processed, such as classifying and storing records or sending them to other subsystems, which saves the difficulty of employee operations and reduces errors. efficiency, identify, process and generate log tables at the same time, and cooperate with the reminder mechanism and subsystem automatic search function to prompt subsystems with common data, effectively avoiding repeated operation problems, improving work efficiency, and also facilitating traceability and management. Business data, while formulating time rules, regularly cleaning stored files, and cleaning subsystems to avoid redundant subsystem data, thus reducing the workload of employees in processing data and effectively avoiding repeated operations.

2、该基于大语言模型的业务数据处理方法,通过设置大语言交互模块,可在操作界面一角构建浮动式窗口,随着操作界面的拉动,始终保持位于显示界面固定位置,方便操作者使用,大语言交互模块可通过对话方式进行询问,可进行查找数据、查找历史、操作规则等系统有记录的操作,有需求的企业还可搭载chatgpt等人工智能插件辅助办公,有效提高了企业数据处理的效率。2. This business data processing method based on the large language model can build a floating window in the corner of the operation interface by setting up the large language interaction module. As the operation interface is pulled, it will always remain at a fixed position on the display interface, making it convenient for the operator to use. The large language interaction module can be used for inquiries through dialogue, and can perform system-recorded operations such as searching data, searching history, and operating rules. Enterprises in need can also be equipped with artificial intelligence plug-ins such as chatgpt to assist in office work, which effectively improves the efficiency of enterprise data processing. efficiency.

3、该基于大语言模型的业务数据处理方法,通过将大系统按照部门性质分为若干个子系统,并分别管理部门内员工的账户,每个部门的子系统内容不同,方便针对性的管理以及业务数据处理,同时采用企业内部内网访问以及外部VPN访问内网的方式,进一步保证了企业业务数据的安全性。3. This business data processing method based on the large language model divides the large system into several subsystems according to the nature of the department, and manages the accounts of employees within the department respectively. The content of each department's subsystem is different, which facilitates targeted management and Business data processing uses both internal enterprise intranet access and external VPN access to the intranet, further ensuring the security of enterprise business data.

4、该基于大语言模型的业务数据处理方法,通过将部门子系统建立为功能模块化的方式,将各个功能独立,并利用通信机制配合接口定义模块搭载在系统框架上,进而可方便依据不同部门业务性质的不同,进行针对性的功能组合,省去了不必要的很多功能,优化了部门子系统界面设置,使其简洁易操作,方便新人上手,也可降低业务数据操作的错误率。4. This business data processing method based on the large language model establishes department subsystems as functional modules, makes each function independent, and uses the communication mechanism to cooperate with the interface definition module to be mounted on the system framework, so that it can be conveniently based on different functions. Depending on the nature of the department's business, targeted function combinations have been carried out, eliminating many unnecessary functions. The department subsystem interface settings have been optimized to make it simple and easy to operate, making it easier for newcomers to get started, and also reducing the error rate of business data operations.

附图说明Description of the drawings

图1为本发明的总系统框图;Figure 1 is a general system block diagram of the present invention;

图2为本发明部门子系统的系统框图;Figure 2 is a system block diagram of the department subsystem of the present invention;

图3为本发明的流程示意图;Figure 3 is a schematic flow diagram of the present invention;

图4为本发明长期文件处理的逻辑流程图。Figure 4 is a logical flow chart of long-term file processing in the present invention.

具体实施方式Detailed ways

下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention. Obviously, the described embodiments are only some of the embodiments of the present invention, rather than all the embodiments. Based on the embodiments of the present invention, all other embodiments obtained by those of ordinary skill in the art without creative efforts fall within the scope of protection of the present invention.

图1-4本发明提供三种技术方案:Figure 1-4 The present invention provides three technical solutions:

第一种实施方式:一种基于大语言模型的业务数据处理方法,具体包括以下步骤:The first implementation mode: a business data processing method based on a large language model, specifically including the following steps:

S1、依据部门工作项目模块化制定子系统;S1. Modularly develop subsystems based on department work projects;

S2、具体包括以下步骤:S2. Specifically includes the following steps:

(1)文件名划分:将文件名按照空格、符号对数字、字母和名词进行划分,得到单词列表;对单词列表中的每个单词进行词性标注;(1) File name division: Divide the file name into numbers, letters and nouns according to spaces and symbols to obtain a word list; perform part-of-speech tagging for each word in the word list;

(2)历史识别记录:根据历史文件分类记录和文件具体数据信息,提取文件名字句并依据人为设定的规则剔除无效特征,保留有效特征,以辅助后续的自动分类过程;(2) Historical identification records: Based on historical file classification records and specific file data information, file name sentences are extracted and invalid features are eliminated based on artificially set rules, while valid features are retained to assist in the subsequent automatic classification process;

(3)设定规则分类标记:人工依据业务数据文件隶属的部门、文件类型,制定文件子集分类表,并人工设定规则对提取的特征编辑子集编码代替特征,添加编码对文件名进行分类标记,将文件名与对应的类别进行关联;(3) Set rule classification tags: Manually formulate a file subset classification table based on the department and file type to which the business data file belongs, and manually set rules to edit the extracted features, edit the subset codes to replace the features, and add codes to the file names. Category tags associate file names with corresponding categories;

制定文件子集分类表包括:先将文件数据创建子集,其公式表示为:Developing a file subset classification table includes: first creating a subset of file data, the formula is expressed as:

S={x|x∈A,P(x)};S={x|x∈A,P(x)};

其中,S表示创建的子集,A表示原始数据集,P(x)表示用来分类的条件或规则,其表示S是由A中满足条件P(x)的元素x组成的子集;Among them, S represents the created subset, A represents the original data set, and P(x) represents the condition or rule used for classification, which means that S is a subset composed of elements x in A that satisfy the condition P(x);

然后依据子集建立文件子集分类表,对新的文件名进行处理和分类识别,设一级分类为A,包含大小两类类别(依据企业发展情况可增加分类),分别为A1和A2;二级分类为B,包含大小两类类别,分别为B1和B2,则分类表的公式可以表示为:Then establish a file subset classification table based on the subset, and process and classify the new file names. Let the first-level classification be A, including two categories, large and small (classifications can be added according to the development of the enterprise), which are A1 and A2 respectively; The secondary classification is B, which includes two categories, large and small, B1 and B2 respectively. The formula of the classification table can be expressed as:

A={A1,A2},B={B1,B2};A={A1, A2}, B={B1, B2};

其中,A1和A2为A的子类别编号,B1和B2为B的子类别编号,且A1、A2、B1和B2类别下进一步划分n个编号指代具体关键词,如A1以下包括{A101、A102、......、A1n},A1表示财务工程材料,A101表示工程投资估算材料,A102表示工程设计概算材料;Among them, A1 and A2 are the sub-category numbers of A, B1 and B2 are the sub-category numbers of B, and the categories A1, A2, B1 and B2 are further divided into n numbers to refer to specific keywords, such as A1 and the following include {A1 01 , A1 02 ,..., A1 n }, A1 represents financial engineering materials, A1 01 represents engineering investment estimation materials, A1 02 represents engineering design budget materials;

(4)算法整合:将有效特征、历史识别记录和制定文件子集分类表整合成算法模型,采用机器学习、深度学习或其他相关技术对新的文件名进行标记处理和分类识别;(4) Algorithm integration: Integrate effective features, historical identification records and formulated file subset classification tables into algorithm models, and use machine learning, deep learning or other related technologies to mark, process and classify new file names;

S3、子系统向主系统上传企业数据前,需要进行一系列的数据处理和分类,具体包括以下几个步骤:Before S3 and the subsystem upload enterprise data to the main system, a series of data processing and classification are required, including the following steps:

(1)对子系统上传的数据包进行划分,若数据包类含有不同类型数据,将不同类型数据分类成不同子集;(1) Divide the data packets uploaded by the subsystem. If the data packet class contains different types of data, classify the different types of data into different subsets;

(2)根据子集分类表中的规则,对每个文件添加编码进行标记,在标记的过程中,依据识别出的特征结合文件子集分类表中分类标记的顺序,对子集文件分别叠加分类标记;(2) According to the rules in the subset classification table, add coding to each file to mark. During the marking process, the subset files are superimposed based on the identified characteristics and the order of classification marks in the file subset classification table. Classification tag;

(3)将标记好编码的数据上传到主系统;(3) Upload the marked and coded data to the main system;

S4、主系统的数据经过数据管理单元进行识别分类和处理,具体包括:S4. The data of the main system is identified, classified and processed through the data management unit, including:

1、分词处理:将文件名中的非字母数字字符替换为空格,使用正则表达式将文件名按照单词进行分割;正则表达式是一种用来描述字符串模式的方法,它可以用来匹配、搜索和替换文本中的字符串。1. Word segmentation processing: Replace non-alphanumeric characters in the file name with spaces, and use regular expressions to split the file names according to words; regular expressions are a method used to describe string patterns, and they can be used to match , search and replace strings in text.

2、提取关键词:对于每个分割出来的单词,对照文件子集分类表内的关键词,剔除多余单词,将处理后的单词作为文件名的关键词;2. Extract keywords: For each segmented word, compare it with the keywords in the file subset classification table, eliminate redundant words, and use the processed words as keywords in the file name;

例如:2023.11.29 张三 财务部 某企业202345286--工程核算报表-相控信息副本,提取后的关键词包括:2023.11.29(日期)、张三(处理人名)、财务部(部门)、某企业(企业名)、202345286(代码)、工程核算报表(文件类别)、相控信息、副本以及空格和符号,其中张三、财务部、某企业、工程核算报表可匹配到文件子集分类表中特征,则其余关键词为无效特征进行删除。For example: 2023.11.29 Zhang San Finance Department of an enterprise 202345286 - Project accounting report - phase control information copy. The extracted keywords include: 2023.11.29 (date), Zhang San (name of person handling the matter), Finance Department (department), A certain enterprise (company name), 202345286 (code), engineering accounting report (file category), phase control information, copy, and spaces and symbols. Among them, Zhang San, Finance Department, a certain enterprise, and engineering accounting report can be matched to the file subset classification features in the table, the remaining keywords are invalid features and will be deleted.

(2)使用分类算法模型将提取的关键词对照子集分类表进行数据分类,例如识别的关键词隶属A1分类下的A102以及B2下的B201,则在文件名前标记A102B201;常见的分类算法模型有:逻辑回归、决策树随机森林、朴素贝叶斯、支持向量机、K最近邻算法、神经网络,其均为现有成熟的分类算法模型。(2) Use the classification algorithm model to classify the extracted keywords against the subset classification table. For example, if the identified keywords belong to A1 02 under the A1 category and B2 01 under the B2 category, mark A1 02 B2 01 before the file name; Common classification algorithm models include: logistic regression, decision tree random forest, naive Bayes, support vector machine, K nearest neighbor algorithm, and neural network, all of which are existing mature classification algorithm models.

(3)数据存储,具体包括:(3) Data storage, specifically including:

1、存储至主系统数据库;1. Store in the main system database;

2、发送至其他子系统数据库;2. Send to other subsystem databases;

3、生成日志表。3. Generate log table.

同时针对初次上传的文件,遍历所有子系统查找相似文件,对存在相似文件的子系统发出提醒,以消除重复工作;At the same time, for files uploaded for the first time, all subsystems are traversed to find similar files, and reminders are issued to subsystems where similar files exist to eliminate duplication of work;

S5、数据识别结合数据标记与覆盖策略识别与处理数据,同时配合用户交互深度参与处理过程,具体包括:对于未识别到标记的数据进行提醒,并提取日志表判断是否为初次上传,并依据子系统的来向和上传的次数提示标记选项,由上传者选择增加标记,非初次上传的数据分配后覆盖原数据,并将原数据转移至暂存站暂存(暂存站及之前储存空间可以是云平台或公司内部大容量存储设备);S5. Data identification combines data marking and coverage strategies to identify and process data, and at the same time cooperates with user interaction to deeply participate in the processing process, including: reminding data that has not been marked, and extracting the log table to determine whether it is the first upload, and based on the child The system prompts the tag option according to the direction and the number of uploads. The uploader chooses to add tags. The data that is not uploaded for the first time will overwrite the original data after distribution, and the original data will be transferred to the temporary storage station for temporary storage (the temporary storage station and the previous storage space can Is it a cloud platform or a company’s internal large-capacity storage device);

S6、主数据库文件在达到设定时间后未访问,则表示项目结束,自动删除所有相关联的子系统数据库数据,并将主数据库对应的数据转移至完结库中(完结库优选为云平台,可储存更多更久的数据便于追溯),其步骤为:S6. If the main database file is not accessed after reaching the set time, it means that the project is over, all associated subsystem database data will be automatically deleted, and the data corresponding to the main database will be transferred to the completion library (the completion library is preferably a cloud platform. More and longer data can be stored for easy traceability). The steps are:

(1)先设定阈值时间,如30天;(1) First set the threshold time, such as 30 days;

(2)遍历文件系统中的所有文件;(2) Traverse all files in the file system;

(3)对于每个文件,检查其最后访问时间。可以使用操作系统提供的API来获取文件的最后访问时间;(3) For each file, check its last access time. You can use the API provided by the operating system to obtain the last access time of the file;

(4)如果文件的最后访问时间超过30天,则将其标记为“需要删除”;(4) If the file was last accessed for more than 30 days, mark it as "needs deletion";

(5)完成遍历后,遍历一次所有需要删除的文件,并删除它们。(5) After completing the traversal, traverse all the files that need to be deleted and delete them.

如图4所示,设最后一次访问日期标记为T f,当前日期为T d,未访问时间为T w,,则:T w,=T d-T f,若T w, 30,则转移主系统内该文件并删除子系统内对应文件,反之保留该文件。As shown in Figure 4, assume that the last access date is marked as T f , the current date is T d , and the unvisited time is T w, , then: T w, = T d - T f , if T w, 30, transfer the file in the main system and delete the corresponding file in the subsystem, otherwise keep the file.

S7、查找数据时通过大语言交互模块询问,大语言交互模块反馈数据下载项并附带数据历史日志。S7. When searching for data, ask through the large language interaction module. The large language interaction module will feedback the data download items and attach the data history log.

通过对企业内不同部门、不同人员多种多样的业务数据文件进行统一整合,并制定统一的规则,建立子集分类表进行有序的分类,在上传数据时可进行识别分类并标记,进而在主系统处理数据时,可快速的找到应该处理的方向,如分类储存记录或发送至其他子系统,节省了员工操作的难度,也降低了错误率,同时识别处理并生成日志表,配合提醒机制和子系统自动查找功能,对存有共同数据的子系统进行提示,有效的避免了重复操作问题,提高了工作效率还可避免错误数据覆盖正确数据,同时也方便追溯,便于管理业务数据,同时制定时间规则,定期清理储存的文件,并对子系统进行清理,避免子系统数据冗杂,进而降低了员工处理数据的工作量,也可有效避免重复操作。By integrating various business data files from different departments and personnel within the enterprise, and formulating unified rules, establishing a subset classification table for orderly classification, identification, classification and labeling can be performed when uploading data, and then in When the main system processes data, it can quickly find the direction that should be processed, such as classifying and storing records or sending them to other subsystems, which saves the difficulty of employee operations and reduces the error rate. It also identifies and processes and generates log tables, and cooperates with the reminder mechanism. And the automatic search function of subsystems prompts subsystems with common data, which effectively avoids repeated operations, improves work efficiency, and avoids incorrect data overwriting correct data. It also facilitates traceability, management of business data, and formulation of Time rules regularly clean up stored files and subsystems to avoid redundant subsystem data, thereby reducing the workload of employees in processing data and effectively avoiding repeated operations.

使大语言交互模块图标始终浮动在界面的一角,可以使用CSS和HTML来实现:Making the large language interactive module icon always float in the corner of the interface can be achieved using CSS and HTML:

首先,在HTML文件中创建一个包含图标的div元素,例如:First, create a div element containing the icon in your HTML file, for example:

<div class="floating-icon"><div class="floating-icon">

<img src="icon.png" alt="Floating Icon"><img src="icon.png" alt="Floating Icon">

</div></div>

然后,在CSS文件中设置该div元素的样式,使其始终浮动在界面的一角,例如:Then, style the div element in the CSS file so that it always floats in the corner of the interface, for example:

.floating-icon {.floating-icon {

position: fixed;position: fixed;

bottom: 20px; /* 距离底部的距离 */bottom: 20px; /* distance from the bottom */

right: 20px; /* 距离右侧的距离 */right: 20px; /* distance from the right */

}}

通过以上方法,可以使图标始终浮动在界面的一角,无论用户如何滚动页面或调整窗口大小。Through the above method, you can make the icon always float in the corner of the interface, no matter how the user scrolls the page or resizes the window.

通过设置大语言交互模块,可在操作界面一角构建浮动式窗口,随着操作界面的拉动,始终保持位于显示界面固定位置,方便操作者使用,大语言交互模块可通过对话方式进行询问,可进行查找数据、查找历史、操作规则等系统有记录的操作,有需求的企业还可搭载chatgpt等人工智能插件辅助办公,有效提高了企业数据处理的效率。By setting up a large language interaction module, a floating window can be built in a corner of the operation interface. As the operation interface is pulled, it will always remain at a fixed position on the display interface, making it convenient for the operator to use. The large language interaction module can be asked through dialogue, and can be The system has recorded operations such as data search, search history, and operation rules. Enterprises in need can also use artificial intelligence plug-ins such as chatgpt to assist in office work, which effectively improves the efficiency of enterprise data processing.

第二种实施方式,与第一种实施方式的主要区别在于:所述子系统按照企业部门划分为若干部门子系统,部门内每个员工账户划分在同一部门子系统内。The main difference between the second implementation mode and the first implementation mode is that the subsystem is divided into several department subsystems according to enterprise departments, and each employee account in the department is divided into the same department subsystem.

所述主系统包括数据管理单元、主系统数据库、大语言交互模块和子系统权限管理模块,所述数据管理单元和大语言交互模块通过访问主系统数据库来上传、下载和查看数据,所述子系统权限管理模块用于管理部门子系统的账户权限。The main system includes a data management unit, a main system database, a large language interaction module and a subsystem authority management module. The data management unit and the large language interaction module upload, download and view data by accessing the main system database. The subsystem The permission management module is used to manage the account permissions of department subsystems.

所述子系统与主系统通过企业内网连接,所述子系统位于外网时通过VPN进入内网访问主系统。The subsystem is connected to the main system through the enterprise intranet. When the subsystem is located on the external network, it enters the intranet through VPN to access the main system.

通过将大系统按照部门性质分为若干个子系统,并分别管理部门内员工的账户,每个部门的子系统内容不同,方便针对性的管理以及业务数据处理,同时采用企业内部内网访问以及外部VPN访问内网的方式,进一步保证了企业业务数据的安全性。By dividing the large system into several subsystems according to the nature of the department, and managing the accounts of employees within the department respectively, the content of each department's subsystem is different, which facilitates targeted management and business data processing, while using the enterprise's internal intranet access and external VPN access to the intranet further ensures the security of corporate business data.

第三种实施方式,与第二种实施方式的主要区别在于:所述部门子系统包括子系统数据库、系统框架、通信机制、扩展模块和模块化功能单元,所述模块化功能单元还包括若干功能模块以及定义若干功能模块接口的接口定义模块;The main difference between the third implementation mode and the second implementation mode is that the department subsystem includes a subsystem database, a system framework, a communication mechanism, an expansion module and a modular functional unit, and the modular functional unit also includes several Function modules and interface definition modules that define several functional module interfaces;

所述通信机制用于将模块化功能单元集成的功能模块连接至系统框架构成完整系统,所述扩展模块用于加载新功能数据后扩展为新功能模块,或在原有功能模块故障时加载该功能模块数据临时使用,所述子系统数据库用于储存本地数据以及功能数据的模块化数据。The communication mechanism is used to connect the functional modules integrated by the modular functional units to the system framework to form a complete system. The expansion module is used to load new functional data and then expand it into a new functional module, or load the function when the original functional module fails. Module data is temporarily used, and the subsystem database is used to store local data and modular data of functional data.

通过将部门子系统建立为功能模块化的方式,将各个功能独立,并利用通信机制配合接口定义模块搭载在系统框架上,进而可方便依据不同部门业务性质的不同,进行针对性的功能组合,省去了不必要的很多功能,优化了部门子系统界面设置,使其简洁易操作,方便新人上手,也可降低业务数据操作的错误率。By establishing the department subsystem as a functional modular method, each function is independent, and the communication mechanism is used in conjunction with the interface definition module to be mounted on the system framework, which can facilitate targeted function combinations based on the different business nature of different departments. It eliminates many unnecessary functions and optimizes the department subsystem interface settings to make it simple and easy to operate, making it easier for newcomers to get started and reducing the error rate of business data operations.

同时本说明书中未作详细描述的内容均属于本领域技术人员公知的现有技术。At the same time, contents not described in detail in this specification belong to the prior art known to those skilled in the art.

需要说明的是,在本文中,诸如第一和第二等之类的关系术语仅仅用来将一个实体或者操作与另一个实体或操作区分开来,而不一定要求或者暗示这些实体或操作之间存在任何这种实际的关系或者顺序。而且,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、方法、物品或者设备不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、方法、物品或者设备所固有的要素。It should be noted that in this article, relational terms such as first and second are only used to distinguish one entity or operation from another entity or operation, and do not necessarily require or imply that these entities or operations are mutually exclusive. any such actual relationship or sequence exists between them. Furthermore, the terms "comprises," "comprises," or any other variations thereof are intended to cover a non-exclusive inclusion such that a process, method, article, or apparatus that includes a list of elements includes not only those elements, but also those not expressly listed other elements, or elements inherent to the process, method, article or equipment.

尽管已经示出和描述了本发明的实施例,对于本领域的普通技术人员而言,可以理解在不脱离本发明的原理和精神的情况下可以对这些实施例进行多种变化、修改、替换和变型,本发明的范围由所附权利要求及其等同物限定。Although the embodiments of the present invention have been shown and described, those of ordinary skill in the art will understand that various changes, modifications, and substitutions can be made to these embodiments without departing from the principles and spirit of the invention. and modifications, the scope of the invention is defined by the appended claims and their equivalents.

Claims (10)

1.一种基于大语言模型的业务数据处理方法,其特征在于:具体包括以下步骤:1. A business data processing method based on a large language model, which is characterized by: specifically including the following steps: S1、依据部门工作项目模块化制定子系统;S1. Modularly develop subsystems based on department work projects; S2、制定文件子集分类表:先将文件数据创建子集,其公式表示为:S2. Develop a file subset classification table: first create a subset of file data, and its formula is expressed as: S={x|x∈A,P(x)};S={x|x∈A,P(x)}; 其中,S表示创建的子集,A表示原始数据集,P(x)表示用来分类的条件或规则,分类的条件或规则由人为参与设定,其表示S是由A中满足条件P(x)的元素x组成的子集;Among them, S represents the created subset, A represents the original data set, and P(x) represents the conditions or rules used for classification. The conditions or rules for classification are set by human participation, which means that S is formed from A that satisfies the condition P( A subset composed of elements x of x); 然后依据子集建立文件子集分类表,对新的文件名进行处理和分类识别,设一级分类为A,包含大小两类类别,分别为A1和A2;二级分类为B,包含大小两类类别,分别为B1和B2,则分类表的公式可以表示为:Then establish a file subset classification table based on the subset, and process and classify the new file names. Let the first-level classification be A, including two categories, large and small, respectively A1 and A2; the second-level classification is B, including both large and small. The class categories are B1 and B2 respectively, then the formula of the classification table can be expressed as: A={A1,A2},B={B1,B2};A={A1, A2}, B={B1, B2}; 其中,A1和A2为A的子类别编号,B1和B2为B的子类别编号,且A1、A2、B1和B2类别下进一步划分n个编号指代具体关键词;Among them, A1 and A2 are the sub-category numbers of A, B1 and B2 are the sub-category numbers of B, and n numbers are further divided under the categories A1, A2, B1 and B2 to refer to specific keywords; S3、子系统向主系统上传企业数据前,进行关键词标记;S3. Before uploading enterprise data to the main system, the subsystem performs keyword tagging; S4、主系统的数据经过数据管理单元进行识别分类和处理,同时针对初次上传的文件,遍历所有子系统查找相似文件,对存在相似文件的子系统发出提醒,以消除重复工作;S4. The data of the main system is identified, classified and processed through the data management unit. At the same time, for the files uploaded for the first time, all subsystems are traversed to find similar files, and reminders are issued to subsystems with similar files to eliminate duplication of work; S5、数据识别结合数据标记与覆盖策略识别与处理数据,同时配合用户交互深度参与处理过程;S5. Data identification combines data marking and coverage strategies to identify and process data, and at the same time cooperates with user interaction to deeply participate in the processing process; S6、主数据库文件在达到设定时间后未访问,则表示项目结束,自动删除所有相关联的子系统数据库数据,并将主数据库对应的数据转移至完结库中;S6. If the main database file is not accessed after the set time is reached, it means that the project is over, all associated subsystem database data will be automatically deleted, and the data corresponding to the main database will be transferred to the completion library; S7、使用CSS和HTML将大语言交互模块浮动式加载于系统操作界面,查找数据时通过大语言交互模块询问,大语言交互模块反馈对应数据。S7. Use CSS and HTML to load the large language interactive module into the system operation interface in a floating manner. When searching for data, ask through the large language interactive module, and the large language interactive module will feed back the corresponding data. 2.根据权利要求1所述的一种基于大语言模型的业务数据处理方法,其特征在于:所述S2具体包括以下步骤:2. A business data processing method based on a large language model according to claim 1, characterized in that: the S2 specifically includes the following steps: (1)文件名划分:将文件名按照空格、符号对数字、字母和名词进行划分,得到单词列表;对单词列表中的每个单词进行词性标注;(1) File name division: Divide the file name into numbers, letters and nouns according to spaces and symbols to obtain a word list; perform part-of-speech tagging for each word in the word list; (2)历史识别记录:根据历史文件分类记录和文件具体数据信息,提取文件名字句并依据人为设定的规则剔除无效特征,保留有效特征,以辅助后续的自动分类过程;(2) Historical identification records: Based on historical file classification records and specific file data information, file name sentences are extracted and invalid features are eliminated based on artificially set rules, while valid features are retained to assist in the subsequent automatic classification process; (3)设定规则分类标记:人工依据业务数据文件隶属的部门、文件类型,制定文件子集分类表,并人工设定规则对提取的特征编辑子集编码代替特征,添加编码对文件名进行分类标记,将文件名与对应的类别进行关联;(3) Set rule classification tags: Manually formulate a file subset classification table based on the department and file type to which the business data file belongs, and manually set rules to edit the extracted features, edit the subset codes to replace the features, and add codes to the file names. Category tags associate file names with corresponding categories; (4)算法整合:将有效特征、历史识别记录和制定文件子集分类表整合成算法模型,对新的文件名进行标记处理和分类识别。(4) Algorithm integration: Integrate effective features, historical identification records and formulated file subset classification tables into an algorithm model, and label and classify new file names. 3.根据权利要求1所述的一种基于大语言模型的业务数据处理方法,其特征在于:所述S4具体包括:3. A business data processing method based on a large language model according to claim 1, characterized in that: the S4 specifically includes: (1)分词处理和关键词提取:(1) Word segmentation processing and keyword extraction: 1、分词处理:将文件名中的非字母数字字符替换为空格,使用正则表达式将文件名按照单词进行分割;1. Word segmentation processing: replace non-alphanumeric characters in the file name with spaces, and use regular expressions to split the file name according to words; 2、提取关键词:对于每个分割出来的单词,对照文件子集分类表内的关键词,剔除多余单词,将处理后的单词作为文件名的关键词;2. Extract keywords: For each segmented word, compare it with the keywords in the file subset classification table, eliminate redundant words, and use the processed words as keywords in the file name; (2)使用分类算法模型将提取的关键词对照子集分类表进行数据分类;(2) Use the classification algorithm model to classify the extracted keywords against the subset classification table; (3)数据存储,具体包括:(3) Data storage, specifically including: 1、存储至主系统数据库;1. Store in the main system database; 2、发送至其他子系统数据库;2. Send to other subsystem databases; 3、生成日志表。3. Generate log table. 4.根据权利要求1所述的一种基于大语言模型的业务数据处理方法,其特征在于:所述S5具体包括:对于未识别到标记的数据进行提醒,并提取日志表判断是否为初次上传,并依据子系统的来向和上传的次数提示标记选项,由上传者选择增加标记,非初次上传的数据分配后覆盖原数据,并将原数据转移至暂存站暂存。4. A business data processing method based on a large language model according to claim 1, characterized in that: the S5 specifically includes: reminding the data that has not been marked, and extracting the log table to determine whether it is the first upload. , and prompts the marking option based on the subsystem's direction and the number of uploads. The uploader chooses to add a mark. The non-first-time uploaded data will overwrite the original data after distribution, and the original data will be transferred to the temporary storage station for temporary storage. 5.根据权利要求1所述的一种基于大语言模型的业务数据处理方法,其特征在于:所述S3子系统向主系统上传数据前需要进行一系列的数据处理和分类,具体包括以下几个步骤:5. A business data processing method based on a large language model according to claim 1, characterized in that: the S3 subsystem needs to perform a series of data processing and classification before uploading data to the main system, specifically including the following: steps: (1)对子系统上传的数据包进行划分,若数据包类含有不同类型数据,将不同类型数据分类成不同子集;(1) Divide the data packets uploaded by the subsystem. If the data packet class contains different types of data, classify the different types of data into different subsets; (2)根据子集分类表中的规则,对每个文件添加编码进行标记,在标记的过程中,依据识别出的特征结合文件子集分类表中分类标记的顺序,对子集文件分别叠加分类标记;(2) According to the rules in the subset classification table, add coding to each file to mark. During the marking process, the subset files are superimposed based on the identified characteristics and the order of classification marks in the file subset classification table. Classification tag; (3)将标记好编码的数据上传到主系统。(3) Upload the marked and coded data to the main system. 6.根据权利要求1所述的一种基于大语言模型的业务数据处理方法,其特征在于:所述子系统按照企业部门划分为若干部门子系统,部门内每个员工账户划分在同一部门子系统内。6. A business data processing method based on a large language model according to claim 1, characterized in that: the subsystem is divided into several department subsystems according to enterprise departments, and each employee account in the department is divided into the same department subsystem. within the system. 7.根据权利要求6所述的一种基于大语言模型的业务数据处理方法,其特征在于:所述主系统包括数据管理单元、主系统数据库、大语言交互模块和子系统权限管理模块,所述数据管理单元和大语言交互模块通过访问主系统数据库来上传、下载和查看数据,所述子系统权限管理模块用于管理部门子系统的账户权限。7. A business data processing method based on a large language model according to claim 6, characterized in that: the main system includes a data management unit, a main system database, a large language interaction module and a subsystem authority management module. The data management unit and the large language interaction module upload, download and view data by accessing the main system database. The subsystem authority management module is used to manage the account authority of the department subsystem. 8.根据权利要求6所述的一种基于大语言模型的业务数据处理方法,其特征在于:所述子系统与主系统通过企业内网连接,所述子系统位于外网时通过VPN进入内网访问主系统。8. A business data processing method based on a large language model according to claim 6, characterized in that: the subsystem is connected to the main system through the enterprise intranet, and when the subsystem is located on the external network, it enters the internal network through VPN. Network access to the main system. 9.根据权利要求6所述的一种基于大语言模型的业务数据处理方法,其特征在于:所述部门子系统包括子系统数据库、系统框架、通信机制、扩展模块和模块化功能单元,所述模块化功能单元还包括若干功能模块以及定义若干功能模块接口的接口定义模块。9. A business data processing method based on a large language model according to claim 6, characterized in that: the department subsystem includes a subsystem database, a system framework, a communication mechanism, an expansion module and a modular functional unit, so The modular functional unit also includes several functional modules and an interface definition module that defines several functional module interfaces. 10.根据权利要求9所述的一种基于大语言模型的业务数据处理方法,其特征在于:所述通信机制用于将模块化功能单元集成的功能模块连接至系统框架构成完整系统,所述扩展模块用于加载新功能数据后扩展为新功能模块,或在原有功能模块故障时加载该功能模块数据临时使用,所述子系统数据库用于储存本地数据以及功能数据的模块化数据。10. A business data processing method based on a large language model according to claim 9, characterized in that: the communication mechanism is used to connect functional modules integrated with modular functional units to the system framework to form a complete system. The expansion module is used to load new function data and then expand it into a new function module, or to load the function module data for temporary use when the original function module fails. The subsystem database is used to store local data and modular data of the function data.
CN202410170293.1A 2024-02-06 2024-02-06 Business data processing method based on large language model Active CN117709909B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410170293.1A CN117709909B (en) 2024-02-06 2024-02-06 Business data processing method based on large language model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410170293.1A CN117709909B (en) 2024-02-06 2024-02-06 Business data processing method based on large language model

Publications (2)

Publication Number Publication Date
CN117709909A true CN117709909A (en) 2024-03-15
CN117709909B CN117709909B (en) 2024-04-09

Family

ID=90150208

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410170293.1A Active CN117709909B (en) 2024-02-06 2024-02-06 Business data processing method based on large language model

Country Status (1)

Country Link
CN (1) CN117709909B (en)

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180075138A1 (en) * 2016-09-14 2018-03-15 FileFacets Corp. Electronic document management using classification taxonomy
CN108629646A (en) * 2017-03-23 2018-10-09 长沙海登网络科技有限公司 The Online Bookstore System of new A SP technological development
CN112016608A (en) * 2020-08-21 2020-12-01 四川大学 Garment perceptual intention classification method based on convolutional neural network, classification model and construction method thereof
CN113961523A (en) * 2018-01-26 2022-01-21 创新先进技术有限公司 Business file splitting and summarizing method, device and equipment
US20220391583A1 (en) * 2021-06-03 2022-12-08 Capital One Services, Llc Systems and methods for natural language processing
CN116029273A (en) * 2022-12-28 2023-04-28 上海浦东发展银行股份有限公司 Text processing method, device, computer equipment and storage medium
CN116483660A (en) * 2023-04-27 2023-07-25 北京新能源汽车股份有限公司 Method, device and equipment for acquiring vehicle-end log and readable storage medium
CN116579339A (en) * 2023-07-12 2023-08-11 阿里巴巴(中国)有限公司 Task execution method and optimization task execution method
CN117112776A (en) * 2023-09-23 2023-11-24 宏景科技股份有限公司 Enterprise knowledge base management and retrieval platform and method based on large language model
CN117235220A (en) * 2023-09-15 2023-12-15 之江实验室 Extensible large language model calling method and device based on graph database knowledge enhancement
CN117235243A (en) * 2023-11-16 2023-12-15 青岛民航凯亚系统集成有限公司 Training optimization method for large language model of civil airport and comprehensive service platform

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180075138A1 (en) * 2016-09-14 2018-03-15 FileFacets Corp. Electronic document management using classification taxonomy
CN108629646A (en) * 2017-03-23 2018-10-09 长沙海登网络科技有限公司 The Online Bookstore System of new A SP technological development
CN113961523A (en) * 2018-01-26 2022-01-21 创新先进技术有限公司 Business file splitting and summarizing method, device and equipment
CN112016608A (en) * 2020-08-21 2020-12-01 四川大学 Garment perceptual intention classification method based on convolutional neural network, classification model and construction method thereof
US20220391583A1 (en) * 2021-06-03 2022-12-08 Capital One Services, Llc Systems and methods for natural language processing
CN116029273A (en) * 2022-12-28 2023-04-28 上海浦东发展银行股份有限公司 Text processing method, device, computer equipment and storage medium
CN116483660A (en) * 2023-04-27 2023-07-25 北京新能源汽车股份有限公司 Method, device and equipment for acquiring vehicle-end log and readable storage medium
CN116579339A (en) * 2023-07-12 2023-08-11 阿里巴巴(中国)有限公司 Task execution method and optimization task execution method
CN117235220A (en) * 2023-09-15 2023-12-15 之江实验室 Extensible large language model calling method and device based on graph database knowledge enhancement
CN117112776A (en) * 2023-09-23 2023-11-24 宏景科技股份有限公司 Enterprise knowledge base management and retrieval platform and method based on large language model
CN117235243A (en) * 2023-11-16 2023-12-15 青岛民航凯亚系统集成有限公司 Training optimization method for large language model of civil airport and comprehensive service platform

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
YUN WAN 等: "An Ensemble Sentiment Classification System of Twitter Data for Airline Services Analysis", 《2015 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOP (ICDMW)》, 4 February 2016 (2016-02-04), pages 1318 - 1325 *
李世钰 等: "古籍数字化国内外研究现状分析与路径构建研究", 《现代情报》, vol. 43, no. 11, 30 November 2023 (2023-11-30), pages 4 - 20 *
李阳: "基于云平台的电子病案系统研究与实", 《中国优秀硕士学位论文全文数据库 医药卫生科技辑》, no. 8, 15 August 2019 (2019-08-15), pages 053 - 152 *
赵姝 等: "不完整数据集的信息熵集成分类算法", 《模式识别与人工智能》, vol. 27, no. 3, 31 March 2014 (2014-03-31), pages 193 - 198 *

Also Published As

Publication number Publication date
CN117709909B (en) 2024-04-09

Similar Documents

Publication Publication Date Title
US11250209B2 (en) Document collaboration and consolidation tools and methods of use
US10783168B2 (en) Systems and methods for probabilistic data classification
US8315997B1 (en) Automatic identification of document versions
US4933848A (en) Method for enforcing referential constraints in a database management system
US10346358B2 (en) Systems and methods for management of data platforms
US20220019905A1 (en) Enterprise knowledge graph building with mined topics and relationships
US20180075138A1 (en) Electronic document management using classification taxonomy
WO2022019973A1 (en) Enterprise knowledge graphs using enterprise named entity recognition
US11573967B2 (en) Enterprise knowledge graphs using multiple toolkits
US20110282854A1 (en) Virtual repository management
CN101777058A (en) Document management system
US20150039600A1 (en) Extensible person container
CN115033905A (en) Wisdom archives management system platform based on thing networking
CN110795923B (en) Automatic generation system and generation method for technical document based on natural language processing
CN114090653B (en) Resource data statistics method and device, meta-platform equipment and storage medium
CN110990403A (en) Business data storage method, system, computer equipment and storage medium
WO2022020012A1 (en) Annotations for enterprise knowledge graphs using multiple toolkits
EP4182815A1 (en) Enterprise knowledge graphs using user-based mining
CN110390099A (en) An object relation extraction system and extraction method based on template library
CN117709909B (en) Business data processing method based on large language model
CN118761736A (en) A document management system and method based on artificial intelligence
CN113254583A (en) Document marking method, device and medium based on semantic vector
CN116090416B (en) Standard writing method, system, equipment and medium based on standard knowledge graph
CN117171164A (en) Data storage method, device, terminal equipment and storage medium
CN114741276A (en) Method and device for multiplexing test cases of domestic operating system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant