CN117709909A - Business data processing method based on large language model - Google Patents
Business data processing method based on large language model Download PDFInfo
- Publication number
- CN117709909A CN117709909A CN202410170293.1A CN202410170293A CN117709909A CN 117709909 A CN117709909 A CN 117709909A CN 202410170293 A CN202410170293 A CN 202410170293A CN 117709909 A CN117709909 A CN 117709909A
- Authority
- CN
- China
- Prior art keywords
- data
- file
- subsystem
- classification
- subset
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 23
- 238000012545 processing Methods 0.000 claims abstract description 28
- 238000013523 data management Methods 0.000 claims abstract description 11
- 230000007246 mechanism Effects 0.000 claims abstract description 11
- 230000006870 function Effects 0.000 claims description 33
- 238000000034 method Methods 0.000 claims description 28
- 230000003993 interaction Effects 0.000 claims description 20
- 230000008569 process Effects 0.000 claims description 17
- 238000004891 communication Methods 0.000 claims description 8
- 238000007667 floating Methods 0.000 claims description 8
- 238000007726 management method Methods 0.000 claims description 8
- 238000004422 calculation algorithm Methods 0.000 claims description 7
- 238000007635 classification algorithm Methods 0.000 claims description 5
- 230000011218 segmentation Effects 0.000 claims description 5
- 230000014509 gene expression Effects 0.000 claims description 4
- 238000013500 data storage Methods 0.000 claims description 3
- 230000010354 integration Effects 0.000 claims description 3
- 238000000605 extraction Methods 0.000 claims description 2
- 230000026676 system process Effects 0.000 abstract description 3
- 230000009471 action Effects 0.000 description 3
- 238000010586 diagram Methods 0.000 description 3
- 239000000463 material Substances 0.000 description 3
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 238000013528 artificial neural network Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004364 calculation method Methods 0.000 description 1
- 238000003066 decision tree Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000000284 extract Substances 0.000 description 1
- 230000002452 interceptive effect Effects 0.000 description 1
- 238000007477 logistic regression Methods 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000007637 random forest analysis Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012706 support-vector machine Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/10—Office automation; Time management
- G06Q10/103—Workflow collaboration or project management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/14—Details of searching files based on file metadata
- G06F16/148—File search processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/16—File or folder operations, e.g. details of user interfaces specifically adapted to file systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/10—File systems; File servers
- G06F16/16—File or folder operations, e.g. details of user interfaces specifically adapted to file systems
- G06F16/162—Delete operations
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/903—Querying
- G06F16/9032—Query formulation
- G06F16/90332—Natural language query formulation or dialogue systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/906—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/604—Tools and structures for managing or administering access control systems
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F21/00—Security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F21/60—Protecting data
- G06F21/62—Protecting access to data via a platform, e.g. using keys or access control rules
- G06F21/6218—Protecting access to data via a platform, e.g. using keys or access control rules to a system of files or objects, e.g. local or distributed file system or database
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0481—Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0481—Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance
- G06F3/04817—Interaction techniques based on graphical user interfaces [GUI] based on specific properties of the displayed interaction object or a metaphor-based environment, e.g. interaction with desktop elements like windows or icons, or assisted by a cursor's changing behaviour or appearance using icons
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/54—Interprogram communication
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
- G06N5/041—Abduction
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/02—Network architectures or network communication protocols for network security for separating internal from external traffic, e.g. firewalls
- H04L63/0272—Virtual private networks
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L63/00—Network architectures or network communication protocols for network security
- H04L63/10—Network architectures or network communication protocols for network security for controlling access to devices or network resources
- H04L63/101—Access control lists [ACL]
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L9/00—Cryptographic mechanisms or cryptographic arrangements for secret or secure communications; Network security protocols
- H04L9/40—Network security protocols
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2221/00—Indexing scheme relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/21—Indexing scheme relating to G06F21/00 and subgroups addressing additional information or applications relating to security arrangements for protecting computers, components thereof, programs or data against unauthorised activity
- G06F2221/2141—Access rights, e.g. capability lists, access control lists, access tables, access matrices
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Computer Security & Cryptography (AREA)
- Software Systems (AREA)
- Computer Hardware Design (AREA)
- Human Computer Interaction (AREA)
- Business, Economics & Management (AREA)
- Signal Processing (AREA)
- Mathematical Physics (AREA)
- Computer Networks & Wireless Communication (AREA)
- Human Resources & Organizations (AREA)
- Computing Systems (AREA)
- Strategic Management (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Bioethics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Entrepreneurship & Innovation (AREA)
- Library & Information Science (AREA)
- Evolutionary Computation (AREA)
- Automation & Control Theory (AREA)
- Economics (AREA)
- Marketing (AREA)
- Operations Research (AREA)
- Quality & Reliability (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a business data processing method based on a large language model, which comprises the following steps: s1, modularizing a subsystem according to department work items; s2, making a file subset classification table; the invention relates to the technical field of electric digital data processing. According to the business data processing method based on the large language model, various business data files of different departments and different personnel in an enterprise are integrated uniformly, uniform rules are formulated, a subset classification table is established to carry out ordered classification, identification classification and marking can be carried out when data are uploaded, and further when a main system processes the data, the direction to be processed can be quickly found out, such as classified storage record or sending to other subsystems, the difficulty of staff operation is saved, the error rate is reduced, a log table is simultaneously identified and processed and generated, a reminding mechanism is matched, the repeated operation problem is effectively avoided, the working efficiency is improved, meanwhile, the tracing is convenient, and the business data management is convenient.
Description
Technical Field
The invention relates to the technical field of electric digital data processing, in particular to a business data processing method based on a large language model.
Background
During the working process, enterprises usually process a large number of complicated data files, and the existing data types are numerous, so that the types of the data files which need to be processed are more complicated for different departments of the whole enterprise.
CN102368311B discloses a service data processing device, which can flexibly process its own data according to service rules defined by users, support subsequent flows, and also solve the problem of repeated operations of users caused by the introduction of multiple service rules without repeated development.
The technology extracts unified rules aiming at the same type of files with different rules, and manages all files through the unified rules, but for some enterprises, the types of the files are possibly more diversified, such as drawing files, financial reports, personnel information, client data and the like, and the unified rules cannot be found to define and manage all types of files, so that in the working process, different files are difficult to effectively process;
even if some files exist in different departments at the same time, even if the same file has slight differences, for example, the same file in different departments has deviation in folder names, and terminal equipment of different departments and different staff in the same department usually does not share resources, if an uploading system is needed, the system only can identify whether the folder names of the outermost layers are consistent, even if only one word is added, the inconsistent judgment is carried out, and the problem of file repetition can occur, so that a great deal of work is repeated, and repeated operation can cause wrong data to mask correct data, and further a subsequent series of troubles can be caused.
Disclosure of Invention
Aiming at the defects of the prior art, the invention provides a business data processing method based on a large language model, which solves the problems that a large number of accounts in an enterprise process a large amount of data, repeated operation is easy to exist, and even the data is wrong finally, so that the follow-up work is influenced.
In order to achieve the above purpose, the invention is realized by the following technical scheme: a business data processing method based on a large language model specifically comprises the following steps:
s1, modularizing a subsystem according to department work items;
s2, making a file subset classification table: first, creating a subset of file data, wherein the subset is expressed as:
S={x|x∈A,P(x)};
where S represents the created subset, a represents the original dataset, P (x) represents the condition or rule used for classification, which means that S is a subset of elements x in a that satisfy the condition P (x);
then establishing a file subset classification table according to the subset, processing and classifying and identifying new file names, and setting a first class classification as A, wherein the first class classification comprises two classes of a size and a class A1 and A2 respectively; the secondary classification is B, including two types of size, namely B1 and B2, and then the formula of the classification table can be expressed as follows:
A={A1,A2},B={B1,B2};
wherein A1 and A2 are the sub-category numbers of A, B1 and B2 are the sub-category numbers of B, and n numbers are further divided under the categories A1, A2, B1 and B2 to refer to specific keywords;
s3, before uploading enterprise data to a main system, the subsystem marks keywords;
s4, the data of the main system are identified, classified and processed through the data management unit, and meanwhile, for the file uploaded for the first time, similar files are searched through all subsystems, and reminding is sent out for the subsystems with the similar files so as to eliminate repeated work;
s5, data identification is combined with data marking and coverage strategy identification and data processing, and simultaneously, the data identification and the data processing are matched with the user interaction depth to participate in the processing process;
s6, if the main database file is not accessed after the set time is reached, the project is ended, all related subsystem database data are automatically deleted, and the data corresponding to the main database are transferred to a finishing database;
s7, inquiring through the large language interaction module when searching the data, and feeding back the data download item and attaching the data history log by the large language interaction module.
Preferably, the step S2 specifically includes the following steps:
(1) Dividing file names: dividing the file name into numbers, letters and nouns according to spaces and symbols to obtain a word list; marking the parts of speech of each word in the word list;
(2) History identification record: extracting file name sentences according to the historical file classification records and the file specific data information, removing invalid features according to manually set rules, and reserving valid features to assist a subsequent automatic classification process;
(3) Setting rule classification marks: manually making a file subset classification table according to departments and file types affiliated to the service data files, manually setting rules to code the extracted feature editing subset to replace features, adding codes to classify and mark file names, and associating the file names with corresponding categories;
(4) Algorithm integration: and integrating the effective characteristics, the history identification record and the formulated file subset classification table into an algorithm model, and carrying out marking processing and classification identification on the new file name.
Preferably, the S4 specifically includes:
(1) Word segmentation and keyword extraction:
1. word segmentation: replacing non-alphanumeric characters in the file names with spaces, and dividing the file names according to words by using a regular expression;
2. extracting keywords: for each segmented word, the keywords in the file subset classification table are compared, redundant words are removed, and the processed words are used as keywords of file names;
(2) Using a classification algorithm model to classify the data of the extracted keywords against a subset classification table;
(3) The data storage specifically comprises:
1. storing the data in a main system database;
2. sending to other subsystem databases;
3. a log table is generated.
Preferably, the step S5 specifically includes: reminding the data without identifying the mark, extracting a log table to judge whether the data is uploaded for the first time, prompting mark options according to the uploading times of the subsystem, selecting to add the mark by an uploading user, distributing the data which is not uploaded for the first time, covering the original data, and transferring the original data to a temporary storage station for temporary storage.
Preferably, before uploading data to the main system, the S3 subsystem needs to perform a series of data processing and classification, and specifically includes the following steps:
(1) Dividing the data packet uploaded by the subsystem, and classifying the data of different types into different subsets if the data packet class contains the data of different types;
(2) Adding codes to each file according to rules in the subset classification table to mark, and respectively superposing classification marks on the subset files according to the recognized characteristics and the classification mark sequence in the subset classification table of the files in the marking process;
(3) And uploading the marked coded data to a host system.
Preferably, the subsystem is divided into a plurality of department subsystems according to enterprise departments, and each employee account in a department is divided into the same department subsystem.
Preferably, the main system comprises a data management unit, a main system database, a large language interaction module and a subsystem authority management module, wherein the data management unit and the large language interaction module upload, download and view data by accessing the main system database, and the subsystem authority management module is used for managing account authorities of a subsystem of a department.
Preferably, the subsystem is connected with the main system through an enterprise intranet, and the subsystem accesses the main system through a VPN access intranet when being located on an external network.
Preferably, the department subsystem comprises a subsystem database, a system framework, a communication mechanism, an expansion module and a modularized functional unit, wherein the modularized functional unit further comprises a plurality of functional modules and an interface definition module for defining interfaces of the functional modules.
Preferably, the communication mechanism is used for connecting the function module integrated by the modularized function unit to the system framework to form a complete system, the expansion module is used for loading new function data and expanding the new function module, or loading the function module data for temporary use when the original function module fails, and the subsystem database is used for storing the local data and modularized data of the function data.
Advantageous effects
The invention provides a business data processing method based on a large language model. Compared with the prior art, the method has the following beneficial effects:
1. according to the business data processing method based on the large language model, various business data files of different departments and different personnel in an enterprise are integrated uniformly, uniform rules are formulated, a subset classification table is established to carry out ordered classification, identification classification and marking can be carried out when data are uploaded, and further when a main system processes the data, the direction to be processed can be quickly found out, such as classified storage records or sending to other subsystems, the difficulty of staff operation is saved, the error rate is reduced, a log table is identified and processed simultaneously, a reminding mechanism and an automatic subsystem searching function are matched, the subsystem with common data is reminded, the problem of repeated operation is effectively avoided, the working efficiency is improved, meanwhile, tracing is convenient, business data management is facilitated, meanwhile, time rules are formulated, the stored files are cleaned regularly, the subsystem is cleaned, the redundant data of the subsystem is avoided, the workload of staff processing the data is further reduced, and the repeated operation is also effectively avoided.
2. According to the business data processing method based on the large language model, the large language interaction module is arranged, a floating window can be built at one corner of the operation interface, the floating window is always kept at the fixed position of the display interface along with the pulling of the operation interface, an operator can conveniently use the floating window, the large language interaction module can inquire in a dialogue mode, the operation of searching data, searching histories, operating rules and other systems for recording can be performed, a required enterprise can also carry the aided office of the artificial intelligent plug-in units such as chatgpt and the like, and the efficiency of enterprise data processing is effectively improved.
3. According to the business data processing method based on the large language model, the large system is divided into a plurality of subsystems according to department properties, accounts of staff in departments are managed respectively, the contents of the subsystems of each department are different, targeted management and business data processing are facilitated, meanwhile, the mode that an internal enterprise network accesses an external VPN accesses the internal network is adopted, and the security of business data of the enterprise is further guaranteed.
4. According to the business data processing method based on the large language model, the department subsystem is established to be in a function modularization mode, each function is independent, and the communication mechanism is utilized to cooperate with the interface definition module to be carried on the system frame, so that targeted function combination can be conveniently carried out according to different business properties of different departments, a plurality of unnecessary functions are omitted, interface setting of the department subsystem is optimized, the department subsystem is concise and easy to operate, a new person can conveniently get on hand, and the error rate of business data operation can be reduced.
Drawings
FIG. 1 is a block diagram of the overall system of the present invention;
FIG. 2 is a system block diagram of a department subsystem of the present invention;
FIG. 3 is a schematic flow chart of the present invention;
FIG. 4 is a logic flow diagram of long-term file processing in accordance with the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
FIGS. 1-4 the present invention provides three technical solutions:
first embodiment: a business data processing method based on a large language model specifically comprises the following steps:
s1, modularizing a subsystem according to department work items;
s2, specifically comprising the following steps:
(1) Dividing file names: dividing the file name into numbers, letters and nouns according to spaces and symbols to obtain a word list; marking the parts of speech of each word in the word list;
(2) History identification record: extracting file name sentences according to the historical file classification records and the file specific data information, removing invalid features according to manually set rules, and reserving valid features to assist a subsequent automatic classification process;
(3) Setting rule classification marks: manually making a file subset classification table according to departments and file types affiliated to the service data files, manually setting rules to code the extracted feature editing subset to replace features, adding codes to classify and mark file names, and associating the file names with corresponding categories;
the making of the file subset classification table includes: first, creating a subset of file data, wherein the subset is expressed as:
S={x|x∈A,P(x)};
where S represents the created subset, a represents the original dataset, P (x) represents the condition or rule used for classification, which means that S is a subset of elements x in a that satisfy the condition P (x);
then, establishing a file subset classification table according to the subset, processing and classifying and identifying new file names, and setting a first class classification as A, wherein the first class classification comprises a large class classification and a small class classification (classification can be added according to the development condition of enterprises) and the first class classification is A1 and A2 respectively; the secondary classification is B, including two types of size, namely B1 and B2, and then the formula of the classification table can be expressed as follows:
A={A1,A2},B={B1,B2};
wherein A1 and A2 are the sub-category numbers of A, B1 and B2 are the sub-category numbers of B, and the further division of n numbers under the A1, A2, B1 and B2 categories refers to specific keywords, e.g. A1 below A1 includes { A1 ] 01 、A1 02 、......、A1 n A1 represents a financial engineering material, A1 01 Representing engineering investment estimating materials, A1 02 Representing engineering design calculation materials;
(4) Algorithm integration: integrating the effective characteristics, the history identification record and the formulated file subset classification table into an algorithm model, and carrying out marking processing and classification identification on the new file name by adopting machine learning, deep learning or other related technologies;
s3, before uploading enterprise data to the main system, the subsystem needs to perform a series of data processing and classification, and specifically comprises the following steps:
(1) Dividing the data packet uploaded by the subsystem, and classifying the data of different types into different subsets if the data packet class contains the data of different types;
(2) Adding codes to each file according to rules in the subset classification table to mark, and respectively superposing classification marks on the subset files according to the recognized characteristics and the classification mark sequence in the subset classification table of the files in the marking process;
(3) Uploading the marked coded data to a host system;
s4, the data of the main system are identified, classified and processed through a data management unit, and the method specifically comprises the following steps:
1. word segmentation: replacing non-alphanumeric characters in the file names with spaces, and dividing the file names according to words by using a regular expression; regular expressions are a method used to describe string patterns that can be used to match, search, and replace strings in text.
2. Extracting keywords: for each segmented word, the keywords in the file subset classification table are compared, redundant words are removed, and the processed words are used as keywords of file names;
for example: 2023.11.29 Zhang San financial department certain enterprise 202345286- -engineering accounting report- -copy of phase control information, the extracted keywords include: 2023.11.29 (date), zhang San (person name processing), financial department (department), certain enterprise (enterprise name), 202345286 (code), engineering accounting report (file category), phase control information, copy and space and symbol, wherein Zhang San, financial department, certain enterprise, engineering accounting report can be matched with the characteristics in the file subset classification table, and then the rest keywords are invalid characteristics and deleted.
(2) Data classification of extracted keywords against a subset classification table using a classification algorithm model, e.g., A1 under the class of identified keyword membership A1 02 B2 under B2 01 Then mark A1 before the filename 02 B2 01 The method comprises the steps of carrying out a first treatment on the surface of the Common classification algorithm models are: logistic regression, decision tree random forest, naive Bayes, support vector machine, K nearest neighbor algorithm, neural network, which are all mature classification algorithm models.
(3) The data storage specifically comprises:
1. storing the data in a main system database;
2. sending to other subsystem databases;
3. a log table is generated.
Meanwhile, aiming at the file uploaded for the first time, traversing all subsystems to search for similar files, and sending out a prompt to the subsystem with the similar files so as to eliminate repeated work;
s5, data identification is combined with data marking and coverage strategy identification and data processing, and the data are matched with user interaction depth to participate in the processing process, and the method specifically comprises the following steps: reminding the data without identifying the mark, extracting a log table to judge whether the data is primarily uploaded, prompting mark options according to the uploading times of the subsystem, selecting an added mark by an uploading user, distributing the data which is not primarily uploaded, covering the original data, and transferring the original data to a temporary storage station for temporary storage (the temporary storage station and the previous storage space can be a cloud platform or a large-capacity storage device in a company);
s6, if the main database file is not accessed after the set time is reached, the project is ended, all related subsystem database data are automatically deleted, and the data corresponding to the main database are transferred to a finishing database (the finishing database is preferably a cloud platform and can store more and longer data for convenient tracing), and the method comprises the following steps:
(1) Setting a threshold time, such as 30 days;
(2) Traversing all files in a file system;
(3) For each file, its last access time is checked. The last access time of the file may be obtained using an API provided by the operating system;
(4) If the last access time of the file exceeds 30 days, it is marked as "delete needed";
(5) After the traversing is completed, traversing all files to be deleted once, and deleting the files.
As shown in fig. 4, the last access date is marked asT f The current date isT d The non-access time isT w, Then:T w, =T d -T f if (if)T w, And 30, transferring the file in the main system and deleting the corresponding file in the subsystem, otherwise, reserving the file.
S7, inquiring through the large language interaction module when searching the data, and feeding back the data download item and attaching the data history log by the large language interaction module.
The method has the advantages that various business data files of different departments and different personnel in an enterprise are integrated uniformly, uniform rules are formulated, a subset classification table is established to carry out ordered classification, identification classification and marking can be carried out when data are uploaded, and then when a main system processes the data, the direction which should be processed can be quickly found, such as classified storage records or sending to other subsystems, the difficulty of staff operation is saved, the error rate is reduced, meanwhile, a log table is identified and processed and generated, a reminding mechanism and an automatic subsystem searching function are matched, the subsystem with common data is prompted, the problem of repeated operation is effectively avoided, the working efficiency is improved, the error data can be avoided to cover correct data, meanwhile, tracing is convenient, the business data is convenient to manage, the time rules are formulated, the stored files are cleaned regularly, the sub-system is cleaned, the redundant data of the sub-system is avoided, the workload of staff processing the data is further reduced, and the repeated operation can be effectively avoided.
Making the large language interactive module icon always float at one corner of the interface can be implemented using CSS and HTML:
first, a div element containing an icon is created in the HTML file, for example:
<div class="floating-icon">
<img src="icon.png" alt="Floating Icon">
</div>
then, the style of the div element is set in the CSS file so that it always floats at a corner of the interface, for example:
.floating-icon {
position: fixed;
bottom 20 px;/. Distance from bottom
right 20px; × distance from right × +.
}
Through the method, the icon can always float at one corner of the interface, and the user can scroll the page or adjust the size of the window.
By arranging the large language interaction module, a floating window can be built at one corner of the operation interface, the floating window is always kept at a fixed position of the display interface along with the pulling of the operation interface, an operator can conveniently use the floating window, the large language interaction module can inquire in a dialogue mode, the operation of searching data, searching histories, operating rules and other systems for recording can be performed, a required enterprise can also carry chatgpt and other artificial intelligent plug-ins for assisting in office work, and the efficiency of enterprise data processing is effectively improved.
The second embodiment differs from the first embodiment mainly in that: the subsystem is divided into a plurality of department subsystems according to enterprise departments, and each employee account in a department is divided into the same department subsystem.
The main system comprises a data management unit, a main system database, a large language interaction module and a subsystem authority management module, wherein the data management unit and the large language interaction module upload, download and view data by accessing the main system database, and the subsystem authority management module is used for managing account authorities of a subsystem of a department.
The subsystem is connected with the main system through an enterprise intranet, and the subsystem accesses the main system through a VPN (virtual private network) when being located on an external network.
The large system is divided into a plurality of subsystems according to the nature of departments, accounts of staff in the departments are managed respectively, the content of the subsystems of each department is different, targeted management and business data processing are facilitated, and meanwhile, the mode of accessing an internal network of an enterprise and accessing the internal network by an external VPN is adopted, so that the security of business data of the enterprise is further ensured.
The third embodiment differs from the second embodiment mainly in that: the department subsystem comprises a subsystem database, a system framework, a communication mechanism, an expansion module and a modularized functional unit, wherein the modularized functional unit also comprises a plurality of functional modules and an interface definition module for defining interfaces of the functional modules;
the communication mechanism is used for connecting the function modules integrated by the modularized function units to the system framework to form a complete system, the expansion module is used for loading new function data and expanding the new function data into the new function module or loading the function module data for temporary use when the original function module fails, and the subsystem database is used for storing the local data and modularized data of the function data.
By establishing the department subsystem as a function modularization mode, each function is independent, and the communication mechanism is matched with the interface definition module to be carried on the system frame, so that targeted function combination can be conveniently carried out according to different department service properties, a plurality of unnecessary functions are omitted, the interface setting of the department subsystem is optimized, the interface setting of the department subsystem is concise and easy to operate, a new person can conveniently get on hand, and the error rate of service data operation can be reduced.
And all that is not described in detail in this specification is well known to those skilled in the art.
It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus.
Although embodiments of the present invention have been shown and described, it will be understood by those skilled in the art that various changes, modifications, substitutions and alterations can be made therein without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.
Claims (10)
1. A business data processing method based on a large language model is characterized in that: the method specifically comprises the following steps:
s1, modularizing a subsystem according to department work items;
s2, making a file subset classification table: first, creating a subset of file data, wherein the subset is expressed as:
S={x|x∈A,P(x)};
wherein S represents a created subset, a represents an original dataset, P (x) represents a condition or rule for classification, the classified condition or rule being set by human participation, which means that S is a subset composed of elements x in a satisfying the condition P (x);
then establishing a file subset classification table according to the subset, processing and classifying and identifying new file names, and setting a first class classification as A, wherein the first class classification comprises two classes of a size and a class A1 and A2 respectively; the secondary classification is B, including two types of size, namely B1 and B2, and then the formula of the classification table can be expressed as follows:
A={A1,A2},B={B1,B2};
wherein A1 and A2 are the sub-category numbers of A, B1 and B2 are the sub-category numbers of B, and n numbers are further divided under the categories A1, A2, B1 and B2 to refer to specific keywords;
s3, before uploading enterprise data to a main system, the subsystem marks keywords;
s4, the data of the main system are identified, classified and processed through the data management unit, and meanwhile, for the file uploaded for the first time, similar files are searched through all subsystems, and reminding is sent out for the subsystems with the similar files so as to eliminate repeated work;
s5, data identification is combined with data marking and coverage strategy identification and data processing, and simultaneously, the data identification and the data processing are matched with the user interaction depth to participate in the processing process;
s6, if the main database file is not accessed after the set time is reached, the project is ended, all related subsystem database data are automatically deleted, and the data corresponding to the main database are transferred to a finishing database;
s7, floating loading the large language interaction module on a system operation interface by using CSS and HTML, inquiring through the large language interaction module when searching data, and feeding back corresponding data by the large language interaction module.
2. The business data processing method based on the large language model according to claim 1, wherein: the step S2 specifically comprises the following steps:
(1) Dividing file names: dividing the file name into numbers, letters and nouns according to spaces and symbols to obtain a word list; marking the parts of speech of each word in the word list;
(2) History identification record: extracting file name sentences according to the historical file classification records and the file specific data information, removing invalid features according to manually set rules, and reserving valid features to assist a subsequent automatic classification process;
(3) Setting rule classification marks: manually making a file subset classification table according to departments and file types affiliated to the service data files, manually setting rules to code the extracted feature editing subset to replace features, adding codes to classify and mark file names, and associating the file names with corresponding categories;
(4) Algorithm integration: and integrating the effective characteristics, the history identification record and the formulated file subset classification table into an algorithm model, and carrying out marking processing and classification identification on the new file name.
3. The business data processing method based on the large language model according to claim 1, wherein: the step S4 specifically comprises the following steps:
(1) Word segmentation and keyword extraction:
1. word segmentation: replacing non-alphanumeric characters in the file names with spaces, and dividing the file names according to words by using a regular expression;
2. extracting keywords: for each segmented word, the keywords in the file subset classification table are compared, redundant words are removed, and the processed words are used as keywords of file names;
(2) Using a classification algorithm model to classify the data of the extracted keywords against a subset classification table;
(3) The data storage specifically comprises:
1. storing the data in a main system database;
2. sending to other subsystem databases;
3. a log table is generated.
4. The business data processing method based on the large language model according to claim 1, wherein: the step S5 specifically comprises the following steps: reminding the data without identifying the mark, extracting a log table to judge whether the data is uploaded for the first time, prompting mark options according to the uploading times of the subsystem, selecting to add the mark by an uploading user, distributing the data which is not uploaded for the first time, covering the original data, and transferring the original data to a temporary storage station for temporary storage.
5. The business data processing method based on the large language model according to claim 1, wherein: the S3 subsystem needs to perform a series of data processing and classification before uploading data to the main system, and specifically comprises the following steps:
(1) Dividing the data packet uploaded by the subsystem, and classifying the data of different types into different subsets if the data packet class contains the data of different types;
(2) Adding codes to each file according to rules in the subset classification table to mark, and respectively superposing classification marks on the subset files according to the recognized characteristics and the classification mark sequence in the subset classification table of the files in the marking process;
(3) And uploading the marked coded data to a host system.
6. The business data processing method based on the large language model according to claim 1, wherein: the subsystem is divided into a plurality of department subsystems according to enterprise departments, and each employee account in a department is divided into the same department subsystem.
7. The business data processing method based on the large language model according to claim 6, wherein: the main system comprises a data management unit, a main system database, a large language interaction module and a subsystem authority management module, wherein the data management unit and the large language interaction module upload, download and view data by accessing the main system database, and the subsystem authority management module is used for managing account authorities of a subsystem of a department.
8. The business data processing method based on the large language model according to claim 6, wherein: the subsystem is connected with the main system through an enterprise intranet, and the subsystem accesses the main system through a VPN (virtual private network) when being located on an external network.
9. The business data processing method based on the large language model according to claim 6, wherein: the department subsystem comprises a subsystem database, a system framework, a communication mechanism, an expansion module and a modularized functional unit, wherein the modularized functional unit further comprises a plurality of functional modules and an interface definition module for defining interfaces of the functional modules.
10. The business data processing method based on the large language model according to claim 9, wherein: the communication mechanism is used for connecting the function modules integrated by the modularized function units to the system framework to form a complete system, the expansion module is used for loading new function data and expanding the new function data into the new function module or loading the function module data for temporary use when the original function module fails, and the subsystem database is used for storing the local data and modularized data of the function data.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410170293.1A CN117709909B (en) | 2024-02-06 | 2024-02-06 | Business data processing method based on large language model |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202410170293.1A CN117709909B (en) | 2024-02-06 | 2024-02-06 | Business data processing method based on large language model |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117709909A true CN117709909A (en) | 2024-03-15 |
CN117709909B CN117709909B (en) | 2024-04-09 |
Family
ID=90150208
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202410170293.1A Active CN117709909B (en) | 2024-02-06 | 2024-02-06 | Business data processing method based on large language model |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117709909B (en) |
Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180075138A1 (en) * | 2016-09-14 | 2018-03-15 | FileFacets Corp. | Electronic document management using classification taxonomy |
CN108629646A (en) * | 2017-03-23 | 2018-10-09 | 长沙海登网络科技有限公司 | The Online Bookstore System of new A SP technological development |
CN112016608A (en) * | 2020-08-21 | 2020-12-01 | 四川大学 | Garment perceptual intention classification method based on convolutional neural network, classification model and construction method thereof |
CN113961523A (en) * | 2018-01-26 | 2022-01-21 | 创新先进技术有限公司 | Business file splitting and summarizing method, device and equipment |
US20220391583A1 (en) * | 2021-06-03 | 2022-12-08 | Capital One Services, Llc | Systems and methods for natural language processing |
CN116029273A (en) * | 2022-12-28 | 2023-04-28 | 上海浦东发展银行股份有限公司 | Text processing method, device, computer equipment and storage medium |
CN116483660A (en) * | 2023-04-27 | 2023-07-25 | 北京新能源汽车股份有限公司 | Method, device and equipment for acquiring vehicle-end log and readable storage medium |
CN116579339A (en) * | 2023-07-12 | 2023-08-11 | 阿里巴巴(中国)有限公司 | Task execution method and optimization task execution method |
CN117112776A (en) * | 2023-09-23 | 2023-11-24 | 宏景科技股份有限公司 | Enterprise knowledge base management and retrieval platform and method based on large language model |
CN117235243A (en) * | 2023-11-16 | 2023-12-15 | 青岛民航凯亚系统集成有限公司 | Training optimization method for large language model of civil airport and comprehensive service platform |
CN117235220A (en) * | 2023-09-15 | 2023-12-15 | 之江实验室 | Extensible large language model calling method and device based on graph database knowledge enhancement |
-
2024
- 2024-02-06 CN CN202410170293.1A patent/CN117709909B/en active Active
Patent Citations (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20180075138A1 (en) * | 2016-09-14 | 2018-03-15 | FileFacets Corp. | Electronic document management using classification taxonomy |
CN108629646A (en) * | 2017-03-23 | 2018-10-09 | 长沙海登网络科技有限公司 | The Online Bookstore System of new A SP technological development |
CN113961523A (en) * | 2018-01-26 | 2022-01-21 | 创新先进技术有限公司 | Business file splitting and summarizing method, device and equipment |
CN112016608A (en) * | 2020-08-21 | 2020-12-01 | 四川大学 | Garment perceptual intention classification method based on convolutional neural network, classification model and construction method thereof |
US20220391583A1 (en) * | 2021-06-03 | 2022-12-08 | Capital One Services, Llc | Systems and methods for natural language processing |
CN116029273A (en) * | 2022-12-28 | 2023-04-28 | 上海浦东发展银行股份有限公司 | Text processing method, device, computer equipment and storage medium |
CN116483660A (en) * | 2023-04-27 | 2023-07-25 | 北京新能源汽车股份有限公司 | Method, device and equipment for acquiring vehicle-end log and readable storage medium |
CN116579339A (en) * | 2023-07-12 | 2023-08-11 | 阿里巴巴(中国)有限公司 | Task execution method and optimization task execution method |
CN117235220A (en) * | 2023-09-15 | 2023-12-15 | 之江实验室 | Extensible large language model calling method and device based on graph database knowledge enhancement |
CN117112776A (en) * | 2023-09-23 | 2023-11-24 | 宏景科技股份有限公司 | Enterprise knowledge base management and retrieval platform and method based on large language model |
CN117235243A (en) * | 2023-11-16 | 2023-12-15 | 青岛民航凯亚系统集成有限公司 | Training optimization method for large language model of civil airport and comprehensive service platform |
Non-Patent Citations (4)
Title |
---|
YUN WAN 等: "An Ensemble Sentiment Classification System of Twitter Data for Airline Services Analysis", 《2015 IEEE INTERNATIONAL CONFERENCE ON DATA MINING WORKSHOP (ICDMW)》, 4 February 2016 (2016-02-04), pages 1318 - 1325 * |
李世钰 等: "古籍数字化国内外研究现状分析与路径构建研究", 《现代情报》, vol. 43, no. 11, 30 November 2023 (2023-11-30), pages 4 - 20 * |
李阳: "基于云平台的电子病案系统研究与实", 《中国优秀硕士学位论文全文数据库 医药卫生科技辑》, no. 8, 15 August 2019 (2019-08-15), pages 053 - 152 * |
赵姝 等: "不完整数据集的信息熵集成分类算法", 《模式识别与人工智能》, vol. 27, no. 3, 31 March 2014 (2014-03-31), pages 193 - 198 * |
Also Published As
Publication number | Publication date |
---|---|
CN117709909B (en) | 2024-04-09 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11250209B2 (en) | Document collaboration and consolidation tools and methods of use | |
CN108089843B (en) | Intelligent bank enterprise-level demand management system | |
CN104769585B (en) | Internet and other sources are recursively traveled through to identify, collect, manage, judge and identify the system and method for enterprise identity and related data | |
CN106796578A (en) | Autoknowledge system | |
CN106294520B (en) | Carry out identified relationships using the information extracted from document | |
CN101203847A (en) | System and method for managing listings | |
CN106802905A (en) | A kind of synergistic data exchange method of isomorphism PLM system | |
CN112506892A (en) | Index traceability management system based on metadata technology | |
CN110795923A (en) | Automatic generation system and generation method of technical document based on natural language processing | |
CN114461761A (en) | Searching method, system, computer device and storage medium based on label matching | |
CN116775972A (en) | Remote resource arrangement service method and system based on information technology | |
CN117709909B (en) | Business data processing method based on large language model | |
CN113836374A (en) | Real-time government affair data processing system based on big data | |
Winchester | What every historian needs to know about record linkage for the microcomputer era | |
CN103136204A (en) | Client information duplication checking method and client information duplication checking system | |
US12045200B2 (en) | System and method for content curation and collaboration | |
CN115982429A (en) | Knowledge management method and system based on flow control | |
CN115098585A (en) | Automatic law and regulation data processing method and system based on big data | |
JP2918735B2 (en) | Drawing and parts list creation management device | |
JP2021086400A (en) | Information processing device, instruction statement data producing method and program | |
CN108830556A (en) | Personnel information management, search method and device | |
CN117332761B (en) | PDF document intelligent identification marking system | |
Garfield et al. | ISI® data-base-produced information services | |
US20240020479A1 (en) | Training machine learning models for multi-modal entity matching in electronic records | |
CN114756643A (en) | Method, system, apparatus and storage medium for constructing thesaurus |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |