CN117951211A

CN117951211A - Large language model privatization deployment device and method for cloud service industry

Info

Publication number: CN117951211A
Application number: CN202410348300.2A
Authority: CN
Inventors: 冯偲; 李红雁; 薛寒; 周树亮
Original assignee: Tibet Ningsuan Technology Group Co ltd; Nanjing Computing Nanjing Technology Co ltd
Current assignee: Tibet Ningsuan Technology Group Co ltd; Nanjing Computing Nanjing Technology Co ltd
Priority date: 2024-03-26
Filing date: 2024-03-26
Publication date: 2024-04-30

Abstract

The invention discloses a device and a method for privatizing and deploying a large language model in the cloud service industry, wherein the device comprises the following components: the system comprises a data collection module for collecting original data, a first module for preprocessing the collected original data, a second module for vectorizing the preprocessed data and generating corresponding indexes, a vectorized stored cloud service database, a large language model deployed on a privately-owned server, a third module for parameter optimization and the like. The method and the system for privately deploying the large language model based on the cloud service knowledge base can fully meet the requirements of actual projects according to the own data of enterprises, and promote the professionality of the large model in the cloud service industry on the premise of ensuring the safety of the enterprise data.

Description

Large language model privatization deployment device and method for cloud service industry

Technical Field

The invention belongs to the technical field of artificial intelligence, relates to a large language model technology, and in particular relates to a large language model privatization deployment device and method for the cloud service industry.

Background

Currently, large language models perform well on all NLP tasks to perform human set tasks. However, these large models in the general field have difficulty with satisfactory results in the vertical field. For example, in the cloud service at present, the related work of the cloud service industry needs research and development personnel with abundant experience, and a general large language model cannot be used for the work therein. Furthermore, data security for large language models is more important to enterprise users involving sensitive data. In the cloud service industry, the data of enterprises are quite important, and the data comprise information such as user data, interface information of a server, account numbers and passwords of users and the like. The data provided by different enterprises vary widely, and the demands of the enterprises vary, and a scheme for quickly adapting to a large model of a specific enterprise is lacking. In the application of a large model in the cloud service industry, a large language model privatization deployment scheme aiming at enterprise sensitive data is lacking.

Disclosure of Invention

The technical purpose is that: aiming at the technical problems, the invention provides a large language model privatization deployment device and method for the cloud service industry, which are used for privatizing and deploying the large language model on the basis of a cloud service knowledge base and improving the professionality of the large model in the cloud service industry on the premise of ensuring the safety of enterprise data.

The technical scheme is as follows: in order to achieve the technical purpose, the invention adopts the following technical scheme:

The large language model privatization deployment device for the cloud service industry is characterized in that a large language model, a first module, a second module and a third module are deployed on a privatized server of a user side; wherein,

The first module is used for preprocessing collected original data, wherein the original data is collected privately-owned user data, and the collected original data comprises cloud service help documents, specifications, user manuals and running records of a server;

the second module is used for carrying out vectorization processing on the preprocessed data to generate corresponding indexes, and the original data, the preprocessed data and the vectorized data are all stored in the cloud service database;

The third module is used for optimizing the key parameters and optimizing the large language model by using the optimizing result; the key parameters comprise GPU memory loading parameters of the reasoning server, the number of reasoning batches, the number of the retrieved document blocks and the quantization level of the large language model;

The large language model has access to operations, analyses and reasoning processes for data entered into the large language model, including questions posed by a user received by the device, and descriptions relating to the questions retrieved from the cloud services database.

Preferably, the first module comprises:

The blank area standardization unit adopts a regular matching mode to match blank lines of more than two blank spaces and replaces the corresponding blank lines with the two blank spaces;

the special symbol processing unit is used for cleaning the special symbols in a regular matching mode;

The stop word filtering unit is used for traversing the original data in a traversing algorithm and regular matching mode, and replacing and deleting the stop words, wherein the stop words are added into a stop word list by analyzing the common stop words of the document in advance;

and a confusion degree-based word and sentence filtering unit, which uses an N-gram model to calculate the confusion degree of the words or sentences in the original data, and when the confusion degree is lower than 0.5, deleting the corresponding words or sentences.

Preferably, the second module includes:

The data loading unit is used for loading the data to be vectorized;

the splitting unit is used for splitting the loaded data to be vectorized to obtain a split document;

the vector extraction unit is used for loading a HuggingFace language model, carrying out vectorization extraction on the split document, and obtaining a vector corresponding to the split document;

And the output module is used for outputting the split document and the corresponding vector to the cloud service database in the form of DuckDB file structure.

Preferably, the third module comprises:

The optimal interval determining unit is used for inputting parameters to be optimized and determining the optimal interval of each parameter;

The grid optimizing unit is used for searching an optimal solution of each parameter in the optimal interval of each parameter by adopting a grid optimizing method;

and the optimization module is used for optimizing the large language model by using the optimal solution of each parameter.

Preferably, the apparatus further comprises:

The data collection module is used for collecting proprietary user data, including cloud service help documents, specifications, user manuals and running records of the server;

The user questioning module is used for receiving questions proposed by a user;

an answer module for outputting an answer to the question;

And after receiving the problem proposed by the user, the user questioning module calls a communication interface to access a cloud service database, and searches a description related to the problem in the cloud service database, and the problem and the description are submitted to a large language model.

A method for privatizing and deploying a large language model in the cloud service industry comprises the following steps:

Preprocessing collected original data, wherein the original data are collected privately-owned user data, and the collected original data comprise cloud service help documents, specifications, user manuals and running records of a server;

Vectorizing the preprocessed data to generate corresponding indexes, wherein the original data, the preprocessed data and the vectorized data are stored in a cloud service database;

Optimizing key parameters and optimizing a large language model by using an optimizing result, wherein the key parameters comprise GPU memory loading parameters of an inference server, the number of inference batches, the number of retrieved document blocks and quantization levels of the large language model, the large language model is used for carrying out operation, analysis and inference processing on data input into the large language model, and the data input into the large language model comprise problems proposed by users and descriptions related to the problems retrieved from a cloud service database.

Preferably, the method comprises the steps of:

Receiving a problem proposed by a user;

Accessing the cloud service database, and submitting the problems and the descriptions to a large language model after the descriptions related to the problems are retrieved from the cloud service database;

using the large language model for carrying out operation, analysis and reasoning treatment on the problems and descriptions input into the large language model;

and processing results of the large language model are used as answers to the questions and output.

The raw data collected is preferably preprocessed as follows:

matching blank lines with more than two blank spaces in a regular matching mode, and replacing the blank lines with the two blank spaces;

cleaning special symbols by adopting a regular matching mode;

Traversing original data by adopting a traversing algorithm and a regular matching mode, and replacing and deleting stop words, wherein the stop words are added into a stop word list by analyzing the stop words commonly used by the document in advance;

And (3) performing confusion degree calculation on the words or sentences in the original data by using the N-gram model, and deleting the corresponding words or sentences when the confusion degree is lower than 0.5.

Preferably, the preprocessed data is converted into vectors and stored in a vector store in the following steps:

Loading data to be vectorized;

Splitting the loaded data to be vectorized to obtain a split document;

Loading HuggingFace language models, and carrying out vectorization extraction on the split document to obtain a vector corresponding to the split document;

and storing the split document and the corresponding vector into a cloud service database in the form of DuckDB file structure.

Preferably, the key parameters are optimized by the following steps, and the optimizing result is used for optimizing the large language model:

Inputting parameters to be optimized, and determining an optimal interval of each parameter;

adopting a grid optimizing method to find out the optimal solution of each parameter in the optimal interval of each parameter according to the set step length;

and optimizing the large language model by using the optimal solution of each parameter.

The beneficial effects are that: due to the adoption of the technical scheme, the invention has the following beneficial effects:

According to the method, a large model privatization deployment scheme for cloud service enterprise sensitive data is established, a proprietary expert knowledge base of an enterprise, namely a cloud service database D, is established, a large language model access knowledge base is deployed on a privatized server, model training is not needed, conversion from a general model to a vertical field model can be completed, and enterprise data safety is guaranteed.

Drawings

FIG. 1 is a schematic diagram of a configuration of a large language model privately deployed device for cloud service industry;

FIG. 2 is a flowchart of a method for large language model privatization deployment in the cloud service industry according to a second embodiment;

Fig. 3 is an example stop word in the method of the present invention.

Detailed Description

Embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

Example 1

As shown in fig. 1, the present invention proposes a large language model privatization deployment device for cloud service industry, comprising: the system comprises a data collection module, a first module, a cloud service database, a large language model, a second module, a third module, a user questioning module and a response module. The details are as follows.

1. Module one: cloud service knowledge base data processing

The module is mainly used for processing text data in cloud service industry, and the original data comprise a user manual of a cloud service, operation records of the cloud server and other specifications of the cloud server. The processing flow comprises blank area standardization, stop word filtering, confusion-based word and sentence filtering and special symbol deleting.

2. And a second module: data vectorized storage

The module is mainly used for converting text data into vectors and storing the vectors, and in the subsequent privatization deployment, the models directly access the vector storage library without accessing the original documents, so that the retrieval speed is increased.

The cloud service database is used for storing own data of enterprises, including original data, data subjected to pretreatment and vectorization. The field of self data sources of enterprises is data of cloud service industry, so the cloud service data is named as cloud service data, and in fact, the data in a cloud service database is not placed in the cloud, and an external network cannot access the cloud service data. The user questioning module and the answering module are both arranged on the private server, and can be accessed through pages in the local area network to realize interaction of conversations.

3. And a third module: parameter optimization of proprietary deployment model

In the model deployment, a lot of parameters influence the reasoning speed and performance of the algorithm, and a lot of time is required for manual fine tuning. The optimizing algorithm uses grid optimizing, and optimizing parameters comprise GPU memory loading parameters of an inference server, namely n_layer, the number of inference batches, namely batch, the number of retrieval document blocks, namely chunks, and the quantization level of a large language model, namely q.

Example two

The embodiment provides a large language model privatization deployment method for cloud service industry, which mainly comprises the following steps as shown in fig. 2:

Step 1: and collecting related information such as user manuals, description documents and the like, and preprocessing data through a pair of modules.

Step 2: and (3) vectorizing the data after the preprocessing in the first step through a second module, storing and generating indexes to obtain a cloud service industry retrieval database, namely a cloud service database.

And step 3, quantifying a large language model by using a module three-parameter optimizing result, and disposing the large language model on a privately-owned server.

And step 4, after receiving the user question, comparing the question with the cloud service database in the step two, searching the related expression s, and sending the question and the document description s into the large language model in the step three together to obtain a final answer. After the user presents the problem, the system can match in the cloud service database according to the problem, and the matching is carried out on the relevant document fragments and then submitted to the large language model. The document description s is a document corresponding to the document fragment and the index after the system performs comparison and retrieval.

The main steps executed by the first module are as follows:

Step 1.1, matching blank lines of more than two spaces in a regular matching mode, and replacing the blank lines with the two spaces.

And 1.2, cleaning the special symbol in a regular matching mode. Special symbols include a variety of graphical symbols such as: ✑ ✒ ✉ ✁ ✂ ✃ ✄ ✆ ✉ ☎ ☏ ☑ ✓ ✔ ∈ ☐ ☒ ✗ ✘ ㄨ ✕ ✖ ✖ ☢ ☠ ☣ ✈ ∈and the like.

And 1.3, analyzing stop words commonly used by the documents, adding the stop words into a stop word list, traversing all cloud service knowledge base documents such as user manuals through traversal algorithm and regular matching, and deleting the cloud service knowledge base documents. An example stop word is shown in fig. 3.

And 1.4, calculating the confusion degree of the sentence by using an N-gram model, and deleting the word when the confusion degree is lower than 0.5.

Because the large model in the professional field emphasizes objectivity, except stop words, words with rich emotion colors are removed, multi-finger adjectives or adverbs, such as words with strong emotion colors like quite, quite and the like, can influence the result output of the large model.

The main steps executed by the second module are as follows:

and 2.1, loading a cloud service help document to be vectorized, a user manual and an operation record of a server.

And 2.2, creating a splitter for the document in the step 2.1, splitting the document data in the step 2.1, and corresponding the text fragments and the indexes through splitting.

And 2.3, loading HuggingFace of a language model, and carrying out vectorization extraction on the splitting result in the step 2.2, namely converting the text segment into a vector through vectorization.

And 2.4, storing the split document and vector, and storing the split document and vector by using DuckDB file structures.

The above N-gram model for performing confusion calculation on sentences and the language model HuggingFace for performing vectorization processing on the split result can be obtained by training the N-gram model and the language model HuggingFace in the prior art, which are not described herein.

The main steps executed by the third module are as follows:

Step 3.1, firstly, confirming parameters to be optimized, and determining an approximate optimal interval of each parameter;

step 3.2, then, through grid optimization, in particular, continuously trying in intervals according to a certain step length, for example, the parameter a is in [1,5], and the step length is 1, and then, finding out the optimal solution of a in [0.1.2.3.4.5 ]. The optimization executed by the third module is the deployment optimization of the model, mainly determining quantization levels required to be adopted by the model deployment, the input token length and the like.

According to the privatized cloud service deployment scheme, privatized enterprise data are required to be deployed on own servers, answers retrieved through models and cloud service data are more specialized according to the own data of enterprises, actual project requirements are met, retrieval time is greatly shortened, accuracy of the answers is improved, and the problem of data safety can be solved.

The foregoing has shown and described the basic principles, principal features and advantages of the invention. It will be appreciated by persons skilled in the art that the above embodiments are not intended to limit the invention in any way, and that all technical solutions obtained by means of equivalent substitutions or equivalent transformations fall within the scope of the invention.

Claims

1. The large language model privatization deployment device for the cloud service industry is characterized in that a large language model, a first module, a second module and a third module are deployed on a privatized server of a user side; wherein,

2. The large language model privatized deployment device for cloud service industry of claim 1, wherein said module one comprises:

3. The large language model privatized deployment device for cloud service industry of claim 1, wherein the second module comprises:

The data loading unit is used for loading the data to be vectorized;

4. The large language model privatized deployment device for cloud service industry of claim 1, wherein said module three comprises:

5. The large language model privatized deployment apparatus for cloud services industry of claim 1, further comprising:

The user questioning module is used for receiving questions proposed by a user;

an answer module for outputting an answer to the question;

6. The large language model privatization deployment method for the cloud service industry is characterized by comprising the following steps:

7. The method for privatizing deployment of large language models for cloud service industry as claimed in claim 6, wherein said method comprises the steps of:

Receiving a problem proposed by a user;

8. The method for proprietary deployment of large language models in the cloud service industry as claimed in claim 6, wherein the collected raw data is preprocessed by:

cleaning special symbols by adopting a regular matching mode;

9. The method for proprietary deployment of large language models in the cloud service industry of claim 6, wherein the pre-processed data is converted into vectors and stored in a vector store by:

Loading data to be vectorized;

Splitting the loaded data to be vectorized to obtain a split document;

10. The method for privatizing and deploying the large language model for the cloud service industry according to claim 6, wherein the key parameters are optimized, and the large language model is optimized by using the optimizing result: