CN117271567A - Tax text conversion method, device and equipment based on large model and storage medium - Google Patents

Tax text conversion method, device and equipment based on large model and storage medium Download PDF

Info

Publication number
CN117271567A
CN117271567A CN202311313291.5A CN202311313291A CN117271567A CN 117271567 A CN117271567 A CN 117271567A CN 202311313291 A CN202311313291 A CN 202311313291A CN 117271567 A CN117271567 A CN 117271567A
Authority
CN
China
Prior art keywords
financial
tax
data
model
language model
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311313291.5A
Other languages
Chinese (zh)
Inventor
陈鹏飞
丁乐
刘子星
徐煌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Servyou Software Group Co ltd
Original Assignee
Servyou Software Group Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Servyou Software Group Co ltd filed Critical Servyou Software Group Co ltd
Priority to CN202311313291.5A priority Critical patent/CN117271567A/en
Publication of CN117271567A publication Critical patent/CN117271567A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2452Query translation
    • G06F16/24522Translation of natural language queries to structured queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/242Query formulation
    • G06F16/243Natural language query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/02Knowledge representation; Symbolic representation
    • G06N5/022Knowledge engineering; Knowledge acquisition
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Databases & Information Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a tax text conversion method, device, equipment and storage medium based on a large model, which relate to the field of natural language processing and comprise the following steps: preprocessing the acquired financial tax data, and vectorizing the acquired preprocessed financial tax data; constructing a financial tax training knowledge base based on the vectorized financial tax data, extracting a plurality of vectorized financial tax data from the financial tax training knowledge base, and converting the vectorized financial tax data into a plurality of corresponding identification data; taking a plurality of identification data as the input of each model level of a preset language model to perform model fine adjustment on the preset language model so as to obtain a target language model; and receiving the financial tax questions input by the user side, and processing the financial tax questions based on the target language model to obtain target database sentences matched with the financial tax questions. Thus, when the financial tax question input by the user side is received, the database statement corresponding to the financial tax question can be generated directly based on the language model obtained through training.

Description

Tax text conversion method, device and equipment based on large model and storage medium
Technical Field
The invention relates to the field of natural language processing, in particular to a tax text conversion method, device and equipment based on a large model and a storage medium.
Background
Text2SQL (Text to SQL) is a technology for converting a natural language of a user into executable SQL (Structured Query Language ) under the condition of a given database, and most of the technologies in the prior art adopt traditional models such as RATSQL (Relation-Aware Transformer Text-to-SQL), LGESQL (Line Graph Enhanced Text-to-SQL) and the like or adopt a mode of combining the traditional models with a ChatGPT (Chat Generative Pre-trained Transformer) interface, but by adopting the traditional models, the situation of low accuracy exists, and the mode of combining the traditional models with the ChatGPT interface has high cost and risk of data leakage.
Disclosure of Invention
Accordingly, the present invention aims to provide a tax text conversion method, apparatus, device and storage medium based on a large model, which can construct an SQL knowledge base based on historical data, so as to realize a text2SQL question-answering system capable of automatically generating corresponding SQL query sentences and carrying out data query by realizing user input problems through the constructed SQL knowledge base. Therefore, the accuracy is improved, the safety of data is ensured, and the development cost is successfully reduced. The specific scheme is as follows:
in a first aspect, the present application discloses a tax text conversion method based on a large model, applied to a text conversion system, including:
preprocessing the acquired financial tax data, and vectorizing the acquired preprocessed financial tax data to acquire vectorized financial tax data;
constructing a financial tax training knowledge base based on the vectorization financial tax data, extracting a plurality of vectorization financial tax data from the financial tax training knowledge base, and converting the vectorization financial tax data into a plurality of corresponding identification data;
taking the plurality of identification data as the input of each model level of a preset language model to carry out model fine adjustment on the preset language model so as to obtain a target language model;
and receiving financial tax questions input by a user terminal, processing the financial tax questions based on the target language model to obtain target database sentences matched with the financial tax questions, and returning the target database sentences to the user terminal.
Optionally, the preprocessing the acquired financial tax data and vectorizing the obtained preprocessed financial tax data to obtain vectorized financial tax data includes:
acquiring historical financial tax questions and historical database sentences, and performing data cleaning operation on the historical financial tax questions and the historical database data to obtain preprocessed financial tax data;
and processing the pre-processed financial tax data based on the BERT model and the SimCSE model to obtain vectorized financial tax data.
Optionally, the extracting a plurality of vectorized financial data from the financial training knowledge base and converting the plurality of vectorized financial data into a corresponding plurality of identification data includes:
randomly extracting a plurality of vectorized financial data from the financial tax training knowledge base, and dividing the vectorized financial data to divide the vectorized financial data into a plurality of identification data.
Optionally, the receiving the financial tax question input by the user terminal, processing the financial tax question based on the target language model, to obtain a target database statement matched with the financial tax question, and returning the target database statement to the user terminal, including:
receiving financial tax questions input by a user side, and determining whether database sentences corresponding to the financial tax questions exist in the financial tax training knowledge base;
if the financial problem does not exist, inputting the financial problem into the target language model, generating a target database statement corresponding to the financial problem through the target language model, and returning the target database statement to the user side.
Optionally, if the financial tax question does not exist, inputting the financial tax question to the target language model, so as to generate a target database statement corresponding to the financial tax question through the target language model, including:
if the financial tax training knowledge base does not have the database statement corresponding to the financial tax question, inputting the financial tax question into the target language model;
and optimizing the financial accounting problem through the target language model to generate a corresponding target database statement based on the obtained optimized financial accounting problem.
Optionally, after receiving the financial tax question input by the user, and determining whether the database statement corresponding to the financial tax question exists in the financial tax training knowledge base, the method further includes:
if so, directly taking the matched database statement corresponding to the financial and tax problem as a target database statement, and returning the target database statement to the user side.
Optionally, the tax text conversion method based on the big model further includes:
if a feedback text fed back by the user side is received, the feedback text is stored in the financial training knowledge base, and the feedback text is input into the target language model to optimize the target language model; the feedback text is a text sent by the user side and used for evaluating the correctness of the target database statement.
In a second aspect, the present application discloses a tax text conversion device based on a large model, applied to a text conversion system, including:
the text vectorization module is used for preprocessing the acquired financial tax data and vectorizing the obtained preprocessed financial tax data to obtain vectorized financial tax data;
the data conversion module is used for constructing a financial tax training knowledge base based on the vectorization financial tax data, extracting a plurality of vectorization financial tax data from the financial tax training knowledge base and converting the vectorization financial tax data into a plurality of corresponding identification data;
the model fine-tuning module is used for taking the plurality of identification data as the input of each model level of a preset language model so as to conduct model fine-tuning on the preset language model and obtain a target language model;
the data return module is used for receiving financial and tax problems input by a user terminal, processing the financial and tax problems based on the target language model, obtaining target database sentences matched with the financial and tax problems, and returning the target database sentences to the user terminal.
In a third aspect, the present application discloses an electronic device comprising:
a memory for storing a computer program;
a processor for executing the computer program to implement the tax text conversion method based on the big model as described above.
In a fourth aspect, the present application discloses a computer readable storage medium for storing a computer program which when executed by a processor implements a tax text conversion method based on a large model as described above.
In the method, firstly, acquired financial data are preprocessed, vectorization operation is carried out on the obtained preprocessed financial data to obtain vectorized financial data, then a financial training knowledge base is built based on the vectorized financial data, a plurality of vectorized financial data are extracted from the financial training knowledge base, the vectorized financial data are converted into a plurality of corresponding identification data, the identification data are used as input of each layer of model level of a preset language model, model fine adjustment is carried out on the preset language model to obtain a target language model, finally financial problems input by a user side are received, processing is carried out on the financial problems based on the target language model to obtain target database sentences matched with the financial problems, and the target database sentences are returned to the user side. Therefore, through the tax text conversion method based on the large model, vectorization operation can be performed on the acquired financial tax data, a financial tax training knowledge base is built based on the vectorization financial tax data, a plurality of vectorization financial tax data are extracted from the financial tax training knowledge base, the vectorization financial tax data are input in each layer of a preset language model, the preset language model is subjected to fine adjustment, finally, database sentences corresponding to financial tax problems input by a user side are generated through the obtained language model, so that an SQL knowledge base can be built based on historical data, a text2SQL question-answering system for automatically generating corresponding SQL query sentences and carrying out data query can be realized through the built SQL knowledge base, accuracy is improved, safety of the data is ensured, and development cost is successfully reduced.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only embodiments of the present invention, and that other drawings can be obtained according to the provided drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flow chart of a tax text conversion method based on a large model disclosed in the application;
FIG. 2 is a timing diagram of a tax text conversion method based on a large model disclosed in the present application;
FIG. 3 is a schematic structural diagram of a tax text conversion device based on a large model disclosed in the present application;
fig. 4 is a block diagram of an electronic device disclosed in the present application.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
In the prior art, a traditional model such as RATSQL, LGESQL or a mode of combining the traditional model with the ChatGPT interface is mostly adopted to realize Text2SQL, but the condition of low accuracy exists by adopting the traditional model, and the mode of combining the traditional model with the ChatGPT interface has high cost and risk of data leakage.
In order to overcome the problems, the application discloses a tax text conversion method, device and equipment based on a large model and a storage medium. The SQL knowledge base can be constructed based on the historical data, so that the user can automatically generate corresponding SQL query sentences and perform a text2SQL question-answering system of data query through the constructed SQL knowledge base, thereby not only improving the accuracy, but also ensuring the safety of the data and successfully reducing the development cost.
Referring to fig. 1, the embodiment of the invention discloses a tax text conversion method based on a large model, which is applied to a text conversion system and comprises the following steps:
and S11, preprocessing the acquired financial data, and vectorizing the obtained preprocessed financial data to obtain vectorized financial data.
In this embodiment, preprocessing is performed on the acquired financial tax data, and vectorization operation is performed on the obtained preprocessed financial tax data to obtain vectorized financial tax data, which includes: acquiring historical financial tax questions and historical database sentences, and performing data cleaning operation on the historical financial tax questions and the historical database data to obtain preprocessed financial tax data; and processing the pre-processed financial tax data based on the BERT model and the SimCSE model to obtain vectorized financial tax data. That is, in the present application, training data for constructing the text conversion system is derived from historical financial tax questions and historical database statements, and it is to be noted that, the historical financial tax questions are financial tax questions actually described by the user, and the historical database statements are SQL statements written by an engineer, after acquiring the financial tax data, the financial tax data needs to be preprocessed, that is, data cleaning operation is performed, so as to verify the data consistency of the financial tax data, and process invalid values and missing values in the financial tax data. Further, after preprocessing of the data is completed to obtain preprocessed financial data, carrying out embellishment vectorization on the question sentences in the preprocessed data, vectorizing the question sentences through a pre-training model BERT (Bidirectional Encoder Representation from Transformers), and then carrying out text similarity comparison by combining with a SimCSE model to remove data with too high similarity. Thus, by pulling the distances of similar data closer, the distances of dissimilar data farther apart, the characterization of the data can be better learned.
And step S12, constructing a financial tax training knowledge base based on the vectorization financial tax data, extracting a plurality of vectorization financial tax data from the financial tax training knowledge base, and converting the vectorization financial tax data into a plurality of corresponding identification data.
In this embodiment, after vectorization of the pre-processed financial tax data is completed to obtain vectorized financial tax data, a financial tax training knowledge base needs to be built based on the vectorized financial tax data, and a plurality of vectorized financial tax data needs to be randomly extracted from the built financial tax training knowledge base as training data for fine tuning a preset language model, which specifically includes the following steps: randomly extracting a plurality of vectorized financial data from the financial tax training knowledge base, and dividing the vectorized financial data to divide the vectorized financial data into a plurality of identification data. That is, in order to implement the specific application scenario of converting tax text into SQL, the P-Tuning v 2-based efficient parameter fine Tuning method may be used to segment a plurality of vectorized data extracted from the financial tax database, so as to segment a plurality of vectorized data into a plurality of identification data, namely, sympts tokens, and use the obtained sympts tokens as input of each layer of model level of the preset language model instead of only adding the input layer, so that more learnable parameters can be generated, and because the sympts tokens are input to each layer of the model, more direct influence can be brought to model prediction, and processing efficiency is improved.
And S13, taking the plurality of identification data as input of each model level of a preset language model to perform model fine adjustment on the preset language model so as to obtain a target language model.
In this embodiment, the plurality of identification data is used as an input of each model level of a preset language model, so as to perform model fine tuning on the preset language model, and obtain a target language model. That is, the obtained samples may be used as input for each layer of a preset language model, and it should be noted that the preset language model adopts a Vicuna-13B model to fine tune the Vicuna-13B model. It should be further noted that before inputting the templates into the preset language model, the templates are required to be subjected to Prompt Engineering to prompt engineering operations, where the templates generally include instructions, context information, input and output, and Prompt Engineering may be performed to enable better output results of the model. An effective prompt first requires adding a role to the model and then issuing instructions. Secondly, the format of the output is required to be described clearly, and the output is structured. While some examples are given of models, and the intent is clarified with longer, more specific probes to get more relevant and more detailed outputs.
And S14, receiving financial and tax problems input by a user terminal, processing the financial and tax problems based on the target language model to obtain target database sentences matched with the financial and tax problems, and returning the target database sentences to the user terminal.
In this embodiment, receiving a financial tax question input by a user terminal, so as to process the financial tax question based on the target language model, to obtain a target database statement matched with the financial tax question, and returning the target database statement to the user terminal, where the processing includes: receiving financial tax questions input by a user side, and determining whether database sentences corresponding to the financial tax questions exist in the financial tax training knowledge base; if the financial problem does not exist, inputting the financial problem into the target language model, generating a target database statement corresponding to the financial problem through the target language model, and returning the target database statement to the user side. That is, after the preset language model finishes fine tuning, the target language model that finishes fine tuning can be utilized to process the financial tax problem input by the user end, and generate a corresponding database statement, specifically: if the financial tax training knowledge base does not have the database statement corresponding to the financial tax question, inputting the financial tax question into the target language model; and optimizing the financial accounting problem through the target language model to generate a corresponding target database statement based on the obtained optimized financial accounting problem. That is, after the financial and tax question input by the user side is received, the financial and tax training knowledge base needs to be matched according to the financial and tax question input by the user side, if there is no SQL sentence corresponding to the financial and tax question in the financial and tax training knowledge base, a database sentence corresponding to the financial and tax question needs to be generated through a language model after fine tuning, it needs to be explained that the database sentence input by the user may have a question with unclear expression, at this time, the question needs to be optimized, and a corresponding SQL sentence needs to be generated with the optimized question.
It should be noted that, after receiving the financial tax question input by the user, and determining whether the database statement corresponding to the financial tax question exists in the financial tax training knowledge base, the method further includes: if the financial tax problem exists, the matched database statement corresponding to the financial tax problem is directly used as a target database statement, and the target database statement is returned to the user side, namely, if the database statement corresponding to the financial tax problem exists in a financial tax training knowledge base, the client side for feeding back the database statement corresponding to the financial tax problem can be directly used.
The tax text conversion method based on the big model further includes: if a feedback text fed back by the user side is received, the feedback text is stored in the financial training knowledge base, and the feedback text is input into the target language model to optimize the target language model; the feedback text is a text sent by the user side and used for evaluating the correctness of the target database statement. That is, after receiving the feedback SQL statement, the user can evaluate the accuracy of the SQL statement, and by collecting correction data and feedback comments of the user, the user can find and understand possible problems of the model in practical application and timely improve the model, so that the model can be continuously optimized through correction data and feedback information provided by the user in the use process, the accuracy of the model is continuously improved, and a closed-loop optimization cycle is formed, so that more accurate and efficient service is provided.
Therefore, through the tax text conversion method based on the large model, vectorization operation can be performed on acquired financial tax data, a financial tax training knowledge base is built based on vectorization financial tax data, a plurality of vectorization financial tax data are extracted from the financial tax training knowledge base, the vectorization financial tax data are input as each layer of a preset language model, the preset language model is fine-tuned, finally, database sentences corresponding to financial tax problems input by a user side are generated through the obtained language model, so that an SQL knowledge base can be built based on historical data, a text2SQL question-answering system for automatically generating corresponding SQL query sentences and carrying out data query can be realized through the built SQL knowledge base, on one hand, the distances of similar data can be shortened through vectorization of the data, the distances of dissimilar data can be pulled away, and the representation of the data can be better learned; on the other hand, more learnable parameters can be generated through processing the data, and because the probes are input to each layer of the model, more direct influence can be brought to model prediction, the processing efficiency is improved, and the cost of model training can be reduced through generating training data; in yet another aspect, the model may be continuously optimized by correction data and feedback information provided by the user during use, thereby continuously improving accuracy of the model, and forming a closed-loop optimization loop to provide more accurate and efficient services.
Referring to fig. 2, the embodiment of the invention discloses a tax text conversion method based on a large model, which comprises the following steps:
as shown in fig. 2, after the financial and tax questions are input by the user side, the financial and tax knowledge base can be firstly matched based on the financial and tax questions input by the user, if the financial and tax knowledge base is matched with the SQL statement, the matched SQL and the data are directly fed back to the user side, if the corresponding SQL statement cannot be matched in the financial and tax knowledge base, prompt Engineering is required to be carried out on the financial and tax questions input by the user to optimize the questions, then the optimized financial and tax questions are processed through the Vicuna-13B model which is finely adjusted to obtain the corresponding SQL statement, and the SQL statement is returned to the client side. And if the feedback text which is sent by the user side and used for evaluating the correctness of the target database statement is received, the feedback text can be stored in a financial training knowledge base and input into the target language model to optimize the Vicuna-13B model, so that the accuracy of the model is continuously improved, and a closed-loop optimization cycle is formed to provide more accurate and efficient service.
Referring to fig. 3, an embodiment of the invention discloses a tax text conversion device based on a large model, which is applied to a text conversion system and comprises:
the text vectorization module 11 is configured to perform preprocessing on the acquired financial tax data, and perform vectorization operation on the obtained preprocessed financial tax data to obtain vectorized financial tax data;
the data conversion module 12 is configured to construct a financial tax training knowledge base based on the vectorized financial tax data, extract a plurality of vectorized financial tax data from the financial tax training knowledge base, and convert the plurality of vectorized financial tax data into a corresponding plurality of identification data;
the model fine-tuning module 13 is configured to use the plurality of identification data as input of each model level of a preset language model, so as to perform model fine-tuning on the preset language model, and obtain a target language model;
the data return module 14 is configured to receive a financial tax question input by a user, process the financial tax question based on the target language model, obtain a target database statement that matches the financial tax question, and return the target database statement to the user.
Therefore, through the tax text conversion method based on the large model, vectorization operation can be performed on the acquired financial tax data, a financial tax training knowledge base is built based on the vectorization financial tax data, a plurality of vectorization financial tax data are extracted from the financial tax training knowledge base, the vectorization financial tax data are input in each layer of a preset language model, the preset language model is subjected to fine adjustment, finally, database sentences corresponding to financial tax problems input by a user side are generated through the obtained language model, so that an SQL knowledge base can be built based on historical data, a text2SQL question-answering system for automatically generating corresponding SQL query sentences and carrying out data query can be realized through the built SQL knowledge base, accuracy is improved, safety of the data is ensured, and development cost is successfully reduced.
In some embodiments, the text vectorization module 11 may specifically include:
the data preprocessing unit is used for acquiring historical financial tax problems and historical database sentences, and performing data cleaning operation on the historical financial tax problems and the historical database data to obtain preprocessed financial tax data;
and the data vectorization unit is used for processing the preprocessed financial tax data based on the BERT model and the SimCSE model to obtain vectorized financial tax data.
In some embodiments, the data conversion module 12 may specifically include:
the data conversion unit is used for randomly extracting a plurality of vectorized financial data from the financial training knowledge base and dividing the vectorized financial data so as to divide the vectorized financial data into a plurality of identification data.
In some embodiments, the data return module 14 may specifically include:
the matching determination submodule is used for receiving financial tax questions input by a user side and determining whether database sentences corresponding to the financial tax questions exist in the financial tax training knowledge base or not;
and the data generation sub-module is used for inputting the financial accounting problem into the target language model if the financial accounting problem does not exist, generating a target database statement corresponding to the financial accounting problem through the target language model, and returning the target database statement to the user side.
In some embodiments, the data generating sub-module may specifically include:
the data input unit is used for inputting the financial problem into the target language model if a database statement corresponding to the financial problem does not exist in the financial training knowledge base;
and the data generation unit is used for optimizing the financial accounting problem through the target language model so as to generate a corresponding target database statement based on the obtained optimized financial accounting problem.
In some embodiments, the tax text conversion apparatus based on the large model may further include:
and the data return unit is used for directly taking the matched database statement corresponding to the financial problem as a target database statement and returning the target database statement to the user side if the matched database statement exists.
In some embodiments, the tax text conversion apparatus based on the large model may further include:
the data optimization unit is used for storing the feedback text into the financial training knowledge base and inputting the feedback text into the target language model to optimize the target language model if the feedback text fed back by the user side is received; the feedback text is a text sent by the user side and used for evaluating the correctness of the target database statement.
Further, the embodiment of the present application further discloses an electronic device, and fig. 4 is a block diagram of an electronic device 20 according to an exemplary embodiment, where the content of the figure is not to be considered as any limitation on the scope of use of the present application.
Fig. 4 is a schematic structural diagram of an electronic device 20 according to an embodiment of the present application. The electronic device 20 may specifically include: at least one processor 21, at least one memory 22, a power supply 23, a communication interface 24, an input output interface 25, and a communication bus 26. Wherein the memory 22 is used for storing a computer program, which is loaded and executed by the processor 21 to implement the relevant steps in the tax text conversion method based on the big model disclosed in any of the foregoing embodiments. In addition, the electronic device 20 in the present embodiment may be specifically an electronic computer.
In this embodiment, the power supply 23 is configured to provide an operating voltage for each hardware device on the electronic device 20; the communication interface 24 can create a data transmission channel between the electronic device 20 and an external device, and the communication protocol to be followed is any communication protocol applicable to the technical solution of the present application, which is not specifically limited herein; the input/output interface 25 is used for acquiring external input data or outputting external output data, and the specific interface type thereof may be selected according to the specific application requirement, which is not limited herein.
The memory 22 may be a carrier for storing resources, such as a read-only memory, a random access memory, a magnetic disk, or an optical disk, and the resources stored thereon may include an operating system 221, a computer program 222, and the like, and the storage may be temporary storage or permanent storage.
The operating system 221 is used for managing and controlling various hardware devices on the electronic device 20 and computer programs 222, which may be Windows Server, netware, unix, linux, etc. The computer program 222 may further comprise a computer program that can be used to perform other specific tasks in addition to the computer program that can be used to perform the large model-based tax text conversion method performed by the electronic device 20 as disclosed in any of the previous embodiments.
Further, the application also discloses a computer readable storage medium for storing a computer program; wherein the computer program, when executed by a processor, implements the tax text conversion method based on the big model disclosed above. For specific steps of the method, reference may be made to the corresponding contents disclosed in the foregoing embodiments, and no further description is given here.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different point from other embodiments, so that the same or similar parts between the embodiments are referred to each other. For the device disclosed in the embodiment, since it corresponds to the method disclosed in the embodiment, the description is relatively simple, and the relevant points refer to the description of the method section.
Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative elements and steps are described above generally in terms of functionality in order to clearly illustrate the interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the solution. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. The software modules may be disposed in Random Access Memory (RAM), memory, read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
Finally, it is further noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The foregoing has outlined the detailed description of the preferred embodiment of the present application, and the detailed description of the principles and embodiments of the present application has been provided herein by way of example only to facilitate the understanding of the method and core concepts of the present application; meanwhile, as those skilled in the art will have modifications in the specific embodiments and application scope in accordance with the ideas of the present application, the present description should not be construed as limiting the present application in view of the above.

Claims (10)

1. A tax text conversion method based on a large model, which is applied to a text conversion system, comprising:
preprocessing the acquired financial tax data, and vectorizing the acquired preprocessed financial tax data to acquire vectorized financial tax data;
constructing a financial tax training knowledge base based on the vectorization financial tax data, extracting a plurality of vectorization financial tax data from the financial tax training knowledge base, and converting the vectorization financial tax data into a plurality of corresponding identification data;
taking the plurality of identification data as the input of each model level of a preset language model to carry out model fine adjustment on the preset language model so as to obtain a target language model;
and receiving financial tax questions input by a user terminal, processing the financial tax questions based on the target language model to obtain target database sentences matched with the financial tax questions, and returning the target database sentences to the user terminal.
2. The method for converting tax text based on large model according to claim 1, wherein the preprocessing the acquired financial tax data and vectorizing the obtained preprocessed financial tax data to obtain vectorized financial tax data comprises:
acquiring historical financial tax questions and historical database sentences, and performing data cleaning operation on the historical financial tax questions and the historical database data to obtain preprocessed financial tax data;
and processing the pre-processed financial tax data based on the BERT model and the SimCSE model to obtain vectorized financial tax data.
3. The method of claim 1, wherein extracting a plurality of vectorized financial data from the financial training knowledge base and converting the plurality of vectorized financial data to a corresponding plurality of identification data comprises:
randomly extracting a plurality of vectorized financial data from the financial tax training knowledge base, and dividing the vectorized financial data to divide the vectorized financial data into a plurality of identification data.
4. The method for converting tax text based on large model according to claim 1, wherein said receiving the financial tax question inputted by the user terminal, processing the financial tax question based on the target language model, obtaining a target database sentence matched with the financial tax question, and returning the target database sentence to the user terminal, comprises:
receiving financial tax questions input by a user side, and determining whether database sentences corresponding to the financial tax questions exist in the financial tax training knowledge base;
if the financial problem does not exist, inputting the financial problem into the target language model, generating a target database statement corresponding to the financial problem through the target language model, and returning the target database statement to the user side.
5. The method of claim 4, wherein the inputting the financial question into the target language model to generate a target database statement corresponding to the financial question via the target language model if not present comprises:
if the financial tax training knowledge base does not have the database statement corresponding to the financial tax question, inputting the financial tax question into the target language model;
and optimizing the financial accounting problem through the target language model to generate a corresponding target database statement based on the obtained optimized financial accounting problem.
6. The method for converting tax text based on large model according to claim 4, wherein after receiving the financial tax question inputted by the user side and determining whether there is a database sentence corresponding to the financial tax question in the financial tax training knowledge base, further comprising:
if so, directly taking the matched database statement corresponding to the financial and tax problem as a target database statement, and returning the target database statement to the user side.
7. The large model-based tax text conversion method according to any one of claims 1 to 6, further comprising:
if a feedback text fed back by the user side is received, the feedback text is stored in the financial training knowledge base, and the feedback text is input into the target language model to optimize the target language model; the feedback text is a text sent by the user side and used for evaluating the correctness of the target database statement.
8. A tax text conversion device based on a large model, applied to a text conversion system, comprising:
the text vectorization module is used for preprocessing the acquired financial tax data and vectorizing the obtained preprocessed financial tax data to obtain vectorized financial tax data;
the data conversion module is used for constructing a financial tax training knowledge base based on the vectorization financial tax data, extracting a plurality of vectorization financial tax data from the financial tax training knowledge base and converting the vectorization financial tax data into a plurality of corresponding identification data;
the model fine-tuning module is used for taking the plurality of identification data as the input of each model level of a preset language model so as to conduct model fine-tuning on the preset language model and obtain a target language model;
the data return module is used for receiving financial and tax problems input by a user terminal, processing the financial and tax problems based on the target language model, obtaining target database sentences matched with the financial and tax problems, and returning the target database sentences to the user terminal.
9. An electronic device, comprising:
a memory for storing a computer program;
a processor for executing the computer program to implement the tax text conversion method based on a large model as claimed in any one of claims 1 to 7.
10. A computer readable storage medium for storing a computer program which when executed by a processor implements the tax text conversion method based on a large model of any one of claims 1 to 7.
CN202311313291.5A 2023-10-10 2023-10-10 Tax text conversion method, device and equipment based on large model and storage medium Pending CN117271567A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311313291.5A CN117271567A (en) 2023-10-10 2023-10-10 Tax text conversion method, device and equipment based on large model and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311313291.5A CN117271567A (en) 2023-10-10 2023-10-10 Tax text conversion method, device and equipment based on large model and storage medium

Publications (1)

Publication Number Publication Date
CN117271567A true CN117271567A (en) 2023-12-22

Family

ID=89215812

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311313291.5A Pending CN117271567A (en) 2023-10-10 2023-10-10 Tax text conversion method, device and equipment based on large model and storage medium

Country Status (1)

Country Link
CN (1) CN117271567A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117764054A (en) * 2024-02-06 2024-03-26 佛山科学技术学院 Natural language understanding method and system based on automatic construction prompt engineering

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117764054A (en) * 2024-02-06 2024-03-26 佛山科学技术学院 Natural language understanding method and system based on automatic construction prompt engineering

Similar Documents

Publication Publication Date Title
KR102486348B1 (en) Attention-based sequence transduction neural networks
US10332507B2 (en) Method and device for waking up via speech based on artificial intelligence
CN116340584B (en) Implementation method for automatically generating complex graph database query statement service
US11354594B2 (en) Black-box optimization using neural networks
US10540585B2 (en) Training sequence generation neural networks using quality scores
CN117271567A (en) Tax text conversion method, device and equipment based on large model and storage medium
US20210357587A1 (en) An intelligent response method and device
CN111858913A (en) Method and system for automatically generating text abstract
CN116737910B (en) Intelligent dialogue processing method, device, equipment and storage medium
CN117077791A (en) Model reasoning method, device, equipment and medium based on graph data structure
CN112860873B (en) Intelligent response method, device and storage medium
CN117669717A (en) Knowledge enhancement-based large model question-answering method, device, equipment and medium
CN111324712A (en) Dialogue reply method and server
CN111381935A (en) DSL configuration expression-based function implementation method and system
Rohit et al. System for Enhancing Accuracy of Noisy Text using Deep Network Language Models
CN115905490A (en) Man-machine interaction dialogue method, device and equipment
CN117350264B (en) PPT file generation method, device, equipment and storage medium
CN110704623A (en) Method, device, system and storage medium for improving entity identification rate based on Rasa _ Nlu framework
CN116991985B (en) Real-time information response method and system based on generated pre-training model
CN111930921B (en) Intention prediction method and device
CN114691699B (en) Intelligent settlement method and system
CN111858877B (en) Multi-type intelligent question answering method, system, equipment and readable storage medium
CN117668620A (en) Method and device for identifying dialogue by using out-of-domain intention
CN117635045A (en) Intelligent receipt contract management method, device and system
CN116701514A (en) Database adaptation method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination