CN117668563B

CN117668563B - Text recognition method, text recognition device, electronic equipment and readable storage medium

Info

Publication number: CN117668563B
Application number: CN202410131516.3A
Authority: CN
Inventors: 李兵兵; 王彦伟; 朱克峰; 黄伟; 李仁刚
Original assignee: Suzhou Metabrain Intelligent Technology Co Ltd
Current assignee: Suzhou Metabrain Intelligent Technology Co Ltd
Priority date: 2024-01-31
Filing date: 2024-01-31
Publication date: 2024-04-30
Anticipated expiration: 2044-01-31
Also published as: CN117668563A

Abstract

The invention discloses a text recognition method, a text recognition device, electronic equipment and a readable storage medium, which are applied to the technical field of computer vision. The method comprises the steps of obtaining a text type recognition model processed in a pre-training process, and determining a parameter selection network capable of automatically selecting parameters to be adjusted in a fine adjustment process according to the structural characteristics of the model. Obtaining a loss function according to the predicted value of each text sample of the text data training sample set and the sparsity loss of the corresponding text type label and the text type recognition model in the training process, and carrying out fine adjustment on each parameter to be updated of the text type recognition model by combining the text data training sample set and the text recognition task, so as to obtain the text type recognition model capable of carrying out text type recognition on the input text to be processed. The method and the device can solve the problem that the text type recognition model is poor in effectiveness and flexibility when being subjected to fine adjustment in the related technology, and can improve the text type recognition accuracy and recognition efficiency.

Description

Text recognition method, text recognition device, electronic equipment and readable storage medium

Technical Field

The present invention relates to the field of computer vision, and in particular, to a text recognition method, apparatus, electronic device, and readable storage medium.

Background

With the rapid popularization and application of artificial intelligence technology, a pre-training model providing a generalized basic model for the execution of different tasks is widely used. In order to adapt to an actual downstream task and simultaneously consider the fine-tuning efficiency and the resource occupation condition of the model, most of the pre-training parameters are usually fixed, and small or additional model parameters of the pre-training model are fine-tuned.

The text recognition task in the natural language processing task in the computer vision technology generally uses a traditional large pre-training language model as a text type recognition model to realize the recognition of the text type, and when the large pre-training language model is subjected to fine tuning, a sparse matrix is generally utilized to represent parameter changes, and the parameters of the text type recognition model are regulated by introducing fixed auxiliary parameters. However, because the ranks of the auxiliary parameters of different layers are selected and the positions of the auxiliary parameters are all dependent on manual experience trial and error, the effectiveness of the auxiliary parameters on different text type recognition models cannot be ensured, the number of the auxiliary parameters in the text type recognition model training process is fixed, and the flexibility is poor.

In view of this, realizing the effective and flexible parameter adjustment to the text type recognition model, promoting text type recognition accuracy and recognition efficiency is the technical problem that the skilled person needs to solve.

Disclosure of Invention

The invention provides a text recognition method, a device, electronic equipment and a readable storage medium, which can effectively and flexibly realize parameter adjustment of a text type recognition model and effectively improve the text type recognition accuracy and recognition efficiency.

In order to solve the technical problems, the invention provides the following technical scheme:

In one aspect, the present invention provides a text recognition method, including:

acquiring a text type recognition model processed by a pre-training process, and acquiring a text data training sample set carrying text type labels;

Determining a parameter selection network according to the model structure characteristics of the text type recognition model; the model structure characteristics are used for determining the minimum model structure unit corresponding to the parameters to be updated of the text type recognition model, and the parameter selection network is used for automatically selecting the parameters to be updated, which are readjusted in the fine tuning process of the text type recognition model;

Determining a loss function jointly according to the predicted value of each text sample of the text data training sample set by the text type recognition model, the corresponding text type label and the sparsity loss of the text type recognition model in the training process;

Utilizing the text data training sample set to adjust each parameter to be updated of the text type recognition model based on a text recognition task and the loss function;

and carrying out text type recognition on the input text to be processed by utilizing the adjusted text type recognition model to obtain a corresponding text type recognition result.

In a first exemplary embodiment, the determining, according to the predicted value of the text type recognition model for each text sample in the text data training sample set and the corresponding text type label, the sparsity loss of the text type recognition model in the training process, together determines a loss function, includes:

acquiring precision loss information of the text type recognition model in prediction; the precision loss information is determined according to the difference between the predicted text type of the text type recognition model and the text type label;

Acquiring sparsity loss of the text type recognition model in the training process; the sparseness loss is determined according to the numerical value of the auxiliary parameter of the parameter selection network;

And determining a loss function of the text type recognition model according to the precision loss information and the sparsity loss.

In a second exemplary embodiment, the obtaining accuracy loss information of the text type recognition model in prediction includes:

invoking a precision loss determination relation, and calculating precision loss information of the text type recognition model in prediction; the accuracy loss determination relation is as follows:

；

In the formula, loss _jing is precision loss information, N is the total number of text samples contained in the text data training sample set, C is the total number of text types contained in the text data training sample set, y _nj is a label of the nth text sample belonging to the jth text type, and p _nj is a probability value for predicting the nth text sample belonging to the jth text type.

In a third exemplary embodiment, the obtaining the sparsity loss of the text type recognition model during training includes:

invoking a sparsity loss determination relation, and calculating sparsity loss of the text type recognition model in the training process; the sparsity loss determination relation is as follows:

；

where loss _s is the sparsity penalty, The output value of the gating function that is the i-th auxiliary function.

In a fourth exemplary embodiment, the training the sample set using the text data, based on a text recognition task and the loss function, adjusts parameters to be updated of the text type recognition model, including:

Based on the loss function, calculating the gradient of each parameter to be trained by using a back propagation algorithm, and updating each parameter to be trained by using an optimizer so as to finish fine adjustment of the model parameters of the text type recognition model.

In a fifth exemplary embodiment, the training the sample set using the text data, based on a text recognition task and the loss function, adjusts parameters to be updated of the text type recognition model, including:

invoking a parameter updating relation, and updating each auxiliary parameter of the parameter selection network; the parameter updating relation is as follows:

；

where a _i is the i-th auxiliary parameter, η is the learning rate, loss represents the loss function.

In a sixth exemplary embodiment, the determining a parameter selection network according to the model structure characteristics of the text type recognition model includes:

Determining a mapping relation between auxiliary parameters of a parameter selection network and a target model structure of the text type recognition model based on model structure characteristics of the text type recognition model; the auxiliary parameters participate in the training process of the text type recognition model and are used for controlling whether the corresponding target model structure is updated or not; the target model structure is a minimum model structure unit corresponding to the parameters to be updated of the text type recognition model;

and automatically determining parameters to be updated of the text type recognition model in a training process suitable for a text recognition task according to the parameter selection network.

In a seventh exemplary embodiment, the determining the model structural feature according to the model size of the text type recognition model, and the determining the mapping relationship between the auxiliary parameter of the parameter selection network and the target model structure of the text type recognition model based on the model structural feature of the text type recognition model includes:

Acquiring a preset scale threshold; if the model scale of the text type recognition model is smaller than or equal to the preset scale threshold, the target model structure of the text type recognition model is a neuron, and the mapping relation between the auxiliary parameters of the parameter selection network and the neuron of the text type recognition model is determined;

Wherein at least one neuron of the text type recognition model has a corresponding auxiliary parameter to be trained.

In an eighth exemplary embodiment, the determining a mapping relationship between the auxiliary parameter of the parameter selection network and the neuron of the text type recognition model includes:

and establishing a corresponding relation between the structural parameter corresponding to at least one neuron structure and the auxiliary parameter to be trained corresponding to the structural parameter in the parameter selection network.

In a ninth exemplary embodiment, the model structure feature is determined according to a model scale and a network model structure of the text type recognition model, and determining, based on the model structure feature of the text type recognition model, a mapping relationship between an auxiliary parameter of a parameter selection network and a target model structure of the text type recognition model includes:

Acquiring a preset scale threshold;

If the model scale of the text type recognition model is larger than a preset scale threshold value, and a network model structure corresponding to the text type recognition model comprises a target model substructure meeting a preset fixed configuration condition, the target model structure of the text type recognition model is the target model substructure, and a mapping relation between auxiliary parameters of a parameter selection network and the target model substructure of the text type recognition model is determined;

The target model substructure is a combined network structure of a plurality of nerve cell groups, and at least one target model substructure of the text type recognition model is provided with a corresponding auxiliary parameter to be trained.

In a tenth exemplary embodiment, the determining the mapping relationship between the auxiliary parameter of the parameter selection network and the target model substructure of the text type recognition model includes:

Constructing a target parameter matrix for each target model substructure according to parameters corresponding to at least one neuron in the current target model substructure;

And establishing a corresponding relation between the target parameter matrix corresponding to each target model substructure and the auxiliary parameters to be trained corresponding to the target parameter matrix in the parameter selection network.

In an eleventh exemplary embodiment, the text type recognition model employs a bi-directional encoder characterization Bert model from a transformer, the target model structure being an attention module and an intermediate layer module of at least one encoder layer; determining a mapping relationship between auxiliary parameters of a parameter selection network and a target model substructure of the text type recognition model, comprising:

Corresponding auxiliary parameters to be trained are set for the attention module and the middle layer module of the Bert model.

In a twelfth exemplary embodiment, the training the sample set using the text data, based on a text recognition task and the loss function, adjusting parameters to be updated of the text type recognition model includes:

Determining a target attention module and a target middle layer module to be updated in the Bert model according to the values of the auxiliary parameters of the parameter selection network;

Determining a loss function according to sparsity loss of each auxiliary parameter of the parameter selection network in the training process, a predicted output value of the Bert model and a corresponding text type label;

training the Bert model based on the loss function by using a text mask and the text data training sample set to update model parameters of each target attention module and each target middle layer module of the Bert model.

In a thirteenth exemplary embodiment, the automatically selecting parameters to be updated for readjusting the text type recognition model during the fine tuning process includes:

Invoking the auxiliary parameter values to determine a calculation relation to calculate the value of at least one auxiliary parameter in the parameter selection network;

automatically selecting parameters to be updated, which are readjusted in the fine tuning process, of the text type recognition model according to the numerical values of the auxiliary parameters;

Wherein, the auxiliary parameter value determines a calculation relation as follows:

；

in the method, in the process of the invention, Selecting auxiliary parameters/>, in the network, for said parametersE is a natural logarithm and α is a hyper-parameter for controlling the slope.

In a fourteenth exemplary embodiment, the automatically selecting parameters to be updated for readjusting the text type recognition model during fine tuning includes:

Acquiring a preset gating threshold value which is preset;

Judging whether each auxiliary parameter is larger than the preset gating threshold value or not for at least one auxiliary parameter in the parameter selection network;

If the value of the current auxiliary parameter is larger than a preset gating threshold, the structural parameter of the current target model structure corresponding to the current auxiliary parameter is not updated;

And if the value of the current auxiliary parameter is smaller than or equal to the preset gating threshold, taking the structural parameter of the current target model structure corresponding to the current auxiliary parameter as the parameter to be updated.

In a fifteenth exemplary embodiment, the automatically determining, according to the parameter selection network, parameters to be updated of the text type recognition model in a training process applicable to a text recognition task includes:

Determining whether the corresponding target model structure is updated or not according to the current value of at least one auxiliary parameter in the parameter selection network in the updating process;

And if the corresponding current target model structure is determined to be updated according to the current value of the current auxiliary parameter, taking the structural parameter of the current target model structure as the parameter to be updated.

In a sixteenth exemplary embodiment, before determining whether the corresponding object model structure is updated according to the current value of the at least one auxiliary parameter in the parameter selection network in the update process, the method further includes:

and calling a pre-constructed gating function, and calculating the value of at least one auxiliary parameter in the parameter selection network.

In a seventeenth exemplary embodiment, determining whether the corresponding object model structure is updated according to the current value of at least one auxiliary parameter in the parameter selection network in the updating process includes:

Judging whether the output value of a gating function of the current auxiliary parameter is larger than a preset gating threshold value or not for at least one auxiliary parameter of the parameter selection network;

if the output value of the gating function of the current auxiliary parameter is larger than a preset gating threshold, setting a mask corresponding to the current auxiliary parameter as a first identification value;

If the output value of the gating function of the current auxiliary parameter is smaller than or equal to the preset gating threshold value, setting a mask corresponding to the current auxiliary parameter as a second identification value;

generating mask information of the parameter selection network according to the identification value corresponding to the mask of each auxiliary parameter;

Determining whether to update the corresponding object model structure based on the mask information;

The first identification value is used for indicating that the target model structure does not perform gradient update, and the second identification value is used for indicating that the target model structure performs gradient update.

In an eighteenth exemplary embodiment, the training the sample set using the text data, based on a text recognition task and the loss function, adjusts parameters to be updated of the text type recognition model, including:

Based on the parameters to be updated selected by the parameter selection network, adjusting a calculation chart in the training process of the text type recognition model in real time;

and performing fine adjustment processing on the text type recognition model according to the adjusted calculation graph.

In a nineteenth exemplary embodiment, the real-time adjustment of the calculation map in the training process of the text type recognition model based on the parameters to be updated selected by the parameter selection network includes:

According to the mask information of the parameter selection network, correspondingly adjusting the gradient identification parameters of each node of the computational graph of the text type recognition model in forward propagation;

the gradient identification parameter is used for indicating whether the corresponding node performs gradient calculation in the back propagation process.

In a twentieth exemplary embodiment, the determining, according to the predicted value of the text type recognition model for each text sample in the text data training sample set and the corresponding text type label, the sparsity loss of the text type recognition model in the training process, includes:

Calling an integral loss function relation, and calculating a loss function of the text type recognition model; the overall loss function relation is as follows:

；

Where loss represents the loss function, And (3) as the output value of the gating function of the ith auxiliary function, N is the total number of text samples contained in the text data training sample set, C is the total number of text types contained in the text data training sample set, y _nj is the label of the jth text type of the nth text sample, and p _nj is the probability value for predicting the jth text type of the nth text sample.

Another aspect of the present invention provides a text recognition apparatus, including:

the data acquisition module is used for acquiring a text type recognition model processed in a pre-training process and acquiring a text data training sample set carrying text type labels;

The parameter selection module is used for determining a parameter selection network according to the model structure characteristics of the text type recognition model; the model structure characteristics are used for determining a minimum model structure unit corresponding to parameters to be updated of the text type recognition model; the parameter selection network is used for automatically selecting parameters to be updated, which are readjusted in the fine tuning process of the text type recognition model;

The loss function determining module is used for determining a loss function together according to the predicted value of the text type recognition model on each text sample of the text data training sample set, the corresponding text type label and the sparsity loss of the text type recognition model in the training process;

the fine tuning module is used for training a sample set by utilizing the text data and adjusting each parameter to be updated of the text type recognition model based on a text recognition task and the loss function;

and the text recognition module is used for recognizing the text type of the input text to be processed by utilizing the adjusted text type recognition model, so as to obtain a corresponding text type recognition result.

The invention also provides an electronic device comprising a processor for implementing the steps of the text recognition method according to any of the preceding claims when executing a computer program stored in a memory.

The invention finally provides a readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the text recognition method according to any of the preceding claims.

The technical scheme provided by the invention has the advantages that the position of the parameter to be updated, which is optimal in the fine adjustment stage, can be automatically found from the text type recognition model through the parameter selection network, manual experience is not needed to be relied on for selection, only the selected parameter is updated in the fine adjustment process of the text type recognition model, and the calculated amount of the fine adjustment task of the text type recognition model can be effectively reduced, so that fine adjustment of the text type recognition model with more flexibility, higher optimization efficiency and better optimization performance can be realized. In addition, the loss function text type recognition model determined by the parameter selection network and the text type recognition model is utilized for training, so that the quantity of parameters which need to be finely adjusted on the basis of guaranteeing the model precision of the text type recognition model is as small as possible, the effective balance of the model precision and the quantity of parameters to be updated is achieved, the calculated quantity of the text type recognition model in a fine adjustment stage is further reduced, the parameter adjustment efficiency and the model precision of the text type recognition model are improved, and the text type recognition precision and recognition efficiency can be further improved.

In addition, the invention also provides a corresponding implementation device, electronic equipment and a readable storage medium for the text recognition method, so that the method is more practical, and the device, the electronic equipment and the readable storage medium have corresponding advantages.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention as claimed.

Drawings

For a clearer description of the present invention or of the technical solutions related thereto, the following brief description will be given of the drawings used in the description of the embodiments or of the related art, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained from these drawings without the inventive effort of a person skilled in the art.

FIG. 1 is a schematic flow chart of a text recognition method provided by the invention;

FIG. 2 is a schematic diagram of a loss function determination process according to the present invention;

FIG. 3 is a schematic flow chart of a parameter selection network for selecting parameters to be updated according to the present invention;

FIG. 4 is a schematic diagram of a structural framework of an exemplary application scenario provided by the present invention;

FIG. 5 is a schematic diagram of a fine tuning process of a text type recognition model of an exemplary application scenario provided by the present invention;

FIG. 6 is a schematic diagram of a text type recognition model of an exemplary application scenario provided by the present invention;

FIG. 7 is a schematic diagram of the internal structure of the encoder layer in the text type recognition model of FIG. 6 according to the present invention;

FIG. 8 is a block diagram of a text recognition device according to one embodiment of the present invention;

Fig. 9 is a block diagram of an embodiment of an electronic device according to the present invention.

Detailed Description

In order to better understand the aspects of the present invention, the present invention will be described in further detail with reference to the accompanying drawings and detailed description. It will be apparent that the described embodiments are only some, but not all, embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention. The terms "first," "second," "third," "fourth," and the like in the description of the invention and in the above figures, are used for distinguishing between different objects and not necessarily for describing a particular sequential or chronological order. Furthermore, the terms "comprise" and "have," as well as any variations of the two, are intended to cover a non-exclusive inclusion. The term "exemplary" means "serving as an example, embodiment, or illustration. Any embodiment described herein as "exemplary" is not necessarily to be construed as preferred or advantageous over other embodiments.

In the natural language processing task process, the pre-training model comprises a pre-training process and a fine tuning process, wherein the fine tuning process is a process of applying the pre-trained model to a data set of a current downstream application task and adapting model parameters to the data set of the model. The embodiment performs fine tuning on a pre-training language model capable of recognizing text types to obtain a text type recognition model for executing a text type recognition task. Currently, the number of parameters of a pre-trained text type recognition model shows an increasing trend, and the demands for computational effort and storage are also increasing, so that fine tuning still requires great computational effort and storage support. The traditional text type recognition model fine tuning mode adopts full-scale fine tuning, model parameters are initialized by pre-training weights and repeatedly updated to a specified value along with gradients which maximize the objective function of the text type recognition model. For each downstream task, a brand new parameter set θ with the same dimension as the pre-training weight is required to be learned, namely θ= { θ1, θ2, …, θn }, where θn is the parameter of the nth downstream task with the same dimension as the pre-training weight. With the increase of the size of the text type recognition model, the cost is increased, and the consumption level hardware cannot support the required computing power resources and storage resources. To solve this problem, PEET (Parameter-EFFICIENT FINE-Tuning, efficient fine Tuning of parameters) has been developed.

The PEFT retrains by selecting part of model parameters, and can enable the LLM (Large Language Model ) to be efficiently adapted to a text recognition task without fine tuning all parameters of a pre-trained model. The parameter adjustment mode for fixing most of the pre-training parameters and only trimming a small amount or additional model parameters greatly reduces the calculation cost and the storage cost, and can realize the model performance equivalent to the full-scale trimming. However, in the related art, when the pre-trained text type recognition model is subjected to fine tuning, the parameter change is represented by using a sparse matrix, and the parameter of the pre-trained model is adjusted by introducing fixed auxiliary parameters. The method can not ensure that the method can be effective for different text type recognition models, and the number of auxiliary parameters in the training process is fixed, so that the flexibility is poor.

In view of the above, the invention can automatically find the optimal position of the parameter to be updated in the fine tuning stage from the text type recognition model through the parameter selection network, does not need to rely on manual experience to select, only updates the selected parameter in the fine tuning process of the text type recognition model, can effectively reduce the calculated amount of the fine tuning task of the text type recognition model, and improves the recognition precision and recognition efficiency of the text type. Various non-limiting embodiments of the present invention are described in detail below. Numerous specific details are set forth in the following description in order to provide a better understanding of the invention. It will be understood by those skilled in the art that the present invention may be practiced without these specific details. In other instances, well known methods, procedures, components, and circuits have not been described in detail so as not to obscure the present invention.

Referring first to fig. 1, fig. 1 is a schematic flow chart of a text recognition method provided by the present invention, and the present invention may include the following:

s101: and acquiring a text type recognition model processed by a pre-training process, and acquiring a text data training sample set carrying text type labels.

In this embodiment, a Pre-training language model is applied to a text recognition task of natural language processing, and the corresponding text type recognition model is a network model subjected to a Pre-training process, where the text type recognition model subjected to the Pre-training process refers to a model that trains the Pre-training language model on a large-scale dataset and has a strong generalization ability, and the Pre-training language model is defined as a text type recognition model, where the Pre-training language model includes, but is not limited to, CNN (Convolutional Neural Network ), RNN (Recurrent Neural Network, cyclic neural network), LSTM (Long Short-Term Memory network), LLM (Large Language Model ), transformer (transforming neural network), bert (Bidirectional Encoder Representation from Transformers, bi-directional encoder characterization from a Transformer), GPT (GENERATIVE PRE-trained Transformer, generating a Pre-training model), CLIP (Contrastive Language-Image Pre-training model, and graphic contrast Pre-training model). The text data training sample set is a training sample set used in the process of fine tuning of a pre-training language model and suitable for a text recognition task, the text data training sample set can comprise a plurality of text samples of different types, each text sample is labeled with a corresponding text type label manually or automatically in advance, and the number of the text samples can be flexibly selected according to actual requirements, so that the realization of the text data training sample set is not affected.

S102: and determining a parameter selection network according to the model structure characteristics of the text type recognition model.

In this step, the parameter selection network is used to automatically select the parameters to be updated, which are readjusted in the fine tuning process of the text type recognition model, and the parameter selection network may be any differentiable network structure, such as a U-Net (U-shaped network) neural network, which is not limited in the present invention. The parameter selection network is a network structure which can be updated by back propagation, the input of the parameter selection network is 0 or 1, and the parameters to be updated of the text type recognition model are selected by binarizing the output. The parameters to be updated are model parameters of the text type recognition model which are readjusted in the process of being suitable for the text recognition task, namely, for convenience of description, each parameter which is required to be updated when the text type recognition model is selected by using the parameter selection network is defined as the parameters to be updated. The model structure characteristics are used for determining the minimum model structure unit corresponding to the parameters to be updated of the text type recognition model. Model structural features include scale features and internal structural features of the text type recognition model, which are features of the network architecture of the text type recognition model. The minimum model structure unit is a model structure corresponding to the parameter to be updated, which is determined by the parameter selection network, and the model structure can be the minimum unit of the text type recognition model or a combination of a plurality of network structures in the text type recognition model. The parameter selection network, for example, enables automated parameter selection by optimizing trainable auxiliary parameters, the smallest model building units of the text type recognition model each corresponding to a trainable auxiliary parameter, and determining whether the smallest model building units should be updated or maintained based on the values of the auxiliary parameters.

S103: and determining a loss function according to the predicted value of each text sample of the text data training sample set by the text type recognition model and the sparsity loss of the corresponding text type label and the text type recognition model in the training process.

In order to adapt to the text type recognition task, the text type recognition model needs to be subjected to independent fine tuning aiming at the text type recognition task, and in the fine tuning process of the text type recognition model, in order to reduce the data volume processed by the fine tuning task, only parameters to be updated need to be updated in the training process of the text type recognition model. In order to further reduce the amount of training tasks, the loss function used by the text type recognition model in the fine tuning process is determined by the sparseness loss caused by the introduction of the parameter selection network in the training process of the text type recognition model and the loss caused by the training of the text type recognition model, namely, the loss function considers the precision loss of the model and the corresponding sparseness loss caused by the number of parameters to be updated. The effective balance of the model precision and the number of parameters to be updated can be realized through precision loss and sparsity loss; the precision loss is the difference of the text type recognition model to the type prediction output of the text sample and the real type label corresponding to the text sample, wherein the real label refers to the information which needs to be paid attention to the text sample, namely the information of the text type, by manpower. The difference between the predicted output and the real label refers to the difference between the predicted output and the real label, that is, the degree to which the predicted output is different from the real label, in other words, the error of the predicted output compared with the real label. The sparsity loss, namely the loss caused by the additional auxiliary parameters introduced in the storage process and the calculation process, can improve the fine tuning efficiency of the text type recognition model and is beneficial to improving the fine tuning precision of the text type recognition model.

S104: and utilizing the text data training sample set to adjust each parameter to be updated of the text type recognition model based on a text recognition task and the loss function.

It will be appreciated that training of the pre-trained language model involves two phases, a pre-training phase and a fine-tuning phase. In the pre-training stage, a large-scale corpus is generally used for training a large-scale neural network algorithm structure for training a specific language model to learn and realize, and the finally obtained large-scale neural network algorithm structure and parameters are the pre-training language model. In the fine tuning stage, small-scale training is performed on specific task targets (downstream tasks) and task data (downstream data), so that fine tuning of text type recognition model parameters is realized, and finally a model adapting to the specific tasks and the data is obtained. In this embodiment, the task target is a text recognition task, which is a process of classifying text documents into different predefined types or labels, and can be applied to various technical fields such as spam filtering, emotion analysis, topic classification, news classification, recommendation system, and the like. The text recognition task is input in a natural language such as document data recorded in any one language and text images in which the text data is recorded. The task data is a text data training sample set, the text data training sample set comprises a plurality of groups of text samples serving as training samples, the text data training sample set comprises image data samples which are not limited to the text data samples and record text information, the text data training sample set is used as a data set of a current downstream application task of a text recognition task, and any pre-training language model is subjected to fine adjustment, so that the pre-training language model is suitable for the text recognition task, and a text type recognition model is obtained.

S105: and carrying out text type recognition on the input text to be processed by utilizing the adjusted text type recognition model to obtain a corresponding text type recognition result.

It can be understood that the text to be processed in this step may also be converted text data, for example, the user wants to identify the type of text data recorded in the image, the image data may be converted into text data first, and the text data is input into a fine-tuned text type identification model as the text to be processed, the text type identification model processes the input text to be processed, and outputs the text type to which the text to be processed belongs, and the output result is used as the text type identification result corresponding to the text to be processed according to the text type identification model.

According to the technical scheme provided by the invention, the position of the optimal parameter to be updated in the fine tuning stage can be automatically found from the text type recognition model through the parameter selection network, manual experience is not needed to be relied on for selection, only the selected parameter is updated in the fine tuning process of the text type recognition model, and the calculated amount of a fine tuning task of the text type recognition model can be effectively reduced, so that fine tuning of the text type recognition model with higher flexibility, higher optimization efficiency and better optimization performance can be realized. In addition, the loss function text type recognition model determined by the parameter selection network and the text type recognition model is utilized for training, so that the quantity of parameters which need to be finely adjusted on the basis of guaranteeing the model precision of the text type recognition model is as small as possible, the effective balance of the model precision and the quantity of parameters to be updated is achieved, the calculated quantity of the text type recognition model in a fine adjustment stage is further reduced, the parameter adjustment efficiency and the model precision of the text type recognition model are improved, and the text type recognition precision and recognition efficiency can be further improved. Text type recognition model text type recognition model.

The foregoing embodiment determines the loss function according to the predicted value of each text sample in the text data training sample set and the corresponding text type label according to the text type recognition model, where the sparsity of the text type recognition model in the training process is lost, so that the text type recognition model is trained without any limitation, and based on this, the present invention further provides a calculation method of the loss function, please refer to fig. 2, which may include the following contents:

s201: and acquiring precision loss information of the text type recognition model in prediction.

S202: and acquiring sparsity loss of the text type recognition model in the training process.

S203: and determining a loss function of the text type recognition model according to the precision loss information and the sparsity loss.

In this embodiment, the accuracy loss information is a model accuracy loss function of the text type recognition model itself, which is determined according to the difference between the predicted text type of the text type recognition model and the text type tag. In order to make the amount of parameters to be trimmed as small as possible without affecting the accuracy of the text type recognition model, the loss function further comprises a sparseness loss caused by parameters to be updated determined by the parameter selection network, and the sparseness loss is determined according to the values of auxiliary parameters of the parameter selection network.

The calculation mode of the sparsity loss can be as follows: obtaining an output value of a gating function of each auxiliary parameter of the parameter selection network; and calculating the sparsity loss according to the sum of the output values of the gating functions of the auxiliary parameters. For example, in order to further improve the fine tuning efficiency of the text type recognition model, a sparsity loss determination relation may be stored locally in advance, and the sparsity loss of the text type recognition model in the training process may be calculated by calling the sparsity loss determination relation; the sparsity loss determination relationship may be expressed as:

；

where loss _s is the sparsity penalty, The output value of the gating function that is the i-th auxiliary function. The smaller the loss _s is, the smaller the corresponding auxiliary parameters are, and the cutting of the auxiliary parameters can be performed by selecting a threshold value, so that the number of the auxiliary parameters is reduced.

For example, in order to further improve the fine tuning efficiency of the text type recognition model, a precision loss determination relation may be stored locally in advance, and precision loss information of the text type recognition model during prediction may be calculated by calling the precision loss determination relation; the loss of precision determination relationship may be expressed as:

；

For example, to further improve the fine tuning efficiency of the text type recognition model, an overall loss function relation may be stored locally, the overall loss function relation may be called, and the loss function of the text type recognition model may be calculated; the overall loss function relation is as follows:

；

Where loss represents the loss function, For the output value of the gating function of the ith auxiliary function, the present embodiment may calculate the value of each auxiliary parameter of the parameter selection network using the gating function, and accordingly,/>, for exampleAnd for the output value of the gating function, N is the total number of text samples contained in the text data training sample set, C is the total number of text types contained in the text data training sample set, y _nj is the label of the nth text sample belonging to the jth text type, and p _nj is the probability value for predicting the nth text sample belonging to the jth text type.

It will be appreciated that during the training process, all trainable parameters, including the auxiliary parameters and the parameters to be updated, need to be optimized to minimize the above-mentioned loss function. The present embodiment may use a back-propagation algorithm to calculate the gradient of the loss function with respect to all trainable parameters and use an optimizer, such as a random gradient descent, to update these parameters. Namely, based on the loss function, calculating the gradient of each parameter to be trained by using a back propagation algorithm, and updating each parameter to be trained by using an optimizer so as to finish fine adjustment of the model parameters of the text type recognition model. In order to further improve the efficiency, a parameter updating relation can be stored locally in advance, and then the following parameter updating relation can be directly called to update each auxiliary parameter; the parameter update relation may be expressed as:

；

From the above, the loss function and the parameter updating method according to the present embodiment can perform fine adjustment of the text type recognition model with high efficiency and high accuracy.

In the above embodiment, no limitation is made on how the structural features based on the text type recognition model correspond to the parameter selection network, and in order to further reduce the number of additional auxiliary parameters introduced, the embodiment adopts a structural mode to perform parameter selection. Referring to fig. 3, an exemplary implementation manner is further provided in the present invention, and the present embodiment is based on a structural structure searching manner, so that by setting a mapping relationship between auxiliary parameters in a parameter selection network and a network structure of a text type recognition model, efficient utilization of the auxiliary parameters is achieved, and further, higher efficient computation reduction and acceleration are ensured. The implementation process of the step S102, that is, "determining the parameter selection network according to the model structure characteristics of the text type recognition model" may include:

S301: and determining the mapping relation between the auxiliary parameters of the parameter selection network and the target model structure of the text type recognition model based on the model structure characteristics of the text type recognition model.

S302: and automatically determining parameters to be updated of the text type recognition model in a training process suitable for the text recognition task according to the parameter selection network.

In this embodiment, the mapping relationship is a correspondence relationship between a trainable parameter used for representing a parameter selection network and a target network structure of a text type recognition model, where the target model structure is a minimum model structure unit corresponding to a parameter to be updated for determining the text type recognition model. Each auxiliary parameter participates in a training process of the text type recognition model for controlling whether a corresponding target model structure is updated. The target downstream application task is a downstream task to which the text type recognition model is to be applied, that is, a downstream task when the text type recognition model performs fine tuning, that is, a text type recognition task in this embodiment.

The embodiment provides a corresponding mapping relation construction mode according to different target model structures: and acquiring a preset scale threshold, and dividing the text type recognition model into a small scale and a large scale according to the preset scale threshold. For a small-scale text type recognition model, taking the model scale of the text type recognition model as the model structure characteristic; if the model scale of the text type recognition model is smaller than or equal to a preset scale threshold, determining that the target model structure is a neuron, and each neuron of the text type recognition model is provided with a corresponding auxiliary parameter to be trained. The preset scale threshold is an experience threshold selected according to an actual application scene and is used for judging whether the text type recognition model belongs to a large-scale huge model or a small-scale model, and the neuron is the minimum unit structure of the text type recognition model, and because the scale of the text type recognition model is smaller, the neuron is used as the minimum model structural unit, a large number of auxiliary parameters are not introduced, and the fine adjustment task amount of the whole text type recognition model is not excessively increased. For example, the text type recognition model is a network model of a fully connected layer including n neurons, corresponding to a parameter selection network, there are n auxiliary parameters to be trained. Furthermore, because the parameters to be updated are involved in the fine tuning task, the corresponding relation between the parameters can be established on the basis of determining the mapping relation between the model structure and the parameter selection network. And establishing a corresponding relation between the parameters corresponding to each neuron structure and the auxiliary parameters to be trained corresponding to the parameters in the parameter selection network for the condition that the target model structure is the neuron. For example, the text type recognition model is a fully connected layer including n neurons, where the parameter of the i-th neuron is wi, and the corresponding auxiliary parameter is ai, and a corresponding relationship is established between wi and ai.

For large-scale text type recognition models, i.e. if the model size of the text type recognition model is larger than a preset size threshold, two cases, one huge model and one non-huge model, can be classified. For the non-massive model, the target model structure is determined as a combination of a plurality of neurons, namely the combination of a plurality of neurons is taken as a target model structure, and each neuron combination of the text type recognition model is provided with a corresponding auxiliary parameter to be trained. For example, the text type recognition model is a network model of a fully connected layer including n neurons, corresponding to a parameter selection network, there are n auxiliary parameters to be trained. Furthermore, because the parameters to be updated are involved in the fine tuning task, the corresponding relation between the parameters can be established on the basis of determining the mapping relation between the model structure and the parameter selection network. And establishing a corresponding relation between the parameters corresponding to each neuron structure and the auxiliary parameters to be trained corresponding to the parameters in the parameter selection network for the condition that the target model structure is the neuron. For example, the text type recognition model is a deep neural network model comprising 3 fully connected layers, each fully connected layer comprising n neurons, and the n neurons of one fully connected layer can be combined as a target model structure, and then there are 3 auxiliary parameters to be trained. Furthermore, because the parameters to be updated are involved in the fine tuning task, the corresponding relation between the parameters can be established on the basis of determining the mapping relation between the model structure and the parameter selection network. For the case that the target model structure is a neuron combination, a corresponding parameter matrix of each neuron structure combination can be established with the auxiliary parameters to be trained corresponding to the parameter matrix in the parameter selection network. For example, if the parameter matrix of the ith full connection layer is wi, the corresponding auxiliary parameter is ai, and a corresponding relationship is established between wi and ai.

For the huge quantity of models, when a parameter selection network is used, if corresponding auxiliary parameters are set for each parameter, the huge quantity of auxiliary parameters are introduced, and a large quantity of extra calculation is introduced for updating the auxiliary parameters in the training process, so that the huge quantity of fine adjustment tasks of the text type recognition model is caused, and fine adjustment of the text type recognition model is not facilitated. In the embodiment, the model scale and the internal network structure of the text type recognition model are taken as the model structure characteristics; if the model scale of the text type recognition model is larger than a preset scale threshold, and the network structure of the text type recognition model comprises a target model substructure meeting preset fixed configuration conditions, the preset fixed configuration conditions are whether the text type recognition model comprises fixed sub-modules or comprises a model structure formed by stacking or repeatedly stacking the fixed sub-modules, and if the text type recognition model comprises the fixed sub-modules, the text type recognition model meets the preset fixed configuration conditions.

For example, the text type recognition model is a Bert model, which contains BertLayer (encoder layers) of 12 identical structures, and each BertLayer contains a fixed SelfAttention (self-attention) layer, so the Bert model is a text type recognition model that satisfies a preset fixed configuration condition. Correspondingly, the target model substructure is a combined network structure of a plurality of neuron groups, namely the target model substructure consists of a plurality of network structures, and each network structure is formed by combining a plurality of neurons. If the target model structure is determined to belong to the target model substructure of the type, each target model substructure of the text type recognition model is provided with a corresponding auxiliary parameter to be trained, so that the selection of the substructure with larger granularity can be realized, and the capability of computing acceleration is improved. For example, the text type recognition model is a Bert model, the text type is a completion text type and a missing text type, and the target model structure is a attention module bertlayer and/or an intermediate layer module bertlayer and/or an auxiliary parameter to be trained corresponding to each bertlayer and/or each bertlayer. Furthermore, because the parameters to be updated are involved in the fine tuning task, the corresponding relation between the parameters can be established on the basis of determining the mapping relation between the model structure and the parameter selection network. For the situation that the target model structure is formed by combining a plurality of neuron groups, each neuron group is combined with a corresponding parameter matrix, and the plurality of neuron groups are recombined into a large parameter matrix, namely a target parameter matrix, namely, for each target model substructure, the target parameter matrix is constructed according to the parameters corresponding to each neuron in the current target model substructure; and establishing a corresponding relation between the target parameter matrix corresponding to each target substructure and the auxiliary parameters to be trained corresponding to the target parameter matrix in the parameter selection network. For example, the target parameter matrix of the ith bertlayer, the corresponding auxiliary parameter is a _i, and a correspondence is established between w _i and a _i. Wherein w ₁,w₂,w₃,w₄ is the parameter matrix formed by the first neuron group, the parameter matrix formed by the second neuron group, the parameter matrix formed by the third neuron group and the parameter matrix formed by the fourth neuron group in this order.

Accordingly, the fine tuning process for the Bert model may be: the attention module and the middle layer module which need to be updated in the Bert model are determined according to the numerical value of each auxiliary parameter of the parameter selection network, and can be defined as a target attention module and a target middle layer module, and a loss function is determined according to the sparseness loss of each auxiliary parameter of the parameter selection network in the training process, the predicted output value of the Bert model and the corresponding text type label; and training a sample set by using the text mask and the text data, and updating model parameters corresponding to each target attention module and each target middle layer module based on the loss function. The text type labels are labels of the type to which manually marked texts corresponding to each text sample belong, and the text masks are masks for marking text eliminating positions.

As can be seen from the foregoing, the present embodiment flexibly determines the scale of the fine tuning structure of the text type recognition model corresponding to the auxiliary parameters of the parameter selection network based on the scale of the text type recognition model, thereby not only realizing automatic selection of the parameters to be updated in the text type recognition model, but also avoiding excessive parameter quantity and extra calculation amount introduced by the parameter selection network, and effectively improving the fine tuning efficiency of the text type recognition model.

In the above embodiment, how to select the parameters to be updated by using the parameter selection network is not limited in any way, and the present invention also provides an exemplary implementation manner of the step of "automatically selecting the parameters to be updated that are readjusted by the text type recognition model in the fine tuning process" in the above embodiment may include:

Determining whether the corresponding target model structure is updated or not according to the current value of each auxiliary parameter in the parameter selection network in the updating process; and if the corresponding current target model structure is determined to be updated according to the current value of the current auxiliary parameter, the parameter of the current target model structure is the parameter to be updated.

In this embodiment, each auxiliary parameter is used to control whether the corresponding object model structure needs to be updated, and the control manner may be determined based on the values of the auxiliary parameters in the updating process. The present embodiment may be implemented by means of gating selection by associating these auxiliary parameters with whether the target model structure is updated or not, for example, if the gating output is 0, the corresponding target model structure does not need to update the parameters; if the gating output is 1, the corresponding target model structure needs to be updated with parameters. Through the gating selection, the automatic selection of parameters to be finely tuned is realized. In other words, the values of the auxiliary parameters may be calculated by calling a pre-built gating function, i.e. before determining whether the corresponding object model structure is updated, the values of each auxiliary parameter in the parameter selection network may also be calculated by calling a pre-built gating function. For each auxiliary parameter in the parameter selection network, if the value of the current auxiliary parameter is larger than a preset gating threshold value, the parameter of the current target model structure corresponding to the current auxiliary parameter is not updated; if the value of the current auxiliary parameter is smaller than or equal to the preset gating threshold, the parameter of the current target model structure corresponding to the current auxiliary parameter is the parameter to be updated. The preset gating threshold can be flexibly selected according to actual conditions, and the implementation of the invention is not affected. Wherein any differentiable function may be used as a gating function to determine whether to update parameters of an ith object model structure, such as neurons, as an exemplary embodiment, the values of each auxiliary parameter in the parametric selection network may be calculated by determining a calculation relationship by calling the auxiliary parameter values; the auxiliary parameter value determination calculation relation can be expressed as:

；

in the method, in the process of the invention, Selecting the value of each auxiliary parameter in the network for the parameter,/>And (3) representing auxiliary parameters, wherein e is a natural logarithm, and alpha is a slope control super-parameter.

Further, in order to improve the fine tuning efficiency of the text type recognition model, based on the above embodiment, the present invention may determine that the text type recognition model needs to be updated in the training process directly based on the mask, that is, the implementation process of the step of the above embodiment of "determining whether the corresponding target model structure is updated by the current value of at least one auxiliary parameter in the parameter selection network in the updating process" may include the following:

Acquiring a preset gating threshold value, and judging whether the output value of a gating function of the current auxiliary parameter is larger than the preset gating threshold value for each auxiliary parameter of the parameter selection network; if the output value of the gating function of the current auxiliary parameter is larger than a preset gating threshold, setting a mask corresponding to the current auxiliary parameter as a first identification value; if the output value of the gating function of the current auxiliary parameter is smaller than or equal to a preset gating threshold value, setting a mask corresponding to the current auxiliary parameter as a second identification value; generating mask information of the parameter selection network according to the identification value corresponding to the mask of each auxiliary parameter; it is determined whether to update the corresponding object model structure based on the mask information.

In this embodiment, during the fine tuning process of the model, the auxiliary parameters in the parameter selection network are updated first, and the parameters to be updated in the original text type recognition model are determined by comparing the values of the auxiliary parameters with the preset gating threshold values. The first identification value is used for indicating that the target model structure does not perform gradient update, and the second identification value is used for indicating that the target model structure performs gradient update. The first identification value and the second identification value can be flexibly selected according to actual application scenes. Taking a target model structure as a neuron, calculating the numerical value of each auxiliary parameter in a parameter selection network by using a gating function, and calculating the output g (a _i) of the gating function corresponding to the w _i neuron by taking the preset gating threshold value as an example, if the output of the gating function is larger than 0.8, the corresponding neuron does not need to carry out gradient update, and setting the Mask [ i ] corresponding to the neuron as 0; if the output of the gating function is less than 0.8, the corresponding neuron needs to be subjected to gradient update, and the Mask [ i ] corresponding to the neuron is set to 1. Taking a text type recognition model as an example of a three-layer deep neural network, the values of auxiliary parameters corresponding to the three-layer neural network are respectively as follows: 0.3, 0.5, 0.3, and a preset gating threshold t=0, and because the value of the auxiliary parameter of the third layer neural network is < t, recording that the Mask of the third layer neural network is Mask [3] =1, and the Mask of the other two layers of neural networks is Mask [1] =mask [2] =0, and correspondingly, the whole Mask of the text type recognition model can be expressed as mask= [0, 1], and when the parameter is updated in the training process of the text type recognition model, updating calculation is performed on the parameter corresponding to the third layer neural network, and updating is not performed on the parameters corresponding to the first layer neural network and the second layer neural network.

The above embodiment does not limit the process of training the text type recognition model by determining each parameter to be updated according to the value of each auxiliary parameter of the parameter selection network, and based on the above embodiment, the present invention further provides an exemplary implementation manner, which may include the following:

Based on the parameters to be updated selected by the parameter selection network, adjusting a calculation chart in the training process of the text type recognition model in real time; and performing fine adjustment processing on the text type recognition model according to the adjusted calculation graph.

In this embodiment, the computational graph is used to describe a graphical model of the relationship between variables in a mathematical expression, typically for automatic differentiation in deep learning. In the computational graph, nodes represent operations such as addition, multiplication, summation, activation functions, etc., and edges represent relationships between inputs and outputs. The computational graph can split complex mathematical expressions into simple operations, and is convenient for deriving and calculating gradients. On forward propagation, the computation graph will compute the output value of each node in turn in the order from input to output, with the node storing the computed values. In the back propagation, the computation graph sequentially computes the gradient of each node according to the sequence from output to input, and transmits the gradient to the precursor node, and the node stores the gradient at the moment. Finally, the computational graph can calculate gradients of the entire expression by automatic differentiation techniques and be used to optimize model parameters. The embodiment realizes the adjustment of the automatic parameter selector to the model calculation map based on the real-time update of the dynamic calculation map, and further, the gradient identification parameters of each node in the calculation map of the forward propagation of the text type identification model can be correspondingly adjusted according to the mask of each auxiliary parameter of the parameter selection network, namely the mask information corresponding to the parameter selection network.

The gradient identification parameter is a requires_grad parameter of each node in the calculation graph of forward propagation, and is used for indicating whether the corresponding node performs gradient calculation in backward propagation. When the text type recognition model is updated in a back propagation mode, the value of the requires_grad parameter of the nodes of the forward propagation calculation map is correspondingly modified according to the whole mask information of the text type recognition model. The requires_grad default value is True, indicating that gradient calculations are required. For each node, if requires_grad=false, then the corresponding node does not need to calculate the gradient; otherwise, calculating the gradient, so as to skip the back propagation calculation of the corresponding node, and reducing the amount of model fine tuning tasks by avoiding unnecessary back propagation calculation, thereby improving the efficiency. For example, according to Mask information mask= [0, 1] of the three-layer neural network model, the required_grad of the corresponding parameters of the second-layer neural network and the third-layer neural network in the text type recognition model is set to False, so that the gradient calculation of the first-layer neural network and the second-layer neural network is skipped during back propagation, and the back propagation time is reduced by performing the back propagation gradient calculation only on each neuron of the third-layer neural network.

As can be seen from the above, the present embodiment realizes the adjustment of the model computation graph by the automatic parameter selector based on the real-time update of the dynamic computation graph, thereby skipping the unnecessary back propagation computation and further reducing the computation amount of the fine tuning task.

In order to make the technical solution of the present invention more clear for those skilled in the art, the present invention provides an exemplary embodiment, and some possible application scenarios related to the technical solution of the present invention are described by way of example, as shown in fig. 4, fig. 4 is a schematic diagram of a hardware composition frame to which a text recognition method provided by the present invention is applicable, which may include the following:

The hardware component framework may include a first electronic device 41 and a second electronic device 42, with the first electronic device 41 and the second electronic device 42 being connected by a network 43. The first electronic device 41 is disposed with a processor for executing the text recognition method described in any of the above embodiments, and the second electronic device 42 is a client having a man-machine interaction interface. The pre-training language model of the present embodiment is a Bert model, that is, based on the Bert model as a text type recognition model, and the first electronic device 41 fine-adjusts the Bert model based on the flow shown in fig. 5. As shown in fig. 6, the box represents a model structure, the oval box represents a model input or a model output, the Bert model can predict the text type to which the input text information belongs, for example, a central processor and an artificial intelligence chip are arranged on a motherboard of a complete text server, and a central processor and [. The input to the Bert model is words encoded into a token number, i.e., a sequence of input word tokens and a word mask for marking word culling locations. The model includes an embedded layer, an encoder, a pooling layer, and an output layer. Wherein the encoder comprises a plurality of encoder layers as the internal body structure of the Bert model, the model comprising 12 encoder layers is shown in fig. 6. And the output result of the coding layer passes through a pooling layer to obtain the final model output. Each encoder layer has the same structure, and each encoder layer contains a fixed model substructure attention module and an intermediate layer module, based on the illustration of fig. 7. For an attention module comprising a self-attention module and an attention output matrix, text data is input to an attention layer through a current layer, is processed by the attention module, and finally obtains the output of the whole attention module through the attention output matrix, wherein the self-attention module comprises a self-attention layer query matrix, a self-attention layer value matrix, a self-attention layer key matrix and a random discarding layer. For the natural language model, the text input needs to be converted into a numerical value through an embedding layer, and then is processed through a deep neural network. In the selection of the target model structure, since the variation of the embedded layer easily causes a severe fluctuation of the model performance, the embedded layer is generally skipped, and the selection of the target model structure is performed in the subsequent encoder or decoder layer. In this embodiment, corresponding auxiliary parameters to be trained are set for the attention module bertlayer of each encoder layer and the middle layer module bertlayer of each encoder layer of the Bert model, for example, a11 corresponds to attention of the first encoder layer BertLayer, and a12 corresponds to attention … of the second encoder layer BertLayer 1. The self-attention layer plays an important role in the model performance of the Bert model, and the whole self-attention module can be selected as a target model structure when the target model structure is selected, so that the number of auxiliary parameters is reduced, and the complexity of subsequent operation can be simplified. In the fine tuning process of the Bert model by using the input data X, updating each auxiliary parameter in the parameter selection network, generating mask information of the Bert model by comparing the gating output value of the auxiliary parameter with the numerical value of a preset gating threshold, wherein the mask is 1, the parameter is required to be updated, and the mask is 0, the parameter is not required to be updated. The required_grad of the corresponding parameter in the Bert model is set to False according to the mask information, so that the gradient is not calculated at the time of back propagation, thereby reducing the time of back propagation. And determining a loss function of the Bert model together according to the precision loss information of the Bert model during prediction and the sparsity loss determined by the output value of the gating function of each auxiliary parameter of the parameter selection network, and training the Bert model by using the loss function. In the training process, a back propagation algorithm is used for calculating the gradient of each parameter to be trained in the loss function, and an optimizer is used for updating each parameter to be trained.

It should be noted that the above application scenario is only shown for the convenience of understanding the idea and principle of the present invention, and the embodiment of the present invention is not limited in any way. Rather, embodiments of the invention may be applied to any scenario where applicable.

From the above, the parameter adjustment of the Bert model can be effectively and flexibly implemented, so that the text recognition accuracy and the text recognition efficiency are improved.

It should be noted that, in the present invention, the steps are not strictly executed sequentially, so long as they conform to the logic sequence, and the steps may be executed simultaneously or may be executed according to a certain preset sequence, and fig. 1, fig. 2, and fig. 3 are only schematic, and are not meant to represent only such an execution sequence.

The invention also provides a corresponding device for the text recognition method, so that the method is more practical. Wherein the device may be described separately from the functional module and the hardware. In the following description, a text recognition apparatus according to the present invention is described, which is used to implement the text recognition method according to the present invention, and in this embodiment, the text recognition apparatus may include or be divided into one or more program modules, and the one or more program modules are stored in a storage medium and executed by one or more processors, to implement the text recognition method according to the first embodiment of the present invention. Program modules in the present invention are referred to as a series of computer program instruction segments capable of performing a particular function, which are more suitable than the program itself for describing the execution of the text recognition device in a storage medium. The following description will specifically describe functions of each program module of the present embodiment, and a text recognition apparatus described below and a text recognition method described above may be referred to correspondingly to each other.

Based on the angle of the functional modules, referring to fig. 8, fig. 8 is a block diagram of a text recognition device according to an embodiment of the present invention, where the device may include:

The data acquisition module 801 is configured to acquire a text type recognition model processed through a pre-training process, and acquire a text data training sample set carrying text type labels;

A parameter selection module 802, configured to determine a parameter selection network according to the model structure characteristics of the text type recognition model; the model structure characteristics are used for determining a minimum model structure unit corresponding to parameters to be updated of the text type recognition model; the parameter selection network is used for automatically selecting parameters to be updated, which are readjusted in the fine tuning process of the text type recognition model;

a loss function determining module 803, configured to determine a loss function according to the predicted value of the text type recognition model on each text sample of the text data training sample set and the sparsity loss of the corresponding text type label and the text type recognition model in the training process; the sparsity loss is used for controlling the total number of the parameters to be updated;

A fine tuning module 804, configured to train a sample set using the text data, and adjust each parameter to be updated of the text type recognition model based on a text recognition task and the loss function;

And the text recognition module 805 is configured to perform text type recognition on the input text to be processed by using the adjusted text type recognition model, so as to obtain a corresponding text type recognition result.

Illustratively, in some implementations of the present embodiment, the loss function determining module 803 may further be configured to:

As an exemplary implementation of the above embodiment, the above loss function determining module 803 may further be configured to:

；

As another exemplary implementation of the above embodiment, the above loss function determining module 803 may further be configured to:

；

Illustratively, in other implementations of the present embodiment, the foregoing trimming module 804 may be further configured to:

As an exemplary implementation of the above embodiment, the fine tuning module 804 may be further configured to:

；/>

Illustratively, in still other implementations of the present embodiment, the parameter selection module 802 described above may be further configured to:

As an exemplary implementation of the above embodiment, the parameter selection module 802 may be further configured to: the model structure characteristics are determined according to the model scale of the text type recognition model, and a preset scale threshold is obtained; if the model scale of the text type recognition model is smaller than or equal to the preset scale threshold, the target model structure of the text type recognition model is a neuron, and the mapping relation between the auxiliary parameters of the parameter selection network and the neuron of the text type recognition model is determined; wherein at least one neuron of the text type recognition model has a corresponding auxiliary parameter to be trained.

As an exemplary implementation of the above embodiment, the parameter selection module 802 may be further configured to: and establishing a corresponding relation between the structural parameter corresponding to at least one neuron structure and the auxiliary parameter to be trained corresponding to the structural parameter in the parameter selection network.

As another exemplary implementation of the above embodiment, the above parameter selection module 802 may be further configured to: the model structure characteristics are determined according to the model scale and the network model structure of the text type recognition model, and a preset scale threshold is obtained; if the model scale of the text type recognition model is larger than a preset scale threshold value, and a network model structure corresponding to the text type recognition model comprises a target model substructure meeting a preset fixed configuration condition, the target model structure of the text type recognition model is the target model substructure, and a mapping relation between auxiliary parameters of a parameter selection network and the target model substructure of the text type recognition model is determined; the target model substructure is a combined network structure of a plurality of nerve cell groups, and at least one target model substructure of the text type recognition model is provided with a corresponding auxiliary parameter to be trained.

As an exemplary implementation of the above embodiment, the parameter selection module 802 may be further configured to:

As another exemplary implementation of the above embodiment, the above parameter selection module 802 may be further configured to: the text type recognition model adopts a bi-directional encoder characterization quantity Bert model from a transformer, and the target model structure is an attention module and a middle layer module of at least one encoder layer; corresponding auxiliary parameters to be trained are set for the attention module and the middle layer module of the Bert model.

；

Acquiring a preset gating threshold value which is preset;

As another exemplary implementation of the above embodiment, the above parameter selection module 802 may be further configured to:

Illustratively, in still other implementations of the present embodiment, the foregoing trimming module 804 may be further configured to:

As another exemplary implementation of the above embodiment, the fine tuning module 804 may be further configured to:

Illustratively, in still other implementations of the present embodiment, the loss function determining module 803 may be further configured to:

；

The functions of each functional module of the text recognition device of the present invention may be specifically implemented according to the method in the above method embodiment, and the specific implementation process may refer to the related description of the above method embodiment, which is not repeated herein.

From the above, the embodiment can effectively and flexibly realize parameter adjustment of the text type recognition model, and improve the text type recognition accuracy and recognition efficiency.

The text recognition device mentioned above is described from the viewpoint of functional modules, and the present invention also provides an electronic device described from the viewpoint of hardware, as shown in fig. 9. The electronic device comprises a memory 90 for storing a computer program; a processor 91 for implementing the steps of the text recognition method as mentioned in any of the embodiments above when executing a computer program. In some embodiments, the electronic device may further include a display 92, an input/output interface 93, a communication interface 94, a power supply 95, a communication bus 96, and a sensor 97 for implementing various functions.

Processor 91 may include one or more processing cores, which may be controllers, microcontrollers, microprocessors, or artificial intelligence processors for processing computing operations involving machine learning, among other things. Memory 90 may include one or more of computer-readable storage media, high-speed random access memory, and nonvolatile memory, and may be an internal storage unit or an external storage device. The memory 90 may store application software installed on the electronic device and various types of data, such as: code of a program or the like in executing the text recognition method, and data that has been output or is to be output may also be stored temporarily. In this embodiment, the memory 90 is at least used for storing a computer program 901, which is capable of implementing relevant steps of the text recognition method disclosed in any of the previous embodiments after being loaded and executed by the processor 91, where the stored resources may further include an operating system 902 and data 903, and the operating system 902 may include Windows, unix, linux and so on. The data 903 may include, but is not limited to, data corresponding to text recognition results.

It will be appreciated that the text recognition method of the above-described embodiments may be stored in a computer readable storage medium if implemented in the form of software functional units and sold or used as a stand-alone product. Based on such understanding, the technical solution of the present invention may be embodied essentially or in part or all of the technical solution contributing to the related art, or may be embodied in the form of a software product stored in a storage medium, which performs all or part of the steps of the methods of the various embodiments of the present invention. Based on this, the invention also provides a readable storage medium storing a computer program which, when executed by a processor, performs the steps of the text recognition method according to any of the embodiments above.

The text recognition method, the text recognition device, the electronic equipment and the readable storage medium provided by the invention are described in detail. The present invention may be subject to several improvements and modifications without departing from the principles of the present invention, which also fall within the scope of the present invention.

Claims

1. A method of text recognition, comprising:

Determining a loss function jointly according to the predicted value of each text sample of the text data training sample set by the text type recognition model, the corresponding text type label and the sparsity loss of the text type recognition model in the training process; the sparsity loss is used for controlling the total number of the parameters to be updated;

Performing text type recognition on the input text to be processed by using the adjusted text type recognition model to obtain a corresponding text type recognition result;

The determining a loss function according to the predicted value of each text sample of the text data training sample set by the text type recognition model, the corresponding text type label and the sparsity loss of the text type recognition model in the training process includes:

Acquiring sparsity loss of the text type recognition model in a training process and precision loss information in prediction, and determining a loss function of the text type recognition model according to the precision loss information and the sparsity loss;

The sparsity loss is calculated in the following way: acquiring an output value of a gating function of an auxiliary parameter of the parameter selection network; calculating sparsity loss according to the sum of output values of the gating functions of the auxiliary parameters; the precision loss information is based on Calculating;

2. The method for recognizing text according to claim 1, wherein the determining the loss function according to the predicted value of the text type recognition model for each text sample in the training sample set of the text data and the corresponding text type label, the sparsity loss of the text type recognition model in the training process includes:

3. The method for text recognition according to claim 2, wherein the obtaining the sparsity loss of the text type recognition model during training comprises:

；

4. The text recognition method of claim 1, wherein the training the sample set using the text data adjusts parameters to be updated of the text type recognition model based on a text recognition task and the loss function, comprising:

5. The text recognition method of claim 4, wherein the training the sample set using the text data adjusts parameters to be updated of the text type recognition model based on a text recognition task and the loss function, comprising:

；

6. The text recognition method of claim 1, wherein the determining a parametric selection network based on model structural features of the text type recognition model comprises:

7. The text recognition method of claim 6, wherein the model structure characteristics are determined according to a model scale of the text type recognition model, and wherein the determining a mapping relationship between the auxiliary parameters of the parameter selection network and the target model structure of the text type recognition model based on the model structure characteristics of the text type recognition model comprises:

8. The text recognition method of claim 7, wherein determining a mapping relationship between auxiliary parameters of a parameter selection network and neurons of the text type recognition model comprises:

9. The text recognition method of claim 6, wherein the model structure characteristics are determined according to a model size and a network model structure of the text type recognition model, and wherein the determining a mapping relationship between the auxiliary parameters of the parameter selection network and the target model structure of the text type recognition model based on the model structure characteristics of the text type recognition model comprises:

Acquiring a preset scale threshold;

10. The text recognition method of claim 9, wherein determining a mapping relationship between auxiliary parameters of a parameter selection network and a target model substructure of the text type recognition model comprises:

11. The text recognition method of claim 10, wherein the text type recognition model employs a bi-directional encoder characterization Bert model from a transformer, the target model structure being an attention module and a middle layer module of at least one encoder layer; determining a mapping relationship between auxiliary parameters of a parameter selection network and a target model substructure of the text type recognition model, comprising:

12. The text recognition method of claim 11, wherein the training the sample set using the text data adjusts parameters to be updated of the text type recognition model based on a text recognition task and the loss function, comprising:

13. The text recognition method of claim 1, wherein automatically selecting parameters to be updated for readjustment of the text type recognition model during fine tuning comprises:

；

14. The text recognition method of claim 1, wherein automatically selecting parameters to be updated for readjustment of the text type recognition model during fine tuning comprises:

Acquiring a preset gating threshold value which is preset;

15. The text recognition method of claim 6, wherein automatically determining parameters to be updated of the text type recognition model in a training process applicable to a text recognition task based on the parameter selection network comprises:

16. The text recognition method of claim 15, wherein determining whether the corresponding object model structure is updated by the current value of at least one auxiliary parameter in the parameter selection network during the updating process, further comprises:

17. The text recognition method of claim 16, wherein determining whether the corresponding object model structure is updated by the current value of at least one auxiliary parameter in the parameter selection network during the updating process, comprises:

18. The text recognition method of claim 1, wherein the training the sample set using the text data adjusts parameters to be updated of the text type recognition model based on a text recognition task and the loss function, comprising:

19. The text recognition method of claim 18, wherein the selecting the parameters to be updated selected by the network based on the parameters adjusts a computational graph of the training process of the text type recognition model in real time, comprising:

20. The method according to any one of claims 1 to 19, wherein determining a loss function according to the text type recognition model includes:

；

Where loss represents the loss function, The output value of the gating function that is the i-th auxiliary function.

21. A text recognition device, comprising:

The loss function determining module is used for determining a loss function together according to the predicted value of the text type recognition model on each text sample of the text data training sample set, the corresponding text type label and the sparsity loss of the text type recognition model in the training process; the sparsity loss is used for controlling the total number of the parameters to be updated;

the text recognition module is used for recognizing the text type of the input text to be processed by utilizing the adjusted text type recognition model to obtain a corresponding text type recognition result;

Wherein the loss function determination module is further to:

22. An electronic device comprising a processor and a memory, the processor being configured to implement the steps of the text recognition method of any one of claims 1 to 20 when executing a computer program stored in the memory.

23. A readable storage medium, characterized in that it has stored thereon a computer program which, when executed by a processor, implements the steps of the text recognition method according to any of claims 1 to 20.