WO2023078264A1

WO2023078264A1 - Method and apparatus for training business card information extraction system, and computer-readable storage medium

Info

Publication number: WO2023078264A1
Application number: PCT/CN2022/129071
Authority: WO
Inventors: 王奥迪; 杨希
Original assignee: 中移(苏州)软件技术有限公司; 中国移动通信集团有限公司
Priority date: 2021-11-03
Filing date: 2022-11-01
Publication date: 2023-05-11
Also published as: CN116090463A

Abstract

A method and apparatus for training a business card information extraction system, and a storage medium. The method comprises: performing recognition on a business card image, so as to obtain text information (S101); then performing training by means of a preset BERT model and a preset convolutional neural network to process the text information, so as to obtain a feature vector (S102); then performing combined encoding on the feature vector, so as to obtain corresponding text fragment feature information (S103); and finally, distinguishing the text fragment feature information by using a classifier, so as to obtain a predicted classification label corresponding to the text fragment feature information (S104). By means of a preset target function, a loss value of the predicted classification label corresponding to the text fragment feature information and a preset classification label meets the requirements, thereby completing the training of a business card information extraction system; and by means of screening the predicted classification label, structured information is obtained. The method can improve the effect of a system when same performs information extraction on a business card, such that the error of structured information extracted when information extraction is performed on the business card is reduced.

Description

A training method and device for a business card information extraction system, and a computer-readable storage medium

Cross References to Related Applications

The present invention is based on a Chinese patent application with application number 202111296307.7 and a filing date of November 03, 2021, and claims the priority of this Chinese patent application. The entire content of this Chinese patent application is hereby incorporated by reference.

technical field

The invention relates to the technical field of image information processing, in particular to a training method and device for a business card information extraction system, and a computer-readable storage medium.

Background technique

The main goal of business card information extraction is to input an image of a business card and extract the structured information in the business card. The structured information includes key fields such as name, position, company, address, phone number, and email address.

In the prior art, the extraction of business card information mainly includes two processes: first, using optical character recognition (optical character recognition, OCR) technology to identify the text in the business card from the business card image; secondly, structuring the text recognized by OCR As the final output result of the system, the OCR-recognized text is structured using artificial design rules or named entity recognition technology, thereby extracting the key fields in the business card.

However, because the existing OCR technology needs to recognize "trigger words" in actual use, thereby recognizing text from images, and the layout of business cards is various, and there are many redundant information, and some business card information contains "trigger words". ", some business card information does not include "trigger words", and some business card information "trigger words" are icons; therefore, when information is extracted from business cards in the prior art, there are errors in the extracted structured information. How to obtain a system that can improve the accuracy of business card information extraction has become a technical problem to be solved urgently.

Contents of the invention

The embodiment of the present invention expects to provide a business card information extraction system training method and device, and a computer-readable storage medium, which can improve the effect of the system when extracting information from business cards, thereby reducing the extracted structured information when extracting information from business cards. information errors.

Technical scheme of the present invention is realized like this:

An embodiment of the present invention provides a business card information extraction system training method, including:

Recognizing the business card image to obtain text information; wherein, the business card image is at least one of the following: a real business card image or a simulated business card image;

A feature vector is obtained based on a preset BERT model, a preset convolutional neural network, and the text information; wherein, the feature vector represents semantic information of vocabulary in the text information;

Based on the preset cyclic neural network, the feature vectors are combined and encoded to obtain corresponding text segment feature information; wherein the text segment feature information represents text content of different combinations;

Use a classifier to discriminate the text segment feature information, so as to obtain the predicted classification label corresponding to the text segment feature information; wherein, the predicted classification label represents the text type of the text segment feature information, and the predicted classification label To obtain the basis for structured information;

Based on the preset objective function, and the predicted classification label and preset classification label corresponding to the feature information of the text segment, a loss value is obtained, and a target parameter is determined according to the loss value; wherein, the target parameter is the predicted Assume variables in the BERT model, the preset convolutional neural network, the preset cyclic neural network, and the classifier, and the target parameter represents a system for extracting business card information.

In the above solution, the loss value is obtained based on the preset objective function, and the predicted classification label and the preset classification label corresponding to the feature information of the text segment, and the target parameter is determined according to the loss value, including:

Obtain a first sub-loss value according to the first sub-objective function and the first preset weight, as well as the preset classification label and the predicted classification label;

According to the second sub-objective function, the second preset weight, and the first preset weight, the preset classification label and the predicted classification label, the second sub-loss value is obtained; wherein, the first sub-objective function and the second sub-objective function are both the preset objective function;

The loss value is obtained based on the first sub-loss value and the second sub-loss value, and the target parameter is determined according to the loss value.

In the above solution, the determination of the target parameter according to the loss value includes:

When the loss value remains in a state of not decreasing, determine the variables in the current preset BERT model, the preset convolutional neural network, the preset cyclic neural network, and the classifier as the target parameters .

In the above scheme, the feature vector is obtained based on the preset BERT model, the preset convolutional neural network, and the text information, including:

Based on the preset BERT model, the text content information is converted to obtain a word vector sequence; wherein the text information includes text content information and text position information;

Filling the word vector sequence into a preset two-dimensional grid according to the position information of the text line, so as to obtain a target two-dimensional grid;

The feature vector is obtained according to the target two-dimensional grid and the preset convolutional neural network.

In the above solution, the feature vector is obtained according to the target two-dimensional grid and the preset convolutional neural network, including:

The feature vector is obtained according to the target two-dimensional grid and the three-dimensional convolution kernel in the preset convolutional neural network.

In the above solution, the feature vector is obtained according to the target two-dimensional grid and the three-dimensional convolution kernel in the preset convolutional neural network, including:

Extracting features in the target two-dimensional grid according to the three-dimensional convolution kernel in the preset convolutional neural network, so as to obtain the feature vector.

In the above scheme, when the business card image is the simulated business card image, before the identification of the business card image and obtaining the text information, the method includes:

Collect text sample information;

The business card image is obtained according to the text sample information, a preset generative confrontation network, and preset typesetting rules.

An embodiment of the present invention provides a business card information extraction system training device, including an obtaining part and a determining part; wherein,

The obtaining part is configured to identify the business card image to obtain text information; based on a preset BERT model, a preset convolutional neural network, and the text information, obtain a feature vector; wherein the feature vector represents the Semantic information of vocabulary in the text information; based on the preset recurrent neural network, the feature vector is encoded to obtain the corresponding text segment feature information; wherein, the text segment feature information represents different combinations of text content; using a classifier to The feature information of the text segment is discriminated, so as to obtain the predicted classification label corresponding to the feature information of the text segment; wherein, the predicted classification label represents the text type of the feature information of the text segment, and the predicted classification label is to obtain a structured Information basis; based on a preset objective function, and the predicted classification label and preset classification label corresponding to the text segment feature information, a loss value is obtained, and a target parameter is determined according to the loss value;

The determining part is configured to determine a target parameter according to the loss value; wherein, the target parameter is the preset BERT model, the preset convolutional neural network, the preset cyclic neural network, and the preset Variables in the classifier, the target parameters characterize the system configured to extract business card information.

In the above solution, the obtaining part is further configured to obtain the first sub-loss value according to the first sub-objective function and the first preset weight, as well as the preset classification label and the predicted classification label; according to the second A sub-objective function, a second preset weight, and the first preset weight, the preset classification label, and the predicted classification label to obtain a second sub-loss value; wherein, the first sub-objective function and the The second sub-objective function is the preset objective function; the loss value is obtained based on the first sub-loss value and the second sub-loss value, and the objective parameter is determined according to the loss value.

In the above solution, the determining part is further configured to determine the current preset BERT model, the preset convolutional neural network, and the preset cyclic neural network when the loss value remains in a state of not decreasing. And the variables in the classifier are the target parameters.

In the above solution, the obtaining part is further configured to convert the text content information based on the preset BERT model to obtain a word vector sequence; wherein the text information includes text content information and text position information; according to the The text line position information, filling the word vector sequence into the preset two-dimensional grid, so as to obtain the target two-dimensional grid; according to the target two-dimensional grid, and the preset convolutional neural network, thus Get the eigenvectors.

In the above solution, the obtaining part is further configured to obtain the feature vector according to the target two-dimensional grid and the three-dimensional convolution kernel in the preset convolutional neural network.

In the above solution, the obtaining part is further configured to extract features in the target two-dimensional grid according to the three-dimensional convolution kernel in the preset convolutional neural network, so as to obtain the feature vector.

In the above solution, the device further includes a collection part configured to collect text sample information when the business card image is the simulated business card image;

The obtaining part is further configured to obtain the business card image according to the text sample information, a preset generative confrontation network, and preset typesetting rules.

An embodiment of the present invention provides a business card information extraction system training device, including:

a memory for storing executable data instructions;

The processor is configured to implement the business card information extraction system training method described in the embodiment of the present invention when executing the executable instructions stored in the memory.

An embodiment of the present invention provides a computer storage medium, which is characterized in that executable instructions are stored therein for causing a processor to execute to implement the business card information extraction system training method described in the embodiment of the present invention.

The embodiment of the present invention provides a business card information extraction system training method and device, and a computer storage medium. The method includes identifying the business card image to obtain text information, and then training through a preset BERT model and a preset convolutional neural network. Process the text information to obtain feature vectors, and then combine and encode the feature vectors to obtain the corresponding text segment feature information, and finally use the classifier to discriminate the text segment feature information to obtain the predicted classification labels corresponding to the text segment feature information, Through the preset objective function, the loss value of the predicted classification label corresponding to the feature information of the text segment and the preset classification label is obtained. When the loss value meets the requirements, the training of the business card information extraction system will be completed.

The embodiment of the present invention can improve the effect of the information extraction of the business card by the system, thereby reducing the error of the extracted structured information when the information is extracted from the business card.

Description of drawings

Fig. 1 is a structure diagram 1 of a business card information extraction system provided by an embodiment of the present invention;

FIG. 2 is a second architecture diagram of a business card information extraction system provided by an embodiment of the present invention;

Fig. 3 is a flow chart 1 of a training method for a business card information extraction system provided by an embodiment of the present invention;

Fig. 4 is a flow chart 2 of a training method for a business card information extraction system provided by an embodiment of the present invention;

Fig. 5 is a flowchart three of a training method for a business card information extraction system provided by an embodiment of the present invention;

Fig. 6a is a schematic diagram of text sample information of a business card information extraction system training method provided by an embodiment of the present invention;

Fig. 6b is a schematic diagram of simulated text information of a business card information extraction system training method provided by an embodiment of the present invention;

7 is a schematic diagram of a simulated business card image generated by a business card information extraction system training method provided by an embodiment of the present invention;

FIG. 8 is a flow chart of a method for extracting business card information provided by an embodiment of the present invention;

Fig. 9 is a schematic diagram of a business card image provided by an embodiment of the present invention;

FIG. 10 is a first structural diagram of a business card information extraction system training device provided by an embodiment of the present invention;

FIG. 11 is the second structure diagram of a training device for a business card information extraction system provided by an embodiment of the present invention.

Detailed ways

The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the drawings in the embodiments of the present invention.

Before introducing the solutions of the embodiments of the present invention, a brief description of the technical terms that may be used in the embodiments of the present invention is given:

GAN (Generative Adversarial Networks, Generative Adversarial Networks) is a deep learning model. The model produces quite good output through mutual game learning of (at least) two modules in the framework: Generative Model (Generative Model) and Discriminative Model (Discriminative Model). .

NLP (Natural Language Processing, Natural Language Processing) is a branch of artificial intelligence and linguistics. It mainly studies the use of computers to process information such as shape, sound, and meaning of natural language, that is, the input of words, words, sentences, and texts. , output, recognition, analysis, understanding, generation, etc. operation and processing. Specific manifestations of natural language processing include machine translation, text summarization, text classification, text proofreading, information extraction, speech synthesis, speech recognition, etc.

NER (Name Entity Recognition, named entity recognition) is a very basic task in NLP. NER is an important basic tool for many NLP tasks such as information extraction, question answering systems, syntax analysis, and machine translation. The purpose of named entity recognition is to identify entities of specified categories in text. The so-called named entities are people's names, organization names, place names and all other entities identified by names.

OCR (Optical Character Recognition, Optical Character Recognition) means that electronic equipment (such as a scanner or digital camera) checks characters printed on paper, determines its shape by detecting dark and bright patterns, and then uses character recognition to translate the shape into a computer The process of text; that is, for printed characters, the text in the paper document is converted into a black and white dot matrix image file by optical means, and the text in the image is converted into a text format by computer technology for further processing by word processing software editing technology

Fig. 1 is a structure diagram 1 of a business card information extraction system provided by an embodiment of the present invention. As shown in Fig. 1 , an embodiment of the present invention provides a business card information extraction system, including an input module 1, an OCR module 2, and a NER module 3 and output module 4. Wherein, the input module 1 is used to input the business card image to be recognized; the OCR module 2 is used to extract the text in the input business card image, and outputs it in text format; Recognition; the output module 4 is used to post-process the recognition result output by the NER module 3, and output the final target structured information.

In some embodiments of the present invention, FIG. 2 is a second architecture diagram of a business card information extraction system provided by an embodiment of the present invention. As shown in FIG. 2 , the NER module 3 completes OCR through the NER model set in the NER module 3 Entity recognition of the output text. The NER model includes Word Embedding layer, Bidirectional layer, Hidden layer, Span Representations layer, Fully-connected Layer layer and Span Classifier layer.

In some embodiments of the present invention, the Word Embedding layer is used to process the text output by OCR based on a preset BERT model and a preset convolutional neural network to obtain a feature vector. The Bidirectional layer is used to encode the feature vector based on the preset cyclic neural network to obtain the feature information of the corresponding text segment. The Hidden layer is used to convert the hidden text segment feature information that is not easy to be captured in the text segment feature information into readable text through the Hidden model, that is, convert the hidden text segment feature information into readable text information. For example, if The Bidirectional layer encodes the feature vector, and the obtained text segment feature information contains a piece of hidden text segment feature information "B-LOC|I-LOC|I-LOC". Through the Hidden model of the Hidden layer, the above hidden text can be passed The fragment feature information is converted into readable text information "FTA". The Span Representations layer is used to splice the feature information of the above text fragments according to preset rules. The Fully-connected Layer layer is used for feature fusion or feature weighting of the text segment feature information; the Span Classifier layer is used to distinguish the text segment feature information, obtain the predicted classification label corresponding to the text segment feature information, and filter the predicted classification label. The feature information of the target text segment will be obtained, and the structural information can be determined according to the feature information of the target text segment and the predicted classification label corresponding to the feature information of the target text segment.

Fig. 3 is a flowchart one of a training method for a business card information extraction system provided by an embodiment of the present invention. As shown in Fig. 3 , the training method for a business card information extraction system provided by an embodiment of the present invention includes:

S101. Recognize the business card image to obtain text information. Wherein, the business card image is at least one of the following: a real business card image or a simulated business card image.

In the embodiment of the present invention, it is applicable to the scene where the business card image is recognized and the text information meeting the preset requirements is acquired.

In the embodiment of the present invention, the business card image is recognized by the OCR module to obtain the required text information.

In the embodiment of the present invention, the business card image is a real business card image and/or a simulated business card image. Wherein, the real business card image represents a business card printed in real life, and in actual use, the business card printed in real life may be scanned or photographed to obtain a real business card image. The simulated business card image is constructed based on printed business cards in real life, and the simulated business card image corresponds to a business card that has not been printed in reality or is different from a printed business card. In actual use, the adversarial network can be generated through the preset, and the corresponding simulated business card image can be output according to the preset content (text sample information) and layout (preset layout rules).

In the embodiment of the present invention, the input of the business card image is completed through the input module; wherein, the business card image needs to perform a data preprocessing operation before input.

Exemplarily, the data preprocessing operation may be binarization, direction correction, distortion correction, denoising and so on.

In the embodiment of the present invention, the OCR module is mainly responsible for extracting the text in the input business card image and outputting it in a text format, so as to obtain text information. Wherein, the text information is granular in text lines, and each text line includes text content and text position information.

It can be understood that the business card image is recognized by the OCR module, so as to obtain text information in a text format, which is convenient for subsequent processing.

S102. Obtain a feature vector based on a preset BERT model, a preset convolutional neural network, and text information; wherein, the feature vector represents semantic information of words in the text information.

In the embodiment of the present invention, it is applicable to a scenario where the text information obtained in S101 is processed to obtain semantic information of vocabulary in the text information.

In the embodiment of the present invention, the feature vector representing the semantic information of the vocabulary in the text information is obtained from the text information through the preset BERT model and the preset convolutional neural network.

In the embodiment of the present invention, the BERT model (Bidirectional Encoder Representations from Transformers model) is a self-encoding language model, which can extract the relationship features of vocabulary in sentences, and can extract relationship features at multiple different levels, thereby more comprehensively reflecting sentence semantics , and the word meaning can be obtained according to the sentence context during the extraction process, so as to avoid ambiguity.

In the embodiment of the present invention, the text information will be input into the preset BERT model in the form of token, wherein the token form refers to the original word vector of each word/word in the text; in actual use, the token form of the text information can be recorded as " Text token sequence T", text token sequence T=(t ₁ ,t ₂ ,...,t _N ). Input the text token sequence T into the preset BERT model, convert the text token sequence T through the preset BERT model, and get the word vector sequence W, the word vector sequence W=(w ₁ ,w ₂ ,..., w _N ).

For example, the text information is "company address: XXXX, XX Road, XX District, XX City", and the above text information is converted into token form and input into the preset BERT model, that is, "public" is converted into t ₁ in the text token sequence T , convert "司" into t ₂ in the text token sequence T, etc., and input the converted t ₁ and t ₂ into the preset BERT model to obtain the corresponding word vector sequence.

In the embodiment of the present invention, after the text information is converted into a word vector sequence through the preset BERT model, it is necessary to fill the word vector sequence W into the preset two-dimensional grid according to the text position information in the text information; wherein, The value of each position (grid) in the two-dimensional grid corresponds to a word vector in the text token sequence T. In actual use, if there is a vacancy in the preset two-dimensional grid, the word vector of <PAD> Vector fill. Finally, the local feature capture of the above two-dimensional grid is carried out through the preset convolutional neural network, so as to obtain the feature vector.

It can be understood that the named entity recognition method based on segment classification based on text information in the embodiment of the present invention can reduce the impact of entity trigger word errors during the named entity identification process, thereby improving the effect of entity extraction.

Fig. 4 is a flowchart two of a business card information extraction system training method provided by an embodiment of the present invention. As shown in Fig. 4, S102 may also include S1021-S1023, as follows:

S1021. Based on the preset BERT model, convert the text content information to obtain a word vector sequence; wherein, the text information includes text content information and text position information.

In some embodiments of the present invention, it is applicable to a scenario of processing text content information in text information.

In some embodiments of the present invention, the text content information in the text information is input into the preset BERT model according to the format required by the preset BERT model, so as to obtain a sequence of word vectors.

In some embodiments of the present invention, the text information includes text content information and text position information, wherein the text content information refers to the text on the business card image, and the text position information refers to the coordinates of the text on the business card image, and the coordinates can be based on preset coordinates Department OK.

It can be understood that, in the embodiment of the present invention, the word vector sequence is obtained by presetting the BERT model, which can improve the accuracy of the obtained word vector sequence.

S1022. According to the position information of the text line, fill the sequence of word vectors into the preset two-dimensional grid, so as to obtain the target two-dimensional grid.

In some embodiments of the present invention, it is suitable for obtaining a target two-dimensional network and providing data support for subsequent further processing through a preset convolutional neural network.

In some embodiments of the present invention, according to the text line position information in the text information obtained in S101, as shown in the Word Embedding layer of the NER model in Figure 2, each word vector in the word vector sequence is filled to the preset Set in the two-dimensional grid, so as to obtain the target two-dimensional grid. Wherein, the specification of the preset two-dimensional grid is r*c, r is the number of text lines, and c is the maximum length of the text lines.

It can be understood that in this way, the capture of local features can be completed, and the layout information of the text on the business card image can be modeled.

S1023. Obtain a feature vector according to the target two-dimensional grid and a preset convolutional neural network.

In some embodiments of the present invention, it is applicable to the scene where after the target two-dimensional grid is obtained, the above data is subjected to subsequent processing to obtain the feature vector.

In some embodiments of the present invention, the target two-dimensional grid is input into a preset convolutional neural network to obtain a feature vector.

In some embodiments of the present invention, S1023 includes: obtaining the feature vector according to the target two-dimensional grid and presetting the three-dimensional convolution kernel in the convolutional neural network.

In the embodiment of the present invention, in the three-dimensional convolution kernel, the first dimension represents the width of the convolution kernel, and the first dimension is the same as the length of the word vector; the second dimension represents the height of the convolution kernel; the third dimension represents the convolution The size of the kernel, and the size of the third dimension is the same as the length of the word vector.

In some embodiments of the present invention, in S1023, according to the preset three-dimensional convolution kernel in the convolutional neural network, the features in the target two-dimensional grid are extracted to obtain a feature vector.

In some embodiments of the present invention, the features in the target two-dimensional grid are extracted using the three-dimensional convolution kernel in the preset convolutional neural network, wherein each word vector corresponds to at least one feature, and each feature Corresponding to multiple word vectors.

It can be understood that the feature is extracted through the three-dimensional convolution kernel to improve the accuracy of the extracted feature.

S103. Based on the preset recurrent neural network, perform combined encoding on the feature vectors to obtain corresponding text segment feature information; wherein, the text segment feature information represents text content of different combinations.

In the embodiment of the present invention, it is applicable to the scene where feature vectors are combined and coded to obtain feature information of text segments.

In the embodiment of the present invention, after the feature vector is obtained through S102, the feature vector is input into the preset cyclic neural network, and the combination encoding of the feature vector is obtained through the preset cyclic neural network, so as to obtain the text segment feature information, and the text segment feature information represents the feature The text content of different combinations formed by the combination of vectors.

In the embodiment of the present invention, the preset cyclic neural network, that is, the LSTM model, uses the preset cyclic neural network to characterize the combination of feature vectors and performs different forms of elements in the text token sequence corresponding to the text information through the preset cyclic neural network. Combination representations, such as: forward cycle representation or reverse cycle representation. In this way, the semantics of sentences formed by different combinations of elements in the above text token sequence can be obtained. Wherein, as described in S102, the text token sequence represents text information, and each element in the text token sequence represents a word in the text information.

In the embodiment of the present invention, due to the fact that some information may not be easily recognized or captured during the processing of text information, the above-mentioned text segment feature information will include some hidden text segment feature information; at this time, it may The hidden text segment feature information is converted into readable text information (text segment feature information) through the Hidden layer as shown in FIG. 2 to ensure the integrity of the text segment feature information.

It can be understood that the preset cyclic neural network can improve the accuracy rate of identifying the semantics of the text content corresponding to the feature information of the text segment.

S104. Use a classifier to discriminate the feature information of the text segment, so as to obtain the predicted classification label corresponding to the feature information of the text segment; wherein, the predicted classification label represents the text type of the feature information of the text segment, and the predicted classification label is the basis for obtaining the structured information.

In the embodiment of the present invention, it is applicable to the scenario of acquiring structured information.

In the embodiment of the present invention, a classifier is used to discriminate the feature information of the text segment to obtain a predicted classification label corresponding to the feature information of each text segment. In actual use, the predicted classification labels can be screened according to the preset classification labels. The text segment feature information corresponding to the same predicted classification label as the preset classification label is the target text segment feature information. According to the target text segment feature information and the corresponding The structured information can be obtained by predicting the classification labels, and the structured information is the extraction result of the information in the business card image input in S101.

In the embodiment of the present invention, the classifier is used to filter the feature information of the text segment according to the predicted classification label to select the same predicted classification label as the preset classification label.

Exemplarily, if the preset classification label is "address", the classifier filters the text segment feature information whose predicted classification label is "address" from the predicted classification label as the target text segment feature information; if the target text segment feature information represents The text content is: "XXXX, XX Road, XX District, XX City", and the structured information is "Address: XXXX, XX Road, XX District, XX City".

It can be understood that, in the embodiment of the present invention, a classifier is used to discriminate the text segment feature information to obtain the predicted classification label corresponding to the text segment feature information, and the target text segment feature information is obtained by filtering the predicted classification label to extract structured information. Replace the existing method of extracting structured information by identifying "trigger words"; achieve the purpose of removing the error caused by identifying "trigger words" in the process of extracting structured information, thereby reducing the error of the extracted structured information .

S105. Obtain a loss value based on the preset objective function, the predicted classification label corresponding to the feature information of the text segment and the preset classification label, and determine the target parameter according to the loss value. Among them, the target parameter is the variable in the preset BERT model, the preset convolutional neural network, the preset recurrent neural network and the classifier, and the target parameter represents a system for extracting business card information.

In the embodiment of the present invention, it is applicable to the scenario of judging the accuracy of the predicted classification label of the identified text segment feature information.

In the embodiment of the present invention, the preset objective function is used to calculate the preset classification label and the predicted classification label of the feature information of the text segment, and the loss value between the preset classification label and the predicted classification label is obtained, and the target is determined according to the loss value parameter. Among them, the loss value represents the error between the preset classification label and the predicted classification label of the feature information of the text segment.

In the embodiment of the present invention, the preset classification label of the text segment characteristic information is the actual classification label printed on the business card image and corresponding to the text segment characteristic information. In actual use, the accuracy of the structured information extracted by the business card information extraction system can be improved through the preset classification labels of the text segment feature information and the loss value of the predicted classification labels. Among them, when the loss value remains in the state of not decreasing, it can be determined that the accuracy rate of the structured information extracted by the business card information extraction system reaches the maximum value, and the business card information extraction system completes the training. At this time, the current value of the variable in the business card information extraction system is the target parameter.

In the embodiment of the present invention, if the text token sequence includes n texts in the form of tokens, the text information corresponding to the above text token sequence will contain n(n+1) pieces of text segment feature information, and each text segment feature information represents A text segment, but there is text segment feature information in the above n(n+1) text segment feature information that has no meaning in predicting classification labels, that is, there is a negative sample.

Exemplarily, if the text information obtained in S101 is "company address: No. 1XX8, XX Road, XX District, XX City", the above text information is converted into token form. Wherein, each word in the above text information is equivalent to an element in the text token sequence, for example: "公" is equivalent to t ₁ in the text token sequence T, and "司" is equivalent to t ₂ in the text token sequence T wait. Table 1 shows n(n+1) text fragments obtained based on the above text information, and Table 1 is as follows:

Table 1

As shown in Table 1, each column in Table 1 corresponds to a text segment, and only one text segment in Table 1 is meaningful: "No. 1XX8, XX Road, XX District, XX City", and its default classification label type is "Address ", while other text fragments have no practical labels, which are negative samples.

In the embodiment of the present invention, the preset objective function includes a first preset function and a second preset function; wherein, the first preset function can be SoftMax Loss, and SoftMax Loss is shown in formula (1), as follows:

in,

i is the text segment feature information index that S103 obtains, m is the total number of text segment feature information in the text segment feature information index, x _i refers to the feature information (features) of the i-th text segment in the text segment feature information that S103 obtains vector), y _i refers to the preset classification label corresponding to the feature information of the i-th text segment, j is the preset classification label index, c+1 is the total number of preset classification labels in the preset classification label index, and 1 represents Meaningless preset classification labels in the preset classification label index, W is the preset weight parameter in the classifier, γ is the first preset weight, where, 0≤γ≤1, T is used for transposition, T and S102 The text token sequence in is irrelevant.

In the embodiment of the present invention, the first preset weight is used to reduce the contribution of negative samples to the objective function. When γ=0, it means that negative samples do not participate in training at all.

In the embodiment of the present invention, the second preset function may be Center Loss, and Center Loss is as shown in formula (2), as follows:

Among them, λ is the second preset weight, i is the text segment feature information index obtained in S103, m is the total number of text segment feature information in the text segment feature information index, x _i refers to the text segment feature information obtained in S103 The feature information (feature vector) of the i-th text segment, y _i refers to the preset classification label corresponding to the feature information of the i-th text segment, j is the preset classification label index, and c+1 is in the preset classification label index The total number of preset classification labels, 1 represents the meaningless preset classification labels in the preset classification label index, W is the preset weight parameter in the classifier, γ is the first preset weight, where, 0≤γ≤1 .

In the embodiment of the present invention, the preset function is L, and the preset function L is as shown in formula (3), as follows:

Wherein, L _S is the first preset function, and L _C is the second preset function.

It can be understood that the method based on metric learning is introduced in the embodiment of the present invention. On the basis of SoftMax Loss, Center Loss in the field of face recognition is added, that is, the distance constraint between the sample in the feature space and the class center is added. , supervised classifier learning, which makes the class more aggregated and the class more separated, thereby improving the generalization ability of the algorithm and improving the effect of business card information extraction. Among them, Softmax Loss is used to constrain the ability to distinguish different types of entity text (text content), that is, to make the feature information of text fragments discriminative. Center Loss is used to constrain the aggregation of feature information in text fragments, thereby improving the generalization ability of the model.

Fig. 5 is a flowchart three of a business card information extraction system training method provided by an embodiment of the present invention. As shown in Fig. 5, S105 may also include S1051-S1053, as follows:

S1051. Obtain a first sub-loss value according to the first sub-objective function and the first preset weight, as well as the preset classification label and the predicted classification label.

In some embodiments of the present invention, it is applicable to the scene of obtaining the first sub-loss value.

In some embodiments of the present invention, formula (1) is used to calculate the first sub-loss value of the preset classification label and the predicted classification label of the text segment characteristic information, and the first sub-loss value represents the discrimination of the text segment characteristic information Spend.

It can be understood that the error between the preset classification label corresponding to the feature information of the text segment and the predicted classification label can be judged by the first loss value.

S1052. Obtain a second sub-loss value according to the second sub-objective function, the second preset weight, the first preset weight, the preset classification label, and the predicted classification label; wherein, the first sub-objective function and the second sub-objective The functions are all preset objective functions.

In some embodiments of the present invention, it is applicable to the scene of obtaining the second sub-loss value.

In some embodiments of the present invention, the second sub-loss value of the preset classification label and the predicted classification label of the text segment feature information is calculated by formula 1-2, and the second sub-loss value represents the category in the text segment feature information degree of polymerization within.

In some embodiments of the present invention, the second preset weight is used to control the proportion of the second sub-loss value in the loss value, and the first preset weight is used to control the influence of negative samples in the second sub-objective function.

It can be understood that the distance between the feature information of the text segment and the class center can be judged by the second loss value.

S1053. Obtain a loss value based on the first sub-loss value and the second sub-loss value, and determine a target parameter according to the loss value.

In the embodiment of the present invention, it is applicable to the scenario where the training ends according to the loss value.

In the embodiment of the present invention, based on the first sub-loss value and the second sub-loss value, the loss value is obtained, and the variable in the business card information extraction system is determined according to the loss value, that is, the target parameter.

In the embodiment of the present invention, when the loss value keeps no longer decreasing, it means that the business card information extraction system has completed training, and the current business card information extraction system can ensure the accuracy of the extracted structured information, that is, improve the extraction effect.

It can be understood that in the embodiment of the present invention, weight parameters are added to the preset objective function, and Center Loss is introduced into the calculation of the loss value, which can not only reduce the impact of the imbalance of positive and negative samples on the loss value, but also improve the recognition effect .

In some embodiments of the present invention, determining the target parameter according to the loss value refers to determining the current preset BERT model, preset convolutional neural network, preset cyclic neural network, and classifier when the loss value remains in a state of not decreasing. Variables are target parameters.

In some embodiments of the present invention, the preset BERT model, preset convolutional neural network, preset cyclic neural network, and classifier will adjust the value of the variable according to the loss value, and repeat the processing on the text information obtained in the business card image , to determine the variable value when the loss value is the smallest as the target parameter.

It can be understood that, through the business card information extraction system training method provided by the embodiment of the present invention, the trained business card information extraction system will improve the recognition effect.

In some embodiments of the present invention, when the business card image is a simulated business card image, before S101, the business card information extraction system training method provided in the embodiment of the present invention further includes:

S106. Collect text sample information.

In some embodiments of the present invention, it is applicable to a scene of sample collection before extracting business card information.

In some embodiments of the present invention, data is crawled on each platform, so as to collect text sample information.

In some embodiments of the present invention, data is collected for each target field of the business card. The source of the data may be a platform, and the data may include public information such as name, company, address, email address, website, mobile phone, telephone number, and fax. For individual target fields, the naming convention can be summarized, based on the preset rules, and constructed on the basis of the collected data. For example, some mailbox field data can be constructed through the rule of "name pinyin + mailbox domain name". For English fields, including English name, position, company, and address, you can use the translation function to translate them.

It is understandable that this can ensure the quantity of text sample information and provide data support for training.

S107. Obtain a simulated business card image according to the text sample information, the preset generative adversarial network, and the preset typesetting rules.

In some embodiments of the present invention, it is suitable for constructing a scene of simulating a business card image.

In some embodiments of the present invention, according to the text sample information obtained in S106, the typesetting of the text sample information is imitated to generate simulated text information; based on the text sample information and the simulated text information, through the preset generative confrontation network, according to the preset typesetting The rule gets simulated business card images.

In some embodiments of the present invention, the preset typesetting rules are obtained by replacing the text content in the business card and exchanging the order of the text position based on the typesetting of the existing business card.

Exemplarily, the text sample information is shown in Figure 6a, and the black box in Figure 6a marks the text layout of the text sample information; the simulated text information is simulated text information generated according to the text layout shown in Figure 6a, as shown in Figure 6a 6b, the black box in FIG. 6b marks the text layout of the simulated text information.

In some embodiments of the present invention, FIG. 7 is a schematic diagram of a simulated business card image generated by a business card information extraction method provided by an embodiment of the present invention. The preset generative confrontation network includes the generator (generator) and the discriminator in FIG. 7 (discriminator). The generator is used to generate simulated business card images, and the discriminator is used to recognize the simulated business card images generated by the generator. Among them, the generator is an encoder-decoder structure; and the discriminator is composed of an image discriminator (Image Discriminator) and a text matcher (Text Matcher), and the image discriminator is used to distinguish the visual features such as the style and background of the simulated business card image output by the generator authenticity. The purpose of the text matcher is to determine the similarity between the text on the simulated business card image and the real text input into the generator, so as to ensure that the text information on the simulated business card image has been replaced with the newly input text. In actual use, input a real business card image (real image with text_a) into the generator, that is, the business card text of the real business card image is text_a, and the encoder performs feature extraction on the real image to obtain the image feature vector corresponding to the real image (Image Embedding), that is, hidden code (latent code); input the text text_b to be replaced into Text Matcher to obtain the text feature vector (Text Embedding) of text_b; use the feature vector corresponding to text_b, the image feature vector of real image and the added random Noise z (Random Noise) is input to the decoder to obtain a simulated business card image (fake image with text_b), that is, the business card text of the simulated business card image is text_b; OCR recognition is performed on the simulated business card image to obtain fake text_b; the real business card image (real image with text_a) and the simulated business card image (fake image with text_b) are input to the image discriminator in the discriminator to distinguish the true and false (Real/Fake) of the business card image, and fake text_b and text_b are input to the text in the discriminator In the matcher, it is judged whether the two are the same (Same) or different (Different), so as to improve the simulation degree of the simulated business card image. It is understandable that, in actual use, through the cooperation of the generator and the discriminator, the simulation degree of the simulated business card image generated by the generator can be guaranteed, and through the preset generative confrontation network, as well as the real business card image and text sample information, A simulated business card image is generated for the training of the business card information extraction system, which achieves the purpose of expanding the data set and increases the diversity of the data.

Fig. 8 is a flowchart of a method for extracting business card information provided by an embodiment of the present invention. As shown in Fig. 8, it is suitable for the business card information extraction system trained by the business card information extraction system training method provided by the embodiment of the present invention, including:

S201. Input the business card image to the OCR character recognition module (OCR module).

In some embodiments of the invention, the business card image needs to be pre-processed before input. FIG. 9 is a schematic diagram of a business card image provided by an embodiment of the present invention. As shown in FIG. 9 , it is the business card image input to the OCR character recognition module in S201.

It can be understood that this can ensure the accuracy of the subsequently extracted information.

S202. The OCR text recognition module extracts text from the input business card image, and outputs text information in a text format.

In some embodiments of the present invention, based on FIG. 8, the OCR text recognition module recognizes the above-mentioned business card image, and the text information obtained will be as follows:

Word: -A X Yun|9X8 pos:284,46,440,84

Word: Austrian XXXXXX cloud service provider pos: 290,80,435,103

Word: Luo Moumou pos: 29,137,162,170

Word: 131XXXX 1111pos: 360,174,440,192

Word: XXX@XXXXXXX-inc.com pos: 295,186,441,209

Word: XX Group-XXXXX business group pos: 27,178,176,202

Word: China, No. 1XX, XX Road, XX District, XX City pos:276,201,442,226

Word: XXXXX (XX) Co., Ltd. pos: 27,195,176,221

Word: XXXX Center X Block X Floor pos: 339,222,443,244

Word: XXXXXX expert pos: 25,212,119,233

Word: www.xxxxxx.com pos: 356,242,444,260

It can be understood that the OCR character recognition module can not only recognize the text content, but also obtain the text position information corresponding to the text content.

S203. The NER named entity recognition module (NER module) performs entity recognition on the text information output by the OCR text recognition module, obtains the corresponding text segment feature information and the corresponding predicted classification label, and obtains the target text segment by filtering the predicted classification label The feature information and the predicted classification label corresponding to the feature information of the target text segment.

In some embodiments of the present invention, as shown in FIG. 2, when the NER named entity recognition module performs entity recognition on the text information output by the OCR character recognition module, it needs to process the text information according to S102. Exemplarily, as shown in FIG. As shown in the Word Embedding layer in 2, the grid in the figure is a two-dimensional grid, and the information filled in the two-dimensional grid is the word vector obtained after processing the text in S202. When the three-dimensional convolution kernel compares the two-dimensional grid After the weighted average of the word vectors in each grid in , that is, after the local feature capture of the above two-dimensional grid, the feature vector will be obtained, x ₁ , x ₂ , x ₃ , x ₄ , x ₅ in the Bidirectional layer It is the feature vector input to the Bidirectional layer for encoding, such as: x ₁ represents the word vector of "A" in the two-dimensional grid. After the Bidirectional layer combines and encodes the above feature vectors, the text segment feature information will be obtained, and then the hidden text segment feature information in the above text segment feature information is converted through the Hidden layer, and all the text segment feature information will be obtained. Right now

Wherein, and respectively correspond to x ₁ expressed in forward order and x ₁ expressed in reverse order. After inputting the above text segment feature information into the Span Representations layer for splicing, a sentence composed of text segment feature information will be obtained

Finally, the feature information of the text segment is classified through the Fully-connected Layer and the Span Classifier layer, and the predicted classification label corresponding to the feature information of the text segment is obtained. Exemplarily, based on the text obtained in S202, the feature information of the text segment and the predicted classification label corresponding to the feature information of the text segment may be as follows:

<name>Luo Moumou</name>

<department>XX Group-XXXXX Business Group</department>

<company>XXXXX (XX) Co., Ltd.</company>

<position>XXXXXX expert</position>

-A X cloud | 9X8

Austrian XXXXXX cloud service provider

<mail>XXX@XXXXXXX-inc.com</mail>

<addr>No. 1XX, XX Road, XX District, XX City, China

XXXX Center X Block X Floor</addr>

<url>www.xxxxxx.com</url>

Among them, "Luo XX" is the feature information of the text segment, "<name></name>" is the predicted classification label of "Luo XX", and "-阿X云|9X8", "奥XXXXXX cloud service provider" is a negative sample.

In some embodiments of the present invention, the predicted classification tags are screened according to the preset classification tags, wherein the key field for entity identification, ie, the preset classification tags, can be set as required. Exemplarily, if the preset classification tags are: name, department, company, position, mobile phone, email, address and website. Then, after filtering the above predicted classification labels, the obtained target text fragment feature information and the predicted classification labels corresponding to the target text fragment feature information will be as follows:

<name>Luo Moumou</name>

<department>XX Group-XXXXX Business Group</department>

<company>XXXXX (XX) Co., Ltd.</company>

<position>XXXXXX expert</position>

<mail>XXX@XXXXXXX-inc.com</mail>

<addr>No. 1XX, XX Road, XX District, XX City, China

XXXX Center X Block X Floor</addr>

<url>www.xxxxxx.com</url>

It can be understood that the feature information of the target text segment can be obtained through the NER named entity recognition module.

S204. The output module performs subsequent processing on the feature information of the target text segment and the predicted classification labels corresponding to the feature information of the target text segment, and outputs final structured information.

In the embodiment of the present invention, the output module is used to extract the target field from the feature information of the target text segment, and combine the predicted classification labels corresponding to the feature information of the target text segment to obtain the final target structured information. Wherein, the subsequent processing includes: removing blank characters, invalid characters, and the like. Exemplarily, based on the feature information of the target text segment obtained in S203 and the predicted classification label corresponding to the feature information of the target text segment, the output structured information will be as follows:

Name: Luo Moumou

Department: XX Group-XXXXX Business Group

Company: XXXXX (XX) Co., Ltd.

Position: XXXXXX Expert

Mobile: 131 XXXX 1111

Email: XXX@XXXXXXX-inc.com

Address: Floor X, Block X, XXXX Center, No. 1XX, XX Road, XX District, XX City, China

URL: www.xxxxxx.com

Among them, "name" corresponds to "<name></name>" in S203, which is the predicted classification label, that is, the text category; "Luo XX" is the feature information of the target text segment, that is, the text content.

It can be understood that the output module will complete the sorting of the feature information of the target text segment and the predicted classification labels corresponding to the feature information of the target text segment to obtain structured information.

Fig. 10 is a structure diagram 1 of a business card information extraction system training device provided by the embodiment of the present invention. As shown in Fig. 10, the embodiment of the present invention provides a business card information extraction system training device 3, which is suitable for providing The training method of the business card information extraction system includes obtaining a part 31 and a determining part 32; wherein,

The obtaining part 31 is configured to identify the business card image to obtain text information; based on a preset BERT model, a preset convolutional neural network, and the text information, obtain a feature vector; wherein, the feature vector represents the Semantic information of vocabulary in the text information; Based on the preset recurrent neural network, the feature vector is encoded to obtain the corresponding text segment feature information; wherein, the text segment feature information represents text content of different combinations; using a classifier Discriminating the feature information of the text segment, so as to obtain the predicted classification label corresponding to the feature information of the text segment; wherein, the predicted classification label represents the text type of the characteristic information of the text segment, and the predicted classification label is the obtained structure Based on the basis of the culturalization information; based on the preset objective function, and the predicted classification label and the preset classification label corresponding to the text segment feature information, a loss value is obtained, and an objective parameter is determined according to the loss value.

The determining part 32 is configured to determine a target parameter according to the loss value; wherein, the target parameter is the preset BERT model, the preset convolutional neural network, the preset cyclic neural network, and the preset variables in the classifier, the target parameter characterizes a system configured to extract business card information.

In some embodiments of the present invention, the obtaining part 31 is further configured to obtain the first sub-loss according to the first sub-objective function and the first preset weight, as well as the actual classification label and the predicted classification label value; and according to the second sub-objective function, the second preset weight, and the first preset weight, the preset classification label and the predicted classification label, the second sub-loss value is obtained; wherein, the first Both a sub-objective function and the second sub-objective function are the preset objective functions; and the loss value is obtained based on the first sub-loss value and the second sub-loss value, and determined according to the loss value The target parameter.

In some embodiments of the present invention, the determining part 32 is further configured to determine the current preset BERT model, the preset convolutional neural network, the The variables in the preset cyclic neural network and the classifier are the target parameters.

In some embodiments of the present invention, the obtaining part 31 is further configured to convert the text content information based on the preset BERT model to obtain a sequence of word vectors; wherein the text information includes text content information and text position information; and filling the word vector sequence into a preset two-dimensional grid according to the text line position information, thereby obtaining a target two-dimensional grid; and according to the target two-dimensional grid, and the A convolutional neural network is preset to obtain the feature vector.

In some embodiments of the present invention, the obtaining part 31 is further configured to obtain the feature vector according to the target two-dimensional grid and the three-dimensional convolution kernel in the preset convolutional neural network .

In some embodiments of the present invention, the obtaining part 31 is further configured to extract features in the target two-dimensional grid according to the three-dimensional convolution kernel in the preset convolutional neural network, so that Get the eigenvectors.

In some embodiments of the present invention, the device further includes a collection part 33 configured to collect text sample information when the business card image is the simulated business card image;

The obtaining part 31 is further configured to obtain the business card image according to the text sample information, a preset generative confrontation network, and preset typesetting rules.

Figure 11 is the second architecture diagram of a business card information extraction system training device provided by the embodiment of the present invention. As shown in Figure 11, the embodiment of the present invention provides a business card information extraction system training device, and corresponds to a business card information The business card information extraction system training of the extraction system training device, the business card information extraction system training device 4 includes a processor 401, a memory 402 and a communication bus 404, the memory 402 communicates with the processor 401 through the communication bus 404, and the memory 402 stores the processor 401 one or more executable programs, when the one or more programs are executed, the processor 401 executes the business card information extraction system training method according to the embodiment of the present invention, specifically, the business card information extraction system training device 4 also includes a communication component 403 for data transmission, wherein at least one processor 401 is provided.

In the embodiment of the present invention, various components in the training device 4 of the business card information extraction system are coupled together through the bus 404, and the bus 404 is used to realize connection and communication between these components. In addition to the data bus, the bus 404 also includes a power bus, a control bus and a status signal bus. However, the various buses are labeled as pass bus 404 in FIG. 9 for clarity of illustration.

An embodiment of the present invention provides a computer-readable storage medium, the computer-readable storage medium stores executable instructions, and when the executable instructions are executed, the processor 401 is used to cause the processor 401 to perform the operation described in any one of the above embodiments. The training method of business card information extraction system.

Those skilled in the art should understand that the embodiments of the present invention may be provided as methods, systems, or computer program products. Accordingly, the present invention can take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, optical storage, etc.) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It should be understood that each procedure and/or block in the flowchart and/or block diagram, and a combination of procedures and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions may be provided to a general purpose computer, special purpose computer, embedded processor, or processor of other programmable data processing equipment to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing equipment produce a An apparatus for realizing the functions specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.

These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, the instructions The device realizes the function specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.

These computer program instructions can also be loaded onto a computer or other programmable data processing device, causing a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process, thereby The instructions provide steps for implementing the functions specified in the flow chart or blocks of the flowchart and/or the block or blocks of the block diagrams.

The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the protection scope of the present invention.

Industrial Applicability

The embodiment of the present invention discloses a business card information extraction system training method, device, and storage medium. The method includes identifying the business card image to obtain text information, and then training the pair through a preset BERT model and a preset convolutional neural network. The text information is processed to obtain the feature vector, and then the feature vector is combined and encoded to obtain the corresponding text segment feature information. Finally, the classifier is used to discriminate the text segment feature information to obtain the predicted classification label corresponding to the text segment feature information. The preset objective function makes the loss value of the predicted classification label corresponding to the feature information of the text segment and the preset classification label meet the requirements, so as to complete the training of the business card information extraction system, and the structured information will be obtained by filtering the predicted classification label. The embodiment of the present invention can improve the effect of the information extraction of the business card by the system, thereby reducing the error of the extracted structured information when the information is extracted from the business card.

Claims

A business card information extraction system training method, comprising:

Recognizing the business card image to obtain text information; wherein, the business card image is at least one of the following: a real business card image or a simulated business card image;

A feature vector is obtained based on a preset BERT model, a preset convolutional neural network, and the text information; wherein, the feature vector represents semantic information of vocabulary in the text information;

Based on the preset cyclic neural network, the feature vectors are combined and encoded to obtain corresponding text segment feature information; wherein the text segment feature information represents text content of different combinations;

Use a classifier to discriminate the text segment feature information, so as to obtain the predicted classification label corresponding to the text segment feature information; wherein, the predicted classification label represents the text type of the text segment feature information, and the predicted classification label To obtain the basis for structured information;

Based on the preset objective function, and the predicted classification label and preset classification label corresponding to the feature information of the text segment, a loss value is obtained, and a target parameter is determined according to the loss value; wherein, the target parameter is the predicted Assume variables in the BERT model, the preset convolutional neural network, the preset cyclic neural network, and the classifier, and the target parameter represents a system for extracting business card information.
The method according to claim 1, wherein the loss value is obtained based on the preset objective function, and the predicted classification label and the preset classification label corresponding to the feature information of the text segment, and is determined according to the loss value Target parameters, including:

Obtain a first sub-loss value according to the first sub-objective function and the first preset weight, as well as the preset classification label and the predicted classification label;

According to the second sub-objective function, the second preset weight, and the first preset weight, the preset classification label and the predicted classification label, the second sub-loss value is obtained; wherein, the first sub-objective function and the second sub-objective function are both the preset objective function;

The loss value is obtained based on the first sub-loss value and the second sub-loss value, and the target parameter is determined according to the loss value.
The method according to any one of claims 1 or 2, wherein said determining a target parameter according to said loss value comprises:

When the loss value remains in a state of not decreasing, determine the variables in the current preset BERT model, the preset convolutional neural network, the preset cyclic neural network, and the classifier as the target parameters .
The method according to claim 1, wherein, the feature vector is obtained based on the preset BERT model, the preset convolutional neural network, and the text information, including:

Based on the preset BERT model, the text content information is converted to obtain a word vector sequence; wherein the text information includes text content information and text position information;

Filling the word vector sequence into a preset two-dimensional grid according to the position information of the text line, so as to obtain a target two-dimensional grid;

The feature vector is obtained according to the target two-dimensional grid and the preset convolutional neural network.
The method according to claim 4, wherein said obtaining said feature vector according to said target two-dimensional grid and said preset convolutional neural network comprises:

The feature vector is obtained according to the target two-dimensional grid and the three-dimensional convolution kernel in the preset convolutional neural network.
The method according to claim 5, wherein the feature vector is obtained according to the target two-dimensional grid and the three-dimensional convolution kernel in the preset convolutional neural network, comprising:

Extract features in the target two-dimensional grid according to the three-dimensional convolution kernel in the preset convolutional neural network, so as to obtain the feature vector.
The method according to claim 1, wherein, when the business card image is the simulated business card image, before identifying the business card image and obtaining text information, the method further comprises:

Collect text sample information;

The simulated business card image is obtained according to the text sample information, a preset generative confrontation network, and preset typesetting rules.
A training device for a business card information extraction system, including an obtaining part and a determining part; wherein,

The obtaining part is configured to identify the business card image to obtain text information; based on a preset BERT model, a preset convolutional neural network, and the text information, obtain a feature vector; wherein the feature vector represents the Semantic information of vocabulary in the text information; based on the preset recurrent neural network, the feature vector is encoded to obtain the corresponding text segment feature information; wherein, the text segment feature information represents different combinations of text content; using a classifier to The feature information of the text segment is discriminated, so as to obtain the predicted classification label corresponding to the feature information of the text segment; wherein, the predicted classification label represents the text type of the feature information of the text segment, and the predicted classification label is to obtain a structured Information basis; based on a preset objective function, and the predicted classification label and preset classification label corresponding to the text segment feature information, a loss value is obtained, and a target parameter is determined according to the loss value;

The determining part is configured to determine a target parameter according to the loss value; wherein, the target parameter is the preset BERT model, the preset convolutional neural network, the preset cyclic neural network, and the preset Variables in the classifier, the target parameters characterize the system used to extract business card information.
The business card information extraction system training device according to claim 8, wherein,

The obtaining part is further configured to obtain a first sub-loss value according to the first sub-objective function and the first preset weight, as well as the preset classification label and the predicted classification label; according to the second sub-objective function, The second preset weight, as well as the first preset weight, the preset classification label, and the predicted classification label, obtain a second sub-loss value; wherein, the first sub-objective function and the second sub-objective function The objective functions are all the preset objective functions;

The determining part is further configured to obtain the loss value based on the first sub-loss value and the second sub-loss value, and determine the target parameter according to the loss value.
The business card information extraction system training device according to claim 8 or 9, wherein,

The determination part is further configured to determine the current preset BERT model, the preset convolutional neural network, the preset cyclic neural network, and the classification when the loss value remains in a state of not decreasing. The variable in the device is the target parameter.
The business card information extraction system training device according to claim 8, wherein,

The obtaining part is also configured to convert the text content information based on the preset BERT model to obtain a word vector sequence; wherein the text information includes text content information and text position information; according to the text line position Information, filling the word vector sequence into the preset two-dimensional grid, so as to obtain the target two-dimensional grid; according to the target two-dimensional grid, and the preset convolutional neural network, thereby obtaining the feature vector.
The business card information extraction system training device according to claim 11, wherein,

The obtaining part is further configured to obtain the feature vector according to the target two-dimensional grid and the three-dimensional convolution kernel in the preset convolutional neural network.
The business card information extraction system training device according to claim 12, wherein,

The obtaining part is further configured to extract features in the target two-dimensional grid according to the three-dimensional convolution kernel in the preset convolutional neural network, so as to obtain the feature vector.
According to the business card information extraction system training device according to claim 8, said business card information extraction system training device also includes a collection part, wherein,

The collection part is configured to collect text sample information;

The obtaining part is further configured to obtain the simulated business card image according to the text sample information, a preset generative confrontation network, and preset typesetting rules.
A business card information extraction system training device, comprising:

a memory for storing executable data instructions;

The processor is configured to implement the training method of the business card information extraction system according to any one of claims 1 to 7 when executing the executable instructions stored in the memory.
A computer-readable and machine-readable storage medium, storing executable instructions, used to cause a processor to implement the training method of the business card information extraction system described in any one of claims 1 to 7.