WO2023078264A1 - Method and apparatus for training business card information extraction system, and computer-readable storage medium - Google Patents

Method and apparatus for training business card information extraction system, and computer-readable storage medium Download PDF

Info

Publication number
WO2023078264A1
WO2023078264A1 PCT/CN2022/129071 CN2022129071W WO2023078264A1 WO 2023078264 A1 WO2023078264 A1 WO 2023078264A1 CN 2022129071 W CN2022129071 W CN 2022129071W WO 2023078264 A1 WO2023078264 A1 WO 2023078264A1
Authority
WO
WIPO (PCT)
Prior art keywords
preset
information
text
business card
classification label
Prior art date
Application number
PCT/CN2022/129071
Other languages
French (fr)
Chinese (zh)
Inventor
王奥迪
杨希
Original Assignee
中移(苏州)软件技术有限公司
中国移动通信集团有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中移(苏州)软件技术有限公司, 中国移动通信集团有限公司 filed Critical 中移(苏州)软件技术有限公司
Publication of WO2023078264A1 publication Critical patent/WO2023078264A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition

Definitions

  • the present invention is based on a Chinese patent application with application number 202111296307.7 and a filing date of November 03, 2021, and claims the priority of this Chinese patent application.
  • the entire content of this Chinese patent application is hereby incorporated by reference.
  • the invention relates to the technical field of image information processing, in particular to a training method and device for a business card information extraction system, and a computer-readable storage medium.
  • the main goal of business card information extraction is to input an image of a business card and extract the structured information in the business card.
  • the structured information includes key fields such as name, position, company, address, phone number, and email address.
  • the extraction of business card information mainly includes two processes: first, using optical character recognition (optical character recognition, OCR) technology to identify the text in the business card from the business card image; secondly, structuring the text recognized by OCR
  • OCR optical character recognition
  • the OCR-recognized text is structured using artificial design rules or named entity recognition technology, thereby extracting the key fields in the business card.
  • the embodiment of the present invention expects to provide a business card information extraction system training method and device, and a computer-readable storage medium, which can improve the effect of the system when extracting information from business cards, thereby reducing the extracted structured information when extracting information from business cards. information errors.
  • An embodiment of the present invention provides a business card information extraction system training method, including:
  • the business card image is at least one of the following: a real business card image or a simulated business card image;
  • a feature vector is obtained based on a preset BERT model, a preset convolutional neural network, and the text information; wherein, the feature vector represents semantic information of vocabulary in the text information;
  • the feature vectors are combined and encoded to obtain corresponding text segment feature information; wherein the text segment feature information represents text content of different combinations;
  • the predicted classification label represents the text type of the text segment feature information, and the predicted classification label To obtain the basis for structured information
  • a loss value is obtained, and a target parameter is determined according to the loss value; wherein, the target parameter is the predicted Assume variables in the BERT model, the preset convolutional neural network, the preset cyclic neural network, and the classifier, and the target parameter represents a system for extracting business card information.
  • the loss value is obtained based on the preset objective function, and the predicted classification label and the preset classification label corresponding to the feature information of the text segment, and the target parameter is determined according to the loss value, including:
  • the second sub-objective function the second preset weight, and the first preset weight, the preset classification label and the predicted classification label, the second sub-loss value is obtained; wherein, the first sub-objective function and the second sub-objective function are both the preset objective function;
  • the loss value is obtained based on the first sub-loss value and the second sub-loss value, and the target parameter is determined according to the loss value.
  • the determination of the target parameter according to the loss value includes:
  • the feature vector is obtained based on the preset BERT model, the preset convolutional neural network, and the text information, including:
  • the text content information is converted to obtain a word vector sequence; wherein the text information includes text content information and text position information;
  • the feature vector is obtained according to the target two-dimensional grid and the preset convolutional neural network.
  • the feature vector is obtained according to the target two-dimensional grid and the preset convolutional neural network, including:
  • the feature vector is obtained according to the target two-dimensional grid and the three-dimensional convolution kernel in the preset convolutional neural network.
  • the feature vector is obtained according to the target two-dimensional grid and the three-dimensional convolution kernel in the preset convolutional neural network, including:
  • the method when the business card image is the simulated business card image, before the identification of the business card image and obtaining the text information, the method includes:
  • the business card image is obtained according to the text sample information, a preset generative confrontation network, and preset typesetting rules.
  • An embodiment of the present invention provides a business card information extraction system training device, including an obtaining part and a determining part; wherein,
  • the obtaining part is configured to identify the business card image to obtain text information; based on a preset BERT model, a preset convolutional neural network, and the text information, obtain a feature vector; wherein the feature vector represents the Semantic information of vocabulary in the text information; based on the preset recurrent neural network, the feature vector is encoded to obtain the corresponding text segment feature information; wherein, the text segment feature information represents different combinations of text content; using a classifier to The feature information of the text segment is discriminated, so as to obtain the predicted classification label corresponding to the feature information of the text segment; wherein, the predicted classification label represents the text type of the feature information of the text segment, and the predicted classification label is to obtain a structured Information basis; based on a preset objective function, and the predicted classification label and preset classification label corresponding to the text segment feature information, a loss value is obtained, and a target parameter is determined according to the loss value;
  • the determining part is configured to determine a target parameter according to the loss value; wherein, the target parameter is the preset BERT model, the preset convolutional neural network, the preset cyclic neural network, and the preset Variables in the classifier, the target parameters characterize the system configured to extract business card information.
  • the target parameter is the preset BERT model, the preset convolutional neural network, the preset cyclic neural network, and the preset Variables in the classifier, the target parameters characterize the system configured to extract business card information.
  • the obtaining part is further configured to obtain the first sub-loss value according to the first sub-objective function and the first preset weight, as well as the preset classification label and the predicted classification label; according to the second A sub-objective function, a second preset weight, and the first preset weight, the preset classification label, and the predicted classification label to obtain a second sub-loss value; wherein, the first sub-objective function and the The second sub-objective function is the preset objective function; the loss value is obtained based on the first sub-loss value and the second sub-loss value, and the objective parameter is determined according to the loss value.
  • the determining part is further configured to determine the current preset BERT model, the preset convolutional neural network, and the preset cyclic neural network when the loss value remains in a state of not decreasing.
  • the variables in the classifier are the target parameters.
  • the obtaining part is further configured to convert the text content information based on the preset BERT model to obtain a word vector sequence; wherein the text information includes text content information and text position information; according to the The text line position information, filling the word vector sequence into the preset two-dimensional grid, so as to obtain the target two-dimensional grid; according to the target two-dimensional grid, and the preset convolutional neural network, thus Get the eigenvectors.
  • the obtaining part is further configured to obtain the feature vector according to the target two-dimensional grid and the three-dimensional convolution kernel in the preset convolutional neural network.
  • the obtaining part is further configured to extract features in the target two-dimensional grid according to the three-dimensional convolution kernel in the preset convolutional neural network, so as to obtain the feature vector.
  • the device further includes a collection part configured to collect text sample information when the business card image is the simulated business card image;
  • the obtaining part is further configured to obtain the business card image according to the text sample information, a preset generative confrontation network, and preset typesetting rules.
  • An embodiment of the present invention provides a business card information extraction system training device, including:
  • a memory for storing executable data instructions
  • the processor is configured to implement the business card information extraction system training method described in the embodiment of the present invention when executing the executable instructions stored in the memory.
  • An embodiment of the present invention provides a computer storage medium, which is characterized in that executable instructions are stored therein for causing a processor to execute to implement the business card information extraction system training method described in the embodiment of the present invention.
  • the embodiment of the present invention provides a business card information extraction system training method and device, and a computer storage medium.
  • the method includes identifying the business card image to obtain text information, and then training through a preset BERT model and a preset convolutional neural network. Process the text information to obtain feature vectors, and then combine and encode the feature vectors to obtain the corresponding text segment feature information, and finally use the classifier to discriminate the text segment feature information to obtain the predicted classification labels corresponding to the text segment feature information, Through the preset objective function, the loss value of the predicted classification label corresponding to the feature information of the text segment and the preset classification label is obtained. When the loss value meets the requirements, the training of the business card information extraction system will be completed.
  • the embodiment of the present invention can improve the effect of the information extraction of the business card by the system, thereby reducing the error of the extracted structured information when the information is extracted from the business card.
  • Fig. 1 is a structure diagram 1 of a business card information extraction system provided by an embodiment of the present invention
  • FIG. 2 is a second architecture diagram of a business card information extraction system provided by an embodiment of the present invention.
  • Fig. 3 is a flow chart 1 of a training method for a business card information extraction system provided by an embodiment of the present invention
  • Fig. 4 is a flow chart 2 of a training method for a business card information extraction system provided by an embodiment of the present invention
  • Fig. 5 is a flowchart three of a training method for a business card information extraction system provided by an embodiment of the present invention.
  • Fig. 6a is a schematic diagram of text sample information of a business card information extraction system training method provided by an embodiment of the present invention.
  • Fig. 6b is a schematic diagram of simulated text information of a business card information extraction system training method provided by an embodiment of the present invention.
  • FIG. 7 is a schematic diagram of a simulated business card image generated by a business card information extraction system training method provided by an embodiment of the present invention.
  • FIG. 8 is a flow chart of a method for extracting business card information provided by an embodiment of the present invention.
  • Fig. 9 is a schematic diagram of a business card image provided by an embodiment of the present invention.
  • FIG. 10 is a first structural diagram of a business card information extraction system training device provided by an embodiment of the present invention.
  • FIG. 11 is the second structure diagram of a training device for a business card information extraction system provided by an embodiment of the present invention.
  • GAN Generative Adversarial Networks, Generative Adversarial Networks
  • Generative Adversarial Networks is a deep learning model.
  • the model produces quite good output through mutual game learning of (at least) two modules in the framework: Generative Model (Generative Model) and Discriminative Model (Discriminative Model). .
  • NLP Natural Language Processing, Natural Language Processing
  • Natural Language Processing is a branch of artificial intelligence and linguistics. It mainly studies the use of computers to process information such as shape, sound, and meaning of natural language, that is, the input of words, words, sentences, and texts. , output, recognition, analysis, understanding, generation, etc. operation and processing.
  • Specific manifestations of natural language processing include machine translation, text summarization, text classification, text proofreading, information extraction, speech synthesis, speech recognition, etc.
  • NER Name Entity Recognition, named entity recognition
  • NLP NLP
  • NER is an important basic tool for many NLP tasks such as information extraction, question answering systems, syntax analysis, and machine translation.
  • the purpose of named entity recognition is to identify entities of specified categories in text.
  • the so-called named entities are people's names, organization names, place names and all other entities identified by names.
  • OCR Optical Character Recognition, Optical Character Recognition
  • electronic equipment such as a scanner or digital camera
  • OCR Optical Character Recognition, Optical Character Recognition
  • determines its shape by detecting dark and bright patterns and then uses character recognition to translate the shape into a computer
  • the process of text that is, for printed characters, the text in the paper document is converted into a black and white dot matrix image file by optical means, and the text in the image is converted into a text format by computer technology for further processing by word processing software editing technology
  • Fig. 1 is a structure diagram 1 of a business card information extraction system provided by an embodiment of the present invention.
  • an embodiment of the present invention provides a business card information extraction system, including an input module 1, an OCR module 2, and a NER module 3 and output module 4.
  • the input module 1 is used to input the business card image to be recognized;
  • the OCR module 2 is used to extract the text in the input business card image, and outputs it in text format;
  • the output module 4 is used to post-process the recognition result output by the NER module 3, and output the final target structured information.
  • FIG. 2 is a second architecture diagram of a business card information extraction system provided by an embodiment of the present invention.
  • the NER module 3 completes OCR through the NER model set in the NER module 3 Entity recognition of the output text.
  • the NER model includes Word Embedding layer, Bidirectional layer, Hidden layer, Span Representations layer, Fully-connected Layer layer and Span Classifier layer.
  • the Word Embedding layer is used to process the text output by OCR based on a preset BERT model and a preset convolutional neural network to obtain a feature vector.
  • the Bidirectional layer is used to encode the feature vector based on the preset cyclic neural network to obtain the feature information of the corresponding text segment.
  • the Hidden layer is used to convert the hidden text segment feature information that is not easy to be captured in the text segment feature information into readable text through the Hidden model, that is, convert the hidden text segment feature information into readable text information. For example, if The Bidirectional layer encodes the feature vector, and the obtained text segment feature information contains a piece of hidden text segment feature information "B-LOC
  • the fragment feature information is converted into readable text information "FTA".
  • the Span Representations layer is used to splice the feature information of the above text fragments according to preset rules.
  • the Fully-connected Layer layer is used for feature fusion or feature weighting of the text segment feature information; the Span Classifier layer is used to distinguish the text segment feature information, obtain the predicted classification label corresponding to the text segment feature information, and filter the predicted classification label.
  • the feature information of the target text segment will be obtained, and the structural information can be determined according to the feature information of the target text segment and the predicted classification label corresponding to the feature information of the target text segment.
  • Fig. 3 is a flowchart one of a training method for a business card information extraction system provided by an embodiment of the present invention. As shown in Fig. 3 , the training method for a business card information extraction system provided by an embodiment of the present invention includes:
  • the business card image is at least one of the following: a real business card image or a simulated business card image.
  • the business card image is recognized by the OCR module to obtain the required text information.
  • the business card image is a real business card image and/or a simulated business card image.
  • the real business card image represents a business card printed in real life, and in actual use, the business card printed in real life may be scanned or photographed to obtain a real business card image.
  • the simulated business card image is constructed based on printed business cards in real life, and the simulated business card image corresponds to a business card that has not been printed in reality or is different from a printed business card.
  • the adversarial network can be generated through the preset, and the corresponding simulated business card image can be output according to the preset content (text sample information) and layout (preset layout rules).
  • the input of the business card image is completed through the input module; wherein, the business card image needs to perform a data preprocessing operation before input.
  • the data preprocessing operation may be binarization, direction correction, distortion correction, denoising and so on.
  • the OCR module is mainly responsible for extracting the text in the input business card image and outputting it in a text format, so as to obtain text information.
  • the text information is granular in text lines, and each text line includes text content and text position information.
  • the business card image is recognized by the OCR module, so as to obtain text information in a text format, which is convenient for subsequent processing.
  • S102 Obtain a feature vector based on a preset BERT model, a preset convolutional neural network, and text information; wherein, the feature vector represents semantic information of words in the text information.
  • the feature vector representing the semantic information of the vocabulary in the text information is obtained from the text information through the preset BERT model and the preset convolutional neural network.
  • the BERT model (Bidirectional Encoder Representations from Transformers model) is a self-encoding language model, which can extract the relationship features of vocabulary in sentences, and can extract relationship features at multiple different levels, thereby more comprehensively reflecting sentence semantics , and the word meaning can be obtained according to the sentence context during the extraction process, so as to avoid ambiguity.
  • the text information is "company address: XXXX, XX Road, XX District, XX City", and the above text information is converted into token form and input into the preset BERT model, that is, "public” is converted into t 1 in the text token sequence T , convert " ⁇ ” into t 2 in the text token sequence T, etc., and input the converted t 1 and t 2 into the preset BERT model to obtain the corresponding word vector sequence.
  • the word vector sequence W into the preset two-dimensional grid according to the text position information in the text information; wherein, The value of each position (grid) in the two-dimensional grid corresponds to a word vector in the text token sequence T.
  • the word vector of ⁇ PAD> Vector fill.
  • the named entity recognition method based on segment classification based on text information in the embodiment of the present invention can reduce the impact of entity trigger word errors during the named entity identification process, thereby improving the effect of entity extraction.
  • Fig. 4 is a flowchart two of a business card information extraction system training method provided by an embodiment of the present invention. As shown in Fig. 4, S102 may also include S1021-S1023, as follows:
  • the text content information in the text information is input into the preset BERT model according to the format required by the preset BERT model, so as to obtain a sequence of word vectors.
  • the text information includes text content information and text position information, wherein the text content information refers to the text on the business card image, and the text position information refers to the coordinates of the text on the business card image, and the coordinates can be based on preset coordinates Department OK.
  • the word vector sequence is obtained by presetting the BERT model, which can improve the accuracy of the obtained word vector sequence.
  • it is suitable for obtaining a target two-dimensional network and providing data support for subsequent further processing through a preset convolutional neural network.
  • each word vector in the word vector sequence is filled to the preset Set in the two-dimensional grid, so as to obtain the target two-dimensional grid.
  • the specification of the preset two-dimensional grid is r*c, r is the number of text lines, and c is the maximum length of the text lines.
  • the present invention it is applicable to the scene where after the target two-dimensional grid is obtained, the above data is subjected to subsequent processing to obtain the feature vector.
  • the target two-dimensional grid is input into a preset convolutional neural network to obtain a feature vector.
  • S1023 includes: obtaining the feature vector according to the target two-dimensional grid and presetting the three-dimensional convolution kernel in the convolutional neural network.
  • the first dimension represents the width of the convolution kernel, and the first dimension is the same as the length of the word vector; the second dimension represents the height of the convolution kernel; the third dimension represents the convolution The size of the kernel, and the size of the third dimension is the same as the length of the word vector.
  • the features in the target two-dimensional grid are extracted to obtain a feature vector.
  • the features in the target two-dimensional grid are extracted using the three-dimensional convolution kernel in the preset convolutional neural network, wherein each word vector corresponds to at least one feature, and each feature Corresponding to multiple word vectors.
  • the feature is extracted through the three-dimensional convolution kernel to improve the accuracy of the extracted feature.
  • the feature vector is input into the preset cyclic neural network, and the combination encoding of the feature vector is obtained through the preset cyclic neural network, so as to obtain the text segment feature information, and the text segment feature information represents the feature The text content of different combinations formed by the combination of vectors.
  • the preset cyclic neural network uses the preset cyclic neural network to characterize the combination of feature vectors and performs different forms of elements in the text token sequence corresponding to the text information through the preset cyclic neural network.
  • Combination representations such as: forward cycle representation or reverse cycle representation. In this way, the semantics of sentences formed by different combinations of elements in the above text token sequence can be obtained.
  • the text token sequence represents text information
  • each element in the text token sequence represents a word in the text information.
  • the above-mentioned text segment feature information will include some hidden text segment feature information; at this time, it may
  • the hidden text segment feature information is converted into readable text information (text segment feature information) through the Hidden layer as shown in FIG. 2 to ensure the integrity of the text segment feature information.
  • the preset cyclic neural network can improve the accuracy rate of identifying the semantics of the text content corresponding to the feature information of the text segment.
  • S104 Use a classifier to discriminate the feature information of the text segment, so as to obtain the predicted classification label corresponding to the feature information of the text segment; wherein, the predicted classification label represents the text type of the feature information of the text segment, and the predicted classification label is the basis for obtaining the structured information.
  • a classifier is used to discriminate the feature information of the text segment to obtain a predicted classification label corresponding to the feature information of each text segment.
  • the predicted classification labels can be screened according to the preset classification labels.
  • the text segment feature information corresponding to the same predicted classification label as the preset classification label is the target text segment feature information.
  • the structured information can be obtained by predicting the classification labels, and the structured information is the extraction result of the information in the business card image input in S101.
  • the classifier is used to filter the feature information of the text segment according to the predicted classification label to select the same predicted classification label as the preset classification label.
  • the classifier filters the text segment feature information whose predicted classification label is "address" from the predicted classification label as the target text segment feature information; if the target text segment feature information represents The text content is: “XXXX, XX Road, XX District, XX City", and the structured information is "Address: XXXX, XX Road, XX District, XX City”.
  • a classifier is used to discriminate the text segment feature information to obtain the predicted classification label corresponding to the text segment feature information, and the target text segment feature information is obtained by filtering the predicted classification label to extract structured information.
  • the existing method of extracting structured information by identifying "trigger words”; achieve the purpose of removing the error caused by identifying "trigger words” in the process of extracting structured information, thereby reducing the error of the extracted structured information .
  • the target parameter is the variable in the preset BERT model, the preset convolutional neural network, the preset recurrent neural network and the classifier, and the target parameter represents a system for extracting business card information.
  • the preset objective function is used to calculate the preset classification label and the predicted classification label of the feature information of the text segment, and the loss value between the preset classification label and the predicted classification label is obtained, and the target is determined according to the loss value parameter.
  • the loss value represents the error between the preset classification label and the predicted classification label of the feature information of the text segment.
  • the preset classification label of the text segment characteristic information is the actual classification label printed on the business card image and corresponding to the text segment characteristic information.
  • the accuracy of the structured information extracted by the business card information extraction system can be improved through the preset classification labels of the text segment feature information and the loss value of the predicted classification labels.
  • the loss value remains in the state of not decreasing, it can be determined that the accuracy rate of the structured information extracted by the business card information extraction system reaches the maximum value, and the business card information extraction system completes the training.
  • the current value of the variable in the business card information extraction system is the target parameter.
  • the text information corresponding to the above text token sequence will contain n(n+1) pieces of text segment feature information, and each text segment feature information represents A text segment, but there is text segment feature information in the above n(n+1) text segment feature information that has no meaning in predicting classification labels, that is, there is a negative sample.
  • each word in the above text information is equivalent to an element in the text token sequence, for example: " ⁇ " is equivalent to t 1 in the text token sequence T, and " ⁇ ” is equivalent to t 2 in the text token sequence T wait.
  • Table 1 shows n(n+1) text fragments obtained based on the above text information, and Table 1 is as follows:
  • each column in Table 1 corresponds to a text segment, and only one text segment in Table 1 is meaningful: "No. 1XX8, XX Road, XX District, XX City", and its default classification label type is "Address ", while other text fragments have no practical labels, which are negative samples.
  • the preset objective function includes a first preset function and a second preset function; wherein, the first preset function can be SoftMax Loss, and SoftMax Loss is shown in formula (1), as follows:
  • i is the text segment feature information index that S103 obtains
  • m is the total number of text segment feature information in the text segment feature information index
  • x i refers to the feature information (features) of the i-th text segment in the text segment feature information that S103 obtains vector)
  • y i refers to the preset classification label corresponding to the feature information of the i-th text segment
  • j is the preset classification label index
  • c+1 is the total number of preset classification labels in the preset classification label index
  • 1 represents Meaningless preset classification labels in the preset classification label index
  • W is the preset weight parameter in the classifier
  • is the first preset weight, where, 0 ⁇ 1, T is used for transposition, T and S102 The text token sequence in is irrelevant.
  • the first preset weight is used to reduce the contribution of negative samples to the objective function.
  • the second preset function may be Center Loss
  • Center Loss is as shown in formula (2), as follows:
  • is the second preset weight
  • i is the text segment feature information index obtained in S103
  • m is the total number of text segment feature information in the text segment feature information index
  • x i refers to the text segment feature information obtained in S103
  • y i refers to the preset classification label corresponding to the feature information of the i-th text segment
  • j is the preset classification label index
  • c+1 is in the preset classification label index
  • the total number of preset classification labels 1 represents the meaningless preset classification labels in the preset classification label index
  • W is the preset weight parameter in the classifier
  • is the first preset weight, where, 0 ⁇ 1 .
  • the preset function is L
  • the preset function L is as shown in formula (3), as follows:
  • L S is the first preset function
  • L C is the second preset function
  • SoftMax Loss Center Loss in the field of face recognition is added, that is, the distance constraint between the sample in the feature space and the class center is added.
  • supervised classifier learning which makes the class more aggregated and the class more separated, thereby improving the generalization ability of the algorithm and improving the effect of business card information extraction.
  • Softmax Loss is used to constrain the ability to distinguish different types of entity text (text content), that is, to make the feature information of text fragments discriminative.
  • Center Loss is used to constrain the aggregation of feature information in text fragments, thereby improving the generalization ability of the model.
  • Fig. 5 is a flowchart three of a business card information extraction system training method provided by an embodiment of the present invention. As shown in Fig. 5, S105 may also include S1051-S1053, as follows:
  • formula (1) is used to calculate the first sub-loss value of the preset classification label and the predicted classification label of the text segment characteristic information, and the first sub-loss value represents the discrimination of the text segment characteristic information Spend.
  • the error between the preset classification label corresponding to the feature information of the text segment and the predicted classification label can be judged by the first loss value.
  • the second sub-loss value of the preset classification label and the predicted classification label of the text segment feature information is calculated by formula 1-2, and the second sub-loss value represents the category in the text segment feature information degree of polymerization within.
  • the second preset weight is used to control the proportion of the second sub-loss value in the loss value
  • the first preset weight is used to control the influence of negative samples in the second sub-objective function
  • the distance between the feature information of the text segment and the class center can be judged by the second loss value.
  • the loss value is obtained, and the variable in the business card information extraction system is determined according to the loss value, that is, the target parameter.
  • the loss value when the loss value keeps no longer decreasing, it means that the business card information extraction system has completed training, and the current business card information extraction system can ensure the accuracy of the extracted structured information, that is, improve the extraction effect.
  • weight parameters are added to the preset objective function, and Center Loss is introduced into the calculation of the loss value, which can not only reduce the impact of the imbalance of positive and negative samples on the loss value, but also improve the recognition effect .
  • determining the target parameter according to the loss value refers to determining the current preset BERT model, preset convolutional neural network, preset cyclic neural network, and classifier when the loss value remains in a state of not decreasing.
  • Variables are target parameters.
  • the preset BERT model, preset convolutional neural network, preset cyclic neural network, and classifier will adjust the value of the variable according to the loss value, and repeat the processing on the text information obtained in the business card image , to determine the variable value when the loss value is the smallest as the target parameter.
  • the trained business card information extraction system will improve the recognition effect.
  • the business card information extraction system training method provided in the embodiment of the present invention further includes:
  • S106 Collect text sample information.
  • it is applicable to a scene of sample collection before extracting business card information.
  • data is crawled on each platform, so as to collect text sample information.
  • data is collected for each target field of the business card.
  • the source of the data may be a platform, and the data may include public information such as name, company, address, email address, website, mobile phone, telephone number, and fax.
  • the naming convention can be summarized, based on the preset rules, and constructed on the basis of the collected data.
  • some mailbox field data can be constructed through the rule of "name pinyin + mailbox domain name".
  • it is suitable for constructing a scene of simulating a business card image.
  • the typesetting of the text sample information is imitated to generate simulated text information; based on the text sample information and the simulated text information, through the preset generative confrontation network, according to the preset typesetting
  • the rule gets simulated business card images.
  • the preset typesetting rules are obtained by replacing the text content in the business card and exchanging the order of the text position based on the typesetting of the existing business card.
  • the text sample information is shown in Figure 6a, and the black box in Figure 6a marks the text layout of the text sample information;
  • the simulated text information is simulated text information generated according to the text layout shown in Figure 6a, as shown in Figure 6a 6b, the black box in FIG. 6b marks the text layout of the simulated text information.
  • FIG. 7 is a schematic diagram of a simulated business card image generated by a business card information extraction method provided by an embodiment of the present invention.
  • the preset generative confrontation network includes the generator (generator) and the discriminator in FIG. 7 (discriminator).
  • the generator is used to generate simulated business card images, and the discriminator is used to recognize the simulated business card images generated by the generator.
  • the generator is an encoder-decoder structure; and the discriminator is composed of an image discriminator (Image Discriminator) and a text matcher (Text Matcher), and the image discriminator is used to distinguish the visual features such as the style and background of the simulated business card image output by the generator authenticity.
  • the purpose of the text matcher is to determine the similarity between the text on the simulated business card image and the real text input into the generator, so as to ensure that the text information on the simulated business card image has been replaced with the newly input text.
  • a real business card image real image with text_a
  • the encoder performs feature extraction on the real image to obtain the image feature vector corresponding to the real image (Image Embedding), that is, hidden code (latent code);
  • input the text text_b to be replaced into Text Matcher to obtain the text feature vector (Text Embedding) of text_b;
  • use the feature vector corresponding to text_b, the image feature vector of real image and the added random Noise z (Random Noise) is input to the decoder to obtain a simulated business card image (fake image with text_b), that is, the business card text of the simulated business card image is text_b; OCR recognition is performed on the simulated business card image (fake image with text_b), that is, the business card text of
  • the simulation degree of the simulated business card image generated by the generator can be guaranteed, and through the preset generative confrontation network, as well as the real business card image and text sample information, A simulated business card image is generated for the training of the business card information extraction system, which achieves the purpose of expanding the data set and increases the diversity of the data.
  • Fig. 8 is a flowchart of a method for extracting business card information provided by an embodiment of the present invention. As shown in Fig. 8, it is suitable for the business card information extraction system trained by the business card information extraction system training method provided by the embodiment of the present invention, including:
  • FIG. 9 is a schematic diagram of a business card image provided by an embodiment of the present invention. As shown in FIG. 9 , it is the business card image input to the OCR character recognition module in S201.
  • the OCR text recognition module extracts text from the input business card image, and outputs text information in a text format.
  • the OCR text recognition module recognizes the above-mentioned business card image, and the text information obtained will be as follows:
  • the OCR character recognition module can not only recognize the text content, but also obtain the text position information corresponding to the text content.
  • the NER named entity recognition module performs entity recognition on the text information output by the OCR text recognition module, obtains the corresponding text segment feature information and the corresponding predicted classification label, and obtains the target text segment by filtering the predicted classification label The feature information and the predicted classification label corresponding to the feature information of the target text segment.
  • the NER named entity recognition module when the NER named entity recognition module performs entity recognition on the text information output by the OCR character recognition module, it needs to process the text information according to S102.
  • the grid in the figure is a two-dimensional grid, and the information filled in the two-dimensional grid is the word vector obtained after processing the text in S202.
  • the feature vector When the three-dimensional convolution kernel compares the two-dimensional grid After the weighted average of the word vectors in each grid in , that is, after the local feature capture of the above two-dimensional grid, the feature vector will be obtained, x 1 , x 2 , x 3 , x 4 , x 5 in the Bidirectional layer It is the feature vector input to the Bidirectional layer for encoding, such as: x 1 represents the word vector of "A" in the two-dimensional grid. After the Bidirectional layer combines and encodes the above feature vectors, the text segment feature information will be obtained, and then the hidden text segment feature information in the above text segment feature information is converted through the Hidden layer, and all the text segment feature information will be obtained.
  • the feature information of the text segment is classified through the Fully-connected Layer and the Span Classifier layer, and the predicted classification label corresponding to the feature information of the text segment is obtained.
  • the feature information of the text segment and the predicted classification label corresponding to the feature information of the text segment may be as follows:
  • “Luo XX” is the feature information of the text segment
  • " ⁇ name> ⁇ /name>” is the predicted classification label of "Luo XX”
  • 9X8” is a negative sample.
  • the predicted classification tags are screened according to the preset classification tags, wherein the key field for entity identification, ie, the preset classification tags, can be set as required.
  • the preset classification tags are: name, department, company, position, mobile phone, email, address and website.
  • the feature information of the target text segment can be obtained through the NER named entity recognition module.
  • the output module performs subsequent processing on the feature information of the target text segment and the predicted classification labels corresponding to the feature information of the target text segment, and outputs final structured information.
  • the output module is used to extract the target field from the feature information of the target text segment, and combine the predicted classification labels corresponding to the feature information of the target text segment to obtain the final target structured information.
  • the subsequent processing includes: removing blank characters, invalid characters, and the like.
  • the output structured information will be as follows:
  • "name” corresponds to " ⁇ name> ⁇ /name>” in S203, which is the predicted classification label, that is, the text category; "Luo XX” is the feature information of the target text segment, that is, the text content.
  • the output module will complete the sorting of the feature information of the target text segment and the predicted classification labels corresponding to the feature information of the target text segment to obtain structured information.
  • Fig. 10 is a structure diagram 1 of a business card information extraction system training device provided by the embodiment of the present invention.
  • the embodiment of the present invention provides a business card information extraction system training device 3, which is suitable for providing
  • the training method of the business card information extraction system includes obtaining a part 31 and a determining part 32; wherein,
  • the obtaining part 31 is configured to identify the business card image to obtain text information; based on a preset BERT model, a preset convolutional neural network, and the text information, obtain a feature vector; wherein, the feature vector represents the Semantic information of vocabulary in the text information; Based on the preset recurrent neural network, the feature vector is encoded to obtain the corresponding text segment feature information; wherein, the text segment feature information represents text content of different combinations; using a classifier Discriminating the feature information of the text segment, so as to obtain the predicted classification label corresponding to the feature information of the text segment; wherein, the predicted classification label represents the text type of the characteristic information of the text segment, and the predicted classification label is the obtained structure Based on the basis of the culturalization information; based on the preset objective function, and the predicted classification label and the preset classification label corresponding to the text segment feature information, a loss value is obtained, and an objective parameter is determined according to the loss value.
  • the determining part 32 is configured to determine a target parameter according to the loss value; wherein, the target parameter is the preset BERT model, the preset convolutional neural network, the preset cyclic neural network, and the preset variables in the classifier, the target parameter characterizes a system configured to extract business card information.
  • the obtaining part 31 is further configured to obtain the first sub-loss according to the first sub-objective function and the first preset weight, as well as the actual classification label and the predicted classification label value; and according to the second sub-objective function, the second preset weight, and the first preset weight, the preset classification label and the predicted classification label, the second sub-loss value is obtained; wherein, the first Both a sub-objective function and the second sub-objective function are the preset objective functions; and the loss value is obtained based on the first sub-loss value and the second sub-loss value, and determined according to the loss value The target parameter.
  • the determining part 32 is further configured to determine the current preset BERT model, the preset convolutional neural network, the The variables in the preset cyclic neural network and the classifier are the target parameters.
  • the obtaining part 31 is further configured to convert the text content information based on the preset BERT model to obtain a sequence of word vectors; wherein the text information includes text content information and text position information; and filling the word vector sequence into a preset two-dimensional grid according to the text line position information, thereby obtaining a target two-dimensional grid; and according to the target two-dimensional grid, and the A convolutional neural network is preset to obtain the feature vector.
  • the obtaining part 31 is further configured to obtain the feature vector according to the target two-dimensional grid and the three-dimensional convolution kernel in the preset convolutional neural network .
  • the obtaining part 31 is further configured to extract features in the target two-dimensional grid according to the three-dimensional convolution kernel in the preset convolutional neural network, so that Get the eigenvectors.
  • the device further includes a collection part 33 configured to collect text sample information when the business card image is the simulated business card image;
  • the obtaining part 31 is further configured to obtain the business card image according to the text sample information, a preset generative confrontation network, and preset typesetting rules.
  • FIG 11 is the second architecture diagram of a business card information extraction system training device provided by the embodiment of the present invention.
  • the embodiment of the present invention provides a business card information extraction system training device, and corresponds to a business card information
  • the business card information extraction system training of the extraction system training device, the business card information extraction system training device 4 includes a processor 401, a memory 402 and a communication bus 404, the memory 402 communicates with the processor 401 through the communication bus 404, and the memory 402 stores the processor 401 one or more executable programs, when the one or more programs are executed, the processor 401 executes the business card information extraction system training method according to the embodiment of the present invention, specifically, the business card information extraction system training device 4 also includes a communication component 403 for data transmission, wherein at least one processor 401 is provided.
  • bus 404 various components in the training device 4 of the business card information extraction system are coupled together through the bus 404, and the bus 404 is used to realize connection and communication between these components.
  • the bus 404 also includes a power bus, a control bus and a status signal bus.
  • the various buses are labeled as pass bus 404 in FIG. 9 for clarity of illustration.
  • An embodiment of the present invention provides a computer-readable storage medium, the computer-readable storage medium stores executable instructions, and when the executable instructions are executed, the processor 401 is used to cause the processor 401 to perform the operation described in any one of the above embodiments.
  • the training method of business card information extraction system is used to cause the processor 401 to perform the operation described in any one of the above embodiments.
  • the embodiments of the present invention may be provided as methods, systems, or computer program products. Accordingly, the present invention can take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, optical storage, etc.) having computer-usable program code embodied therein.
  • a computer-usable storage media including but not limited to disk storage, optical storage, etc.
  • These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, the instructions
  • the device realizes the function specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.
  • the embodiment of the present invention discloses a business card information extraction system training method, device, and storage medium.
  • the method includes identifying the business card image to obtain text information, and then training the pair through a preset BERT model and a preset convolutional neural network.
  • the text information is processed to obtain the feature vector, and then the feature vector is combined and encoded to obtain the corresponding text segment feature information.
  • the classifier is used to discriminate the text segment feature information to obtain the predicted classification label corresponding to the text segment feature information.
  • the preset objective function makes the loss value of the predicted classification label corresponding to the feature information of the text segment and the preset classification label meet the requirements, so as to complete the training of the business card information extraction system, and the structured information will be obtained by filtering the predicted classification label.
  • the embodiment of the present invention can improve the effect of the information extraction of the business card by the system, thereby reducing the error of the extracted structured information when the information is extracted from the business card.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Databases & Information Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Character Discrimination (AREA)

Abstract

A method and apparatus for training a business card information extraction system, and a storage medium. The method comprises: performing recognition on a business card image, so as to obtain text information (S101); then performing training by means of a preset BERT model and a preset convolutional neural network to process the text information, so as to obtain a feature vector (S102); then performing combined encoding on the feature vector, so as to obtain corresponding text fragment feature information (S103); and finally, distinguishing the text fragment feature information by using a classifier, so as to obtain a predicted classification label corresponding to the text fragment feature information (S104). By means of a preset target function, a loss value of the predicted classification label corresponding to the text fragment feature information and a preset classification label meets the requirements, thereby completing the training of a business card information extraction system; and by means of screening the predicted classification label, structured information is obtained. The method can improve the effect of a system when same performs information extraction on a business card, such that the error of structured information extracted when information extraction is performed on the business card is reduced.

Description

一种名片信息抽取系统训练方法及装置、计算机可读存储介质A training method and device for a business card information extraction system, and a computer-readable storage medium
相关申请的交叉引用Cross References to Related Applications
本发明基于申请号为202111296307.7、申请日为2021年11月03日的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本发明作为参考。The present invention is based on a Chinese patent application with application number 202111296307.7 and a filing date of November 03, 2021, and claims the priority of this Chinese patent application. The entire content of this Chinese patent application is hereby incorporated by reference.
技术领域technical field
本发明涉及图像信息处理技术领域,尤其涉及一种名片信息抽取系统训练方法及装置、计算机可读存储介质。The invention relates to the technical field of image information processing, in particular to a training method and device for a business card information extraction system, and a computer-readable storage medium.
背景技术Background technique
名片信息抽取的主要目标是,输入一张名片的图像,抽取出名片中的结构化信息,其中,结构化信息包括姓名、职位、公司、地址、电话、邮箱等关键字段。The main goal of business card information extraction is to input an image of a business card and extract the structured information in the business card. The structured information includes key fields such as name, position, company, address, phone number, and email address.
现有技术中对于名片信息的抽取,主要包括两个流程:首先,使用光学字符识别(optical character recognition,OCR)技术从名片图像中识别出名片中的文本;其次,对OCR识别的文本进行结构化,作为最终的系统输出结果,之后采用人工设计规则或命名实体识别技术对OCR识别的文本进行结构化,从而提取名片中的关键字段。In the prior art, the extraction of business card information mainly includes two processes: first, using optical character recognition (optical character recognition, OCR) technology to identify the text in the business card from the business card image; secondly, structuring the text recognized by OCR As the final output result of the system, the OCR-recognized text is structured using artificial design rules or named entity recognition technology, thereby extracting the key fields in the business card.
但是,由于现有的OCR技术在实际使用中需要通过识别“触发词”,从而从图像中识别文本,而名片的布局形式多样,且冗余信息较多,有的名片信息中包含“触发词”,有的名片信息不包括“触发词”,有的名片信息中“触发词”是图标;因此,现有技术中对名片进行信息抽取时,存在抽取到的结构化信息具有误差。使得如何得到一种能够提高名片信息抽取精度的系统成为目前亟待解决的技术问题。However, because the existing OCR technology needs to recognize "trigger words" in actual use, thereby recognizing text from images, and the layout of business cards is various, and there are many redundant information, and some business card information contains "trigger words". ", some business card information does not include "trigger words", and some business card information "trigger words" are icons; therefore, when information is extracted from business cards in the prior art, there are errors in the extracted structured information. How to obtain a system that can improve the accuracy of business card information extraction has become a technical problem to be solved urgently.
发明内容Contents of the invention
本发明实施例期望提供一种名片信息抽取系统训练方法及装置、计算机可读存储介质,能够提高系统对名片进行信息抽取时的效果,从而减少在对名片进行信息抽取时,抽取到的结构化信息的误差。The embodiment of the present invention expects to provide a business card information extraction system training method and device, and a computer-readable storage medium, which can improve the effect of the system when extracting information from business cards, thereby reducing the extracted structured information when extracting information from business cards. information errors.
本发明的技术方案是这样实现的:Technical scheme of the present invention is realized like this:
本发明实施例提供一种名片信息抽取系统训练方法,包括:An embodiment of the present invention provides a business card information extraction system training method, including:
对名片图像进行识别,得到文本信息;其中,所述名片图像为以下至少一种:真实名片图像或模拟名片图像;Recognizing the business card image to obtain text information; wherein, the business card image is at least one of the following: a real business card image or a simulated business card image;
基于预设BERT模型、预设卷积神经网络,以及所述文本信息,得到特征向量;其中,所述特征向量表征所述文本信息中词汇的语义信息;A feature vector is obtained based on a preset BERT model, a preset convolutional neural network, and the text information; wherein, the feature vector represents semantic information of vocabulary in the text information;
基于预设循环神经网络,对所述特征向量进行组合编码,得到对应的文本片段特征信息;其中,所述文本片段特征信息表征不同组合的文本内容;Based on the preset cyclic neural network, the feature vectors are combined and encoded to obtain corresponding text segment feature information; wherein the text segment feature information represents text content of different combinations;
利用分类器对所述文本片段特征信息进行判别,从而得到所述文本片段特征信息对应的预测分类标签;其中,所述预测分类标签表征所述文本片段特征信息的文本类型,所述预测分类标签为得到结构化信息的依据;Use a classifier to discriminate the text segment feature information, so as to obtain the predicted classification label corresponding to the text segment feature information; wherein, the predicted classification label represents the text type of the text segment feature information, and the predicted classification label To obtain the basis for structured information;
基于预设目标函数,以及所述文本片段特征信息对应的所述预测分类标签和预设分类 标签,得到损失值,并根据所述损失值确定目标参数;其中,所述目标参数为所述预设BERT模型、所述预设卷积神经网络、所述预设循环神经网络以及所述分类器中的变量,所述目标参数表征用于抽取名片信息的系统。Based on the preset objective function, and the predicted classification label and preset classification label corresponding to the feature information of the text segment, a loss value is obtained, and a target parameter is determined according to the loss value; wherein, the target parameter is the predicted Assume variables in the BERT model, the preset convolutional neural network, the preset cyclic neural network, and the classifier, and the target parameter represents a system for extracting business card information.
上述方案中,所述基于预设目标函数,以及所述文本片段特征信息对应的所述预测分类标签和预设分类标签,得到损失值,并根据所述损失值确定目标参数,包括:In the above solution, the loss value is obtained based on the preset objective function, and the predicted classification label and the preset classification label corresponding to the feature information of the text segment, and the target parameter is determined according to the loss value, including:
根据第一子目标函数和第一预设权重,以及所述预设分类标签和所述预测分类标签,得到第一子损失值;Obtain a first sub-loss value according to the first sub-objective function and the first preset weight, as well as the preset classification label and the predicted classification label;
根据第二子目标函数、第二预设权重,以及所述第一预设权重、所述预设分类标签和所述预测分类标签,得到第二子损失值;其中,所述第一子目标函数和所述第二子目标函数均为所述预设目标函数;According to the second sub-objective function, the second preset weight, and the first preset weight, the preset classification label and the predicted classification label, the second sub-loss value is obtained; wherein, the first sub-objective function and the second sub-objective function are both the preset objective function;
基于所述第一子损失值和第二子损失值,得到所述损失值,并根据所述损失值确定所述目标参数。The loss value is obtained based on the first sub-loss value and the second sub-loss value, and the target parameter is determined according to the loss value.
上述方案中,所述根据所述损失值确定目标参数,包括:In the above solution, the determination of the target parameter according to the loss value includes:
当所述损失值保持不减少的状态时,确定当前所述预设BERT模型、所述预设卷积神经网络、所述预设循环神经网络以及所述分类器中的变量为所述目标参数。When the loss value remains in a state of not decreasing, determine the variables in the current preset BERT model, the preset convolutional neural network, the preset cyclic neural network, and the classifier as the target parameters .
上述方案中,所述基于预设BERT模型、预设卷积神经网络,以及所述文本信息,得到特征向量,包括:In the above scheme, the feature vector is obtained based on the preset BERT model, the preset convolutional neural network, and the text information, including:
基于所述预设BERT模型,对文本内容信息进行转化,得到词向量序列;其中,所述文本信息包括文本内容信息和文本位置信息;Based on the preset BERT model, the text content information is converted to obtain a word vector sequence; wherein the text information includes text content information and text position information;
根据所述文本行位置信息,将所述词向量序列填充至预设二维网格中,从而得到目标二维网格;Filling the word vector sequence into a preset two-dimensional grid according to the position information of the text line, so as to obtain a target two-dimensional grid;
根据所述目标二维网格,以及所述预设卷积神经网络,从而得到所述特征向量。The feature vector is obtained according to the target two-dimensional grid and the preset convolutional neural network.
上述方案中,所述根据所述目标二维网格,以及所述预设卷积神经网络,从而得到所述特征向量,包括:In the above solution, the feature vector is obtained according to the target two-dimensional grid and the preset convolutional neural network, including:
根据所述目标二维网格,以及所述预设卷积神经网络中的三维卷积核,从而得到所述特征向量。The feature vector is obtained according to the target two-dimensional grid and the three-dimensional convolution kernel in the preset convolutional neural network.
上述方案中,所述根据所述目标二维网格,以及所述预设卷积神经网络中的三维卷积核,从而得到所述特征向量,包括:In the above solution, the feature vector is obtained according to the target two-dimensional grid and the three-dimensional convolution kernel in the preset convolutional neural network, including:
根据所述预设卷积神经网络中的所述三维卷积核,抽取所述目标二维网格中的特征,从而得到所述特征向量。Extracting features in the target two-dimensional grid according to the three-dimensional convolution kernel in the preset convolutional neural network, so as to obtain the feature vector.
上述方案中,当所述名片图像为所述模拟名片图像时,所述对名片图像进行识别,得到文本信息之前,所述方法包括:In the above scheme, when the business card image is the simulated business card image, before the identification of the business card image and obtaining the text information, the method includes:
采集文本样本信息;Collect text sample information;
根据所述文本样本信息,以及预设生成式对抗网络、预设排版规则,得到所述名片图像。The business card image is obtained according to the text sample information, a preset generative confrontation network, and preset typesetting rules.
本发明实施例提供一种名片信息抽取系统训练装置,包括得到部分和确定部分;其中,An embodiment of the present invention provides a business card information extraction system training device, including an obtaining part and a determining part; wherein,
所述得到部分,被配置为对名片图像进行识别,得到文本信息;基于预设BERT模型、预设卷积神经网络,以及所述文本信息,得到特征向量;其中,所述特征向量表征所述文本信息中词汇的语义信息;基于预设循环神经网络,对所述特征向量进行编码,得到对应的文本片段特征信息;其中,所述文本片段特征信息表征不同组合的文本内容;利用分类器对所述文本片段特征信息进行判别,从而得到所述文本片段特征信息对应的预测分类标签;其中,所述预测分类标签表征所述文本片段特征信息的文本类型,所述预测分类标签为得到结构化信息的依据;基于预设目标函数,以及所述文本片段特征信息对应的所述预测分类标签和预设分类标签,得到损失值,并根据所述损失值确定目标参数;The obtaining part is configured to identify the business card image to obtain text information; based on a preset BERT model, a preset convolutional neural network, and the text information, obtain a feature vector; wherein the feature vector represents the Semantic information of vocabulary in the text information; based on the preset recurrent neural network, the feature vector is encoded to obtain the corresponding text segment feature information; wherein, the text segment feature information represents different combinations of text content; using a classifier to The feature information of the text segment is discriminated, so as to obtain the predicted classification label corresponding to the feature information of the text segment; wherein, the predicted classification label represents the text type of the feature information of the text segment, and the predicted classification label is to obtain a structured Information basis; based on a preset objective function, and the predicted classification label and preset classification label corresponding to the text segment feature information, a loss value is obtained, and a target parameter is determined according to the loss value;
所述确定部分,被配置为根据所述损失值确定目标参数;其中,所述目标参数为所述 预设BERT模型、所述预设卷积神经网络、所述预设循环神经网络以及所述分类器中的变量,所述目标参数表征被配置为抽取名片信息的系统。The determining part is configured to determine a target parameter according to the loss value; wherein, the target parameter is the preset BERT model, the preset convolutional neural network, the preset cyclic neural network, and the preset Variables in the classifier, the target parameters characterize the system configured to extract business card information.
上述方案中,所述得到部分,还被配置为根据第一子目标函数和第一预设权重,以及所述预设分类标签和所述预测分类标签,得到第一子损失值;根据第二子目标函数、第二预设权重,以及所述第一预设权重、所述预设分类标签和所述预测分类标签,得到第二子损失值;其中,所述第一子目标函数和所述第二子目标函数均为所述预设目标函数;基于所述第一子损失值和第二子损失值,得到所述损失值,并根据所述损失值确定所述目标参数。In the above solution, the obtaining part is further configured to obtain the first sub-loss value according to the first sub-objective function and the first preset weight, as well as the preset classification label and the predicted classification label; according to the second A sub-objective function, a second preset weight, and the first preset weight, the preset classification label, and the predicted classification label to obtain a second sub-loss value; wherein, the first sub-objective function and the The second sub-objective function is the preset objective function; the loss value is obtained based on the first sub-loss value and the second sub-loss value, and the objective parameter is determined according to the loss value.
上述方案中,所述确定部分,还被配置为当所述损失值保持不减少的状态时,确定当前所述预设BERT模型、所述预设卷积神经网络、所述预设循环神经网络以及所述分类器中的变量为所述目标参数。In the above solution, the determining part is further configured to determine the current preset BERT model, the preset convolutional neural network, and the preset cyclic neural network when the loss value remains in a state of not decreasing. And the variables in the classifier are the target parameters.
上述方案中,所述得到部分,还被配置为基于所述预设BERT模型,对文本内容信息进行转化,得到词向量序列;其中,所述文本信息包括文本内容信息和文本位置信息;根据所述文本行位置信息,将所述词向量序列填充至预设二维网格中,从而得到目标二维网格;根据所述目标二维网格,以及所述预设卷积神经网络,从而得到所述特征向量。In the above solution, the obtaining part is further configured to convert the text content information based on the preset BERT model to obtain a word vector sequence; wherein the text information includes text content information and text position information; according to the The text line position information, filling the word vector sequence into the preset two-dimensional grid, so as to obtain the target two-dimensional grid; according to the target two-dimensional grid, and the preset convolutional neural network, thus Get the eigenvectors.
上述方案中,所述得到部分,还被配置为根据所述目标二维网格,以及所述预设卷积神经网络中的三维卷积核,从而得到所述特征向量。In the above solution, the obtaining part is further configured to obtain the feature vector according to the target two-dimensional grid and the three-dimensional convolution kernel in the preset convolutional neural network.
上述方案中,所述得到部分,还被配置为根据所述预设卷积神经网络中的所述三维卷积核,抽取所述目标二维网格中的特征,从而得到所述特征向量。In the above solution, the obtaining part is further configured to extract features in the target two-dimensional grid according to the three-dimensional convolution kernel in the preset convolutional neural network, so as to obtain the feature vector.
上述方案中,所述装置还包括采集部分,所述采集部分被配置为当所述名片图像为所述模拟名片图像时,采集文本样本信息;In the above solution, the device further includes a collection part configured to collect text sample information when the business card image is the simulated business card image;
所述得到部分,还被配置为根据所述文本样本信息,以及预设生成式对抗网络、预设排版规则,得到所述名片图像。The obtaining part is further configured to obtain the business card image according to the text sample information, a preset generative confrontation network, and preset typesetting rules.
本发明实施例提供一种名片信息抽取系统训练装置,包括:An embodiment of the present invention provides a business card information extraction system training device, including:
存储器,用于存储可执行数据指令;a memory for storing executable data instructions;
处理器,用于执行所述存储器中存储的可执行指令时,实现本发明实施例所述的名片信息抽取系统训练方法。The processor is configured to implement the business card information extraction system training method described in the embodiment of the present invention when executing the executable instructions stored in the memory.
本发明实施例提供一种计算机存储介质,其特征在于,存储有可执行指令,用于引起处理器执行时,实现本发明实施例所述的名片信息抽取系统训练方法。An embodiment of the present invention provides a computer storage medium, which is characterized in that executable instructions are stored therein for causing a processor to execute to implement the business card information extraction system training method described in the embodiment of the present invention.
本发明实施例提供了一种名片信息抽取系统训练方法及装置、计算机存储介质,该方法包括通过对名片图像进行识别,得到文本信息,之后通过预设BERT模型、预设卷积神经网络进行训练对文本信息进行处理,得到特征向量,再对特征向量进行组合编码,以得到对应的文本片段特征信息,最后利用分类器对文本片段特征信息进行判别,得到文本片段特征信息对应的预测分类标签,通过预设目标函数,得到文本片段特征信息对应的预测分类标签与预设分类标签的损失值,当损失值达到要求后,将完成对名片信息抽取系统的训练。The embodiment of the present invention provides a business card information extraction system training method and device, and a computer storage medium. The method includes identifying the business card image to obtain text information, and then training through a preset BERT model and a preset convolutional neural network. Process the text information to obtain feature vectors, and then combine and encode the feature vectors to obtain the corresponding text segment feature information, and finally use the classifier to discriminate the text segment feature information to obtain the predicted classification labels corresponding to the text segment feature information, Through the preset objective function, the loss value of the predicted classification label corresponding to the feature information of the text segment and the preset classification label is obtained. When the loss value meets the requirements, the training of the business card information extraction system will be completed.
本发明实施例能够提高系统对名片进行信息抽取时的效果,从而减少在对名片进行信息抽取时,抽取到的结构化信息的误差。The embodiment of the present invention can improve the effect of the information extraction of the business card by the system, thereby reducing the error of the extracted structured information when the information is extracted from the business card.
附图说明Description of drawings
图1为本发明实施例提供的一种名片信息抽取系统的架构图一;Fig. 1 is a structure diagram 1 of a business card information extraction system provided by an embodiment of the present invention;
图2为本发明实施例提供的一种名片信息抽取系统的架构图二;FIG. 2 is a second architecture diagram of a business card information extraction system provided by an embodiment of the present invention;
图3为本发明实施例提供的一种名片信息抽取系统训练方法的流程图一;Fig. 3 is a flow chart 1 of a training method for a business card information extraction system provided by an embodiment of the present invention;
图4为本发明实施例提供的一种名片信息抽取系统训练方法的流程图二;Fig. 4 is a flow chart 2 of a training method for a business card information extraction system provided by an embodiment of the present invention;
图5为本发明实施例提供的一种名片信息抽取系统训练方法的流程图三;Fig. 5 is a flowchart three of a training method for a business card information extraction system provided by an embodiment of the present invention;
图6a为本发明实施例提供的一种名片信息抽取系统训练方法的文本样本信息示意图;Fig. 6a is a schematic diagram of text sample information of a business card information extraction system training method provided by an embodiment of the present invention;
图6b为本发明实施例提供的一种名片信息抽取系统训练方法的模拟文本信息示意图;Fig. 6b is a schematic diagram of simulated text information of a business card information extraction system training method provided by an embodiment of the present invention;
图7为本发明实施例提供的一种名片信息抽取系统训练方法的生成模拟名片图像示意图;7 is a schematic diagram of a simulated business card image generated by a business card information extraction system training method provided by an embodiment of the present invention;
图8为本发明实施例提供的一种名片信息抽取方法的流程图;FIG. 8 is a flow chart of a method for extracting business card information provided by an embodiment of the present invention;
图9为本发明实施例提供的一种名片图像的示意图;Fig. 9 is a schematic diagram of a business card image provided by an embodiment of the present invention;
图10为本发明实施例提供的一种名片信息抽取系统训练装置的架构图一;FIG. 10 is a first structural diagram of a business card information extraction system training device provided by an embodiment of the present invention;
图11为本发明实施例提供的一种名片信息抽取系统训练装置的架构图二。FIG. 11 is the second structure diagram of a training device for a business card information extraction system provided by an embodiment of the present invention.
具体实施方式Detailed ways
下面将结合本发明实施例中的附图,对本发明实施例中的技术方案进行清楚、完整地描述。The following will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the drawings in the embodiments of the present invention.
在对本发明实施例的方案进行介绍之前,先对本发明实施例中可能会使用到的技术术语进行简单说明:Before introducing the solutions of the embodiments of the present invention, a brief description of the technical terms that may be used in the embodiments of the present invention is given:
GAN(Generative Adversarial Networks,生成对抗网络)是一种深度学习模型,模型通过框架中(至少)两个模块:生成模型(Generative Model)和判别模型(Discriminative Model)的互相博弈学习产生相当好的输出。GAN (Generative Adversarial Networks, Generative Adversarial Networks) is a deep learning model. The model produces quite good output through mutual game learning of (at least) two modules in the framework: Generative Model (Generative Model) and Discriminative Model (Discriminative Model). .
NLP(Natural Language Processing,自然语言处理)是人工智能和语言学领域的分支学科,主要研究用计算机对自然语言的形、音、义等信息进行处理,即对字、词、句、篇章的输入、输出、识别、分析、理解、生成等的操作和加工。自然语言处理的具体表现形式包括机器翻译、文本摘要、文本分类、文本校对、信息抽取、语音合成、语音识别等。NLP (Natural Language Processing, Natural Language Processing) is a branch of artificial intelligence and linguistics. It mainly studies the use of computers to process information such as shape, sound, and meaning of natural language, that is, the input of words, words, sentences, and texts. , output, recognition, analysis, understanding, generation, etc. operation and processing. Specific manifestations of natural language processing include machine translation, text summarization, text classification, text proofreading, information extraction, speech synthesis, speech recognition, etc.
NER(Name Entity Recognition,命名实体识别)是NLP中一项非常基础的任务。NER是信息提取、问答系统、句法分析、机器翻译等众多NLP任务的重要基础工具。命名实体识别的目的是识别文本中指定类别的实体。所谓的命名实体就是人名、机构名、地名以及其他所有以名称为标识的实体。NER (Name Entity Recognition, named entity recognition) is a very basic task in NLP. NER is an important basic tool for many NLP tasks such as information extraction, question answering systems, syntax analysis, and machine translation. The purpose of named entity recognition is to identify entities of specified categories in text. The so-called named entities are people's names, organization names, place names and all other entities identified by names.
OCR(Optical Character Recognition,光学字符识别)是指电子设备(例如扫描仪或数码相机)检查纸上打印的字符,通过检测暗、亮的模式确定其形状,然后用字符识别方法将形状翻译成计算机文字的过程;即,针对印刷体字符,采用光学的方式将纸质文档中的文字转换成为黑白点阵的图像文件,并通过计算机技术将图像中的文字转换成文本格式,供文字处理软件进一步编辑加工的技术OCR (Optical Character Recognition, Optical Character Recognition) means that electronic equipment (such as a scanner or digital camera) checks characters printed on paper, determines its shape by detecting dark and bright patterns, and then uses character recognition to translate the shape into a computer The process of text; that is, for printed characters, the text in the paper document is converted into a black and white dot matrix image file by optical means, and the text in the image is converted into a text format by computer technology for further processing by word processing software editing technology
图1是本发明实施例提供的一种名片信息抽取系统的架构图一,如图1所示,本发明实施例提供一种名片信息抽取系统,包括输入模块1、OCR模块2、NER模块3和输出模块4。其中,输入模块1用于输入待识别的名片图像;OCR模块2用于将输入的名片图像中的文字提取出来,以文本格式输出;NER模块3负责对OCR输出的文本(文本信息)进行实体识别;输出模块4用于对NER模块3输出的识别结果进行后处理,输出最终的目标结构化信息。Fig. 1 is a structure diagram 1 of a business card information extraction system provided by an embodiment of the present invention. As shown in Fig. 1 , an embodiment of the present invention provides a business card information extraction system, including an input module 1, an OCR module 2, and a NER module 3 and output module 4. Wherein, the input module 1 is used to input the business card image to be recognized; the OCR module 2 is used to extract the text in the input business card image, and outputs it in text format; Recognition; the output module 4 is used to post-process the recognition result output by the NER module 3, and output the final target structured information.
在本发明的一些实施例中,图2是本发明实施例提供的一种名片信息抽取系统的架构图二,如图2所示,NER模块3通过设于NER模块3内的NER模型完成OCR输出的文本的实体识别。NER模型包括Word Embedding层、Bidirectional层、Hidden层、Span Representations层、Fully-connected Layer层和Span Classifier层。In some embodiments of the present invention, FIG. 2 is a second architecture diagram of a business card information extraction system provided by an embodiment of the present invention. As shown in FIG. 2 , the NER module 3 completes OCR through the NER model set in the NER module 3 Entity recognition of the output text. The NER model includes Word Embedding layer, Bidirectional layer, Hidden layer, Span Representations layer, Fully-connected Layer layer and Span Classifier layer.
在本发明的一些实施例中,Word Embedding层,用于基于预设BERT模型、预设卷积神经网络,对OCR输出的文本进行处理,得到特征向量。Bidirectional层,用于基于预设 循环神经网络,对特征向量进行编码,得到对应的文本片段特征信息。Hidden层,用于通过Hidden模型将文本片段特征信息中不易被捕捉到的隐含文本片段特征信息转换为可读文本,即将隐含文本片段特征信息转换成可读文本信息,示例性的,若Bidirectional层对特征向量进行编码,得到的文本片段特征信息中包含一段隐含文本片段特征信息“B-LOC|I-LOC|I-LOC”,通过Hidden层的Hidden模型,可以通过上述隐含文本片段特征信息转换成可读文本信息“自贸区”。Span Representations层,用于将上述文本片段特征信息按照预设规则进行拼接。Fully-connected Layer层,用于对文本片段特征信息进行特征融合或者特征加权;Span Classifier层,用于对文本片段特征信息进行判别,得到文本片段特征信息对应的预测分类标签,筛选预测分类标签,将得到目标文本片段特征信息,根据目标文本片段特征信息和目标文本片段特征信息对应的预测分类标签可以确定结构化信息。In some embodiments of the present invention, the Word Embedding layer is used to process the text output by OCR based on a preset BERT model and a preset convolutional neural network to obtain a feature vector. The Bidirectional layer is used to encode the feature vector based on the preset cyclic neural network to obtain the feature information of the corresponding text segment. The Hidden layer is used to convert the hidden text segment feature information that is not easy to be captured in the text segment feature information into readable text through the Hidden model, that is, convert the hidden text segment feature information into readable text information. For example, if The Bidirectional layer encodes the feature vector, and the obtained text segment feature information contains a piece of hidden text segment feature information "B-LOC|I-LOC|I-LOC". Through the Hidden model of the Hidden layer, the above hidden text can be passed The fragment feature information is converted into readable text information "FTA". The Span Representations layer is used to splice the feature information of the above text fragments according to preset rules. The Fully-connected Layer layer is used for feature fusion or feature weighting of the text segment feature information; the Span Classifier layer is used to distinguish the text segment feature information, obtain the predicted classification label corresponding to the text segment feature information, and filter the predicted classification label. The feature information of the target text segment will be obtained, and the structural information can be determined according to the feature information of the target text segment and the predicted classification label corresponding to the feature information of the target text segment.
图3是本发明实施例提供的一种名片信息抽取系统训练方法的流程图一,如图3所示,本发明实施例提供的名片信息抽取系统训练方法,包括:Fig. 3 is a flowchart one of a training method for a business card information extraction system provided by an embodiment of the present invention. As shown in Fig. 3 , the training method for a business card information extraction system provided by an embodiment of the present invention includes:
S101、对名片图像进行识别,得到文本信息。其中,名片图像为以下至少一种:真实名片图像或模拟名片图像。S101. Recognize the business card image to obtain text information. Wherein, the business card image is at least one of the following: a real business card image or a simulated business card image.
本发明实施例中,适用于对名片图像进行识别,获取符合预设要求的文本信息的场景。In the embodiment of the present invention, it is applicable to the scene where the business card image is recognized and the text information meeting the preset requirements is acquired.
本发明实施例中,通过OCR模块对名片图像进行识别,得到需要的文本信息。In the embodiment of the present invention, the business card image is recognized by the OCR module to obtain the required text information.
本发明实施例中,名片图像为真实名片图像和/或模拟名片图像。其中,真实名片图像表征现实生活中印制出的名片,在实际使用中,可以对上述现实生活中印制出的名片扫描或拍照,以得到真实名片图像。模拟名片图像基于现实生活中印制出的名片构造,且模拟名片图像对应的名片为在现实中未被印制出或与印制出的名片具有差异。在实际使用中,可以通过预设生成对抗网络,按照预设的内容(文本样本信息)和排版(预设排版规则)输出对应的模拟名片图像。In the embodiment of the present invention, the business card image is a real business card image and/or a simulated business card image. Wherein, the real business card image represents a business card printed in real life, and in actual use, the business card printed in real life may be scanned or photographed to obtain a real business card image. The simulated business card image is constructed based on printed business cards in real life, and the simulated business card image corresponds to a business card that has not been printed in reality or is different from a printed business card. In actual use, the adversarial network can be generated through the preset, and the corresponding simulated business card image can be output according to the preset content (text sample information) and layout (preset layout rules).
本发明实施例中,名片图像的输入通过输入模块完成;其中,名片图像在输入之前需要进行数据预处理操作。In the embodiment of the present invention, the input of the business card image is completed through the input module; wherein, the business card image needs to perform a data preprocessing operation before input.
示例性的,数据预处理操作可以是二值化、方向矫正、扭曲矫正、去噪等。Exemplarily, the data preprocessing operation may be binarization, direction correction, distortion correction, denoising and so on.
本发明实施例中,OCR模块主要负责将输入的名片图像中的文字提取出来,并以文本格式输出,从而得到文本信息。其中,文本信息以文本行为粒度,每个文本行包括文本内容和文本位置信息。In the embodiment of the present invention, the OCR module is mainly responsible for extracting the text in the input business card image and outputting it in a text format, so as to obtain text information. Wherein, the text information is granular in text lines, and each text line includes text content and text position information.
可以理解的是,通过OCR模块对名片图像进行识别,从而获得文本格式的文本信息,便于后续进行处理。It can be understood that the business card image is recognized by the OCR module, so as to obtain text information in a text format, which is convenient for subsequent processing.
S102、基于预设BERT模型、预设卷积神经网络,以及文本信息,得到特征向量;其中,特征向量表征文本信息中词汇的语义信息。S102. Obtain a feature vector based on a preset BERT model, a preset convolutional neural network, and text information; wherein, the feature vector represents semantic information of words in the text information.
本发明实施例中,适用于对S101中得到的文本信息进行处理,得到文本信息中词汇的语义信息的场景。In the embodiment of the present invention, it is applicable to a scenario where the text information obtained in S101 is processed to obtain semantic information of vocabulary in the text information.
本发明实施例中,通过预设BERT模型、预设卷积神经网络,从文本信息中得到表征文本信息中词汇的语义信息的特征向量。In the embodiment of the present invention, the feature vector representing the semantic information of the vocabulary in the text information is obtained from the text information through the preset BERT model and the preset convolutional neural network.
本发明实施例中,BERT模型(Bidirectional Encoder Representations from Transformers模型)是一个自编码语言模型,可以提取词汇在句子中的关系特征,并且能在多个不同层次提取关系特征,进而更全面反映句子语义,且在提取过程中可以根据句子上下文获取词义,从而避免歧义出现。In the embodiment of the present invention, the BERT model (Bidirectional Encoder Representations from Transformers model) is a self-encoding language model, which can extract the relationship features of vocabulary in sentences, and can extract relationship features at multiple different levels, thereby more comprehensively reflecting sentence semantics , and the word meaning can be obtained according to the sentence context during the extraction process, so as to avoid ambiguity.
本发明实施例中,文本信息将以token形式输入预设BERT模型中,其中,token形式指文本中各个字/词的原始词向量;在实际使用中,文本信息的token形式可以是记为“文本token序列T”,文本token序列T=(t 1,t 2,...,t N)。将文本token序列T输入至预设BERT模型中,通过预设BERT模型对文本token序列T进行转化,将会得到词向量序列W,词向量序列W=(w 1,w 2,...,w N)。 In the embodiment of the present invention, the text information will be input into the preset BERT model in the form of token, wherein the token form refers to the original word vector of each word/word in the text; in actual use, the token form of the text information can be recorded as " Text token sequence T", text token sequence T=(t 1 ,t 2 ,...,t N ). Input the text token sequence T into the preset BERT model, convert the text token sequence T through the preset BERT model, and get the word vector sequence W, the word vector sequence W=(w 1 ,w 2 ,..., w N ).
示例性的,文本信息为“公司地址:XX市XX区XX路XXXX号”,将上述文本信息转换token形式输入至预设BERT模型中,即将“公”转换为文本token序列T中的t 1,将“司”转换为文本token序列T中的t 2等,并将转换后的t 1和t 2输入至预设BERT模型,以得到对应的词向量序列。 For example, the text information is "company address: XXXX, XX Road, XX District, XX City", and the above text information is converted into token form and input into the preset BERT model, that is, "public" is converted into t 1 in the text token sequence T , convert "司" into t 2 in the text token sequence T, etc., and input the converted t 1 and t 2 into the preset BERT model to obtain the corresponding word vector sequence.
本发明实施例中,当文本信息通过预设BERT模型转化为词向量序列后,需要将根据文本信息中的文本位置信息,将词向量序列W填充到预设的二维网格中;其中,二维网格中每个位置(网格)的值均对应文本token序列T中的一个词向量,在实际使用中,若预设的二维网格中有空缺,则以<PAD>的词向量填充。最后,通过预设卷积神经网络对上述二维网格进行局部特征捕获,从而得到特征向量。In the embodiment of the present invention, after the text information is converted into a word vector sequence through the preset BERT model, it is necessary to fill the word vector sequence W into the preset two-dimensional grid according to the text position information in the text information; wherein, The value of each position (grid) in the two-dimensional grid corresponds to a word vector in the text token sequence T. In actual use, if there is a vacancy in the preset two-dimensional grid, the word vector of <PAD> Vector fill. Finally, the local feature capture of the above two-dimensional grid is carried out through the preset convolutional neural network, so as to obtain the feature vector.
可以理解的是,本发明实施例基于文本信息进行片段分类的命名实体识别方法,可以降低在命名实体是被过程中由实体触发词误差带来的影响,从而提升实体抽取的效果。It can be understood that the named entity recognition method based on segment classification based on text information in the embodiment of the present invention can reduce the impact of entity trigger word errors during the named entity identification process, thereby improving the effect of entity extraction.
图4是本发明实施例提供的一种名片信息抽取系统训练方法的流程图二,如图4所示,S102还可以包括S1021-S1023,如下:Fig. 4 is a flowchart two of a business card information extraction system training method provided by an embodiment of the present invention. As shown in Fig. 4, S102 may also include S1021-S1023, as follows:
S1021、基于预设BERT模型,对文本内容信息进行转化,得到词向量序列;其中,文本信息包括文本内容信息和文本位置信息。S1021. Based on the preset BERT model, convert the text content information to obtain a word vector sequence; wherein, the text information includes text content information and text position information.
在本发明的一些实施例中,适用于对文本信息中的文本内容信息进行处理的场景。In some embodiments of the present invention, it is applicable to a scenario of processing text content information in text information.
在本发明的一些实施例中,将文本信息中的文本内容信息按照预设BERT模型需要的格式,输入至预设BERT模型,从而得到词向量序列。In some embodiments of the present invention, the text content information in the text information is input into the preset BERT model according to the format required by the preset BERT model, so as to obtain a sequence of word vectors.
在本发明的一些实施例中,文本信息包括文本内容信息和文本位置信息,其中,文本内容信息指名片图像上的文字,文本位置信息指文字在名片图像上的坐标,坐标可以根据预设坐标系确定。In some embodiments of the present invention, the text information includes text content information and text position information, wherein the text content information refers to the text on the business card image, and the text position information refers to the coordinates of the text on the business card image, and the coordinates can be based on preset coordinates Department OK.
可以理解的是,本发明实施例中通过预设BERT模型获取词向量序列,可以提升获取到的词向量序列的精确度。It can be understood that, in the embodiment of the present invention, the word vector sequence is obtained by presetting the BERT model, which can improve the accuracy of the obtained word vector sequence.
S1022、根据文本行位置信息,将词向量序列填充至预设二维网格中,从而得到目标二维网格。S1022. According to the position information of the text line, fill the sequence of word vectors into the preset two-dimensional grid, so as to obtain the target two-dimensional grid.
在本发明的一些实施例中,适用于得到目标二维网络,为后续通过预设卷积神经网络进行进一步处理提供数据支持的场景。In some embodiments of the present invention, it is suitable for obtaining a target two-dimensional network and providing data support for subsequent further processing through a preset convolutional neural network.
在本发明的一些实施例中,根据S101中得到的文本信息中的文本行位置信息,如图2中的NER模型的Word Embedding层所示,将词向量序列中的每个词向量填充至预设二维网格中,从而得到目标二维网格。其中,预设二维网格的规格为r*c,r为文本行的个数,c为文本行的最大长度。In some embodiments of the present invention, according to the text line position information in the text information obtained in S101, as shown in the Word Embedding layer of the NER model in Figure 2, each word vector in the word vector sequence is filled to the preset Set in the two-dimensional grid, so as to obtain the target two-dimensional grid. Wherein, the specification of the preset two-dimensional grid is r*c, r is the number of text lines, and c is the maximum length of the text lines.
可以理解的是,这样即可完成局部特征的捕获,对名片图像上的文本的布局信息进行建模。It can be understood that in this way, the capture of local features can be completed, and the layout information of the text on the business card image can be modeled.
S1023、根据目标二维网格,以及预设卷积神经网络,从而得到特征向量。S1023. Obtain a feature vector according to the target two-dimensional grid and a preset convolutional neural network.
在本发明的一些实施例中,适用于得到目标二维网格后,对上述数据进行后续处理,得到特征向量的场景。In some embodiments of the present invention, it is applicable to the scene where after the target two-dimensional grid is obtained, the above data is subjected to subsequent processing to obtain the feature vector.
在本发明的一些实施例中,将目标二维网格输入至预设卷积神经网络中,从而得到特征向量。In some embodiments of the present invention, the target two-dimensional grid is input into a preset convolutional neural network to obtain a feature vector.
在本发明的一些实施例中,S1023包括:根据目标二维网格,以及预设卷积神经网络中的三维卷积核,从而得到特征向量。In some embodiments of the present invention, S1023 includes: obtaining the feature vector according to the target two-dimensional grid and presetting the three-dimensional convolution kernel in the convolutional neural network.
在本发明的实施例中,三维卷积核中,第一维表征卷积核的宽,且第一维与词向量的长度相同;第二维表征卷积核的高;第三维表征卷积核的大小,且第三维的大小与词向量的长度相同。In the embodiment of the present invention, in the three-dimensional convolution kernel, the first dimension represents the width of the convolution kernel, and the first dimension is the same as the length of the word vector; the second dimension represents the height of the convolution kernel; the third dimension represents the convolution The size of the kernel, and the size of the third dimension is the same as the length of the word vector.
在本发明的一些实施例中,S1023中根据预设卷积神经网络中的三维卷积核,抽取目标二维网格中的特征,从而得到特征向量。In some embodiments of the present invention, in S1023, according to the preset three-dimensional convolution kernel in the convolutional neural network, the features in the target two-dimensional grid are extracted to obtain a feature vector.
在本发明的一些实施例中,利用预设卷积神经网络中的三维卷积核,对目标二维网格中的特征进行抽取,其中,每个词向量对应至少一个特征,而每个特征对应多个词向量。In some embodiments of the present invention, the features in the target two-dimensional grid are extracted using the three-dimensional convolution kernel in the preset convolutional neural network, wherein each word vector corresponds to at least one feature, and each feature Corresponding to multiple word vectors.
可以理解的是,通过三维卷积核对特征进行抽取,提高抽取到的特征的准确率。It can be understood that the feature is extracted through the three-dimensional convolution kernel to improve the accuracy of the extracted feature.
S103、基于预设循环神经网络,对特征向量进行组合编码,得到对应的文本片段特征信息;其中,文本片段特征信息表征不同组合的文本内容。S103. Based on the preset recurrent neural network, perform combined encoding on the feature vectors to obtain corresponding text segment feature information; wherein, the text segment feature information represents text content of different combinations.
本发明实施例中,适用于对特征向量进行组合编码,从而得到文本片段特征信息的场景。In the embodiment of the present invention, it is applicable to the scene where feature vectors are combined and coded to obtain feature information of text segments.
本发明实施例中,通过S102得到特征向量后,将特征向量输入预设循环神经网络中,通过预设循环神经网络对特征向量的组合编码,从而得到文本片段特征信息,文本片段特征信息表征特征向量的组合编码后形成的不同组合的文本内容。In the embodiment of the present invention, after the feature vector is obtained through S102, the feature vector is input into the preset cyclic neural network, and the combination encoding of the feature vector is obtained through the preset cyclic neural network, so as to obtain the text segment feature information, and the text segment feature information represents the feature The text content of different combinations formed by the combination of vectors.
本发明实施例中,预设循环神经网络,即LSTM模型,通过预设循环神经网络对特征向量的组合编码表征通过预设循环神经网络对文本信息对应的文本token序列中的元素进行不同形式的组合表示,如:正序循环表示或倒序循环表示。从而获取上述文本token序列中的元素经过不同组合后形成句子的语义。其中,如S102中所述,文本token序列表征文本信息,文本token序列中的每个元素均表征文本信息中的一个字。In the embodiment of the present invention, the preset cyclic neural network, that is, the LSTM model, uses the preset cyclic neural network to characterize the combination of feature vectors and performs different forms of elements in the text token sequence corresponding to the text information through the preset cyclic neural network. Combination representations, such as: forward cycle representation or reverse cycle representation. In this way, the semantics of sentences formed by different combinations of elements in the above text token sequence can be obtained. Wherein, as described in S102, the text token sequence represents text information, and each element in the text token sequence represents a word in the text information.
本发明实施例中,由于文本信息在处理过程中,可能会有部分信息不易被识别或捕捉到,因此,将导致上述文本片段特征信息中将包含部分隐含文本片段特征信息;此时,可以通过如图2所示的Hidden层将上述隐含文本片段特征信息转换成可读文本信息(文本片段特征信息),保证文本片段特征信息的完整性。In the embodiment of the present invention, due to the fact that some information may not be easily recognized or captured during the processing of text information, the above-mentioned text segment feature information will include some hidden text segment feature information; at this time, it may The hidden text segment feature information is converted into readable text information (text segment feature information) through the Hidden layer as shown in FIG. 2 to ensure the integrity of the text segment feature information.
可以理解的是,预设循环神经网络可以提高识别出文本片段特征信息对应的文本内容的语义的正确率。It can be understood that the preset cyclic neural network can improve the accuracy rate of identifying the semantics of the text content corresponding to the feature information of the text segment.
S104、利用分类器对文本片段特征信息进行判别,从而得到文本片段特征信息对应的预测分类标签;其中,预测分类标签表征文本片段特征信息的文本类型,预测分类标签为得到结构化信息的依据。S104. Use a classifier to discriminate the feature information of the text segment, so as to obtain the predicted classification label corresponding to the feature information of the text segment; wherein, the predicted classification label represents the text type of the feature information of the text segment, and the predicted classification label is the basis for obtaining the structured information.
本发明实施例中,适用于获取结构化信息的场景。In the embodiment of the present invention, it is applicable to the scenario of acquiring structured information.
本发明实施例中,利用分类器对文本片段特征信息进行判别,得到每个文本片段特征信息对应的预测分类标签。在实际使用中,可以根据预设分类标签对预测分类标签进行筛选,与预设分类标签相同的预测分类标签对应的文本片段特征信息即为目标文本片段特征信息,根据目标文本片段特征信息以及对应的预测分类标签即可得到结构化信息,结构化信息为对S101中输入的名片图像中信息的抽取结果。In the embodiment of the present invention, a classifier is used to discriminate the feature information of the text segment to obtain a predicted classification label corresponding to the feature information of each text segment. In actual use, the predicted classification labels can be screened according to the preset classification labels. The text segment feature information corresponding to the same predicted classification label as the preset classification label is the target text segment feature information. According to the target text segment feature information and the corresponding The structured information can be obtained by predicting the classification labels, and the structured information is the extraction result of the information in the business card image input in S101.
本发明实施例中,分类器用于根据预测分类标签对文本片段特征信息筛选指选择出与预设分类标签相同的预测分类标签。In the embodiment of the present invention, the classifier is used to filter the feature information of the text segment according to the predicted classification label to select the same predicted classification label as the preset classification label.
示例性的,若预设分类标签为“地址”,则分类器从预测分类标签中筛选预测分类标签为“地址”的文本片段特征信息作为目标文本片段特征信息;若目标文本片段特征信息表征的文本内容为:“XX市XX区XX路XXXX号”,则结构化信息为“地址:XX市XX区XX路XXXX号”。Exemplarily, if the preset classification label is "address", the classifier filters the text segment feature information whose predicted classification label is "address" from the predicted classification label as the target text segment feature information; if the target text segment feature information represents The text content is: "XXXX, XX Road, XX District, XX City", and the structured information is "Address: XXXX, XX Road, XX District, XX City".
可以理解的是,本发明实施例通过分类器对文本片段特征信息进行判别得到文本片段特征信息对应的预测分类标签,用筛选预测分类标签得到目标文本片段特征信息,以抽取结构化信息的方式,取代现有技术中通过识别“触发词”抽取结构化信息的方式;达到去除在结构化信息的抽取过程中识别“触发词”带来的误差的目的,从而减少抽取到的结构化信息的误差。It can be understood that, in the embodiment of the present invention, a classifier is used to discriminate the text segment feature information to obtain the predicted classification label corresponding to the text segment feature information, and the target text segment feature information is obtained by filtering the predicted classification label to extract structured information. Replace the existing method of extracting structured information by identifying "trigger words"; achieve the purpose of removing the error caused by identifying "trigger words" in the process of extracting structured information, thereby reducing the error of the extracted structured information .
S105、基于预设目标函数,以及文本片段特征信息对应的预测分类标签和预设分类标签,得到损失值,并根据损失值确定目标参数。其中,目标参数为预设BERT模型、预设卷积神经网络、预设循环神经网络以及分类器中的变量,目标参数表征用于抽取名片信息的系统。S105. Obtain a loss value based on the preset objective function, the predicted classification label corresponding to the feature information of the text segment and the preset classification label, and determine the target parameter according to the loss value. Among them, the target parameter is the variable in the preset BERT model, the preset convolutional neural network, the preset recurrent neural network and the classifier, and the target parameter represents a system for extracting business card information.
本发明实施例中,适用于判断识别出的文本片段特征信息的预测分类标签的准确度的场景。In the embodiment of the present invention, it is applicable to the scenario of judging the accuracy of the predicted classification label of the identified text segment feature information.
本发明实施例中,通过预设目标函数,对文本片段特征信息的预设分类标签和预测分类标签进行计算,得到预设分类标签和预测分类标签之间的损失值,并根据损失值确定目标参数。其中,损失值表征文本片段特征信息的预设分类标签和预测分类标签之间的误差。In the embodiment of the present invention, the preset objective function is used to calculate the preset classification label and the predicted classification label of the feature information of the text segment, and the loss value between the preset classification label and the predicted classification label is obtained, and the target is determined according to the loss value parameter. Among them, the loss value represents the error between the preset classification label and the predicted classification label of the feature information of the text segment.
本发明实施例中,文本片段特征信息的预设分类标签为名片图像上印制的与文本片段特征信息对应的实际分类标签。在实际使用中,通过文本片段特征信息的预设分类标签和预测分类标签的损失值,可以提高名片信息抽取系统抽取到的结构化信息的准确率。其中,当损失值保持不减少的状态时,则可以确定名片信息抽取系统抽取到的结构化信息的准确率达到最大值,名片信息抽取系统完成训练,此时名片信息抽取系统中变量的当前数值为目标参数。In the embodiment of the present invention, the preset classification label of the text segment characteristic information is the actual classification label printed on the business card image and corresponding to the text segment characteristic information. In actual use, the accuracy of the structured information extracted by the business card information extraction system can be improved through the preset classification labels of the text segment feature information and the loss value of the predicted classification labels. Among them, when the loss value remains in the state of not decreasing, it can be determined that the accuracy rate of the structured information extracted by the business card information extraction system reaches the maximum value, and the business card information extraction system completes the training. At this time, the current value of the variable in the business card information extraction system is the target parameter.
本发明实施例中,若文本token序列包括n个token形式的文本,则上述文本token序列对应的文本信息将包含有n(n+1)个文本片段特征信息,每个文本片段特征信息均表征一个文本片段,但是上述n(n+1)个文本片段特征信息中存在预测分类标签没有意义的文本片段特征信息,即存在负样本。In the embodiment of the present invention, if the text token sequence includes n texts in the form of tokens, the text information corresponding to the above text token sequence will contain n(n+1) pieces of text segment feature information, and each text segment feature information represents A text segment, but there is text segment feature information in the above n(n+1) text segment feature information that has no meaning in predicting classification labels, that is, there is a negative sample.
示例性的,若S101中获取到的文本信息为“公司地址:XX市XX区XX路1XX8号”,将上述文本信息转换token形式。其中,上述文本信息中的每个字都相当于文本token序列中的一个元素,例如:“公”相当于文本token序列T中的t 1,“司”相当于文本token序列T中的t 2等。表1为基于上述文本信息得到的n(n+1)个文本片段,表1如下: Exemplarily, if the text information obtained in S101 is "company address: No. 1XX8, XX Road, XX District, XX City", the above text information is converted into token form. Wherein, each word in the above text information is equivalent to an element in the text token sequence, for example: "公" is equivalent to t 1 in the text token sequence T, and "司" is equivalent to t 2 in the text token sequence T wait. Table 1 shows n(n+1) text fragments obtained based on the above text information, and Table 1 is as follows:
Figure PCTCN2022129071-appb-000001
Figure PCTCN2022129071-appb-000001
表1Table 1
如表1所示,表1中的每一列均对应一个文本片段,且表1中仅有一个文本片段有意义:“XX市XX区XX路1XX8号”,其预设分类标签类型为“地址”,而其他文本片段均没有实际意义的标签,即为负样本。As shown in Table 1, each column in Table 1 corresponds to a text segment, and only one text segment in Table 1 is meaningful: "No. 1XX8, XX Road, XX District, XX City", and its default classification label type is "Address ", while other text fragments have no practical labels, which are negative samples.
本发明实施例中,预设目标函数包括第一预设函数和第二预设函数;其中,第一预设函数可以是SoftMax Loss,SoftMax Loss如公式(1)所示,如下:In the embodiment of the present invention, the preset objective function includes a first preset function and a second preset function; wherein, the first preset function can be SoftMax Loss, and SoftMax Loss is shown in formula (1), as follows:
Figure PCTCN2022129071-appb-000002
Figure PCTCN2022129071-appb-000002
其中,
Figure PCTCN2022129071-appb-000003
i为S103得到的文本片段特征信息索引,m为文本片段特征信息索引中的文本片段特征信息总个数,x i是指S103得到的文本片段特征信息中第i个文本片段的特征信息(特征向量),y i是指第i个文本片段特征信息对应的预设分类标签,j为预设分类标签索引,c+1为预设分类标签索引中的预设分类标签总个数,1表征预设分类标签索引中无意义的预设分类标签,W为分类器中预设的权重参数,γ为第一预设权重,其中,0≤γ≤1,T用于转置,T与S102中的文本token序列无关。
in,
Figure PCTCN2022129071-appb-000003
i is the text segment feature information index that S103 obtains, m is the total number of text segment feature information in the text segment feature information index, x i refers to the feature information (features) of the i-th text segment in the text segment feature information that S103 obtains vector), y i refers to the preset classification label corresponding to the feature information of the i-th text segment, j is the preset classification label index, c+1 is the total number of preset classification labels in the preset classification label index, and 1 represents Meaningless preset classification labels in the preset classification label index, W is the preset weight parameter in the classifier, γ is the first preset weight, where, 0≤γ≤1, T is used for transposition, T and S102 The text token sequence in is irrelevant.
本发明实施例中,第一预设权重来降低负样本对目标函数的贡献,当γ=0时,相当于负样本完全不参与训练。In the embodiment of the present invention, the first preset weight is used to reduce the contribution of negative samples to the objective function. When γ=0, it means that negative samples do not participate in training at all.
本发明实施例中,第二预设函数可以是Center Loss,Center Loss如公式(2)所示,如下:In the embodiment of the present invention, the second preset function may be Center Loss, and Center Loss is as shown in formula (2), as follows:
Figure PCTCN2022129071-appb-000004
Figure PCTCN2022129071-appb-000004
其中,λ为第二预设权重,i为S103得到的文本片段特征信息索引,m为文本片段特征信息索引中的文本片段特征信息总个数,x i是指S103得到的文本片段特征信息中第i个文本片段的特征信息(特征向量),y i是指第i个文本片段特征信息对应的预设分类标签,j为预设分类标签索引,c+1为预设分类标签索引中的预设分类标签总个数,1表征预设分类标签索引中无意义的预设分类标签,W为分类器中预设的权重参数,γ为第一预设权重,其中,0≤γ≤1。 Among them, λ is the second preset weight, i is the text segment feature information index obtained in S103, m is the total number of text segment feature information in the text segment feature information index, x i refers to the text segment feature information obtained in S103 The feature information (feature vector) of the i-th text segment, y i refers to the preset classification label corresponding to the feature information of the i-th text segment, j is the preset classification label index, and c+1 is in the preset classification label index The total number of preset classification labels, 1 represents the meaningless preset classification labels in the preset classification label index, W is the preset weight parameter in the classifier, γ is the first preset weight, where, 0≤γ≤1 .
本发明实施例中,预设函数为L,预设函数L如公式(3)所示,如下:In the embodiment of the present invention, the preset function is L, and the preset function L is as shown in formula (3), as follows:
Figure PCTCN2022129071-appb-000005
Figure PCTCN2022129071-appb-000005
其中,L S为第一预设函数,L C为第二预设函数。 Wherein, L S is the first preset function, and L C is the second preset function.
可以理解的是,本发明实施例中引入了基于度量学习的方法,在SoftMax Loss的基础上,加上人脸识别领域的Center Loss,即加上了样本在特征空间上与类中心的距离约束,监督分类器学习,使得类内更加聚合,类间更加分离,从而提升算法的泛化能力,提高名片信息抽取的效果。其中,Softmax Loss用于约束能够区分不同类型的实体文本(文本内容),即使得文本片段特征信息具有判别性。Center Loss用于约束文本片段特征信息类内更加聚合,从而提高模型泛化能力。It can be understood that the method based on metric learning is introduced in the embodiment of the present invention. On the basis of SoftMax Loss, Center Loss in the field of face recognition is added, that is, the distance constraint between the sample in the feature space and the class center is added. , supervised classifier learning, which makes the class more aggregated and the class more separated, thereby improving the generalization ability of the algorithm and improving the effect of business card information extraction. Among them, Softmax Loss is used to constrain the ability to distinguish different types of entity text (text content), that is, to make the feature information of text fragments discriminative. Center Loss is used to constrain the aggregation of feature information in text fragments, thereby improving the generalization ability of the model.
图5是本发明实施例提供的一种名片信息抽取系统训练方法的流程图三,如图5所示,S105还可以包括S1051-S1053,如下:Fig. 5 is a flowchart three of a business card information extraction system training method provided by an embodiment of the present invention. As shown in Fig. 5, S105 may also include S1051-S1053, as follows:
S1051、根据第一子目标函数和第一预设权重,以及预设分类标签和预测分类标签,得到第一子损失值。S1051. Obtain a first sub-loss value according to the first sub-objective function and the first preset weight, as well as the preset classification label and the predicted classification label.
在本发明的一些实施例中,适用于获取第一子损失值的场景。In some embodiments of the present invention, it is applicable to the scene of obtaining the first sub-loss value.
在本发明的一些实施例中,通过公式(1),对文本片段特性信息的预设分类标签和预测分类标签的第一子损失值进行计算,第一子损失值表示文本片段特征信息的判别度。In some embodiments of the present invention, formula (1) is used to calculate the first sub-loss value of the preset classification label and the predicted classification label of the text segment characteristic information, and the first sub-loss value represents the discrimination of the text segment characteristic information Spend.
可以理解的是,通过第一损失值可以判断文本片段特征信息对应的预设分类标签和预测分类标签之间的误差。It can be understood that the error between the preset classification label corresponding to the feature information of the text segment and the predicted classification label can be judged by the first loss value.
S1052、根据第二子目标函数、第二预设权重,以及第一预设权重、预设分类标签和预测分类标签,得到第二子损失值;其中,第一子目标函数和第二子目标函数均为预设目标函数。S1052. Obtain a second sub-loss value according to the second sub-objective function, the second preset weight, the first preset weight, the preset classification label, and the predicted classification label; wherein, the first sub-objective function and the second sub-objective The functions are all preset objective functions.
在本发明的一些实施例中,适用于获取第二子损失值的场景。In some embodiments of the present invention, it is applicable to the scene of obtaining the second sub-loss value.
在本发明的一些实施例中,通过式1-2,对文本片段特征信息的预设分类标签和预测分类标签的第二子损失值进行计算,第二子损失值表示文本片段特征信息中类内的聚合度。In some embodiments of the present invention, the second sub-loss value of the preset classification label and the predicted classification label of the text segment feature information is calculated by formula 1-2, and the second sub-loss value represents the category in the text segment feature information degree of polymerization within.
在本发明的一些实施例中,第二预设权重用于控制第二子损失值在损失值中的占比,第一预设权重用于控制负样本在第二子目标函数中的影响。In some embodiments of the present invention, the second preset weight is used to control the proportion of the second sub-loss value in the loss value, and the first preset weight is used to control the influence of negative samples in the second sub-objective function.
可以理解的是,通过第二损失值可以判断文本片段特征信息与类中心的距离。It can be understood that the distance between the feature information of the text segment and the class center can be judged by the second loss value.
S1053、基于第一子损失值和第二子损失值,得到损失值,并根据损失值确定目标参数。S1053. Obtain a loss value based on the first sub-loss value and the second sub-loss value, and determine a target parameter according to the loss value.
本发明实施例中,适用于根据损失值,结束训练的场景。In the embodiment of the present invention, it is applicable to the scenario where the training ends according to the loss value.
本发明实施例中,基于第一子损失值和第二子损失值,得到损失值,并根据损失值确 定名片信息抽取系统中的变量,即目标参数。In the embodiment of the present invention, based on the first sub-loss value and the second sub-loss value, the loss value is obtained, and the variable in the business card information extraction system is determined according to the loss value, that is, the target parameter.
本发明实施例中,当损失值保持不再减小的状态时,表示名片信息抽取系统完成训练,当前的名片信息抽取系统可以保证抽取到的结构化信息的准确率,即提高抽取效果。In the embodiment of the present invention, when the loss value keeps no longer decreasing, it means that the business card information extraction system has completed training, and the current business card information extraction system can ensure the accuracy of the extracted structured information, that is, improve the extraction effect.
可以理解的是,本发明实施例,在预设目标函数中加入权重参数,并在损失值的计算中引入Center Loss,不仅可以减少正负样本不均衡对损失值的影响,且提升了识别效果。It can be understood that in the embodiment of the present invention, weight parameters are added to the preset objective function, and Center Loss is introduced into the calculation of the loss value, which can not only reduce the impact of the imbalance of positive and negative samples on the loss value, but also improve the recognition effect .
在本发明的一些实施例中,根据损失值确定目标参数指当损失值保持不减少的状态时,确定当前预设BERT模型、预设卷积神经网络、预设循环神经网络以及分类器中的变量为目标参数。In some embodiments of the present invention, determining the target parameter according to the loss value refers to determining the current preset BERT model, preset convolutional neural network, preset cyclic neural network, and classifier when the loss value remains in a state of not decreasing. Variables are target parameters.
在本发明的一些实施例中,预设BERT模型、预设卷积神经网络、预设循环神经网络以及分类器会根据损失值,调整变量的数值,并对名片图像中获取的文本信息重复处理,以确定损失值最小时的变量数值作为目标参数。In some embodiments of the present invention, the preset BERT model, preset convolutional neural network, preset cyclic neural network, and classifier will adjust the value of the variable according to the loss value, and repeat the processing on the text information obtained in the business card image , to determine the variable value when the loss value is the smallest as the target parameter.
可以理解的是,通过本发明实施例提供的名片信息抽取系统训练方法,训练得到的名片信息抽取系统将提升识别效果。It can be understood that, through the business card information extraction system training method provided by the embodiment of the present invention, the trained business card information extraction system will improve the recognition effect.
在本发明的一些实施例中,当名片图像为模拟名片图像时,S101之前,本发明实施例提供的名片信息抽取系统训练方法还包括:In some embodiments of the present invention, when the business card image is a simulated business card image, before S101, the business card information extraction system training method provided in the embodiment of the present invention further includes:
S106、采集文本样本信息。S106. Collect text sample information.
在本发明的一些实施例中,适用于对名片信息进行抽取之前的样本采集场景。In some embodiments of the present invention, it is applicable to a scene of sample collection before extracting business card information.
在本发明的一些实施例中,在各平台爬取数据,从而采集文本样本信息。In some embodiments of the present invention, data is crawled on each platform, so as to collect text sample information.
在本发明的一些实施例中,针对名片的各个目标字段进行数据搜集,数据来源可以是平台,数据可以包含姓名、公司、地址、邮箱、网址、手机、电话、传真等公开的信息。对于个别目标字段,可以总结命名规范,根于预设规则,在已搜集到的数据的基础上进行构造,示例性的,可以通过“姓名拼音+邮箱域名”规则构造一些邮箱字段数据。对于英文字段,包括英文姓名、职位、公司、地址,可以使用翻译功能翻译得到。In some embodiments of the present invention, data is collected for each target field of the business card. The source of the data may be a platform, and the data may include public information such as name, company, address, email address, website, mobile phone, telephone number, and fax. For individual target fields, the naming convention can be summarized, based on the preset rules, and constructed on the basis of the collected data. For example, some mailbox field data can be constructed through the rule of "name pinyin + mailbox domain name". For English fields, including English name, position, company, and address, you can use the translation function to translate them.
可以理解的是,这样可以保证文本样本信息的数量,为训练提供数据支持。It is understandable that this can ensure the quantity of text sample information and provide data support for training.
S107、根据文本样本信息,以及预设生成式对抗网络、预设排版规则,得到模拟名片图像。S107. Obtain a simulated business card image according to the text sample information, the preset generative adversarial network, and the preset typesetting rules.
在本发明的一些实施例中,适用于构建模拟名片图像的场景。In some embodiments of the present invention, it is suitable for constructing a scene of simulating a business card image.
在本发明的一些实施例中,根据S106得到的文本样本信息,模仿文本样本信息的排版,生成模拟文本信息;基于文本样本信息以及模拟文本信息,通过预设生成式对抗网络,按照预设排版规则得到模拟名片图像。In some embodiments of the present invention, according to the text sample information obtained in S106, the typesetting of the text sample information is imitated to generate simulated text information; based on the text sample information and the simulated text information, through the preset generative confrontation network, according to the preset typesetting The rule gets simulated business card images.
在本发明的一些实施例中,预设排版规则通过基于现有名片的排版,对名片中的文本内容替换和文本位置顺序的调换得到。In some embodiments of the present invention, the preset typesetting rules are obtained by replacing the text content in the business card and exchanging the order of the text position based on the typesetting of the existing business card.
示例性的,文本样本信息如图6a所示,图6a中的黑色方框标注的是文本样本信息的文本排版;模拟文本信息是根据图6a所示的文本排版生成的模拟文本信息,如图6b所示,图6b中的黑色方框标注的是模拟文本信息的文本排版。Exemplarily, the text sample information is shown in Figure 6a, and the black box in Figure 6a marks the text layout of the text sample information; the simulated text information is simulated text information generated according to the text layout shown in Figure 6a, as shown in Figure 6a 6b, the black box in FIG. 6b marks the text layout of the simulated text information.
在本发明的一些实施例中,图7是本发明实施例提供的一种名片信息抽取方法的生成模拟名片图像示意图,预设生成式对抗网络包括图7中的生成器(generator)和判别器(discriminator)。生成器用于生成模拟名片图像,判别器用于识别由生成器生成的模拟名片图像。其中,生成器为encoder-decoder结构;而判别器由图像判别器(Image Discriminator)和文本匹配器(Text Matcher)构成,图像判别器用于判别生成器输出的模拟名片图像的样式、背景等视觉特征的真实度。文本匹配器的用于是判定模拟名片图像上的文本与输入至生成器中的真实文本的相似度,保证模拟名片图像上的文字信息已被替换成新输入的文本。在实际使用中,在生成器中输入真实的名片图像(real image with text_a),即真实的名片图像的名片文本为text_a,encoder对real image进行特征提取,得到real image对应的图像特征向量(Image Embedding),即隐藏编码(latent code);将待替换的文本text_b输入Text  Matcher中,得到text_b的文本特征向量(Text Embedding);将text_b对应的特征向量、real image的图像特征向量和添加的随机噪声z(Random Noise)输入至decoder,得到模拟名片图像(fake image with text_b),即模拟名片图像的名片文本为text_b;对模拟名片图像进行OCR识别,得到fake text_b;将真实的名片图像(real image with text_a)和模拟名片图像(fake image with text_b)输入到判别器中的图像判别器,对名片图像的真假(Real/Fake)进行区分,将fake text_b和text_b输入至判别器中的文本匹配器中,判断二者相同(Same)还是不同(Different),进而提高模拟名片图像的仿真度。可以理解的是,在实际使用中,通过生成器与判别器的配合,可以保证生成器生成的模拟名片图像的仿真度,而通过预设生成式对抗网络,以及真名片图像、文本样本信息,生成模拟名片图像,用于名片信息抽取系统的训练,达到了扩充数据集的目的,增加了数据的多样性。In some embodiments of the present invention, FIG. 7 is a schematic diagram of a simulated business card image generated by a business card information extraction method provided by an embodiment of the present invention. The preset generative confrontation network includes the generator (generator) and the discriminator in FIG. 7 (discriminator). The generator is used to generate simulated business card images, and the discriminator is used to recognize the simulated business card images generated by the generator. Among them, the generator is an encoder-decoder structure; and the discriminator is composed of an image discriminator (Image Discriminator) and a text matcher (Text Matcher), and the image discriminator is used to distinguish the visual features such as the style and background of the simulated business card image output by the generator authenticity. The purpose of the text matcher is to determine the similarity between the text on the simulated business card image and the real text input into the generator, so as to ensure that the text information on the simulated business card image has been replaced with the newly input text. In actual use, input a real business card image (real image with text_a) into the generator, that is, the business card text of the real business card image is text_a, and the encoder performs feature extraction on the real image to obtain the image feature vector corresponding to the real image (Image Embedding), that is, hidden code (latent code); input the text text_b to be replaced into Text Matcher to obtain the text feature vector (Text Embedding) of text_b; use the feature vector corresponding to text_b, the image feature vector of real image and the added random Noise z (Random Noise) is input to the decoder to obtain a simulated business card image (fake image with text_b), that is, the business card text of the simulated business card image is text_b; OCR recognition is performed on the simulated business card image to obtain fake text_b; the real business card image (real image with text_a) and the simulated business card image (fake image with text_b) are input to the image discriminator in the discriminator to distinguish the true and false (Real/Fake) of the business card image, and fake text_b and text_b are input to the text in the discriminator In the matcher, it is judged whether the two are the same (Same) or different (Different), so as to improve the simulation degree of the simulated business card image. It is understandable that, in actual use, through the cooperation of the generator and the discriminator, the simulation degree of the simulated business card image generated by the generator can be guaranteed, and through the preset generative confrontation network, as well as the real business card image and text sample information, A simulated business card image is generated for the training of the business card information extraction system, which achieves the purpose of expanding the data set and increases the diversity of the data.
图8是本发明实施例提供的一种名片信息抽取方法的流程图,如图8所示,适用于本发明实施例提供的名片信息抽取系统训练方法训练得到的名片信息抽取系统,包括:Fig. 8 is a flowchart of a method for extracting business card information provided by an embodiment of the present invention. As shown in Fig. 8, it is suitable for the business card information extraction system trained by the business card information extraction system training method provided by the embodiment of the present invention, including:
S201、输入名片图像至OCR文字识别模块(OCR模块)。S201. Input the business card image to the OCR character recognition module (OCR module).
在本发明的一些实施例中,在输入之前需要对名片图像进行预处理。图9是本发明实施例提供的一种名片图像的示意图,如图9所示,为S201输入至OCR文字识别模块的名片图像。In some embodiments of the invention, the business card image needs to be pre-processed before input. FIG. 9 is a schematic diagram of a business card image provided by an embodiment of the present invention. As shown in FIG. 9 , it is the business card image input to the OCR character recognition module in S201.
可以理解的是,这样可以保证后续抽取到的信息的准确度。It can be understood that this can ensure the accuracy of the subsequently extracted information.
S202、OCR文字识别模块将输入的名片图像中的文字提取出来,以文本格式输出得到文本信息。S202. The OCR text recognition module extracts text from the input business card image, and outputs text information in a text format.
在本发明的一些实施例中,基于图8,OCR文字识别模块对上述名片图像进行识别,得到的文本信息将如下所示:In some embodiments of the present invention, based on FIG. 8, the OCR text recognition module recognizes the above-mentioned business card image, and the text information obtained will be as follows:
Word:-阿X云|9X8 pos:284,46,440,84Word: -A X Yun|9X8 pos:284,46,440,84
Word:奥XXXXXX云服务商pos:290,80,435,103Word: Austrian XXXXXX cloud service provider pos: 290,80,435,103
Word:罗某某pos:29,137,162,170Word: Luo Moumou pos: 29,137,162,170
Word:131XXXX 1111pos:360,174,440,192Word: 131XXXX 1111pos: 360,174,440,192
Word:XXX@XXXXXXX-inc.com pos:295,186,441,209Word: XXX@XXXXXXX-inc.com pos: 295,186,441,209
Word:XX集团-XXXXX事业群pos:27,178,176,202Word: XX Group-XXXXX business group pos: 27,178,176,202
Word:中国,XX市XX区XX路1XX号pos:276,201,442,226Word: China, No. 1XX, XX Road, XX District, XX City pos:276,201,442,226
Word:XXXXX(XX)有限公司pos:27,195,176,221Word: XXXXX (XX) Co., Ltd. pos: 27,195,176,221
Word:XXXX中心X座X层pos:339,222,443,244Word: XXXX Center X Block X Floor pos: 339,222,443,244
Word:XXXXXX专家pos:25,212,119,233Word: XXXXXX expert pos: 25,212,119,233
Word:www.xxxxxx.com pos:356,242,444,260Word: www.xxxxxx.com pos: 356,242,444,260
可以理解的是,OCR文字识别模块不仅可以识别得到文本内容,还可以得到文本内容对应的文本位置信息。It can be understood that the OCR character recognition module can not only recognize the text content, but also obtain the text position information corresponding to the text content.
S203、NER命名实体识别模块(NER模块)对OCR文字识别模块输出的文本信息进行实体识别,得到对应的文本片段特征信息及对应的预测分类标签,并通过筛选预测分类标签,从而得到目标文本片段特征信息以及目标文本片段特征信息对应的预测分类标签。S203. The NER named entity recognition module (NER module) performs entity recognition on the text information output by the OCR text recognition module, obtains the corresponding text segment feature information and the corresponding predicted classification label, and obtains the target text segment by filtering the predicted classification label The feature information and the predicted classification label corresponding to the feature information of the target text segment.
在本发明的一些实施例中,如图2所示,NER命名实体识别模块对OCR文字识别模块输出的文本信息进行实体识别时,需要先根据S102对文本信息进行处理,示例性的,如图2中Word Embedding层所示,图中的网格为二维网格,二维网格中填充的信息即是根据S202的文本进行处理后得到的词向量,当三维卷积核对二维网格中的每个网格中的词向量加权平均后,即对上述二维网格进行局部特征捕获后,将得到特征向量,Bidirectional层中的x 1、x 2、x 3、x 4、x 5为输入至Bidirectional层中进行编码的特征向量,如:x 1表征二维网格中“阿”的词向量。Bidirectional层对上述特征向量进行组合编码后,将会得到文本片段特征信息,之后再通过Hidden层对上述文本片段特征信息中的隐含文本片段特征信息 进行转换后,将得到全部文本片段特征信息,即
Figure PCTCN2022129071-appb-000006
Figure PCTCN2022129071-appb-000007
其中,与分别对应正序表示的x 1和倒序表示的x 1。将上述文本片段特征信息输入至Span Representations层进行拼接后,将得到由文本片段特征信息组成的句子
Figure PCTCN2022129071-appb-000008
最后通过Fully-connected Layer层和Span Classifier层对文本片段特征信息进行分类,得到文本片段特征信息对应的预测分类标签。示例性的,基于S202得到的文本,文本片段特征信息以及文本片段特征信息对应的预测分类标签可以如下所示:
In some embodiments of the present invention, as shown in FIG. 2, when the NER named entity recognition module performs entity recognition on the text information output by the OCR character recognition module, it needs to process the text information according to S102. Exemplarily, as shown in FIG. As shown in the Word Embedding layer in 2, the grid in the figure is a two-dimensional grid, and the information filled in the two-dimensional grid is the word vector obtained after processing the text in S202. When the three-dimensional convolution kernel compares the two-dimensional grid After the weighted average of the word vectors in each grid in , that is, after the local feature capture of the above two-dimensional grid, the feature vector will be obtained, x 1 , x 2 , x 3 , x 4 , x 5 in the Bidirectional layer It is the feature vector input to the Bidirectional layer for encoding, such as: x 1 represents the word vector of "A" in the two-dimensional grid. After the Bidirectional layer combines and encodes the above feature vectors, the text segment feature information will be obtained, and then the hidden text segment feature information in the above text segment feature information is converted through the Hidden layer, and all the text segment feature information will be obtained. Right now
Figure PCTCN2022129071-appb-000006
Figure PCTCN2022129071-appb-000007
Wherein, and respectively correspond to x 1 expressed in forward order and x 1 expressed in reverse order. After inputting the above text segment feature information into the Span Representations layer for splicing, a sentence composed of text segment feature information will be obtained
Figure PCTCN2022129071-appb-000008
Finally, the feature information of the text segment is classified through the Fully-connected Layer and the Span Classifier layer, and the predicted classification label corresponding to the feature information of the text segment is obtained. Exemplarily, based on the text obtained in S202, the feature information of the text segment and the predicted classification label corresponding to the feature information of the text segment may be as follows:
<name>罗某某</name><name>Luo Moumou</name>
<department>XX集团-XXXXX事业群</department><department>XX Group-XXXXX Business Group</department>
<company>XXXXX(XX)有限公司</company><company>XXXXX (XX) Co., Ltd.</company>
<position>XXXXXX专家</position><position>XXXXXX expert</position>
-阿X云|9X8-A X cloud | 9X8
奥XXXXXX云服务商Austrian XXXXXX cloud service provider
<mobile>131 XXXX 1111</mobile><mobile>131 XXXX 1111</mobile>
<mail>XXX@XXXXXXX-inc.com</mail><mail>XXX@XXXXXXX-inc.com</mail>
<addr>中国,XX市XX区XX路1XX号<addr>No. 1XX, XX Road, XX District, XX City, China
XXXX中心X座X层</addr>XXXX Center X Block X Floor</addr>
<url>www.xxxxxx.com</url><url>www.xxxxxx.com</url>
其中,“罗某某”为文本片段特征信息,“<name></name>”为“罗某某”的预测分类标签,而“-阿X云|9X8”、“奥XXXXXX云服务商”为负样本。Among them, "Luo XX" is the feature information of the text segment, "<name></name>" is the predicted classification label of "Luo XX", and "-阿X云|9X8", "奥XXXXXX cloud service provider" is a negative sample.
在本发明的一些实施例中,根据预设分类标签对预测分类标签进行筛选,其中,可以根据需要设定进行实体识别的关键字段,即预设分类标签。示例性的,若预设分类标签为:姓名、部门、公司、职位、手机、邮箱、地址和网址。则对上述预测分类标签进行筛选后,得到的目标文本片段特征信息以及目标文本片段特征信息对应的预测分类标签将如下所示:In some embodiments of the present invention, the predicted classification tags are screened according to the preset classification tags, wherein the key field for entity identification, ie, the preset classification tags, can be set as required. Exemplarily, if the preset classification tags are: name, department, company, position, mobile phone, email, address and website. Then, after filtering the above predicted classification labels, the obtained target text fragment feature information and the predicted classification labels corresponding to the target text fragment feature information will be as follows:
<name>罗某某</name><name>Luo Moumou</name>
<department>XX集团-XXXXX事业群</department><department>XX Group-XXXXX Business Group</department>
<company>XXXXX(XX)有限公司</company><company>XXXXX (XX) Co., Ltd.</company>
<position>XXXXXX专家</position><position>XXXXXX expert</position>
<mobile>131 XXXX 1111</mobile><mobile>131 XXXX 1111</mobile>
<mail>XXX@XXXXXXX-inc.com</mail><mail>XXX@XXXXXXX-inc.com</mail>
<addr>中国,XX市XX区XX路1XX号<addr>No. 1XX, XX Road, XX District, XX City, China
XXXX中心X座X层</addr>XXXX Center X Block X Floor</addr>
<url>www.xxxxxx.com</url><url>www.xxxxxx.com</url>
可以理解的是,通过NER命名实体识别模块可以获取到目标文本片段特征信息。It can be understood that the feature information of the target text segment can be obtained through the NER named entity recognition module.
S204、输出模块对目标文本片段特征信息以及目标文本片段特征信息对应的预测分类标签进行后续处理,输出最终的结构化信息。S204. The output module performs subsequent processing on the feature information of the target text segment and the predicted classification labels corresponding to the feature information of the target text segment, and outputs final structured information.
在本发明实施例中,输出模块用于在目标文本片段特征信息中将目标字段提取出来,并结合目标文本片段特征信息对应的预测分类标签,得到最终的目标结构化信息。其中,后续处理包括:剔除空白符、无效字符等。示例性的,基于S203得到的目标文本片段特征信息以及目标文本片段特征信息对应的预测分类标签,输出的结构化信息将如下:In the embodiment of the present invention, the output module is used to extract the target field from the feature information of the target text segment, and combine the predicted classification labels corresponding to the feature information of the target text segment to obtain the final target structured information. Wherein, the subsequent processing includes: removing blank characters, invalid characters, and the like. Exemplarily, based on the feature information of the target text segment obtained in S203 and the predicted classification label corresponding to the feature information of the target text segment, the output structured information will be as follows:
姓名:罗某某Name: Luo Moumou
部门:XX集团-XXXXX事业群Department: XX Group-XXXXX Business Group
公司:XXXXX(XX)有限公司Company: XXXXX (XX) Co., Ltd.
职位:XXXXXX专家Position: XXXXXX Expert
手机:131 XXXX 1111Mobile: 131 XXXX 1111
邮箱:XXX@XXXXXXX-inc.comEmail: XXX@XXXXXXX-inc.com
地址:中国,XX市XX区XX路1XX号XXXX中心X座X层Address: Floor X, Block X, XXXX Center, No. 1XX, XX Road, XX District, XX City, China
网址:www.xxxxxx.comURL: www.xxxxxx.com
其中,“姓名”对应于S203中的“<name></name>”为预测分类标签,即文本类别;“罗某某”为目标文本片段特征信息,即文本内容。Among them, "name" corresponds to "<name></name>" in S203, which is the predicted classification label, that is, the text category; "Luo XX" is the feature information of the target text segment, that is, the text content.
可以理解的是,输出模块将完成目标文本片段特征信息以及目标文本片段特征信息对应的预测分类标签的整理,得到结构化信息。It can be understood that the output module will complete the sorting of the feature information of the target text segment and the predicted classification labels corresponding to the feature information of the target text segment to obtain structured information.
图10是本发明实施例提供的一种名片信息抽取系统训练装置的架构图一,如图10所示,本发明实施例提供一种名片信息抽取系统训练装置3,适用于本发明实施例提供的名片信息抽取系统训练方法,包括得到部分31和确定部分32;其中,Fig. 10 is a structure diagram 1 of a business card information extraction system training device provided by the embodiment of the present invention. As shown in Fig. 10, the embodiment of the present invention provides a business card information extraction system training device 3, which is suitable for providing The training method of the business card information extraction system includes obtaining a part 31 and a determining part 32; wherein,
所述得到部分31,被配置为对名片图像进行识别,得到文本信息;基于预设BERT模型、预设卷积神经网络,以及所述文本信息,得到特征向量;其中,所述特征向量表征所述文本信息中词汇的语义信息;基于预设循环神经网络,对所述特征向量进行编码,得到对应的文本片段特征信息;其中,所述文本片段特征信息表征不同组合的文本内容;利用分类器对所述文本片段特征信息进行判别,从而得到所述文本片段特征信息对应的预测分类标签;其中,所述预测分类标签表征所述文本片段特征信息的文本类型,所述预测分类标签为得到结构化信息的依据;基于预设目标函数,以及所述文本片段特征信息对应的所述预测分类标签和预设分类标签,得到损失值,并根据所述损失值确定目标参数。The obtaining part 31 is configured to identify the business card image to obtain text information; based on a preset BERT model, a preset convolutional neural network, and the text information, obtain a feature vector; wherein, the feature vector represents the Semantic information of vocabulary in the text information; Based on the preset recurrent neural network, the feature vector is encoded to obtain the corresponding text segment feature information; wherein, the text segment feature information represents text content of different combinations; using a classifier Discriminating the feature information of the text segment, so as to obtain the predicted classification label corresponding to the feature information of the text segment; wherein, the predicted classification label represents the text type of the characteristic information of the text segment, and the predicted classification label is the obtained structure Based on the basis of the culturalization information; based on the preset objective function, and the predicted classification label and the preset classification label corresponding to the text segment feature information, a loss value is obtained, and an objective parameter is determined according to the loss value.
所述确定部分32,被配置为根据所述损失值确定目标参数;其中,所述目标参数为所述预设BERT模型、所述预设卷积神经网络、所述预设循环神经网络以及所述分类器中的变量,所述目标参数表征被配置为抽取名片信息的系统。The determining part 32 is configured to determine a target parameter according to the loss value; wherein, the target parameter is the preset BERT model, the preset convolutional neural network, the preset cyclic neural network, and the preset variables in the classifier, the target parameter characterizes a system configured to extract business card information.
在本发明的一些实施例中,所述得到部分31,还被配置为根据第一子目标函数和第一预设权重,以及所述实际分类标签和所述预测分类标签,得到第一子损失值;及根据第二子目标函数、第二预设权重,以及所述第一预设权重、所述预设分类标签和所述预测分类标签,得到第二子损失值;其中,所述第一子目标函数和所述第二子目标函数均为所述预设目标函数;以及基于所述第一子损失值和第二子损失值,得到所述损失值,并根据所述损失值确定所述目标参数。In some embodiments of the present invention, the obtaining part 31 is further configured to obtain the first sub-loss according to the first sub-objective function and the first preset weight, as well as the actual classification label and the predicted classification label value; and according to the second sub-objective function, the second preset weight, and the first preset weight, the preset classification label and the predicted classification label, the second sub-loss value is obtained; wherein, the first Both a sub-objective function and the second sub-objective function are the preset objective functions; and the loss value is obtained based on the first sub-loss value and the second sub-loss value, and determined according to the loss value The target parameter.
在本发明的一些实施例中,所述确定部分32,还被配置为当所述损失值保持不减少的状态时,确定当前所述预设BERT模型、所述预设卷积神经网络、所述预设循环神经网络以及所述分类器中的变量为所述目标参数。In some embodiments of the present invention, the determining part 32 is further configured to determine the current preset BERT model, the preset convolutional neural network, the The variables in the preset cyclic neural network and the classifier are the target parameters.
在本发明的一些实施例中,所述得到部分31,还被配置为基于所述预设BERT模型,对文本内容信息进行转化,得到词向量序列;其中,所述文本信息包括文本内容信息和文本位置信息;及根据所述文本行位置信息,将所述词向量序列填充至预设二维网格中,从而得到目标二维网格;以及根据所述目标二维网格,以及所述预设卷积神经网络,从而得到所述特征向量。In some embodiments of the present invention, the obtaining part 31 is further configured to convert the text content information based on the preset BERT model to obtain a sequence of word vectors; wherein the text information includes text content information and text position information; and filling the word vector sequence into a preset two-dimensional grid according to the text line position information, thereby obtaining a target two-dimensional grid; and according to the target two-dimensional grid, and the A convolutional neural network is preset to obtain the feature vector.
在本发明的一些实施例中,所述得到部分31,还被配置为根据所述目标二维网格,以及所述预设卷积神经网络中的三维卷积核,从而得到所述特征向量。In some embodiments of the present invention, the obtaining part 31 is further configured to obtain the feature vector according to the target two-dimensional grid and the three-dimensional convolution kernel in the preset convolutional neural network .
在本发明的一些实施例中,所述得到部分31,还被配置为根据所述预设卷积神经网络中的所述三维卷积核,抽取所述目标二维网格中的特征,从而得到所述特征向量。In some embodiments of the present invention, the obtaining part 31 is further configured to extract features in the target two-dimensional grid according to the three-dimensional convolution kernel in the preset convolutional neural network, so that Get the eigenvectors.
在本发明的一些实施例中,所述装置还包括采集部分33,所述采集部分33被配置为当所述名片图像为所述模拟名片图像时,采集文本样本信息;In some embodiments of the present invention, the device further includes a collection part 33 configured to collect text sample information when the business card image is the simulated business card image;
所述得到部分31,还被配置为根据所述文本样本信息,以及预设生成式对抗网络、预设排版规则,得到所述名片图像。The obtaining part 31 is further configured to obtain the business card image according to the text sample information, a preset generative confrontation network, and preset typesetting rules.
图11是本发明实施例提供的一种名片信息抽取系统训练装置的架构图二,如图11所 示,本发明实施例提供一种名片信息抽取系统训练装置,并对应一种应用于名片信息抽取系统训练装置的名片信息抽取系统训练,名片信息抽取系统训练装置4包括处理器401、存储器402以及通信总线404,存储器402通过通信总线404与处理器401进行通信,存储器402存储所述处理器401可执行的一个或者多个程序,当所述一个或者多个程序被执行时,所述处理器401执行如本发明实施例的名片信息抽取系统训练方法,具体的,名片信息抽取系统训练装置4还包括用于进行数据传输的通信组件403,其中,处理器401至少设有一个。Figure 11 is the second architecture diagram of a business card information extraction system training device provided by the embodiment of the present invention. As shown in Figure 11, the embodiment of the present invention provides a business card information extraction system training device, and corresponds to a business card information The business card information extraction system training of the extraction system training device, the business card information extraction system training device 4 includes a processor 401, a memory 402 and a communication bus 404, the memory 402 communicates with the processor 401 through the communication bus 404, and the memory 402 stores the processor 401 one or more executable programs, when the one or more programs are executed, the processor 401 executes the business card information extraction system training method according to the embodiment of the present invention, specifically, the business card information extraction system training device 4 also includes a communication component 403 for data transmission, wherein at least one processor 401 is provided.
本发明实施例中,名片信息抽取系统训练装置4中的各个组件通过总线404耦合在一起,通过总线404用于实现这些组件之间的连接通信。总线404除包括数据总线之外,还包括电源总线、控制总线和状态信号总线。但是为了清楚说明起见,在图9中将各种总线都标为通过总线404。In the embodiment of the present invention, various components in the training device 4 of the business card information extraction system are coupled together through the bus 404, and the bus 404 is used to realize connection and communication between these components. In addition to the data bus, the bus 404 also includes a power bus, a control bus and a status signal bus. However, the various buses are labeled as pass bus 404 in FIG. 9 for clarity of illustration.
本发明实施例提供一种计算机可读存储介质,所述计算机可读存储介质存储有可执行指令,当所述可执行指令被执行时,用于引起处理器401执行如上任一实施例所述的名片信息抽取系统训练方法。An embodiment of the present invention provides a computer-readable storage medium, the computer-readable storage medium stores executable instructions, and when the executable instructions are executed, the processor 401 is used to cause the processor 401 to perform the operation described in any one of the above embodiments. The training method of business card information extraction system.
本领域内的技术人员应明白,本发明实施例可提供为方法、系统、或计算机程序产品。因此,本发明可采用硬件实施例、软件实施例、或结合软件和硬件方面的实施例的形式。而且,本发明可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器和光学存储器等)上实施的计算机程序产品的形式。Those skilled in the art should understand that the embodiments of the present invention may be provided as methods, systems, or computer program products. Accordingly, the present invention can take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including but not limited to disk storage, optical storage, etc.) having computer-usable program code embodied therein.
本发明是参照根据本发明实施例的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It should be understood that each procedure and/or block in the flowchart and/or block diagram, and a combination of procedures and/or blocks in the flowchart and/or block diagram can be realized by computer program instructions. These computer program instructions may be provided to a general purpose computer, special purpose computer, embedded processor, or processor of other programmable data processing equipment to produce a machine such that the instructions executed by the processor of the computer or other programmable data processing equipment produce a An apparatus for realizing the functions specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。These computer program instructions may also be stored in a computer-readable memory capable of directing a computer or other programmable data processing apparatus to operate in a specific manner, such that the instructions stored in the computer-readable memory produce an article of manufacture comprising instruction means, the instructions The device realizes the function specified in one or more procedures of the flowchart and/or one or more blocks of the block diagram.
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。These computer program instructions can also be loaded onto a computer or other programmable data processing device, causing a series of operational steps to be performed on the computer or other programmable device to produce a computer-implemented process, thereby The instructions provide steps for implementing the functions specified in the flow chart or blocks of the flowchart and/or the block or blocks of the block diagrams.
以上所述,仅为本发明的较佳实施例而已,并非用于限定本发明的保护范围。The above descriptions are only preferred embodiments of the present invention, and are not intended to limit the protection scope of the present invention.
工业实用性Industrial Applicability
本发明实施例公开了一种名片信息抽取系统训练方法及装置、存储介质,该方法包括通过对名片图像进行识别,得到文本信息,之后通过预设BERT模型、预设卷积神经网络进行训练对文本信息进行处理,得到特征向量,再对特征向量进行组合编码,以得到对应的文本片段特征信息,最后利用分类器对文本片段特征信息进行判别,得到文本片段特征信息对应的预测分类标签,通过预设目标函数,使得文本片段特征信息对应的预测分类标签与预设分类标签的损失值达到要求,从而完成对名片信息抽取系统的训练,而通过筛选预测分类标签将得到结构化信息。本发明实施例能够提高系统对名片进行信息抽取时的效果,从而减少在对名片进行信息抽取时抽取到的结构化信息的误差。The embodiment of the present invention discloses a business card information extraction system training method, device, and storage medium. The method includes identifying the business card image to obtain text information, and then training the pair through a preset BERT model and a preset convolutional neural network. The text information is processed to obtain the feature vector, and then the feature vector is combined and encoded to obtain the corresponding text segment feature information. Finally, the classifier is used to discriminate the text segment feature information to obtain the predicted classification label corresponding to the text segment feature information. The preset objective function makes the loss value of the predicted classification label corresponding to the feature information of the text segment and the preset classification label meet the requirements, so as to complete the training of the business card information extraction system, and the structured information will be obtained by filtering the predicted classification label. The embodiment of the present invention can improve the effect of the information extraction of the business card by the system, thereby reducing the error of the extracted structured information when the information is extracted from the business card.

Claims (16)

  1. 一种名片信息抽取系统训练方法,包括:A business card information extraction system training method, comprising:
    对名片图像进行识别,得到文本信息;其中,所述名片图像为以下至少一种:真实名片图像或模拟名片图像;Recognizing the business card image to obtain text information; wherein, the business card image is at least one of the following: a real business card image or a simulated business card image;
    基于预设BERT模型、预设卷积神经网络,以及所述文本信息,得到特征向量;其中,所述特征向量表征所述文本信息中词汇的语义信息;A feature vector is obtained based on a preset BERT model, a preset convolutional neural network, and the text information; wherein, the feature vector represents semantic information of vocabulary in the text information;
    基于预设循环神经网络,对所述特征向量进行组合编码,得到对应的文本片段特征信息;其中,所述文本片段特征信息表征不同组合的文本内容;Based on the preset cyclic neural network, the feature vectors are combined and encoded to obtain corresponding text segment feature information; wherein the text segment feature information represents text content of different combinations;
    利用分类器对所述文本片段特征信息进行判别,从而得到所述文本片段特征信息对应的预测分类标签;其中,所述预测分类标签表征所述文本片段特征信息的文本类型,所述预测分类标签为得到结构化信息的依据;Use a classifier to discriminate the text segment feature information, so as to obtain the predicted classification label corresponding to the text segment feature information; wherein, the predicted classification label represents the text type of the text segment feature information, and the predicted classification label To obtain the basis for structured information;
    基于预设目标函数,以及所述文本片段特征信息对应的所述预测分类标签和预设分类标签,得到损失值,并根据所述损失值确定目标参数;其中,所述目标参数为所述预设BERT模型、所述预设卷积神经网络、所述预设循环神经网络以及所述分类器中的变量,所述目标参数表征用于抽取名片信息的系统。Based on the preset objective function, and the predicted classification label and preset classification label corresponding to the feature information of the text segment, a loss value is obtained, and a target parameter is determined according to the loss value; wherein, the target parameter is the predicted Assume variables in the BERT model, the preset convolutional neural network, the preset cyclic neural network, and the classifier, and the target parameter represents a system for extracting business card information.
  2. 根据权利要求1所述的方法,其中,所述基于预设目标函数,以及所述文本片段特征信息对应的所述预测分类标签和预设分类标签,得到损失值,并根据所述损失值确定目标参数,包括:The method according to claim 1, wherein the loss value is obtained based on the preset objective function, and the predicted classification label and the preset classification label corresponding to the feature information of the text segment, and is determined according to the loss value Target parameters, including:
    根据第一子目标函数和第一预设权重,以及所述预设分类标签和所述预测分类标签,得到第一子损失值;Obtain a first sub-loss value according to the first sub-objective function and the first preset weight, as well as the preset classification label and the predicted classification label;
    根据第二子目标函数、第二预设权重,以及所述第一预设权重、所述预设分类标签和所述预测分类标签,得到第二子损失值;其中,所述第一子目标函数和所述第二子目标函数均为所述预设目标函数;According to the second sub-objective function, the second preset weight, and the first preset weight, the preset classification label and the predicted classification label, the second sub-loss value is obtained; wherein, the first sub-objective function and the second sub-objective function are both the preset objective function;
    基于所述第一子损失值和第二子损失值,得到所述损失值,并根据所述损失值确定所述目标参数。The loss value is obtained based on the first sub-loss value and the second sub-loss value, and the target parameter is determined according to the loss value.
  3. 根据权利要求1或2中任一项所述的方法,其中,所述根据所述损失值确定目标参数,包括:The method according to any one of claims 1 or 2, wherein said determining a target parameter according to said loss value comprises:
    当所述损失值保持不减少的状态时,确定当前所述预设BERT模型、所述预设卷积神经网络、所述预设循环神经网络以及所述分类器中的变量为所述目标参数。When the loss value remains in a state of not decreasing, determine the variables in the current preset BERT model, the preset convolutional neural network, the preset cyclic neural network, and the classifier as the target parameters .
  4. 根据权利要求1所述的方法,其中,所述基于预设BERT模型、预设卷积神经网络,以及所述文本信息,得到特征向量,包括:The method according to claim 1, wherein, the feature vector is obtained based on the preset BERT model, the preset convolutional neural network, and the text information, including:
    基于所述预设BERT模型,对文本内容信息进行转化,得到词向量序列;其中,所述文本信息包括文本内容信息和文本位置信息;Based on the preset BERT model, the text content information is converted to obtain a word vector sequence; wherein the text information includes text content information and text position information;
    根据所述文本行位置信息,将所述词向量序列填充至预设二维网格中,从而得到目标二维网格;Filling the word vector sequence into a preset two-dimensional grid according to the position information of the text line, so as to obtain a target two-dimensional grid;
    根据所述目标二维网格,以及所述预设卷积神经网络,从而得到所述特征向量。The feature vector is obtained according to the target two-dimensional grid and the preset convolutional neural network.
  5. 根据权利要求4所述的方法,其中,所述根据所述目标二维网格,以及所述预设卷积神经网络,从而得到所述特征向量,包括:The method according to claim 4, wherein said obtaining said feature vector according to said target two-dimensional grid and said preset convolutional neural network comprises:
    根据所述目标二维网格,以及所述预设卷积神经网络中的三维卷积核,从而得到所述特征向量。The feature vector is obtained according to the target two-dimensional grid and the three-dimensional convolution kernel in the preset convolutional neural network.
  6. 根据权利要求5所述的方法,其中,所述根据所述目标二维网格,以及所述预设卷积神经网络中的三维卷积核,从而得到所述特征向量,包括:The method according to claim 5, wherein the feature vector is obtained according to the target two-dimensional grid and the three-dimensional convolution kernel in the preset convolutional neural network, comprising:
    根据所述预设卷积神经网络中的所述三维卷积核,抽取所述目标二维网格中的特征, 从而得到所述特征向量。Extract features in the target two-dimensional grid according to the three-dimensional convolution kernel in the preset convolutional neural network, so as to obtain the feature vector.
  7. 根据权利要求1所述的方法,其中,当所述名片图像为所述模拟名片图像时,所述对名片图像进行识别,得到文本信息之前,所述方法还包括:The method according to claim 1, wherein, when the business card image is the simulated business card image, before identifying the business card image and obtaining text information, the method further comprises:
    采集文本样本信息;Collect text sample information;
    根据所述文本样本信息,以及预设生成式对抗网络、预设排版规则,得到所述模拟名片图像。The simulated business card image is obtained according to the text sample information, a preset generative confrontation network, and preset typesetting rules.
  8. 一种名片信息抽取系统训练装置,包括得到部分和确定部分;其中,A training device for a business card information extraction system, including an obtaining part and a determining part; wherein,
    所述得到部分,被配置为对名片图像进行识别,得到文本信息;基于预设BERT模型、预设卷积神经网络,以及所述文本信息,得到特征向量;其中,所述特征向量表征所述文本信息中词汇的语义信息;基于预设循环神经网络,对所述特征向量进行编码,得到对应的文本片段特征信息;其中,所述文本片段特征信息表征不同组合的文本内容;利用分类器对所述文本片段特征信息进行判别,从而得到所述文本片段特征信息对应的预测分类标签;其中,所述预测分类标签表征所述文本片段特征信息的文本类型,所述预测分类标签为得到结构化信息的依据;基于预设目标函数,以及所述文本片段特征信息对应的所述预测分类标签和预设分类标签,得到损失值,并根据所述损失值确定目标参数;The obtaining part is configured to identify the business card image to obtain text information; based on a preset BERT model, a preset convolutional neural network, and the text information, obtain a feature vector; wherein the feature vector represents the Semantic information of vocabulary in the text information; based on the preset recurrent neural network, the feature vector is encoded to obtain the corresponding text segment feature information; wherein, the text segment feature information represents different combinations of text content; using a classifier to The feature information of the text segment is discriminated, so as to obtain the predicted classification label corresponding to the feature information of the text segment; wherein, the predicted classification label represents the text type of the feature information of the text segment, and the predicted classification label is to obtain a structured Information basis; based on a preset objective function, and the predicted classification label and preset classification label corresponding to the text segment feature information, a loss value is obtained, and a target parameter is determined according to the loss value;
    所述确定部分,被配置为根据所述损失值确定目标参数;其中,所述目标参数为所述预设BERT模型、所述预设卷积神经网络、所述预设循环神经网络以及所述分类器中的变量,所述目标参数表征用于抽取名片信息的系统。The determining part is configured to determine a target parameter according to the loss value; wherein, the target parameter is the preset BERT model, the preset convolutional neural network, the preset cyclic neural network, and the preset Variables in the classifier, the target parameters characterize the system used to extract business card information.
  9. 根据权利要求8所述的名片信息抽取系统训练装置,其中,The business card information extraction system training device according to claim 8, wherein,
    所述得到部分,还被配置为根据第一子目标函数和第一预设权重,以及所述预设分类标签和所述预测分类标签,得到第一子损失值;根据第二子目标函数、第二预设权重,以及所述第一预设权重、所述预设分类标签和所述预测分类标签,得到第二子损失值;其中,所述第一子目标函数和所述第二子目标函数均为所述预设目标函数;The obtaining part is further configured to obtain a first sub-loss value according to the first sub-objective function and the first preset weight, as well as the preset classification label and the predicted classification label; according to the second sub-objective function, The second preset weight, as well as the first preset weight, the preset classification label, and the predicted classification label, obtain a second sub-loss value; wherein, the first sub-objective function and the second sub-objective function The objective functions are all the preset objective functions;
    所述确定部分,还被配置为基于所述第一子损失值和第二子损失值,得到所述损失值,并根据所述损失值确定所述目标参数。The determining part is further configured to obtain the loss value based on the first sub-loss value and the second sub-loss value, and determine the target parameter according to the loss value.
  10. 根据权利要求8或9所述的名片信息抽取系统训练装置,其中,The business card information extraction system training device according to claim 8 or 9, wherein,
    所述确定部分,还被配置为当所述损失值保持不减少的状态时,确定当前所述预设BERT模型、所述预设卷积神经网络、所述预设循环神经网络以及所述分类器中的变量为所述目标参数。The determination part is further configured to determine the current preset BERT model, the preset convolutional neural network, the preset cyclic neural network, and the classification when the loss value remains in a state of not decreasing. The variable in the device is the target parameter.
  11. 根据权利要求8所述的名片信息抽取系统训练装置,其中,The business card information extraction system training device according to claim 8, wherein,
    所述得到部分,还被配置为基于所述预设BERT模型,对文本内容信息进行转化,得到词向量序列;其中,所述文本信息包括文本内容信息和文本位置信息;根据所述文本行位置信息,将所述词向量序列填充至预设二维网格中,从而得到目标二维网格;根据所述目标二维网格,以及所述预设卷积神经网络,从而得到所述特征向量。The obtaining part is also configured to convert the text content information based on the preset BERT model to obtain a word vector sequence; wherein the text information includes text content information and text position information; according to the text line position Information, filling the word vector sequence into the preset two-dimensional grid, so as to obtain the target two-dimensional grid; according to the target two-dimensional grid, and the preset convolutional neural network, thereby obtaining the feature vector.
  12. 根据权利要求11所述的名片信息抽取系统训练装置,其中,The business card information extraction system training device according to claim 11, wherein,
    所述得到部分,还被配置为根据所述目标二维网格,以及所述预设卷积神经网络中的三维卷积核,从而得到所述特征向量。The obtaining part is further configured to obtain the feature vector according to the target two-dimensional grid and the three-dimensional convolution kernel in the preset convolutional neural network.
  13. 根据权利要求12所述的名片信息抽取系统训练装置,其中,The business card information extraction system training device according to claim 12, wherein,
    所述得到部分,还被配置为根据所述预设卷积神经网络中的所述三维卷积核,抽取所述目标二维网格中的特征,从而得到所述特征向量。The obtaining part is further configured to extract features in the target two-dimensional grid according to the three-dimensional convolution kernel in the preset convolutional neural network, so as to obtain the feature vector.
  14. 根据权利要求8所述的名片信息抽取系统训练装置,所述名片信息抽取系统训练装置还包括采集部分,其中,According to the business card information extraction system training device according to claim 8, said business card information extraction system training device also includes a collection part, wherein,
    所述采集部分,被配置为采集文本样本信息;The collection part is configured to collect text sample information;
    所述得到部分,还被配置为根据所述文本样本信息,以及预设生成式对抗网络、预设排版规则,得到所述模拟名片图像。The obtaining part is further configured to obtain the simulated business card image according to the text sample information, a preset generative confrontation network, and preset typesetting rules.
  15. 一种名片信息抽取系统训练装置,包括:A business card information extraction system training device, comprising:
    存储器,用于存储可执行数据指令;a memory for storing executable data instructions;
    处理器,用于执行所述存储器中存储的可执行指令时,实现权利要求1至7任一项所述的名片信息抽取系统训练方法。The processor is configured to implement the training method of the business card information extraction system according to any one of claims 1 to 7 when executing the executable instructions stored in the memory.
  16. 一种计算可读机可读存储介质,存储有可执行指令,用于引起处理器执行时,实现权利要求1至7任一项所述的名片信息抽取系统训练方法。A computer-readable and machine-readable storage medium, storing executable instructions, used to cause a processor to implement the training method of the business card information extraction system described in any one of claims 1 to 7.
PCT/CN2022/129071 2021-11-03 2022-11-01 Method and apparatus for training business card information extraction system, and computer-readable storage medium WO2023078264A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111296307.7 2021-11-03
CN202111296307.7A CN116090463A (en) 2021-11-03 2021-11-03 Business card information extraction system training method and device and storage medium

Publications (1)

Publication Number Publication Date
WO2023078264A1 true WO2023078264A1 (en) 2023-05-11

Family

ID=86197758

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/129071 WO2023078264A1 (en) 2021-11-03 2022-11-01 Method and apparatus for training business card information extraction system, and computer-readable storage medium

Country Status (2)

Country Link
CN (1) CN116090463A (en)
WO (1) WO2023078264A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116469111B (en) * 2023-06-08 2023-09-15 江西师范大学 Character generation model training method and target character generation method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108304911A (en) * 2018-01-09 2018-07-20 中国科学院自动化研究所 Knowledge Extraction Method and system based on Memory Neural Networks and equipment
CN109918671A (en) * 2019-03-12 2019-06-21 西南交通大学 Electronic health record entity relation extraction method based on convolution loop neural network
WO2019174130A1 (en) * 2018-03-14 2019-09-19 平安科技(深圳)有限公司 Bill recognition method, server, and computer readable storage medium
CN110413785A (en) * 2019-07-25 2019-11-05 淮阴工学院 A kind of Automatic document classification method based on BERT and Fusion Features
CN113051914A (en) * 2021-04-09 2021-06-29 淮阴工学院 Enterprise hidden label extraction method and device based on multi-feature dynamic portrait

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108304911A (en) * 2018-01-09 2018-07-20 中国科学院自动化研究所 Knowledge Extraction Method and system based on Memory Neural Networks and equipment
WO2019174130A1 (en) * 2018-03-14 2019-09-19 平安科技(深圳)有限公司 Bill recognition method, server, and computer readable storage medium
CN109918671A (en) * 2019-03-12 2019-06-21 西南交通大学 Electronic health record entity relation extraction method based on convolution loop neural network
CN110413785A (en) * 2019-07-25 2019-11-05 淮阴工学院 A kind of Automatic document classification method based on BERT and Fusion Features
CN113051914A (en) * 2021-04-09 2021-06-29 淮阴工学院 Enterprise hidden label extraction method and device based on multi-feature dynamic portrait

Also Published As

Publication number Publication date
CN116090463A (en) 2023-05-09

Similar Documents

Publication Publication Date Title
CN111476067B (en) Character recognition method and device for image, electronic equipment and readable storage medium
CN111160343B (en) Off-line mathematical formula symbol identification method based on Self-Attention
CN114841972B (en) Transmission line defect identification method based on saliency map and semantic embedded feature pyramid
CN110598800A (en) Garbage classification and identification method based on artificial intelligence
CN112541501A (en) Scene character recognition method based on visual language modeling network
CN111428593A (en) Character recognition method and device, electronic equipment and storage medium
CN116245513B (en) Automatic operation and maintenance system and method based on rule base
CN113806548A (en) Petition factor extraction method and system based on deep learning model
WO2023078264A1 (en) Method and apparatus for training business card information extraction system, and computer-readable storage medium
CN113177435A (en) Test paper analysis method and device, storage medium and electronic equipment
CN113723330A (en) Method and system for understanding chart document information
CN114218391A (en) Sensitive information identification method based on deep learning technology
CN116304042A (en) False news detection method based on multi-modal feature self-adaptive fusion
CN112667813A (en) Method for identifying sensitive identity information of referee document
CN115953788A (en) Green financial attribute intelligent identification method and system based on OCR (optical character recognition) and NLP (non-line-segment) technologies
CN117496542B (en) Document information extraction method, device, electronic equipment and storage medium
CN117558011B (en) Image text tampering detection method based on self-consistency matrix and multi-scale loss
CN112597925B (en) Handwriting recognition/extraction and erasure method, handwriting recognition/extraction and erasure system and electronic equipment
CN114372532A (en) Method, device, equipment, medium and product for determining label marking quality
CN117709317A (en) Report file processing method and device and electronic equipment
CN111523301B (en) Contract document compliance checking method and device
CN113554021A (en) Intelligent seal identification method
CN112966676A (en) Document key information extraction method based on zero sample learning
CN110032716B (en) Character encoding method and device, readable storage medium and electronic equipment
CN116524520A (en) Text recognition method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22889287

Country of ref document: EP

Kind code of ref document: A1