CN118262363A - Watermark identification method, service spare part verification method, device and storage medium - Google Patents

Watermark identification method, service spare part verification method, device and storage medium Download PDF

Info

Publication number
CN118262363A
CN118262363A CN202410517321.2A CN202410517321A CN118262363A CN 118262363 A CN118262363 A CN 118262363A CN 202410517321 A CN202410517321 A CN 202410517321A CN 118262363 A CN118262363 A CN 118262363A
Authority
CN
China
Prior art keywords
certificate
information
standard
character
watermark
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202410517321.2A
Other languages
Chinese (zh)
Inventor
刘云龙
李宝
高超
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Seashell Housing Beijing Technology Co Ltd
Original Assignee
Seashell Housing Beijing Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Seashell Housing Beijing Technology Co Ltd filed Critical Seashell Housing Beijing Technology Co Ltd
Priority to CN202410517321.2A priority Critical patent/CN118262363A/en
Publication of CN118262363A publication Critical patent/CN118262363A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/14Image acquisition
    • G06V30/148Segmentation of character regions
    • G06V30/153Segmentation of character regions using recognition of characters or words
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/191Design or setup of recognition systems or techniques; Extraction of features in feature space; Clustering techniques; Blind source separation
    • G06V30/19173Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/42Document-oriented image-based pattern recognition based on the type of document
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Character Input (AREA)

Abstract

The embodiment of the disclosure discloses a watermark identification method, a verification method, equipment and a storage medium of service spare parts, wherein the method comprises the following steps: based on standard certificate specifications corresponding to the target certificate, carrying out preset field information identification on a certificate image containing the target certificate to obtain standard field information in the certificate image, wherein the preset field information is field information which is specified by standard certificate specifications and is required to be included in the target certificate, and the standard certificate specifications indicate character specifications and position specifications of preset fields required to be included in the target certificate; performing character recognition on the certificate image to obtain a character recognition result; filtering the character recognition result based on the standard field information to obtain candidate character information except the standard field information in the character recognition result; and carrying out information identification on the candidate character information to obtain watermark information. According to the embodiment of the disclosure, the watermark can be efficiently separated from the original characters of the certificate, so that the watermark information contained in the certificate image is obtained, and efficient and accurate automatic watermark extraction is realized.

Description

Watermark identification method, service spare part verification method, device and storage medium
Technical Field
The disclosure relates to the technical field of image recognition, in particular to a watermark recognition method, a verification method, verification equipment and storage medium of service spare parts.
Background
A watermark is a text or picture that is used to identify the provenance or prevent copying. Currently, when a user adds subpoena pieces of information to a data platform, watermarks are usually added to mark a certificate using mode, and certificate information containing the watermarks may limit the use of the certificate information because marked contents are not standard, so that normal business handling is limited.
When the data platform uses certificate information, watermark identification and extraction are needed, and because the watermark pattern cannot be judged, in the related technology, watermark content extraction is mostly dependent on manual processing, and auditors manually identify and extract the watermark content. In this way, the labor cost required for the watermark extraction and recognition process is extremely high and the efficiency is low.
Disclosure of Invention
The embodiment of the disclosure provides a watermark identification method, a verification method, verification equipment and storage medium of service spare parts, so as to solve the problems in the related art at least to a certain extent.
In one aspect of the embodiments of the present disclosure, a watermark identifying method is provided, including:
Based on a standard certificate specification corresponding to a target certificate, carrying out preset field information identification on a certificate image containing the target certificate to obtain standard field information in the certificate image, wherein the preset field information refers to field information which is specified by the standard certificate specification and is to be included in the target certificate, and the standard certificate specification indicates character specifications and position specifications of preset fields which are to be included in the target certificate;
Performing character recognition on the certificate image to obtain a character recognition result;
filtering the character recognition result based on the standard field information to obtain candidate character information except the standard field information in the character recognition result;
And carrying out information identification on the candidate character information to obtain watermark information.
In an exemplary embodiment, the identifying, based on the standard certificate specification corresponding to the target certificate, preset field information of a certificate image including the target certificate to obtain standard field information in the certificate image includes:
determining corresponding information areas of all preset fields in the certificate image based on character specifications and position specifications of the preset fields in the standard certificate specifications;
And carrying out optical character recognition on the information area corresponding to each preset field to obtain the standard field information.
In an exemplary embodiment, the preset fields include a certificate number field and other fields than the certificate number field;
Determining corresponding information areas of all preset fields in the certificate image based on character specifications and position specifications of preset fields in the standard certificate specifications; performing optical character recognition on the information area corresponding to each preset field to obtain the standard field information, wherein the method comprises the following steps:
determining a certificate number area of the certificate number field in the certificate image based on a character specification and a position specification of the certificate number field in the standard certificate specification;
performing optical character recognition on the certificate number area to obtain a certificate number recognition result;
determining the position relation between the certificate number area and the information areas corresponding to other fields based on the standard certificate specification;
determining other information areas of other fields in the certificate image according to the position relation between the certificate number area and the information areas corresponding to the other fields;
Performing optical character recognition on the other information areas to obtain field information recognition results of the other fields; and the standard field information comprises a certificate number identification result and field information identification results of other fields.
In an exemplary embodiment, the determining the document number field in the document image based on the character specification and the location specification of the document number field in the standard document specification includes:
the credential number area is determined based on the number of characters indicated in the character specification of the credential number field and the aspect ratio of the information area indicated by the location specification of the credential number field.
In an exemplary embodiment, the determining the other information areas of the other fields in the document image according to the document number area and the positional relationship between the document number area and the information areas corresponding to the other fields includes:
acquiring the center coordinates of the certificate number area and the length and width of the certificate number area;
Determining the center coordinates of the other information areas based on the relative position relation and the center coordinates of the certificate number areas;
determining the length and the width of the other information areas based on the length and the width of the certificate number area;
determining the inclination of the other information areas based on the inclination of the certificate number areas;
And determining the region position of the other information region in the certificate image based on the central coordinates, the length, the width and the gradient of the other information region.
In an exemplary embodiment, after performing optical character recognition on the certificate number area to obtain a certificate number recognition result, the method further includes:
Checking the certificate number recognition result based on the number information format indicated in the character specification of the certificate number field and the number check code contained in the certificate number recognition result to obtain a check result whether the certificate number recognition result passes the check;
And responding to the verification result that the certificate number identification result passes the verification, and executing the operation of determining the position relation between the certificate number area and the information areas corresponding to other fields based on the standard certificate specification.
In an exemplary embodiment, the filtering the character recognition result based on the standard field information to obtain candidate character information except the standard field information in the character recognition result further includes:
Filtering the character strings corresponding to the standard field information in the character recognition result to obtain candidate character strings;
acquiring each character string region in which each character string contained in the character recognition result is located;
Determining an area filtering criterion based on an information specification of the criterion field information; filtering the information areas in the character string areas based on the area filtering standard to obtain non-standard character string areas, wherein the area filtering standard comprises at least one of a length standard, a width standard and an inclination standard;
and screening the candidate character information from the candidate character strings based on the nonstandard character string area.
In an exemplary embodiment, after obtaining the candidate character information in the character recognition result except for the standard field information, the method further includes:
Filtering the candidate character information based on a preset standard to obtain a filtered character string;
And carrying out information identification on the candidate character information to obtain watermark information, wherein the method comprises the following steps:
and carrying out semantic recognition on the filtered character string to obtain the watermark information.
In an exemplary embodiment, the performing information recognition on the filtered character string to obtain the watermark information includes:
matching the filtered character string with a preset universal watermark character to obtain a matching result;
And responding to the character string identification result to comprise a target character string matched with the preset universal watermark character, and extracting the watermark information from the target character string.
In an exemplary embodiment, before the identifying the preset standard field information of the document image including the target document based on the standard document specification corresponding to the target document, the method further includes:
Classifying the certificate image to obtain the certificate type of the target certificate included in the certificate image;
and acquiring a standard certificate specification corresponding to the target certificate based on the certificate type.
In another aspect of the embodiment, a method for checking a service spare part is provided, including:
Receiving a service request, wherein the service request comprises a certificate image serving as a service spare part and a service type, and the certificate image comprises an image of a target certificate;
based on the watermark identification method in any embodiment, watermark identification is performed on the certificate image to obtain watermark information of the certificate image;
And based on the watermark information of the certificate image, confirming whether the service spare part is authorized to handle the service corresponding to the service type.
In another aspect of the embodiment, there is provided a watermark identifying apparatus including:
The field identification module is used for carrying out preset field information identification on a certificate image containing the target certificate based on a standard certificate specification corresponding to the target certificate to obtain standard field information in the certificate image, wherein the preset field information refers to the field information which is specified by the standard certificate specification and is to be included in the target certificate, and the standard certificate specification indicates character specifications and position specifications of preset characters included in the target certificate;
the character recognition module is used for carrying out character recognition on the certificate image to obtain a character recognition result;
The character filtering module is used for filtering the character recognition result based on the standard field information to obtain candidate character information except the standard field information in the character recognition result;
And the watermark identification module is used for carrying out information identification on the candidate character information to obtain watermark information.
In another aspect of the present embodiment, there is provided an electronic apparatus including:
A memory for storing a computer program;
And a processor for executing the computer program stored in the memory, and when the computer program is executed, implementing the method according to any one of the above embodiments.
In another aspect of this embodiment, a computer readable storage medium is provided, on which a computer program is stored, which when executed by a processor, implements the method according to any of the embodiments described above.
In the embodiment of the disclosure, based on the format specification and the position specification of the certificate information indicated in the standard certificate specification, the certificate image is identified to obtain standard field information corresponding to the certificate information in the certificate image, all character strings contained in the certificate image are filtered by using the standard field information to obtain candidate character strings which are possibly watermarks, and finally the candidate character strings are subjected to semantic identification to obtain watermark information. Because the certificate manufacture has corresponding standard specifications, the certificate specifications of the certificate are utilized for identification, and field information corresponding to the certificate information in the certificate image can be accurately identified, so that other character strings except the certificate information can be screened and obtained, a character string set which is possibly a watermark is obtained, the watermark can be separated from the original information of the certificate in an efficient manner, and the watermark identification efficiency is improved; and combining with the semantic recognition method, the accuracy of watermark recognition can be improved, watermark information contained in the certificate image can be obtained, and efficient and accurate automatic watermark extraction can be realized.
In the embodiment of the disclosure, firstly, the certificate number with obvious characteristics is identified to obtain a number area where the certificate number is located, and then the relative position relation between the area position where the number area is located and the area where the rest information is located is combined to determine the rest information area, so that a standard certificate can be constructed in a certificate image, the certificate information of the certificate image is obtained, the original certificate information in the certificate image can be accurately obtained, and then the filtering is performed based on the original certificate information, thereby improving the accuracy of watermark extraction.
In the embodiment of the disclosure, a non-standard character string area can be obtained by screening based on an area filtering standard, and a non-standard character string can be obtained by screening based on character string content of standard field information; the accurate non-standard character string set is obtained through secondary screening, and watermark extraction is carried out from the non-standard character string set, so that the accuracy of watermark extraction can be improved. In the embodiment of the disclosure, after candidate character information is obtained through screening, filtering of non-watermark character strings such as messy code short characters and the like is performed, and semantic recognition is performed after filtering, so that the efficiency of watermark recognition can be improved.
The technical scheme of the present disclosure is described in further detail below through the accompanying drawings and examples.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the disclosure and together with the description, serve to explain the principles of the disclosure.
The disclosure may be more clearly understood from the following detailed description taken in conjunction with the accompanying drawings in which:
fig. 1 is a flowchart of a watermark identification method provided in an exemplary embodiment of the present disclosure;
FIG. 2 is a flowchart providing a standard field information identification process in accordance with an exemplary embodiment of the present disclosure;
FIG. 3 is a flowchart providing a string filtering process in accordance with an exemplary embodiment of the present disclosure;
FIG. 4 is a flowchart of a method for verifying service spare parts according to an exemplary embodiment of the present disclosure;
Fig. 5 is a schematic structural diagram of a watermark identifying device according to an exemplary embodiment of the present disclosure;
Fig. 6 is a schematic structural diagram of an application embodiment of the electronic device of the present disclosure.
Detailed Description
Various exemplary embodiments of the present disclosure will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present disclosure unless it is specifically stated otherwise.
It will be appreciated by those of skill in the art that the terms "first," "second," etc. in embodiments of the present disclosure are used merely to distinguish between different steps, devices or modules, etc., and do not represent any particular technical meaning nor necessarily logical order between them.
It should also be understood that in embodiments of the present disclosure, "plurality" may refer to two or more, and "at least one" may refer to one, two or more.
It should also be appreciated that any component, data, or structure referred to in the presently disclosed embodiments may be generally understood as one or more without explicit limitation or the contrary in the context.
In addition, the term "and/or" in this disclosure is merely an association relationship describing an association object, and indicates that three relationships may exist, for example, a and/or B may indicate: a exists alone, A and B exist together, and B exists alone. In addition, the character "/" in the present disclosure generally indicates that the front and rear association objects are an or relationship.
It should also be understood that the description of the various embodiments of the present disclosure emphasizes the differences between the various embodiments, and that the same or similar features may be referred to each other, and for brevity, will not be described in detail.
Meanwhile, it should be understood that the sizes of the respective parts shown in the drawings are not drawn in actual scale for convenience of description.
The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the disclosure, its application, or uses.
Techniques, methods, and apparatus known to one of ordinary skill in the relevant art may not be discussed in detail, but are intended to be part of the specification where appropriate.
It should be noted that: like reference numerals and letters denote like items in the following figures, and thus once an item is defined in one figure, no further discussion thereof is necessary in subsequent figures.
Embodiments of the present disclosure may be applicable to electronic devices such as terminal devices, computer systems, servers, etc., which may operate with numerous other general purpose or special purpose computing system environments or configurations. Examples of well known terminal devices, computing systems, environments, and/or configurations that may be suitable for use with the terminal device, computer system, server, or other electronic device include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients, hand-held or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, network personal computers, small computer systems, mainframe computer systems, and distributed cloud computing technology environments that include any of the foregoing, and the like.
Electronic devices such as terminal devices, computer systems, servers, etc. may be described in the general context of computer system-executable instructions, such as program modules, being executed by a computer system. Generally, program modules may include routines, programs, objects, components, logic, data structures, etc., that perform particular tasks or implement particular abstract data types. The computer system/server may be implemented in a distributed cloud computing environment in which tasks are performed by remote processing devices that are linked through a communications network. In a distributed cloud computing environment, program modules may be located in both local and remote computing system storage media including memory storage devices.
In order to solve the problems in the related art, the embodiment of the disclosure provides a watermark identification method, which utilizes standard certificate specifications to identify and obtain standard certificate information, separates original certificate information from other information, and screens out watermark information from other information, so that the efficiency and accuracy of watermark identification can be improved. The method provided by the embodiment of the disclosure can be used in any data using platform needing to use the certificate information or a background server of a client, the background server can be the electronic device, and the electronic device can identify watermark information on the certificate image by using the method provided by the embodiment of the disclosure so as to judge whether the certificate image is available.
Fig. 1 is a flowchart providing a watermark identification method according to an exemplary embodiment of the present disclosure. The method provided by the embodiment of the disclosure can be used in an electronic device, as shown in fig. 1, and comprises the following steps:
Step 101, based on a standard certificate specification corresponding to a target certificate, carrying out preset field information identification on a certificate image containing the target certificate to obtain standard field information in the certificate image, wherein the preset field information is field information to be included in the target certificate specified by a standard certificate specification, and the standard certificate specification indicates character specifications and position specifications of preset fields to be included in the target certificate.
The target certificate refers to a user certificate used in business processing. Illustratively, the target document may be an identity card, a residence card, a student card, a passport, a professional qualification, or the like. When the electronic device receives the certificate image, the whole image identification can be firstly carried out on the certificate image to obtain the information contained in the image, and whether the target certificate is contained in the certificate image or not is determined based on the information contained in the image. Illustratively, when identifying whether the identity card is contained in the document image, the identity card is determined to be contained in the document image when detecting that the name, sex, ethnicity, birth, address, and citizen identity number characters are contained in the document image.
When the document image is detected to contain the target document, the electronic equipment can recognize the preset field information according to the standard document specification corresponding to the target document. The standard certificate specification is a manufacturing specification in the process of certificate manufacturing, the standard certificate specification comprises a character specification and a position specification, the character specification comprises a content specification, a specification of a format, an interval, a size and the like of the character, and the position specification is a position requirement that a preset field corresponds to the character to be positioned in the certificate. The content specification indicates the field to be identified, and the preset field information can be identified in the certificate image by combining the content specification, the character format specification and the character position specification to obtain standard field information. The standard field information obtained by recognition includes a field and field information corresponding to the field.
Illustratively, when the target certificate is an identity card, the standard field information obtained by recognition comprises a name field, corresponding name information, a gender field, corresponding gender information, a ethnic field, corresponding ethnic information, a birth field, corresponding birth information, an address field, corresponding address information, a number field, and corresponding citizen identity number information.
In one possible implementation, after detecting that the document image includes the target document, the document image may be subjected to image processing, and then standard field recognition may be performed on the document image after the image processing. The image processing comprises adaptive binarization, edge detection, corrosion, expansion operation and the like. The electronic equipment can carry out self-adaptive binarization on the certificate image, then carry out edge detection on the binarized image to obtain the certificate edge of the target certificate, and then process the edge by using the corrosion expansion algorithm to make the edge of the target certificate clearer. And then determining a certificate area where the target certificate is positioned based on the detected edge, and carrying out standard preset field information identification on the certificate area to obtain standard field information therein.
Step 102, character recognition is carried out on the evidence image, and a character recognition result is obtained.
In the embodiment of the disclosure, the original document information contained in the document image is detected by using the standard document specification, and all characters contained in the document image are screened based on the original document information, so that a character set which is possibly a watermark is obtained.
In one possible implementation, the electronic device may perform Optical character recognition (Optical CharacterRecognition, OCR) on the document image to obtain a character recognition result, where the character recognition result includes all characters in the document image. It should be noted that, in the embodiment of the present disclosure, only the implementation manners of the preset field information identification and the character string identification are illustrated, but the execution timing is not limited, where the processes of the preset field information identification and the character identification may be performed synchronously or sequentially, which is not limited in the embodiment.
And step 103, filtering the character recognition result based on the standard field information to obtain candidate character information except the standard field information in the character recognition result.
In the embodiment of the disclosure, the electronic device filters the character recognition result based on the standard field information, and can separate the character corresponding to the standard field information from other characters to obtain a character set (i.e., candidate character information) which is possibly watermark, so as to be used in the watermark recognition process.
And 104, carrying out information identification on the candidate character information to obtain watermark information.
After the candidate character information is obtained, information identification can be performed on the candidate character information. In one possible implementation manner, in the information recognition process, the candidate character information and the preset universal watermark character can be matched, and then watermark information is extracted from the characters matched with the preset universal watermark character.
In another possible implementation, the candidate character information may be semantically identified using a semantic identification model. Inputting the candidate character information into a semantic recognition model for semantic analysis to obtain semantics corresponding to the candidate character information, and determining watermark information contained in the candidate character information according to the semantics.
In summary, in the embodiment of the present disclosure, based on the format specification and the position specification of the certificate information indicated in the standard certificate specification, the certificate image is identified to obtain standard field information corresponding to the certificate information in the certificate image, and then all character strings included in the certificate image are filtered by using the standard field information to obtain candidate character strings that may be watermarks, and finally the candidate character strings are subjected to semantic recognition to obtain watermark information. Because the certificate manufacture has corresponding standard specifications, the certificate specifications of the certificate are utilized for identification, and field information corresponding to the certificate information in the certificate image can be accurately identified, so that other character strings except the certificate information can be screened and obtained, a character string set which is possibly a watermark is obtained, the watermark can be separated from the original information of the certificate in an efficient manner, and the watermark identification efficiency is improved; and combining with the semantic recognition method, the accuracy of watermark recognition can be improved, watermark information contained in the certificate image can be obtained, and efficient and accurate automatic watermark extraction can be realized.
In the foregoing embodiment, the identification of the preset standard field is performed according to the standard certificate specification of the target certificate, and in a possible implementation manner, when the watermark in the certificate image is detected, the certificate type of the target certificate included in the certificate image is first determined, and then the standard certificate specification is acquired for subsequent detection. The process comprises the following steps:
And step 1a, classifying the certificate image to obtain the certificate type of the target certificate included in the certificate image.
After receiving the document image, the document image is first subjected to image classification to determine the document type of the target document included in the document image. In one possible implementation, global OCR may be performed on the document image to obtain the characters contained therein. And determining the certificate type of the target certificate contained in the certificate image based on the characters contained in the certificate image.
After the characters contained in the certificate image are detected, the characters can be matched with field information which is contained in each preset certificate type, and the certificate type of the target certificate contained in the certificate image is determined to be the target certificate type under the condition that the characters are completely matched with the field information which is contained in the preset target certificate type.
Schematically, when detecting that the characters in the certificate image contain a name field, a gender field, an ethnic field, a birth field, an address field and an identity card number field, the characters are completely matched with field information which should be contained in the identity card, and determining that the target certificate contained in the certificate image is the identity card.
In another possible implementation, the document image may be image classified using a pre-trained image classification model to obtain the document type of the target document contained in the document image. The image classification model can be obtained by training a large number of sample images in advance, the sample images comprise images corresponding to different certificate types, and the trained image classification model can be used for classifying the certificate images and identifying the certificate types of the certificates in the images.
Step 1b, obtaining standard certificate specifications corresponding to the target certificate based on the certificate type.
After the certificate type corresponding to the target certificate is determined, a corresponding standard certificate specification can be obtained according to the certificate type inquiry and used for a subsequent identification process. The standard certificate specifications corresponding to the various certificate types can be stored locally or in the cloud. The electronic device can obtain standard certificate specifications of the certificate type corresponding to the target certificate from the local or cloud.
In one possible implementation manner, the identification of the preset field information can be directly performed on the document image according to the standard document specification, as shown in fig. 2, and based on the standard document specification corresponding to the target document, the identification of the preset field information on the document image containing the target document includes the following steps:
Step 1021, determining corresponding information areas of the preset fields in the certificate image based on the character specification and the position specification of the preset fields in the standard certificate specification.
Wherein, the character size and the character interval of the corresponding characters (preset fields) of the certificate information are indicated in the character specification. And the relative positions of the corresponding characters of the document information in the whole document are indicated in the position specification. According to the character size and the character interval corresponding to the preset field, the area proportion of the characters corresponding to the preset field in the whole certificate area can be determined, according to the area proportion, the area size of the area corresponding to the preset field information in the certificate image can be determined, and then the information area corresponding to the preset field in the certificate image is determined by combining the relative position of the preset field information in the certificate.
And when the information areas are determined, the information areas corresponding to each preset field are determined by combining the area proportion and the relative positions corresponding to each preset field.
Schematically, when the target certificate is an identity card, information areas where name information, gender information, ethnic information, birth information, address information and citizen identity number information are located are respectively determined.
Step 1022, performing optical character recognition on the information area corresponding to each preset field to obtain standard field information.
OCR recognition can be performed on each information area to obtain field information (standard field information) contained in each information area.
In another possible implementation manner, the field information corresponding to the information can be obtained by identifying based on the information specification corresponding to the certificate information, and then the information area where the other information is located in the certificate image is determined by combining the relative position between the information and the other information, so that character identification is performed based on the information area, and the field information corresponding to the other information is obtained, so that the accuracy of standard field information identification is improved.
Wherein, the step of determining the information area in the certificate image further comprises the following steps:
step 10211, determining a document number area of the document number field in the document image based on the character specification and the position specification of the document number field in the standard document specification.
The target certificate contains the certificate number, and the certificate number has obvious characteristics in the certificate, so that the certificate number in the certificate image can be identified first. In one possible implementation, the identification number area where the identification number field is located is determined based on the character specification and the position specification of the identification number field in the standard identification specification, and then characters in the identification number area are identified to obtain the number field information.
The method comprises the steps of indicating the number of characters of a certificate number in a character specification of a certificate number field, indicating the length-width ratio corresponding to a number area in a position specification of the certificate number field, and determining the certificate number area based on the number of characters indicated in the character specification of the certificate number field and the length-width ratio of an information area indicated by the position specification of the certificate number field, namely determining the certificate number area where the number field is located in a certificate image. The certificate number area meeting the standard can be obtained by searching in the certificate image based on the character digit and the area length-width ratio.
Illustratively, when the target document is an identification card, the document number is 18 digits and the aspect ratio is 5, and the document number area can be searched in the document image based on the standard.
Step 10212, performing optical character recognition on the certificate number area to obtain a certificate number recognition result.
In one possible implementation, after the certificate number area is searched, image capturing may be performed on the certificate number area, and OCR recognition may be performed on the captured image to obtain a certificate number recognition result. The certificate number identification result contains the identified number field information.
Step 10213, determining the positional relationship between the certificate number area and the information area corresponding to the other fields based on the standard certificate specification.
In the standard certificate, the relative position relationship between the certificate number and other certificate information is a determined standard relationship, the relative position relationship between the certificate number position and the other information positions can be obtained, the position relationship between the certificate number region and the information region corresponding to other fields is obtained, and the other information regions where the other certificate information is located are determined by combining the certificate number region.
Step 10214, determining other information areas of other fields in the document image according to the document number area and the position relation between the document number area and the information areas corresponding to the other fields.
Wherein the process of determining the other information area comprises the following steps 1-5:
step 1, obtaining the center coordinates of a certificate number area and the length and width of the certificate number area;
in one possible implementation, a two-dimensional coordinate system for the document image may be established, and a center coordinate corresponding to a center position of the document number area may be determined based on the established two-dimensional coordinate system. Alternatively, the origin of the two-dimensional coordinate system may be the lower left corner, lower right corner, or the center position of the image, etc.
And the length and width of the document number area can be determined based on the edge coordinates of the document number area in the two-dimensional coordinate system.
And 2, determining the center coordinates of other information areas based on the relative position relation and the center coordinates of the certificate number areas.
And the other information areas corresponding to the other information are image areas containing the other information in the certificate image. The positions between the certificate number area and other information areas corresponding to other information have fixed distance proportions, and the center coordinates of the other information areas can be determined according to the fixed distance proportions.
In one possible implementation, the center coordinates of the certificate number areas may be normalized to obtain normalized center coordinates, and the normalized center coordinates of the other information areas may be determined based on the fixed distance ratio indicated by the normalized center coordinates and the relative positional relationship.
Schematically, when the target certificate is an identity card and the center coordinates of the name area are determined based on the center coordinates of the certificate number area, the normalized center coordinates of the certificate number area can be calculated first, and then the normalized center coordinates of the name area corresponding to the name can be determined based on the fixed distance ratio between the number and the name.
And step 3, determining the length and the width of other information areas based on the length and the width of the certificate number area.
The length of the document number area is in a fixed length proportion to the length of the other information areas, and the width of the number area is in a fixed width proportion to the width of the other information areas. The electronic equipment can determine the length of other information areas according to the length of the certificate number area and the fixed length proportion; and the width of other information areas can be determined according to the width of the number area and the fixed width proportion.
By combining the above examples, the length and width of the certificate number area can be calculated according to the edge coordinates of the certificate number area, and the length of the name area can be calculated based on the fixed length ratio between the certificate number area and the name area; and the width of the name area is calculated based on the fixed width ratio between the certificate number area and the name area.
And 4, determining the inclination of other information areas based on the inclination of the certificate number area.
The target document in the document image may have a certain inclination, and the inclination of the document number area is the same as the inclination corresponding to other document information, i.e. the inclination of the other information area is the same as the inclination of the document number area.
In one possible embodiment, the inclination of the document number area may be determined based on the angle between the edge of the document number area and the coordinate axis, and the inclination of the document number area may be determined as the inclination of the other information area.
And 5, determining the region positions of the other information regions in the certificate image based on the central coordinates, the length, the width and the inclination of the other information regions.
According to the normalized center coordinates of other information areas, the center position of the centers of the other information areas in the certificate image can be determined, and the other information areas can be positioned in the certificate image by combining the long width and the gradient of the other information areas.
Illustratively, in combination with the above example, the name area corresponding to the name field information in the document image may be determined according to the normalized center coordinates, the long width, and the inclination with respect to the coordinate axes of the name area.
Other credential information than the credential number area may contain a variety of information. In one possible implementation, the information area of each information in the document image can be determined according to the relative position relationship between the position of the document number and each of the other information. Illustratively, when the target certificate is an identity card, a name area can be determined according to the relative position relationship between the position of the identity card number and the corresponding position of the name; the sex area can be determined according to the relative position relation between the position of the identity card number and the position corresponding to the sex; similarly, birth areas, ethnic areas, and address areas can be redefined.
After detecting the field areas corresponding to the field information, character recognition is carried out on the field areas to obtain standard field information.
Step 10215, performing optical character recognition on other information areas to obtain field information recognition results of other fields; the information areas corresponding to the preset fields in the certificate image comprise a certificate number area and other information areas, and the standard field information comprises a certificate number identification result and field information identification results of other fields.
For each other information area, image interception can be performed respectively, and then OCR recognition is performed on the intercepted image to obtain the other field information included in the other information areas.
In the above process, after the certificate number area is determined, other information areas corresponding to other information are determined directly based on the certificate number area so as to identify other information. However, in one possible scenario, the target document may not be a genuine document, or the document number may be misidentified, in which case the document may not be used, nor does it need to identify subsequent information. Thus, in one possible implementation, it is possible to first detect whether the certificate number in the certificate number area is a standard number, and then perform the above steps of determining other information areas based on the number area. The detection process comprises the following steps a-b:
And a step a of checking the identification result of the certificate number based on the number information format indicated in the character specification of the certificate number field and the number check code contained in the identification result of the certificate number to obtain a check result of whether the identification result of the certificate number passes the check.
The certificate number has a standard number information format indicating the information indicated by each number. In one possible implementation, the number field information included in the identification result of the document number may be subjected to format detection to determine whether each number is a number indicating the corresponding information. Illustratively, the identification card number includes numbers indicating the area, date of birth and sex, and it can be detected whether each number meets the rule. And under the condition of conforming to the rule, combining the number check code contained in the certificate number identification result to carry out secondary detection.
The certificate number is provided with a check code, a check mode indicated in the certificate number specification can be adopted to determine a standard check code corresponding to the number contained in the certificate number identification result, then the standard check code is matched with the number check code contained in the certificate number identification result, and if the standard check code is identical with the identified number check code, the certificate number identification result is determined, and a check result is obtained. If the certificate number contained in the certificate number identification result is a standard certificate number; if the identification numbers are different, the identification numbers are determined to be non-standard identification numbers, false information or false identification are possible, and detection and identification can be performed again.
And b, responding to the verification result that the identification result of the certificate number passes the verification, and executing the operation of determining the position relation between the certificate number area and the information areas corresponding to other fields based on the standard certificate specification.
When the verification result indicates that the identification result of the certificate number passes the verification, namely the identified certificate number is a standard certificate number, other information areas corresponding to other certificate information are determined based on the certificate number area.
In the embodiment of the disclosure, firstly, the certificate number with obvious characteristics is identified to obtain a number area where the certificate number is located, and then the relative position relation between the area position where the number area is located and the area where the rest information is located is combined to determine the rest information area, so that a standard certificate can be constructed in a certificate image, the certificate information of the certificate image is obtained, the original certificate information in the certificate image can be accurately obtained, and then the filtering is performed based on the original certificate information, thereby improving the accuracy of watermark extraction.
In one possible implementation manner, after standard field information corresponding to certificate information in a certificate image is identified, characters corresponding to the standard field information can be filtered, so that candidate character information which can be a watermark can be obtained.
And filtering the number field information and the rest field information in all the characters contained in the certificate image to obtain candidate character information. The number field information and the rest field information form the original document information of the target document contained in the document image, and the original document information is filtered to obtain other non-document information.
In another possible implementation, to improve the extraction accuracy, the character position area may be combined for filtering based on character filtering. As shown in fig. 3, the filtering process comprises the steps of:
Step 1031, filtering the character strings corresponding to the standard field information in the character recognition result to obtain candidate character strings.
The character recognition result comprises a plurality of character strings, and character strings corresponding to standard field information (comprising a certificate number recognition result and a field information recognition result) are filtered to obtain non-standard character strings, namely candidate character strings.
Step 1032, obtaining each character string region where each character string included in the character recognition result is located.
When character recognition is carried out on the certificate image, besides character content, the position area of the character in the certificate image is obtained, and the character string area of each character string in the certificate image is obtained so as to carry out filtering again based on the position area information.
Step 1033, determining a region filtering standard based on the information specification of the standard field information, and filtering the information region in each character string region based on the region filtering standard to obtain a non-standard character string region.
In the above process, the electronic device has determined the number area corresponding to the certificate number and other information areas corresponding to other information. In one possible implementation, the number area and other information areas in the character string area where each detected character string is located are filtered to obtain a non-standard character string area.
The standard field information is different from other field information in character size, interval, position specification, etc., and the information area corresponding to the standard field information is also different from the standard of the non-standard character string area corresponding to other field information. In another possible implementation manner, the area filtering criteria may be determined based on the information specification of the standard field information, and the information area corresponding to the standard field information in each string area may be filtered based on the area filtering criteria, so as to obtain the non-standard string area. Wherein the area filtering criteria includes at least one of a length criteria, a width criteria, and an inclination criteria.
Optionally, the region filtering criteria includes a length criteria. The length standard can be determined according to the length of the area occupied by each standard field information, and the length of the area occupied by each standard field information is used as the length standard for area filtering. The non-standard character string area can be obtained by filtering the area which accords with the length standard in each character string area.
Optionally, the area filtering criteria includes a width criterion, and similarly, the width criterion may be determined by a width of an area occupied by each standard field information, and the width of the area occupied by each standard field information is used as the width criterion for area filtering. The non-standard character string area can be obtained by filtering the area which accords with the width standard in each character string area.
Optionally, the region filtering criteria includes a slope criteria. The area occupied by each standard field information in the target certificate should have the same inclination. And screening areas different from the inclination standard from the character string areas by taking the inclination of the number area and other information areas as the inclination standard, so as to obtain non-standard character string areas.
Optionally, the region filtering criteria may further include a length criterion, a width criterion, and an inclination criterion, and the filtering may be performed based on the length criterion, the width criterion, and the inclination criterion, respectively, to obtain corresponding nonstandard character string regions, and the set of nonstandard character string regions obtained by filtering based on the three criteria is determined as the nonstandard character string region obtained by final filtering.
And 1034, screening candidate character information from the candidate character strings based on the non-standard character string area.
Firstly, non-standard character strings in a non-standard character string area can be identified, and then the same characters are taken from the non-standard character strings and the candidate character strings to obtain candidate character information, wherein the candidate character information is a character set which can be a watermark.
In one possible implementation manner, after the candidate character information is obtained, the candidate character information may include obvious non-watermark character strings such as messy codes and short characters, and the messy codes and the short character strings may be filtered first to improve the efficiency of watermark recognition. The process may comprise the steps of:
step 1041, filtering the candidate character information based on a preset standard to obtain a filtered character string.
The preset standard refers to a character standard corresponding to a preset watermark character. The preset standard comprises a text format and a character string length.
The character strings composed of non-character characters can be filtered based on the messy codes in the character information candidates and the character strings can be filtered based on the character string length, the watermark character strings have a certain character string length, a length threshold can be set, and character strings lower than the length threshold are filtered. The messy codes can be filtered based on the text format, short characters can be filtered based on the character string length, and the filtered character strings, namely the candidate watermark character strings, are obtained.
Step 1042, information recognition is performed on the filtered character string to obtain watermark information.
And carrying out information identification on the candidate watermark character strings obtained by filtering to obtain watermark information contained in the character strings after filtering, so that the information identification process can be reduced, and the watermark identification efficiency can be improved.
In the embodiment of the disclosure, a non-standard character string area can be obtained by screening based on an area filtering standard, and a non-standard character string can be obtained by screening based on character string content of standard field information; the accurate non-standard character string set is obtained through secondary screening, and watermark extraction is carried out from the non-standard character string set, so that the accuracy of watermark extraction can be improved. In the embodiment of the disclosure, after candidate character information is obtained through screening, filtering of non-watermark character strings such as messy code short character strings is performed, semantic recognition is performed after filtering, and therefore watermark recognition efficiency can be improved.
In one possible implementation manner, in order to improve the semantic recognition efficiency, a general watermark character may be preset, and the information recognition is performed on the filtered character string based on the general watermark character, where the process includes the following steps:
step one, matching the filtered character string with a preset universal watermark character to obtain a matching result.
The preset universal watermark character comprises universal characters in the allowed watermark. Illustratively, when the permitted use watermark is a watermark indicating the use purpose of the certificate, the generic watermark character may include "for" "" use "," permitted ", and the like. The filtered character string can be matched with a preset universal watermark character, and a matching result is obtained.
In a possible implementation manner, when the character in the filtered character string matches with a part of the preset universal watermark character, the filtered character string is determined to match with the preset universal watermark character, and the filtered character string can be determined to be the target watermark character string. In another possible implementation manner, in order to improve watermark identification accuracy, in a case that the filtered character string is completely matched with the preset universal watermark character string, it is determined that the filtered character string is matched with the preset universal watermark character, and the filtered character string may be determined as the target watermark character string.
And step two, responding to the character string recognition result to comprise a target character string matched with the preset universal watermark character, and extracting watermark information from the target character string.
When the character string identification result indicates that a target watermark character string with characters matched with the preset universal watermark characters exists in the filtered character string, the target watermark character string can be extracted from the certificate image.
And extracting information from the target watermark character string to obtain watermark information contained in the certificate image.
When the client or the data use platform uses the service spare parts, the service spare parts are required to be checked to determine whether the service spare parts are authorized to be used. In one possible implementation manner, watermark identification can be performed on a certificate image serving as a service spare part to obtain watermark information therein, and whether the service spare part is authorized or not is determined according to the watermark information. The embodiment of the disclosure also provides a verification method of the service spare part, comprising the following steps:
Step 201, a service request is received, wherein the service request includes a certificate image as a service spare part and a service type.
The service spare part refers to a certificate image of a target certificate which is required to be provided by a user when the user transacts the service. When a user handles a service, the certificate image of the target certificate is uploaded as a service spare part for service handling, after subpoena images are uploaded by the user, a service request is triggered to be sent, the electronic equipment can receive the service request, the service request comprises the certificate image which is uploaded by the user and is used as the service spare part and the service type corresponding to the service handling of the user, and the certificate image which is used as the service spare part comprises the image of the target certificate.
Step 202, performing watermark identification on the certificate image based on the watermark identification method described in the above embodiment, to obtain watermark information of the certificate image.
After obtaining the certificate image as the service spare part, the electronic device can identify the watermark information included in the certificate image by using the watermark identification method for the certificate image of the target certificate provided by the embodiment. The specific watermark identification process may refer to the above embodiment, and will not be described herein.
Step 203, based on the watermark information of the certificate image, it is confirmed whether the service spare part is authorized to handle the service corresponding to the service type.
Different service types have different service spare part authorization specifications. The watermark information of the certificate image can be checked according to the service spare part authorization specification corresponding to the service type in the service request, so as to determine whether the service spare part is authorized for the service corresponding to the service type.
The service spare part authorization specification indicates that the service spare part is authorized to handle the authorization character which the service corresponding to the service type should have, the identified watermark information can be matched with the authorization character, if the watermark information is the same as the authorization character indicated in the service spare part authorization specification, the watermark information of the service spare part is determined to be in accordance with the authorization specification, the service spare part is authorized to be used for handling the service corresponding to the service type, and the certificate image can be stored for service handling. If the watermark information is different from the authorization character indicated by the authorization specification of the service spare part, the watermark information of the service spare part is determined to be not in accordance with the authorization specification, the service spare part is unauthorized to handle the service corresponding to the service type, and the user can be instructed to provide the certificate image of the service spare part in accordance with the authorization specification again by feeding back the unauthorized information to the user through the client.
By the method provided by the embodiment, under the condition that the service spare part verification service requirement exists, the uploaded certificate image serving as the service spare part can be automatically verified, whether the service spare part is authorized to be used or not is determined, and the verification efficiency of the service spare part is improved.
As shown in fig. 4, a flowchart of a method for verifying a service spare part according to an exemplary embodiment is shown. Taking a service spare part as an identity card as an example, the method comprises the following steps:
step 401, acquiring a document image.
Step 402, detecting whether the certificate image is an identity card, if so, executing step 403, otherwise, executing step 401.
Step 403, image processing is performed on the document image.
Image processing includes adaptive binarization, edge detection, erosion, and dilation processing.
Step 404, detecting the certificate number area based on the character number of the identification card number and the length-width ratio.
Based on the 18 digits of the identification card number and the feature of aspect ratio 5, the identification card number area is detected in the identification card image.
And step 405, acquiring the center coordinates and the long width of the certificate number area, and carrying out image interception on the certificate number area.
Step 406, character recognition is performed on the intercepted image.
And carrying out character recognition on the screenshot corresponding to the number area to obtain the identification card number in the certificate image.
Step 407, obtain the identification result, determine whether the identification result accords with the identification card number rule, if yes, go to steps 408 and 409.
And checking the identification card number obtained by recognition based on the number composition rule and the number check code, and detecting whether the identification card number is a standard identification card number. And detecting information areas corresponding to other certificate information under the condition of the standard identity card number.
In step 408, the center coordinates and the long width of the other information areas are determined based on the center coordinates and the long width of the certificate number area.
Step 409, performing character recognition on the certificate image to obtain each character string contained in the certificate image and determining a character string area of each character string.
In step 410, each string region is filtered based on the certificate number region and other information regions to obtain candidate strings.
In step 411, short character and messy code filtering are performed on the candidate character strings, so as to obtain candidate watermark character strings.
Step 412, information identification is performed on the candidate watermark character string, and it is determined whether the candidate watermark character string is a watermark character string, if yes, step 413 is performed, and if not, step 414 is performed.
Step 413, determining whether the watermark meets the service spare part authorization specification, if yes, executing step 416, and if not, executing step 415.
Step 414, determining that there is no watermark, and storing the document image.
In the event that no watermark is detected, the credential image is determined to be available and the credential image may be stored.
Step 415, returning the document image and feeding back the non-compliance information.
And under the condition that the watermark is determined to be out of compliance with the service spare part authorization specification, returning the certificate image and feeding back the watermark non-compliance information.
Step 416, determining watermark compliance, and storing the document image.
When the watermark is a standard watermark, the watermark compliance is determined, and the certificate image can be stored.
It should be noted that, the specific implementation of this embodiment may refer to the above embodiment, and will not be described herein.
Fig. 5 is a block diagram of a watermark identification apparatus provided in an exemplary embodiment of the present disclosure. As shown in fig. 5, the apparatus includes:
The field identifying module 501 is configured to identify preset field information of a document image including a target document based on a standard document specification corresponding to the target document, so as to obtain standard field information in the document image, where the preset field information refers to field information to be included in the target document specified by the standard document specification, and the standard document specification indicates a character specification and a position specification of preset characters included in the target document;
The character recognition module 502 is configured to perform character recognition on the certificate image to obtain a character recognition result;
a character filtering module 503, configured to filter the character recognition result based on the standard field information, so as to obtain candidate character information except the standard field information in the character recognition result;
and the watermark identifying module 504 is configured to identify the candidate character information to obtain watermark information.
In an exemplary embodiment, the field identifying module 501 is further configured to:
determining corresponding information areas of all preset fields in the certificate image based on character specifications and position specifications of the preset fields in the standard certificate specifications;
And carrying out optical character recognition on the information area corresponding to each preset field to obtain the standard field information.
In an exemplary embodiment, the preset fields include a certificate number field and other fields than the certificate number field;
the field identifying module 501 is further configured to:
determining a certificate number area of the certificate number field in the certificate image based on a character specification and a position specification of the certificate number field in the standard certificate specification;
performing optical character recognition on the certificate number area to obtain a certificate number recognition result;
determining the position relation between the certificate number area and the information areas corresponding to other fields based on the standard certificate specification;
determining other information areas of other fields in the certificate image according to the position relation between the certificate number area and the information areas corresponding to the other fields;
Performing optical character recognition on the other information areas to obtain field information recognition results of the other fields; and the standard field information comprises a certificate number identification result and field information identification results of other fields.
In an exemplary embodiment, the field identifying module 501 is further configured to:
the credential number area is determined based on the number of characters indicated in the character specification of the credential number field and the aspect ratio of the information area indicated by the location specification of the credential number field.
In an exemplary embodiment, the field identifying module 501 is further configured to:
acquiring the center coordinates of the certificate number area and the length and width of the certificate number area;
Determining the center coordinates of the other information areas based on the relative position relation and the center coordinates of the certificate number areas;
determining the length and the width of the other information areas based on the length and the width of the certificate number area;
determining the inclination of the other information areas based on the inclination of the certificate number areas;
And determining the region position of the other information region in the certificate image based on the central coordinates, the length, the width and the gradient of the other information region.
In an exemplary embodiment, the apparatus further comprises:
The verification module is used for verifying the certificate number recognition result based on the number information format indicated in the character specification of the certificate number field and the number verification code contained in the certificate number recognition result to obtain a verification result of whether the certificate number recognition result passes the verification;
The field identifying module 501 is further configured to perform the operation of determining the positional relationship between the certificate number area and the information areas corresponding to other fields based on the standard certificate specification in response to the verification result being that the certificate number identifying result passes verification.
In an exemplary embodiment, the character filtering module 503 is further configured to:
Filtering the character strings corresponding to the standard field information in the character recognition result to obtain candidate character strings;
acquiring each character string region in which each character string contained in the character recognition result is located;
Determining an area filtering criterion based on an information specification of the criterion field information; filtering the information areas in the character string areas based on the area filtering standard to obtain non-standard character string areas, wherein the area filtering standard comprises at least one of a length standard, a width standard and an inclination standard;
and screening the candidate character information from the candidate character strings based on the nonstandard character string area.
In an exemplary embodiment, the character filtering module 503 is further configured to:
Filtering the candidate character information based on a preset standard to obtain a filtered character string;
the watermark identifying module 504 is further configured to:
and carrying out semantic recognition on the filtered character string to obtain the watermark information.
In an exemplary embodiment, the watermark identifying module 504 is further configured to:
matching the filtered character string with a preset universal watermark character to obtain a matching result;
And responding to the character string identification result to comprise a target character string matched with the preset universal watermark character, and extracting the watermark information from the target character string.
In an exemplary embodiment, the apparatus further comprises:
The classification module is used for classifying the certificate images to obtain the certificate types of the target certificates included in the certificate images;
and the acquisition module is used for acquiring the standard certificate specification corresponding to the target certificate based on the certificate type.
The watermark identifying device in the embodiment of the present disclosure corresponds to the embodiment of the watermark identifying method in the present disclosure, and the related content may be referred to each other, which is not described herein again. The corresponding advantageous technical effects of the watermark identifying device in the embodiments of the present disclosure may be referred to the corresponding advantageous technical effects of the corresponding exemplary method section described above, and will not be described herein.
In addition, the embodiment of the disclosure further provides a verification device for service spare parts, including:
the system comprises a receiving module, a processing module and a processing module, wherein the receiving module is used for receiving a service request, the service request comprises a certificate image serving as a service spare part and a service type, and the certificate image comprises an image of a target certificate;
The identification module is used for carrying out watermark identification on the certificate image based on the method in any embodiment to obtain watermark information of the certificate image;
And the verification module is used for confirming whether the service spare part is authorized to transact the service corresponding to the service type based on the watermark information of the certificate image.
In addition, the embodiment of the disclosure also provides an electronic device, which comprises:
A memory for storing a computer program;
And a processor, configured to execute the computer program stored in the memory, and when the computer program is executed, implement the method according to any one of the embodiments of the disclosure.
Fig. 6 is a schematic structural diagram of an application embodiment of the electronic device of the present disclosure. Next, an electronic device according to an embodiment of the present disclosure is described with reference to fig. 6. The electronic device may be either or both of the first device and the second device, or a stand-alone device independent thereof, which may communicate with the first device and the second device to receive the acquired input signals therefrom.
As shown in fig. 6, the electronic device includes one or more processors and memory.
The processor may be a Central Processing Unit (CPU) or other form of processing unit having data processing and/or instruction execution capabilities, and may control other components in the electronic device to perform the desired functions.
The memory may include one or more computer program products that may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, random Access Memory (RAM) and/or cache memory (cache), and the like. The non-volatile memory may include, for example, read Only Memory (ROM), hard disk, flash memory, and the like. One or more computer program instructions may be stored on the computer readable storage medium that can be executed by a processor to implement the methods of the various embodiments of the present disclosure described above and/or other desired functions.
In one example, the electronic device may further include: input devices and output devices, which are interconnected by a bus system and/or other forms of connection mechanisms (not shown).
In addition, the input device may include, for example, a keyboard, a mouse, and the like.
The output device may output various information including the determined distance information, direction information, etc., to the outside. The output devices may include, for example, a display, speakers, a printer, and a communication network and remote output devices connected thereto, etc.
Of course, only some of the components of the electronic device relevant to the present disclosure are shown in fig. 6, with components such as buses, input/output interfaces, etc. omitted for simplicity. In addition, the electronic device may include any other suitable components depending on the particular application.
In addition to the methods and apparatus described above, embodiments of the present disclosure may also be a computer program product comprising computer program instructions which, when executed by a processor, cause the processor to perform the steps in a watermark identification method according to various embodiments of the present disclosure described in the above section of the present description.
The computer program product may write program code for performing the operations of embodiments of the present disclosure in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server.
Furthermore, embodiments of the present disclosure may also be a computer-readable storage medium, having stored thereon computer program instructions, which when executed by a processor, cause the processor to perform steps in a method according to various embodiments of the present disclosure described in the above section of the present disclosure.
The computer readable storage medium may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. The readable storage medium may include, for example, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium would include the following: an electrical connection having one or more wires, a portable disk, a hard disk, random Access Memory (RAM), read-only memory (ROM), erasable programmable read-only memory (EPROM or flash memory), optical fiber, portable compact disk read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware associated with program instructions, where the foregoing program may be stored in a computer readable storage medium, and when executed, the program performs steps including the above method embodiments; and the aforementioned storage medium includes: various media that can store program code, such as ROM, RAM, magnetic or optical disks.
The basic principles of the present disclosure have been described above in connection with specific embodiments, but it should be noted that the advantages, benefits, effects, etc. mentioned in the present disclosure are merely examples and not limiting, and these advantages, benefits, effects, etc. are not to be considered as necessarily possessed by the various embodiments of the present disclosure. Furthermore, the specific details disclosed herein are for purposes of illustration and understanding only, and are not intended to be limiting, since the disclosure is not necessarily limited to practice with the specific details described.
In this specification, each embodiment is described in a progressive manner, and each embodiment is mainly described in a different manner from other embodiments, so that the same or similar parts between the embodiments are mutually referred to. For system embodiments, the description is relatively simple as it essentially corresponds to method embodiments, and reference should be made to the description of method embodiments for relevant points.
The block diagrams of the devices, apparatuses, devices, systems referred to in this disclosure are merely illustrative examples and are not intended to require or imply that the connections, arrangements, configurations must be made in the manner shown in the block diagrams. As will be appreciated by one of skill in the art, the devices, apparatuses, devices, systems may be connected, arranged, configured in any manner. Words such as "including," "comprising," "having," and the like are words of openness and mean "including but not limited to," and are used interchangeably therewith. The terms "or" and "as used herein refer to and are used interchangeably with the term" and/or "unless the context clearly indicates otherwise. The term "such as" as used herein refers to, and is used interchangeably with, the phrase "such as, but not limited to.
The methods and apparatus of the present disclosure may be implemented in a number of ways. For example, the methods and apparatus of the present disclosure may be implemented by software, hardware, firmware, or any combination of software, hardware, firmware. The above-described sequence of steps for the method is for illustration only, and the steps of the method of the present disclosure are not limited to the sequence specifically described above unless specifically stated otherwise. Furthermore, in some embodiments, the present disclosure may also be implemented as programs recorded in a recording medium, the programs including machine-readable instructions for implementing the methods according to the present disclosure. Thus, the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.
It is also noted that in the apparatus, devices and methods of the present disclosure, components or steps may be disassembled and/or assembled. Such decomposition and/or recombination should be considered equivalent to the present disclosure.
The previous description of the disclosed aspects is provided to enable any person skilled in the art to make or use the present disclosure. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects without departing from the scope of the disclosure. Thus, the present disclosure is not intended to be limited to the aspects shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
The foregoing description has been presented for purposes of illustration and description. Furthermore, this description is not intended to limit the embodiments of the disclosure to the form disclosed herein. Although a number of example aspects and embodiments have been discussed above, a person of ordinary skill in the art will recognize certain variations, modifications, alterations, additions, and subcombinations thereof.

Claims (13)

1. A method of watermark identification, the method comprising:
Based on a standard certificate specification corresponding to a target certificate, carrying out preset field information identification on a certificate image containing the target certificate to obtain standard field information in the certificate image, wherein the preset field information refers to field information which is specified by the standard certificate specification and is to be included in the target certificate, and the standard certificate specification indicates character specifications and position specifications of preset fields which are to be included in the target certificate;
Performing character recognition on the certificate image to obtain a character recognition result;
filtering the character recognition result based on the standard field information to obtain candidate character information except the standard field information in the character recognition result;
And carrying out information identification on the candidate character information to obtain watermark information.
2. The method according to claim 1, wherein the identifying the preset field information of the document image containing the target document based on the standard document specification corresponding to the target document to obtain the standard field information in the document image includes:
determining corresponding information areas of all preset fields in the certificate image based on character specifications and position specifications of the preset fields in the standard certificate specifications;
And carrying out optical character recognition on the information area corresponding to each preset field to obtain the standard field information.
3. The method of claim 2, wherein the preset fields include a certificate number field and other fields than the certificate number field;
Determining corresponding information areas of all preset fields in the certificate image based on character specifications and position specifications of preset fields in the standard certificate specifications; performing optical character recognition on the information area corresponding to each preset field to obtain the standard field information, wherein the method comprises the following steps:
determining a certificate number area of the certificate number field in the certificate image based on a character specification and a position specification of the certificate number field in the standard certificate specification;
performing optical character recognition on the certificate number area to obtain a certificate number recognition result;
determining the position relation between the certificate number area and the information areas corresponding to other fields based on the standard certificate specification;
determining other information areas of other fields in the certificate image according to the position relation between the certificate number area and the information areas corresponding to the other fields;
Performing optical character recognition on the other information areas to obtain field information recognition results of the other fields; and the standard field information comprises a certificate number identification result and field information identification results of other fields.
4. The method of claim 3, wherein the determining a credential number area of the credential number field in the credential image based on a character specification and a location specification of the credential number field in the standard credential specification comprises:
the credential number area is determined based on the number of characters indicated in the character specification of the credential number field and the aspect ratio of the information area indicated by the location specification of the credential number field.
5. A method according to claim 3, wherein said determining other information areas of other fields in the document image based on the document number area and the positional relationship between the document number area and the information areas corresponding to the other fields comprises:
acquiring the center coordinates of the certificate number area and the length and width of the certificate number area;
Determining the center coordinates of the other information areas based on the relative position relation and the center coordinates of the certificate number areas;
determining the length and the width of the other information areas based on the length and the width of the certificate number area;
determining the inclination of the other information areas based on the inclination of the certificate number areas;
And determining the region position of the other information region in the certificate image based on the central coordinates, the length, the width and the gradient of the other information region.
6. The method of claim 3, wherein the step of performing optical character recognition on the document number area to obtain a document number recognition result further comprises:
Checking the certificate number recognition result based on the number information format indicated in the character specification of the certificate number field and the number check code contained in the certificate number recognition result to obtain a check result whether the certificate number recognition result passes the check;
And responding to the verification result that the certificate number identification result passes the verification, and executing the operation of determining the position relation between the certificate number area and the information areas corresponding to other fields based on the standard certificate specification.
7. The method according to claim 2, wherein filtering the character recognition result based on the standard field information to obtain candidate character information except the standard field information in the character recognition result includes:
Filtering the character strings corresponding to the standard field information in the character recognition result to obtain candidate character strings;
acquiring each character string region in which each character string contained in the character recognition result is located;
Determining an area filtering criterion based on an information specification of the criterion field information; filtering the information areas in the character string areas based on the area filtering standard to obtain non-standard character string areas, wherein the area filtering standard comprises at least one of a length standard, a width standard and an inclination standard;
And screening the candidate character information from the candidate character strings based on the non-standard character string area.
8. The method according to any one of claims 1 to 7, wherein after obtaining candidate character information in the character recognition result other than the standard field information, further comprising:
Filtering the candidate character information based on a preset standard to obtain a filtered character string;
And carrying out information identification on the candidate character information to obtain watermark information, wherein the method comprises the following steps:
and carrying out semantic recognition on the filtered character string to obtain the watermark information.
9. The method of claim 8, wherein the performing information recognition on the filtered character string to obtain the watermark information includes:
matching the filtered character string with a preset universal watermark character to obtain a matching result;
And responding to the character string identification result to comprise a target character string matched with the preset universal watermark character, and extracting the watermark information from the target character string.
10. The method according to any one of claims 1 to 7, wherein before the identifying the document image including the target document by the preset standard field information based on the standard document specification corresponding to the target document, the method further comprises:
Classifying the certificate image to obtain the certificate type of the target certificate included in the certificate image;
and acquiring a standard certificate specification corresponding to the target certificate based on the certificate type.
11. A method for verifying a service spare part, the method comprising:
Receiving a service request, wherein the service request comprises a certificate image serving as a service spare part and a service type, and the certificate image comprises an image of a target certificate;
Carrying out watermark recognition on the certificate image based on the watermark recognition method of any one of the claims 1-10 to obtain watermark information of the certificate image;
And based on the watermark information of the certificate image, confirming whether the service spare part is authorized to handle the service corresponding to the service type.
12. An electronic device, comprising:
A memory for storing a computer program;
A processor for executing a computer program stored in said memory, and which, when executed, implements the method of any of the preceding claims 1-11.
13. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the method of any of the preceding claims 1-11.
CN202410517321.2A 2024-04-26 2024-04-26 Watermark identification method, service spare part verification method, device and storage medium Pending CN118262363A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202410517321.2A CN118262363A (en) 2024-04-26 2024-04-26 Watermark identification method, service spare part verification method, device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202410517321.2A CN118262363A (en) 2024-04-26 2024-04-26 Watermark identification method, service spare part verification method, device and storage medium

Publications (1)

Publication Number Publication Date
CN118262363A true CN118262363A (en) 2024-06-28

Family

ID=91607713

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202410517321.2A Pending CN118262363A (en) 2024-04-26 2024-04-26 Watermark identification method, service spare part verification method, device and storage medium

Country Status (1)

Country Link
CN (1) CN118262363A (en)

Similar Documents

Publication Publication Date Title
US20200184210A1 (en) Multi-modal document feature extraction
US11055524B2 (en) Data extraction pipeline
US9626555B2 (en) Content-based document image classification
US10140511B2 (en) Building classification and extraction models based on electronic forms
CN106446816B (en) Face recognition method and device
RU2018145499A (en) AUTOMATION OF PERFORMANCE CHECK
CN109740417B (en) Invoice type identification method, invoice type identification device, storage medium and computer equipment
CN110795714A (en) Identity authentication method and device, computer equipment and storage medium
US11574492B2 (en) Efficient location and identification of documents in images
CN111353491A (en) Character direction determining method, device, equipment and storage medium
CN110647895B (en) Phishing page identification method based on login box image and related equipment
CN112487982A (en) Merchant information auditing method, system and storage medium
CN114448664A (en) Phishing webpage identification method and device, computer equipment and storage medium
CN115688107B (en) Fraud-related APP detection system and method
CN111429110A (en) Store standardization auditing method, device, equipment and storage medium
JPWO2017146229A1 (en) Information processing apparatus, suspect information generation method and program
CN114266267B (en) Automatic identification method, device and storage medium for integrating two-dimension codes, documents, certificates and faces
CN113569839B (en) Certificate identification method, system, equipment and medium
CN118262363A (en) Watermark identification method, service spare part verification method, device and storage medium
CN112115836B (en) Information verification method and device, computer readable storage medium and electronic equipment
CN114443834A (en) Method and device for extracting license information and storage medium
CN113591657A (en) OCR (optical character recognition) layout recognition method and device, electronic equipment and medium
US20200104588A1 (en) Character authenticity determination
CN114782971B (en) Financial document image recognition method and system
CN114092743B (en) Compliance detection method and device for sensitive picture, storage medium and equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination