CN116612486A - Risk group identification method and system based on image identification and graph calculation - Google Patents

Risk group identification method and system based on image identification and graph calculation Download PDF

Info

Publication number
CN116612486A
CN116612486A CN202310666659.XA CN202310666659A CN116612486A CN 116612486 A CN116612486 A CN 116612486A CN 202310666659 A CN202310666659 A CN 202310666659A CN 116612486 A CN116612486 A CN 116612486A
Authority
CN
China
Prior art keywords
template
detected
historical
image
similarity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202310666659.XA
Other languages
Chinese (zh)
Inventor
杜用
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alipay Hangzhou Information Technology Co Ltd
Original Assignee
Alipay Hangzhou Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alipay Hangzhou Information Technology Co Ltd filed Critical Alipay Hangzhou Information Technology Co Ltd
Priority to CN202310666659.XA priority Critical patent/CN116612486A/en
Publication of CN116612486A publication Critical patent/CN116612486A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/412Layout analysis of documents structured with printed lines or input boxes, e.g. business forms or tables
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/19Recognition using electronic means
    • G06V30/19007Matching; Proximity measures
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Software Systems (AREA)
  • Image Analysis (AREA)

Abstract

Disclosed is a risk group identification method based on image identification and graph calculation, the method comprising: acquiring an image of a material to be detected; extracting the prescription and the template of the material to be detected from the image; acquiring the extracted historical materials of the party; determining whether the template of the material to be detected is similar to the template of the historical material; if the template of the material to be detected is dissimilar to the template of the historical material, constructing a relationship diagram of a user of the material to be detected; and clustering the relationship graph to identify risk groups. A risk group identification system and computer readable storage medium based on image identification and graph computation are also disclosed.

Description

Risk group identification method and system based on image identification and graph calculation
Technical Field
The disclosure relates generally to the field of risk control technologies, and in particular, to a risk group identification method and system based on image identification and graph calculation.
Background
The user is required to provide the voucher material as an evidence in the operations of insurance claims, credit business application delay, exhibition period and the like. However, the credential materials are very heterogeneous and originate from different areas and institutions, relying on manual identification is extremely difficult. In the scenarios of overdue credit and fraudulent insurance by illegal means, there are often cases of falsification/counterfeiting of the credential material. The conventional mode for identifying abnormal materials mainly depends on modes such as manual expert auditing, picture tampering identification, picture similarity judgment and the like. As counter-escalation and the population behavior patterns occur, the human experience gradually fails. There is a need for a more efficient means of identification that enables rapid anomaly discovery and accurate risk group identification by combining image recognition capabilities with a graph algorithm.
It is therefore desirable to provide an improved risk group identification scheme based on image identification and graph computation.
Disclosure of Invention
The following presents a simplified summary of one or more aspects in order to provide a basic understanding of such aspects. This summary is not an extensive overview of all contemplated aspects, and is intended to neither identify key or critical elements of all aspects nor delineate the scope of any or all aspects. Its sole purpose is to present some concepts of one or more aspects in a simplified form as a prelude to the more detailed description that is presented later.
The disclosure provides a risk group identification method based on image identification and graph calculation, which comprises the following steps: acquiring an image of a material to be detected; extracting the prescription and the template of the material to be detected from the image; acquiring the extracted historical materials of the party; determining whether the template of the material to be detected is similar to the template of the historical material; if the template of the material to be detected is dissimilar to the template of the historical material, constructing a relationship diagram of a user of the material to be detected; and clustering the relationship graph to identify risk groups.
In an embodiment of the present disclosure, the method further comprises: before extracting the prescription and the template of the material to be detected, the material to be detected is initially classified.
In an embodiment of the present disclosure, extracting the issuer of the material to be detected and the stencil from the image is performed by Optical Character Recognition (OCR).
In an embodiment of the present disclosure, obtaining the extracted historical material of the sender includes at least one of: acquiring all the historical materials of the extracted party; or obtain historical material of the extracted party over a predetermined period of time.
In an embodiment of the present disclosure, determining whether the template of the material to be detected is similar to the template of the historical material further comprises: determining the similarity between the template of the material to be detected and the template of the historical material; if the similarity meets a preset threshold, the template of the material to be detected is similar to the template of the historical material; and if the similarity does not meet the preset threshold, the template of the material to be detected is dissimilar to the template of the historical material.
In an embodiment of the present disclosure, determining whether the template of the material to be detected is similar to the template of the historical material is performed based on deep learning.
In an embodiment of the present disclosure, the method further comprises: and performing cross auditing on the identified risk group.
In an embodiment of the disclosure, the template includes a body content, official seal, layout of the material to be detected.
The disclosure also provides a risk group identification system based on image identification and graph calculation, comprising: the image acquisition module acquires an image of the material to be detected; the extraction module is used for extracting the prescription and the template of the material to be detected from the image; the historical material acquisition module is used for acquiring the extracted historical material of the party; a template similarity determining module for determining whether the template of the material to be detected is similar to the template of the historical material; a relationship graph construction module configured to: if the template of the material to be detected is dissimilar to the template of the historical material, constructing a relationship diagram of a user of the material to be detected; and a risk group identification module for clustering the relationship graph to identify a risk group.
In an embodiment of the present disclosure, the system further comprises an initial classification module configured to: before extracting the prescription and the template of the material to be detected, the material to be detected is initially classified.
In an embodiment of the disclosure, the template similarity determination module is further configured to: determining the similarity between the template of the material to be detected and the template of the historical material; if the similarity meets a preset threshold, the template of the material to be detected is similar to the template of the historical material; and if the similarity does not meet the preset threshold, the template of the material to be detected is dissimilar to the template of the historical material.
In an embodiment of the present disclosure, the system further comprises a cross-auditing module configured to cross-audit the identified risk groups.
The present disclosure also proposes a computer readable storage medium storing a computer program executable by a processor to perform the aforementioned image recognition and graph calculation based risk group recognition method.
According to the technical scheme, the image anomaly recognition capability and the image calculation capability based on deep learning are combined in the risk control scene, so that the scale of a relational network for recognizing risk groups is greatly reduced, the calculation efficiency is improved, and the storage and calculation cost is greatly reduced. Meanwhile, through the examination of a post expert, the automatic recognition capability and the manual cross comparison analysis capability of machine learning are exerted, and the recognition accuracy of the risk group is further improved.
Drawings
The features, nature, and advantages of the present disclosure will become more apparent from the detailed description set forth below when taken in conjunction with the drawings. In the drawings, like reference numerals designate corresponding parts throughout the different views. It is noted that the drawings described are only schematic and are non-limiting. In the drawings, the size of some of the elements may be exaggerated and not drawn on scale for illustrative purposes.
Fig. 1 shows a system schematic diagram of risk group identification based on image identification and graph calculation according to an embodiment of the present disclosure.
Fig. 2 illustrates an exemplary flowchart of a risk group identification method based on image identification and graph computation in accordance with an embodiment of the present disclosure.
FIG. 3 illustrates an exemplary process of determining template similarity in accordance with an embodiment of the present disclosure.
Fig. 4 illustrates a risk group identification architecture based on image identification and graph computation in accordance with an embodiment of the present disclosure.
Fig. 5 shows a block diagram of a risk group identification system based on image identification and graph computation in accordance with an embodiment of the present disclosure.
Fig. 6 illustrates a device block diagram of a risk group identification system including image recognition and graph-based computation in accordance with an embodiment of the present disclosure.
Detailed Description
For the purposes of promoting an understanding of the principles and advantages of the disclosure, reference will now be made to the embodiments illustrated in the drawings and specific language will be used to describe the same. In the following detailed description, numerous specific details are set forth in order to provide a thorough understanding of the described exemplary embodiments. It will be apparent, however, to one skilled in the art, that the described embodiments may be practiced without some or all of these specific details. In other exemplary embodiments, well-known structures have not been described in detail in order to avoid unnecessarily obscuring the concepts of the present disclosure. It should be understood that the specific embodiments described herein are merely illustrative of the present disclosure and are not intended to limit the present disclosure. Meanwhile, the various aspects described in the embodiments may be arbitrarily combined without conflict.
At present, the identification schemes of abnormal materials in the businesses of credit, insurance claim settlement and the like are mainly divided into three types: manual review, tamper identification based on image information, and image similarity determination.
For manual auditing, auditing professionals mainly pay attention to the following types of exceptions in the auditing process: (1) The template is counterfeited, a user downloads the template through PS or Internet and fills in the template after the template is modified, and an auditing expert is very familiar with the material template because of historical auditing experience, so that risks such as the fact that the image template is not in line with or the words are not in line with the correct fonts can be accurately identified; (2) The key diagnosis error description, the auditing specialist is familiar with the description modes of various medical diagnosis and proving materials, and can identify the materials with abnormal description; (3) And the official seal identification, namely the official seal with legal effect has a specific format, an auditing expert can identify whether the format of the official seal is consistent, and the coverage of the seal to the characters, the blurring of inkpad and the like in the stamping process can be captured as abnormal characteristics by the auditing expert. The manual auditing mode has the defects that the manual auditing mode is completely dependent on personal experience of auditing specialists, and the formats of template libraries, official stamps and the like familiar to single specialists are limited, so that carefully processed pictures cannot be identified.
For tamper identification based on image information, EXIF (Exchangeable Image File Format) is information of recording attribute information and shooting data of a digital photograph set specifically for the digital camera photograph. The EXIF may be attached to a file such as JPEG, TIFF, RIFF, to which contents of photographing information about the digital camera and version information of an index map or image processing software are added. The EXIF information is extracted by a machine, so that whether the same picture or whether the same picture is modified by image processing software can be accurately judged. The disadvantage of EXIF information identification is that the information can be manually deleted and the identification mode will fail in a somewhat resistant scenario.
For image similarity judgment, a deep learning method such as a convolutional neural network is generally used for judging whether the images have splicing, removing and copying conditions or a traditional image comparison method is used for judging the cross similarity between the images. The recognition in this way is highly automated, with the disadvantage that the machine costs of the image recognition algorithm are high and a large number of pre-labeled samples are required as training sets. While material counterfeits are a small sample scenario, often not having enough negative samples for training.
The technical scheme combines the image anomaly recognition capability and the image calculation capability based on deep learning, improves the calculation efficiency and greatly reduces the storage and calculation cost.
Fig. 1 shows a system diagram 100 of risk group identification based on image identification and graph computation in accordance with an embodiment of the present disclosure.
As shown, after the risk group identification is started, image classification may be performed first.
In an embodiment of the present disclosure, the system may receive the user-submitted material to be detected and categorize the user-submitted material based on the structured information of the user-interactive interface. For example, the material to be detected may be classified as medical evidence (discharge nubs, diagnostic certificates), poverty certificates, disability certificates, and the like. In a specific implementation, other material types can be set according to actual situations.
Subsequently, optical Character Recognition (OCR) may be performed on the classified material. Specifically, the material may be subjected to text information processing by operations such as noise removal, inclination correction, layout analysis, character recognition, and the like. Through OCR recognition, material information may be extracted, including mainly the material's sender (e.g., diagnosis certified hospital), main descriptive content, official seal, layout, etc.
After extracting the material information, the historical material of the same party can be obtained according to the material party. In some implementations, all of the same category of historical material that is presented by the same party may be obtained (e.g., by retrieving a database). In other implementations, the same category of historical material that the same party has over a particular period of time may be obtained. For example, all diagnostic certificates that the same hospital for diagnosis had in the past month may be obtained.
If the template similarity judgment confirms that the template difference between the material to be detected and the historical material is large, the material can be regarded as abnormal material. The relationship graph can then be constructed with the submitting user of the abnormal material as a node and the relationship of the submitting user with other users as an edge.
Regarding abnormal material identification, the traditional mode is to identify through supervised learning, and identify materials with larger deviation degree through manually marking the correct templates in the materials. This approach requires manual pre-marking, and the business practice does not know what the correct template for the material is, and thus pre-marking is not possible. Therefore, the abnormal material identification mode effectively overcomes the defects of the prior art.
Because the material is counterfeited with high cost, the amount of the single money applied for claim settlement, exhibition period and the like is relatively low, and the claim settlement or exhibition period of the higher amount triggers the under-line investigation, the situation of multiple cases with different identities and multiple materials with the same template can occur in reality. And material submitters often have multiple types of associations. Selecting materials with larger similarity deviation with most material templates, searching related users under the same scene by taking submitters of abnormal materials as seed nodes, and constructing a relation graph by taking the relation among the users as edges.
After the relationship graph is constructed, risk groups can be found from the relationship graph using an unsupervised learning clustering algorithm.
When the traditional method based on the relational clustering is used for carrying out group identification, because a large number of normal users are associated objectively, the risk group is difficult to directly identify. The technical scheme of the present disclosure greatly reduces the scale of the relational network, reasonably controls the computing resources and improves the efficiency.
And finally, delivering the identified risk group to a manual auditing expert for cross auditing, and if necessary, using off-line investigation to identify authenticity.
After the manual expert audit is set to the machine audit judgment, the professional capability of the manual expert cross comparison analysis can be fully utilized, so that the identification accuracy is further improved.
Fig. 2 illustrates an exemplary flowchart of a risk group identification method 200 based on image identification and graph computation in accordance with an embodiment of the present disclosure.
The method 200 begins at step 202. At step 202, an image of the material to be inspected is acquired.
In some implementations, an image of the material to be detected may be acquired by an image capture device (such as a camera, webcam, etc.). For example, a presenter of the material to be detected may be prompted to place the material to be detected in a particular location or range for the image capture device to acquire an image.
In other implementations, the user may be prompted through the interactive interface to upload an image of the material to be detected. Meanwhile, the image uploaded by the user may be required to meet certain requirements, such as requirements regarding image size, definition, and the like. In some embodiments, the material uploaded by the user may also be image pre-processed, such as cropping, scaling, brightness correction, and so forth.
In a medical credential scenario, the material to be detected may include, for example, a discharge nodule, a diagnostic certificate, and the like.
In step 204, the recipe and template of the material to be inspected are extracted from the image.
For example, the sender of the diagnostic certificate may be a specific hospital, the template may be the format used by the diagnostic certificate of the hospital, the official seal of the hospital, the main descriptive content of the diagnostic certificate, and so on. It should be noted that different material suppliers may use different templates. For example, diagnostic certificates prescribed by different hospitals may use different formats and different descriptive content, and be covered with different official chapters.
Optionally, prior to step 204, the material to be detected may be initially classified. For example, the material to be detected may be classified as medical evidence, poverty proof, disability certification, and the like. For example, the initial classification may be based on classification prompts given by the material submitter. Alternatively, the initial classification may also be made according to the content of the material to be detected. For example, the material to be detected may be initially classified as medical credentials prior to step 204. And after extracting the sender of the material and the template in step 204, the specific classification of the material as a diagnostic proof in the medical proof can be known from the information (e.g., the main descriptive content) in the template.
At step 206, the extracted historical material of the party is obtained.
After determining the party of the material to be detected, a history material having the same party as the material to be detected can be found.
For example, after a hospital prescribing a diagnostic certificate is determined, a historical diagnostic certificate prescribed by the hospital may be found.
In an embodiment of the present disclosure, obtaining the extracted historical material of the sender includes at least one of: acquiring all the historical materials of the extracted party; or obtain historical material of the extracted party over a predetermined period of time. For example, all the historical diagnostic certificates of the hospital for which the diagnostic certificate to be detected is made may be acquired, or all the historical diagnostic certificates of the hospital for which the diagnostic certificate to be detected is made may be acquired within a specific period of time. In some cases, if a diagnostic proof template used by a hospital has been changed, a historical diagnostic proof having the same template as the diagnostic proof to be detected may be obtained.
At step 208, it is determined whether the template of the material to be detected is similar to the template of the historical material.
In an embodiment of the present disclosure, determining whether the template of the material to be detected is similar to the template of the historical material further comprises: determining the similarity between the template of the material to be detected and the template of the historical material; if the similarity meets a preset threshold, the template of the material to be detected is similar to the template of the historical material; and if the similarity does not meet the preset threshold, the template of the material to be detected is dissimilar to the template of the historical material.
The detailed process for template similarity determination will be further described below in conjunction with fig. 3.
If the template of the material to be detected is not similar to the template of the history material, a relationship diagram of the user of the material to be detected is constructed, step 210.
If the template of the material to be detected is dissimilar to that of the history material, it is explained that the deviation of the material to be detected from most of the history material is large, whereby the material to be detected can be regarded as an abnormal material (i.e., a material that is likely to be tampered with or counterfeited). For abnormal material, a user submitting the material may be found and a relationship graph constructed based on the user's relationship network. Building a relationship graph is essentially a population behavior pattern in the reverse-reduced real world. And there are often multiple types of associations (e.g., fund relationships, labor relationships, relatives, etc.) between individual users in a community.
Finally, at step 212, the relationship graph is clustered to identify risk groups.
Clustering refers to partitioning a data set into different classes or clusters according to some particular criteria (e.g., distance) such that the similarity of data within the same cluster is as large as possible, and the variability of data in different clusters is as large as possible. Common clustering algorithms include K-means clustering, mean shift clustering, DBSCAN clustering, hierarchical clustering, and the like. Clustering algorithms are well known in the machine learning art and are not described in detail herein.
It should be noted that, in the conventional algorithm based on relational clustering, it is difficult to directly identify risk groups because there are objectively a large number of normal users (i.e., users who submit normal materials) that have associations in the group identification. In contrast, the technical scheme of the present disclosure greatly reduces the scale of the relational network and saves the computing resources.
Optionally, after the risk group is identified, the identified risk group may be cross-audited (e.g., manually audited). After the manual expert audit is set to the machine audit judgment, the professional capability of manual expert cross comparison analysis can be fully utilized, and the accuracy and reliability of risk group identification are further improved.
FIG. 3 illustrates an exemplary process 300 of determining template similarity in accordance with an embodiment of the present disclosure.
As shown in fig. 3, after determining the party of the material to be detected, the history material of the party can be found. Thus, a template of a material to be detected and a template of a history material can be obtained.
After obtaining the templates of the material to be detected and the templates of the history material, the template similarity judgment can be performed on both.
In particular implementations, the template similarity may be determined in a variety of different ways. For ease of illustration, two ways of determining template similarity are shown in FIG. 3.
In the first approach, the similarity (e.g., distance similarity, cosine similarity, etc.) of the template of the material to be detected and the template of the history material is first determined. The determined similarity is then compared to a preset similarity threshold. If the determined similarity is smaller than the similarity threshold, the similarity degree between the material to be detected and the historical material is lower, and the templates of the two materials are considered to be dissimilar. Otherwise, if the determined similarity is greater than or equal to the similarity threshold, it is indicated that the similarity degree between the material to be detected and the historical material is higher, and at this time, the templates of the two materials can be considered to be similar.
In a second approach, the degree of deviation of the template of the material to be detected from the template of the history material is first determined. The determined degree of deviation is then compared with a preset degree of deviation threshold. If the determined deviation degree is smaller than the deviation degree threshold value, the deviation degree of the material to be detected and the historical material is lower, and the templates of the material to be detected and the historical material can be considered to be similar. Otherwise, if the determined deviation degree is greater than or equal to the deviation degree threshold, the deviation degree of the material to be detected and the historical material is higher, and the templates of the material to be detected and the historical material are considered to be dissimilar.
Finally, the determination result may be output.
It should be noted that although two ways of judging template similarity are shown in fig. 3, the present disclosure is not limited thereto. In practical implementation, a person skilled in the art can determine the template similarity in different ways according to practical requirements.
Fig. 4 illustrates a risk group identification architecture 400 based on image identification and graph computation in accordance with an embodiment of the present disclosure.
As shown in fig. 4, the risk group identification architecture 400 of the present disclosure includes an image acquisition phase, an information extraction phase, a historical material acquisition phase, a template similarity determination phase, a relationship graph construction phase, and a group identification phase.
In the image acquisition phase, an image of the material to be detected may be acquired.
For example, the user may submit the material to be detected. An image of the submitted material may then be acquired based on the material. As described above, the image of the material may be captured by the image capture device or uploaded by the user through the interactive interface.
In the information extraction stage, the prescription and template of the material to be detected can be extracted from the image.
Specifically, information in the image may be extracted by optical character recognition (Optical Character Recognition, OCR).
OCR refers to the process of analyzing, identifying and processing the images of text data and acquiring text and layout information. Conventional OCR technology includes image preprocessing, text detection, text recognition, and the like. Image preprocessing is typically done to correct for imaging problems of the image. Common preprocessing geometric transformations, distortion correction, blur removal, image enhancement, ray correction, and the like. Text detection mainly detects the location and the range of text and the layout thereof, and generally also comprises layout analysis, text line detection and the like. The text detection mainly solves the problem that the text exists and the range of the text is large. Text recognition is to recognize text content based on text detection and convert the text content into text information. The recognized text typically needs to be checked again to ensure its correctness.
The OCR technology is well known in the machine learning field and will not be described in detail here.
Optionally, the material to be inspected may also be initially classified (not shown) prior to extracting the issuer of the material to be inspected and the template.
Subsequently, in a historical material acquisition stage, historical material of the extracted material recipe may be acquired.
Then, in the template similarity determination stage, the template similarity between the material to be detected and the historical material can be judged.
If the material to be detected is similar to the template of the history material, it is indicated that the material to be detected is normal (i.e., material that has not been tampered with or counterfeited).
If the material to be detected is not similar to the template of the history material, it is indicated that the material to be detected is an abnormal material (i.e., a material that is likely to be tampered with or counterfeited). At this time, a relationship graph construction stage may be entered in which a relationship graph may be constructed based on the user submitting the abnormal material and the relationship of the user to other users.
Thereafter, in a population identification phase, the relationship graphs may be clustered to identify risk populations.
Optionally, after the risk group is identified, the risk group may be delivered to a manual audit expert for cross audit (not shown in the figure), so that accuracy of risk group identification may be further improved.
And finally, outputting a risk group identification result.
Although fig. 4 shows a particular stage of the risk group identification architecture, it should be noted that this stage division is merely exemplary and not limiting.
Fig. 5 illustrates a block diagram of a risk group identification system 500 based on image identification and graph computation in accordance with an embodiment of the present disclosure.
Referring to fig. 5, a system 500 may include an image acquisition module 502, an extraction module 504, a historical material acquisition module 506, a template similarity determination module 508, a relationship diagram construction module 510, and a risk group identification module 512. Each of these modules may be directly or indirectly connected to or in communication with each other over one or more buses 514.
The image acquisition module 502 may acquire an image of the material to be detected.
The extraction module 504 may extract the recipe and template of the material to be detected from the image.
In an embodiment of the present disclosure, extracting the issuer of the material to be detected and the stencil from the image is performed by Optical Character Recognition (OCR).
In one embodiment of the present disclosure, the template includes the body content, official seal, layout of the material to be detected.
The historical material acquisition module 506 may acquire the extracted historical material of the party.
In an embodiment of the present disclosure, obtaining the extracted historical material of the sender includes at least one of: acquiring all the historical materials of the extracted party; or obtain historical material of the extracted party over a predetermined period of time.
The template similarity determination module 508 may determine whether the templates of the material to be detected are similar to the templates of the historical materials.
In an embodiment of the present disclosure, the template similarity determination module 508 may be further configured to: determining the similarity between the template of the material to be detected and the template of the historical material; if the similarity meets a preset threshold, the template of the material to be detected is similar to the template of the historical material; and if the similarity does not meet the preset threshold, the template of the material to be detected is dissimilar to the template of the historical material.
The relationship graph construction module 510 may be configured to: if the templates of the materials to be detected are dissimilar to the templates of the historical materials, a relationship diagram of the users of the materials to be detected is constructed.
The risk group identification module 512 may cluster the relationship graph to identify the risk group.
While specific modules of the system 500 are shown in fig. 5, it should be understood that these modules are merely exemplary and not limiting. In different implementations, one or more of these modules may be combined, split, removed, or additional modules added. For example, in some implementations, the image acquisition module 502 and the extraction module 504 may be combined into a single module. In some implementations, the system 500 may also include additional modules. For example, the system 500 may also include an initial classification module (not shown) configured to: before extracting the prescription and the template of the material to be detected, the material to be detected is initially classified. In still other implementations, the system 500 can further include a cross-auditing module configured to cross-audit the identified risk groups.
Fig. 6 illustrates a block diagram of an apparatus 600 including a risk group identification system based on image identification and graph computation in accordance with an embodiment of the present disclosure.
The apparatus illustrates a general hardware environment in which the present disclosure may be applied in accordance with exemplary embodiments of the present disclosure.
An apparatus 600, which is an exemplary embodiment of a hardware apparatus that may be applied to aspects of the present disclosure, will now be described with reference to fig. 6. Device 600 may be any machine configured to perform processes and/or calculations and may be, but is not limited to, a workstation, a server, a desktop computer, a laptop computer, a tablet computer, a Personal Digital Assistant (PDA), a smart phone, or any combination thereof.
Device 600 may include components that may be connected to bus 612 or communicate with bus 612 via one or more interfaces. For example, device 600 may include bus 612, processor 602, memory 604, input devices 608, and output devices 610, among others.
The processor 602 may be any type of processor and may include, but is not limited to, a general purpose processor and/or a special purpose processor (e.g., a special purpose processing chip), an intelligent hardware device (e.g., a general purpose processor, DSP, CPU, microcontroller, ASIC, FPGA, programmable logic device, discrete gate or transistor logic, discrete hardware components, or any combination thereof). In some cases, the processor 602 may be configured to operate the memory array using a memory controller. In other cases, a memory controller (not shown) may be integrated into the processor 602. The processor 602 may be responsible for managing the bus and general processing, including the execution of software stored on the memory. Processor 602 can also be configured to perform various functions described herein in connection with risk group identification based on image identification and graph calculation. For example, the processor 602 may be configured to: acquiring an image of a material to be detected; extracting the prescription and the template of the material to be detected from the image; acquiring the extracted historical materials of the party; determining whether the template of the material to be detected is similar to the template of the historical material; if the template of the material to be detected is dissimilar to the template of the historical material, constructing a relationship diagram of a user of the material to be detected; and clustering the relationship graph to identify risk groups.
Memory 604 may be any storage device that may enable data storage. Memory 604 may include, but is not limited to, a magnetic disk drive, an optical storage device, a solid state memory, a floppy disk, a hard disk, magnetic tape, or any other magnetic medium, an optical disk, or any other optical medium, a ROM (read only memory), a RAM (random access memory), a cache memory, and/or any other memory chip or cartridge, and/or any other medium from which a computer may read data, instructions, and/or code. Memory 604 may store computer-executable software 606 comprising computer-readable instructions that, when executed, cause the processor to perform various functions described herein in connection with image recognition and graph-calculation-based risk group recognition.
Input device 608 may be any type of device that may be used to input information.
Output device 610 may be any type of device for outputting information. In one case, the output device 610 may be any type of output device that can display information.
According to the technical scheme, the material anomaly identification is used as a starting point, the mining of the user behaviors and the relation structure is overlapped, the behavior mode of the risk group can be completely restored, and the anomaly mining capability of the image deep learning and the behavior depicting capability of the relation structure are well combined. Meanwhile, the technical scheme has stronger robustness, and because the scheme gives consideration to the image abnormality judgment and relation judgment capability and is treated by group properties, the risk group cannot sense the prevention and control strategy at a single point, so that the wind control resistance can be reduced. In addition, the technical scheme of the present disclosure can greatly reduce the scale of the relational network and the storage and calculation cost of the graph calculation.
The detailed description set forth above in connection with the appended drawings describes examples and is not intended to represent all examples that may be implemented or fall within the scope of the claims. The terms "example" and "exemplary" when used in this specification mean "serving as an example, instance, or illustration," and not "over or superior to other examples.
Reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one embodiment of the present disclosure. Thus, the use of such phrases may not merely refer to one embodiment. Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.
The previous description is provided to enable any person skilled in the art to practice the various aspects described herein. Various modifications to these aspects will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other aspects. Thus, the claims are not intended to be limited to the aspects shown herein, but is to be accorded the full scope consistent with the language claims, wherein reference to an element in the singular is not intended to mean "one and only one" unless specifically so stated, but rather "one or more". The term "some" means one or more unless specifically stated otherwise. The elements of the various aspects described throughout this disclosure are all structural and functional equivalents that are presently or later to be known to those of ordinary skill in the art are expressly incorporated herein by reference and are intended to be encompassed by the claims.
It is also noted that the embodiments may be described as a process which is depicted as a flowchart, a flow diagram, a structure diagram, or a block diagram. Although a flowchart may describe the operations as a sequential process, many of the operations can be performed in parallel or concurrently. Additionally, the order of the operations may be rearranged.
While various embodiments have been illustrated and described, it is to be understood that the embodiments are not limited to the precise arrangements and instrumentalities described above. Various modifications, substitutions, and improvements apparent to those skilled in the art may be made in the arrangement, operation, and details of the apparatus disclosed herein without departing from the scope of the claims.

Claims (13)

1. A risk group identification method based on image identification and graph calculation, comprising:
acquiring an image of a material to be detected;
extracting the prescription and the template of the material to be detected from the image;
acquiring the extracted historical materials of the party;
determining whether the template of the material to be detected is similar to the template of the historical material;
if the template of the material to be detected is dissimilar to the template of the historical material, constructing a relationship diagram of a user of the material to be detected; and
clustering the relationship graph to identify risk groups.
2. The method of claim 1, further comprising:
before extracting the prescription and the template of the material to be detected, the material to be detected is initially classified.
3. The method of claim 1, extracting the out-side and template of the material to be detected from the image is performed by Optical Character Recognition (OCR).
4. The method of claim 1, obtaining the extracted historical material of the sender comprising at least one of:
acquiring all the historical materials of the extracted party; or (b)
The historical material of the extracted party over a predetermined period of time is obtained.
5. The method of claim 1, determining whether the template of the material to be detected is similar to the template of the historical material further comprising:
determining the similarity between the template of the material to be detected and the template of the historical material;
if the similarity meets a preset threshold, the template of the material to be detected is similar to the template of the historical material; and
and if the similarity does not meet the preset threshold, the template of the material to be detected is dissimilar to that of the historical material.
6. The method of claim 1, determining whether the template of the material to be detected is similar to the template of the historical material is performed based on deep learning.
7. The method of claim 1, further comprising: and performing cross auditing on the identified risk group.
8. The method of claim 1, the template comprising a body content, official seal, layout of the material to be detected.
9. A risk group identification system based on image identification and graph computation, comprising:
the image acquisition module acquires an image of the material to be detected;
the extraction module is used for extracting the prescription and the template of the material to be detected from the image;
the historical material acquisition module is used for acquiring the extracted historical material of the party;
a template similarity determining module for determining whether the template of the material to be detected is similar to the template of the historical material;
a relationship graph construction module configured to: if the template of the material to be detected is dissimilar to the template of the historical material, constructing a relationship diagram of a user of the material to be detected; and
and the risk group identification module clusters the relation graph to identify a risk group.
10. The system of claim 9, further comprising an initial classification module configured to: before extracting the prescription and the template of the material to be detected, the material to be detected is initially classified.
11. The system of claim 9, the template similarity determination module further configured to:
determining the similarity between the template of the material to be detected and the template of the historical material;
if the similarity meets a preset threshold, the template of the material to be detected is similar to the template of the historical material; and
and if the similarity does not meet the preset threshold, the template of the material to be detected is dissimilar to that of the historical material.
12. The system of claim 9, further comprising a cross-auditing module configured to cross-audit the identified risk group.
13. A computer readable storage medium storing a computer program executable by a processor to perform the method of any one of claims 1-8.
CN202310666659.XA 2023-06-06 2023-06-06 Risk group identification method and system based on image identification and graph calculation Pending CN116612486A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310666659.XA CN116612486A (en) 2023-06-06 2023-06-06 Risk group identification method and system based on image identification and graph calculation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310666659.XA CN116612486A (en) 2023-06-06 2023-06-06 Risk group identification method and system based on image identification and graph calculation

Publications (1)

Publication Number Publication Date
CN116612486A true CN116612486A (en) 2023-08-18

Family

ID=87685328

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310666659.XA Pending CN116612486A (en) 2023-06-06 2023-06-06 Risk group identification method and system based on image identification and graph calculation

Country Status (1)

Country Link
CN (1) CN116612486A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117114420A (en) * 2023-10-17 2023-11-24 南京启泰控股集团有限公司 Image recognition-based industrial and trade safety accident risk management and control system and method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117114420A (en) * 2023-10-17 2023-11-24 南京启泰控股集团有限公司 Image recognition-based industrial and trade safety accident risk management and control system and method
CN117114420B (en) * 2023-10-17 2024-01-05 南京启泰控股集团有限公司 Image recognition-based industrial and trade safety accident risk management and control system and method

Similar Documents

Publication Publication Date Title
US11328365B2 (en) Systems and methods for insurance fraud detection
EP4052177A1 (en) System and methods for authentication of documents
Asghar et al. Copy-move and splicing image forgery detection and localization techniques: a review
US9639751B2 (en) Property record document data verification systems and methods
US8064703B2 (en) Property record document data validation systems and methods
CN111191568B (en) Method, device, equipment and medium for identifying flip image
CN111886842A (en) Remote user authentication using threshold-based matching
KR20090084968A (en) Digital image archiving and retrieval using a mobile device system
WO2022126978A1 (en) Invoice information extraction method and apparatus, computer device and storage medium
CN111932363A (en) Identification and verification method, device, equipment and system for authorization book
CN112989990B (en) Medical bill identification method, device, equipment and storage medium
CN112561907B (en) Video tampering operation detection method and device based on double-current network
CN116612486A (en) Risk group identification method and system based on image identification and graph calculation
CN114140649A (en) Bill classification method, bill classification device, electronic apparatus, and storage medium
US11715310B1 (en) Using neural network models to classify image objects
JP2008009617A (en) System, program, and method for individual biological information collation
US20070217691A1 (en) Property record document title determination systems and methods
Alherbawi et al. JPEG image classification in digital forensic via DCT coefficient analysis
CN113837169B (en) Text data processing method, device, computer equipment and storage medium
CN113807256A (en) Bill data processing method and device, electronic equipment and storage medium
CN115050042A (en) Claims data entry method and device, computer equipment and storage medium
CN114648813A (en) Handwritten signature identification method and device based on deep learning
CN112053051A (en) Due diligence application system and information processing method thereof
CN110751110A (en) Identity image information verification method, device, equipment and storage medium
CN114820211B (en) Method, device, computer equipment and storage medium for checking and verifying quality of claim data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination