WO2023078044A1 - 用于检验申报信息真实性的方法、系统、设备及介质 - Google Patents

用于检验申报信息真实性的方法、系统、设备及介质 Download PDF

Info

Publication number
WO2023078044A1
WO2023078044A1 PCT/CN2022/124815 CN2022124815W WO2023078044A1 WO 2023078044 A1 WO2023078044 A1 WO 2023078044A1 CN 2022124815 W CN2022124815 W CN 2022124815W WO 2023078044 A1 WO2023078044 A1 WO 2023078044A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
information
declaration
item
text
Prior art date
Application number
PCT/CN2022/124815
Other languages
English (en)
French (fr)
Inventor
张丽
邢宇翔
唐虎
孙运达
傅罡
李强
张
吴武斌
Original Assignee
同方威视技术股份有限公司
清华大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 同方威视技术股份有限公司, 清华大学 filed Critical 同方威视技术股份有限公司
Publication of WO2023078044A1 publication Critical patent/WO2023078044A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/254Fusion techniques of classification results, e.g. of results related to same input data
    • G06F18/256Fusion techniques of classification results, e.g. of results related to same input data of results relating to different input data, e.g. multimodal recognition
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N23/00Investigating or analysing materials by the use of wave or particle radiation, e.g. X-rays or neutrons, not covered by groups G01N3/00 – G01N17/00, G01N21/00 or G01N22/00
    • G01N23/02Investigating or analysing materials by the use of wave or particle radiation, e.g. X-rays or neutrons, not covered by groups G01N3/00 – G01N17/00, G01N21/00 or G01N22/00 by transmitting the radiation through the material
    • G01N23/04Investigating or analysing materials by the use of wave or particle radiation, e.g. X-rays or neutrons, not covered by groups G01N3/00 – G01N17/00, G01N21/00 or G01N22/00 by transmitting the radiation through the material and forming images of the material
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • G06N3/0455Auto-encoder networks; Encoder-decoder networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0464Convolutional networks [CNN, ConvNet]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0475Generative networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/26Segmentation of patterns in the image field; Cutting or merging of image elements to establish the pattern region, e.g. clustering-based techniques; Detection of occlusion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/806Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/40Document-oriented image-based pattern recognition
    • G06V30/41Analysis of document content
    • G06V30/413Classification of content, e.g. text, photographs or tables
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2223/00Investigating materials by wave or particle radiation
    • G01N2223/40Imaging
    • G01N2223/401Imaging image processing
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2223/00Investigating materials by wave or particle radiation
    • G01N2223/60Specific applications or type of materials
    • G01N2223/639Specific applications or type of materials material in a container
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N2223/00Investigating materials by wave or particle radiation
    • G01N2223/60Specific applications or type of materials
    • G01N2223/643Specific applications or type of materials object on conveyor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present disclosure relates to the technical field of artificial intelligence, and more specifically, to a method, device, device and medium for verifying the authenticity of declared information.
  • the inventor found that at least the following problems existed in the prior art: when the X-ray detection device scans the article for inspection, due to the same declaration category (for example, the declared tax number category, in the customs declaration form
  • the items under the product number may include many goods with different names and specifications, and the performance of these items on X-ray images (hereinafter referred to as machine inspection radiation images) is quite different, which increases the difficulty of inspection.
  • the embodiments of the present disclosure provide a method, device, device, and medium for verifying the authenticity of declared information by combining computer-based radiation images and declared information for intelligent verification, which can improve verification accuracy and improve verification efficiency.
  • An aspect of the embodiments of the present disclosure provides a method for verifying the authenticity of declaration information.
  • the method includes: acquiring a machine-checked radiation image obtained by scanning a container loaded with items; acquiring declaration information for declaring items in the container; and performing image information on items in the machine-checked radiation image Recognition to obtain the image features corresponding to the machine-checked radiation image; identifying the text information of the item in the declaration information to obtain the text features corresponding to the declaration information, wherein the text features are used to characterize the The declaration category to which the items in the declaration information belong; the image features are used as input information, and the text features are used as externally introduced features to identify the declaration category of the items in the container; and when at least When the declaration category of an item does not belong to the declaration category in the declaration information, it is determined that the declaration information is doubtful.
  • the image features include N1 first image feature vectors respectively corresponding to image information of N1 items in the machine inspection radiation image, where N1 is an integer greater than or equal to 1.
  • the identifying the image information of the item in the machine-detected radiation image to obtain the image features corresponding to the machine-detected radiation image includes: using a target detection algorithm to convert the machine-detected radiation image to The different items in are divided into independent image blocks to obtain N1 image blocks; extracting the second image feature vector corresponding to each of the image blocks; and based on the second image feature vector corresponding to each of the image blocks, The first image feature vector corresponding to the image information of the item represented by the image block is obtained.
  • the extracting the second image feature vector corresponding to each of the image blocks includes: using an image feature extraction module to perform image recognition on each of the image blocks to obtain the corresponding The second image feature vector; wherein, the image feature extraction module includes a convolutional neural network.
  • the image feature extraction module includes a network structure in which resnet is used as a basic network and SE-block is added after the resnet pooling layer.
  • the obtaining the first image feature vector corresponding to the image information of the item represented by the image block based on the second image feature vector corresponding to each image block includes: Obtaining position information of each image block in the machine-detected radiation image; and obtaining the first image feature vector based on the second image feature vector and the position information corresponding to the same image block .
  • the obtaining the first image feature vector based on the second image feature vector and the position information corresponding to the same image block includes: using the position of the image block information processing the second image feature vector; inputting the processed second image feature vector into an encoder; and obtaining an output of the encoder to obtain the first image feature vector corresponding to the image block .
  • using the image feature as input information and the text feature as an externally introduced feature to identify the declared category of the item in the container includes: using the image feature as a cross-modal The input information of the modal decoder, the text feature is used as the external introduction feature of the attention mechanism of the cross-modal decoder, and the declared category of the item in the container is screened by the cross-modal decoder.
  • the encoder is jointly trained with the cross-modal decoder.
  • the encoder adopts a transformer encoder model.
  • the cross-modal decoder adopts a transformer decoder model.
  • the text features include text feature vectors respectively corresponding to N2 items in the declaration information, where N2 is an integer greater than or equal to 1.
  • the identifying the text information of the items in the declaration information to obtain the text features corresponding to the declaration information includes: extracting the name information and specifications of each item in the declaration information Model information; for each item, the name information is processed into a first sentence, and the specification and model information is processed into a second sentence; the first sentence and the second sentence corresponding to the same item are processed
  • the text feature extraction projection module use the text feature extraction projection module to classify the declaration category to which the item belongs; and use the text feature extraction projection module for each category output result of the item as the corresponding item
  • the text feature vectors wherein, N2 text feature vectors are correspondingly obtained for N2 items.
  • the text feature extraction and projection module uses a BERT model.
  • the second aspect of the embodiments of the present disclosure provides a system for verifying the authenticity of declared information.
  • the system includes an information acquisition subsystem, a feature extraction subsystem, a feature fusion subsystem and a conclusion judgment subsystem.
  • the information acquisition subsystem is used to acquire the machine-checked radiation image obtained by scanning the container loaded with items, and acquire the declaration information for declaring the items in the container.
  • the feature extraction subsystem is used to identify the image information of the item in the machine-inspected radiation image to obtain image features corresponding to the machine-inspected radiation image; and identify the text information of the item in the declaration information, A text feature corresponding to the declaration information is obtained, wherein the text feature is used to characterize the declaration category to which the item in the declaration information belongs.
  • the feature fusion subsystem is used to use the image feature as input information and the text feature as an externally introduced feature to identify the declared category of the item in the container.
  • the conclusion judgment subsystem is used to determine that the declared information is doubtful when the declared category of at least one item in the container does not belong to the declared category in the declared information.
  • the image features include N1 first image feature vectors respectively corresponding to image information of N1 items in the machine inspection radiation image, where N1 is an integer greater than or equal to 1.
  • the feature extraction subsystem includes an image preprocessing module, an image feature extraction module, and an image feature mapping module.
  • the image preprocessing module is used to divide different items in the machine-detected radiation image into independent image blocks by using a target detection algorithm to obtain N1 image blocks.
  • the image feature extraction module is used to extract the second image feature vector corresponding to each image block.
  • the image feature mapping module is configured to obtain the first image feature vector corresponding to the image information of the item represented by the image block based on the second image feature vector corresponding to each image block.
  • the text features include text feature vectors respectively corresponding to N2 items in the declaration information, where N2 is an integer greater than or equal to 1.
  • the feature extraction subsystem includes a declaration information preprocessing module and a text feature extraction and projection module.
  • the declaration information preprocessing module is used to extract the name information and specification model information of each item in the declaration information, and for each item, process the name information into the first sentence, and convert the specification model information Processed as the second statement.
  • the text feature extraction projection module is used to use the first sentence and the second sentence corresponding to the same item as input to classify the declaration category to which the item belongs; and use the text feature extraction projection module for The category output result of each item is used as the text feature vector corresponding to the item; wherein, N2 text feature vectors are correspondingly obtained for N2 items.
  • the electronic device includes one or more memories, and one or more processors.
  • the memory stores executable instructions.
  • the processor executes the executable instructions to implement the method as described above.
  • Another aspect of the embodiments of the present disclosure provides a computer-readable storage medium storing computer-executable instructions, and the instructions are used to implement the above method when executed.
  • Another aspect of the embodiments of the present disclosure provides a computer program, where the computer program includes computer-executable instructions, and the instructions are used to implement the above method when executed.
  • the above-mentioned one or more embodiments have the following advantages or beneficial effects: by combining the declaration information in the form of text and the machine-detected radiation image, the cross-modal fuser based on the attention mechanism is used to fuse the image features and the text features of the declaration information to realize the Intelligent inspection of customs clearance items improves the accuracy of inspection of declared information.
  • Fig. 1 schematically shows an application scenario of a method, system, device, medium and program product for verifying the authenticity of declaration information according to an embodiment of the present disclosure
  • Fig. 2 schematically shows a conceptual diagram of a method for checking the authenticity of declared information according to an embodiment of the present disclosure
  • Fig. 3 schematically shows a flow chart of a method for checking the authenticity of declaration information according to an embodiment of the present disclosure
  • FIG. 4 schematically shows a flow chart of extracting image features in a method for verifying the authenticity of declaration information according to an embodiment of the present disclosure
  • Fig. 5 is a schematic flow chart of obtaining image features in combination with location information in a method for verifying the authenticity of declared information according to an embodiment of the present disclosure
  • Fig. 6 schematically shows a flow chart of extracting image features in a method for verifying the authenticity of declaration information according to an embodiment of the present disclosure
  • FIG. 7 schematically shows a flow chart of extracting text features in a method for verifying the authenticity of declared information according to an embodiment of the present disclosure
  • Fig. 8 schematically shows a flow chart of extracting text features in a method for verifying the authenticity of declaration information according to an embodiment of the present disclosure
  • FIG. 9 schematically shows a schematic flow diagram of screening the declared categories of the items in the container in the method for verifying the authenticity of the declared information according to an embodiment of the present disclosure
  • Fig. 10 schematically shows a block diagram of a system for checking the authenticity of declaration information according to an embodiment of the present disclosure
  • Fig. 11 schematically shows a block diagram of a feature extraction submodule in a system for verifying the authenticity of declaration information according to an embodiment of the present disclosure
  • Fig. 12 schematically shows the overall structure of a system for checking the authenticity of declared information according to another embodiment of the present disclosure
  • Fig. 13 schematically shows a schematic structural representation of a feature extraction subsystem in a system for verifying the authenticity of declaration information according to an embodiment of the present disclosure
  • Fig. 14 schematically shows the structure of a cross-modal decoder in a system for verifying the authenticity of declared information according to an embodiment of the present disclosure
  • Fig. 15 schematically shows a block diagram of an electronic device suitable for implementing the method for verifying the authenticity of declaration information according to an embodiment of the present disclosure.
  • embodiments of the present disclosure provide a method, system, device, medium, and program product for verifying the authenticity of the declaration information, combined with the declaration in text form Information and machine-inspected radiation images, use the cross-modal fuser based on the attention mechanism to fuse the image features with the text features of the declaration information, realize the intelligent inspection of customs clearance items, and determine whether there is false concealment.
  • embodiments of the present disclosure provide a method, system, device, medium and method for verifying the authenticity of declared information.
  • the method includes firstly acquiring the machine-inspected radiation image obtained by scanning the container loaded with articles, and acquiring the declaration information for declaring the articles in the container; then identifying the image information of the article in the machine-inspected radiation image to obtain The image features corresponding to the machine-checked radiation image, and the text information of the items in the declaration information are identified to obtain the text features corresponding to the declaration information, wherein the text features can map the declaration category to which the items in the declaration information belong; Next Using image features as input information and text features as externally introduced features to identify the declared category of the items in the container, and when it is identified that the declared category of at least one item in the container does not belong to the declared category in the declared information, Confirm that the declared information is doubtful.
  • FIG. 1 schematically shows an application scenario 100 of a method, system, device, medium and program product for checking the authenticity of declaration information according to an embodiment of the present disclosure. It should be noted that, what is shown in FIG. 1 is only an example of the system architecture to which the embodiments of the present disclosure can be applied, so as to help those skilled in the art understand the technical content of the present disclosure, but it does not mean that the embodiments of the present disclosure cannot be used in other device, system, environment or scenario.
  • an application scenario 100 may include an X-ray detection apparatus 101 , a server 102 and a terminal device 103 .
  • the X-ray detection device 101 and the terminal device 103 are respectively connected to the server 102 by communication.
  • the articles loaded in the container 12 pass through the X-ray detection device 101 , they may be scanned by the X-ray detection device 101 to obtain a machine-checked radiation image.
  • the X-ray detection device 101 can upload the scanned radiation image to the server 102 .
  • the server 102 can also obtain the declaration information corresponding to the items loaded in the container 12 (for example, one or more declaration forms) by searching from the database, or searching in the cloud, or uploading by the staff 11 through the terminal device 103, etc. ).
  • the server 102 can execute the method for verifying the authenticity of the declaration information according to the embodiment of the disclosure, jointly machine-check the radiation image and the declaration information, and use the cross-modal fuser based on the attention mechanism to analyze Whether the items in the container 12 are consistent with the declared information, and whether there is any suspicion of concealment.
  • the server 102 can send the analysis result to the terminal device 103 and show it to the staff 11 .
  • the staff 11 can according to the analysis result, if there is suspicion to the declaration information of the article in the container 12, then can carry out unpacking inspection to the container 12; If there is no suspicion to the declaration information of the article in the container 12, then can choose The container 12 is randomly inspected or released. In this way, while saving the manpower input in the inspection process, the comprehensiveness and accuracy of the inspection can also be improved to a certain extent.
  • the items in the container 12 may correspond to one declaration form, or may correspond to multiple declaration forms, that is, the items in the container 12 may be items in multiple declaration forms assembled together to form.
  • x containers are loaded with y declaration items, where x and y are both integers, and x ⁇ y.
  • the items in x containers can be scanned continuously to obtain machine-checked radiation images, which can be processed correspondingly to the declaration information in y declaration forms.
  • the machine-checked radiation images obtained by scanning the items in each of the x containers can also be processed corresponding to the declaration information in the y customs declaration forms.
  • the method for verifying the authenticity of the declared information provided by the embodiments of the present disclosure may generally be executed by the server 102 .
  • the systems, devices, media and program products provided in the embodiments of the present disclosure for verifying the authenticity of declared information can generally be set in the server 102 .
  • the method for verifying the authenticity of declared information provided by the embodiments of the present disclosure may also be executed by a server or server cluster that is different from the server 102 and can communicate with the X-ray detection device 101 , the terminal device 103 and/or the server 102 .
  • the systems, devices, media, and program products used to verify the authenticity of declared information provided by the embodiments of the present disclosure may also be set differently from the server 102 and can communicate with the X-ray detection device 101, the terminal device 103, and/or the server 102 communicating servers or server clusters.
  • Fig. 2 schematically shows a conceptual diagram of a method for checking the authenticity of declaration information according to an embodiment of the present disclosure.
  • the system 201 in order to effectively verify whether the item loaded in the container 20 is consistent with the category of the item declared in the declaration information 21, the name information of the declared item and Descriptive information such as specifications and models, combined with the machine-checked radiation image 22 obtained by scanning the container 20, uses the system 201 for checking the authenticity of the declared information according to the embodiment of the present disclosure, through natural language processing technology, deep learning and other methods, to realize The intelligent inspection of the articles in the container 20 determines whether there is concealment.
  • the machine-detected radiation image can be divided into different image blocks in units of items, and image features can be obtained by using image recognition and other visual coding techniques in the system 201.
  • the cross-modal classifier is used to fuse the image features with the text, and the classification result of each image block is obtained through multi-category constraints, and the results are compared with the declaration information filled in the declaration information. Categories are compared to get the final inspection result.
  • system 201 for verifying the authenticity of declared information can be implemented as one or any combination of the system 1000, system 1200, or electronic device 1500 described below, computer storage media, or program products, for example, In order to realize the method for verifying the authenticity of declared information according to the embodiment of the present disclosure.
  • Fig. 3 schematically shows a flowchart of a method for checking the authenticity of declared information according to an embodiment of the present disclosure.
  • the method for verifying the authenticity of declaration information may include operation S310 to operation S360.
  • the image information of the item in the machine-inspected radiation image is identified to obtain image features corresponding to the machine-inspected radiation image.
  • the text information of the item in the declaration information is recognized, and the text features corresponding to the declaration information are obtained.
  • the text feature can be mapped to the declaration category to which the item in the declaration information belongs. For example, according to the descriptive information (for example, name information, specifications and models, etc.) of the declared items in the declaration information, the tax code category or commodity category to which the declared items belong is classified.
  • the declared category of the article in the container is screened by using the image feature as input information and the text feature as an externally introduced feature.
  • the image feature can be used as the input information of the cross-modal decoder
  • the text feature can be used as the external introduction feature of the attention mechanism of the cross-modal decoder
  • the cross-modal decoder can be used to analyze the items in the container.
  • the reporting category is screened.
  • the text features can be introduced into the mutual attention module of the transformer decoder as an external feature, and the image features can be used as the input of the transformer decoder, and each item in the image features belongs to The reporting categories are screened.
  • cross-modal information fusion is completed.
  • the embodiment of the present disclosure maps the real declaration category of the item through the text feature, and verifies the declaration information filled in by the user through the fusion output of the text feature and the image feature.
  • the image features may include N1 first image feature vectors respectively corresponding to image information of N1 items in the machine-detected radiation image, where N1 is an integer greater than or equal to 1.
  • N1 is an integer greater than or equal to 1.
  • the text features may also include N2 text feature vectors respectively corresponding to the N2 items in the declaration information, where N2 is an integer greater than or equal to 1.
  • the embodiment of the present disclosure identifies the declaration category to which the article belongs according to the description information of the article in the declaration information, so that when N2 kinds of articles are declared, N2 text feature vectors are obtained.
  • cross-modal information fusion can be achieved through the interactive operation between features.
  • a corresponding result will be obtained, which is the probability output of each category corresponding to the item in the container. That is, each item will get a probability value for its predicted class.
  • the embodiments of the present disclosure comprehensively analyze the declared items through joint declaration information and machine-inspected radiation images, which improves the efficiency of information use and the accuracy of inspection.
  • Fig. 4 schematically shows a flowchart of operation S330 of extracting image features in the method for verifying the authenticity of declared information according to an embodiment of the present disclosure.
  • operation S330 may include operation S331 to operation S333.
  • a second image feature vector corresponding to each image block is extracted.
  • the image feature extraction module may be used to extract the second image feature vector corresponding to the image block.
  • the image feature extraction module may be a convolutional neural network.
  • the image feature extraction module may specifically use resnet as the basic network, and add a network structure of SE-block after the resnet pooling layer.
  • resnet can be used as the basic network
  • SE-block Squeeze-Extract Block
  • fully connected layer fc1 and fully connected layer fc2 can be added after the original resnet pooling layer.
  • the output result of the fully connected layer fc1 may be used as the second image feature vector, thereby obtaining a set of second image feature vectors.
  • the image feature extraction module can use the image blocks in the historically declared machine-checked radiation images for training. For example, the image blocks in different forms of the same item are obtained, and the image feature extraction module can learn and recognize the possible forms of the same item in the machine-detected radiation image through pre-labeling.
  • the second image feature vector may be directly used as the first feature vector.
  • the second image feature vector may also be further processed to obtain the first image feature vector.
  • the second feature vector can be processed according to the position information of each image block in the machine-detected radiation image, and a feature that can reflect the spatial position relationship of the image block in the machine-detected radiation image is obtained as the first feature vector . In this way, when performing category identification in operation S350, it can be more comprehensive, and to a certain extent, missed identification or repeated identification of items in the container can be avoided, thereby improving identification efficiency.
  • Fig. 5 schematically shows a flow chart of obtaining image features in combination with location information in operation S333 in the method for verifying the authenticity of declared information according to an embodiment of the present disclosure.
  • operation S333 may include operation S501 and operation S502.
  • the second feature vector may be processed according to the position information of each image block to obtain the first feature vector.
  • position information of each image block in the machine-examined radiation image is acquired.
  • the location information may be, for example, coordinates of anchor points in the outer contour of each image block.
  • the location information may be the coordinates of vertices or intersection points of the geometric figure.
  • the first image feature vector is obtained based on the second image feature vector and position information corresponding to the same image block.
  • the second image feature vector can be processed first by using the position information of the image block, for example, the position information of the image block can be processed as a vector, and then connected with the second image feature vector corresponding to the image block; or, for example Mapping transformation processing such as encoding may be performed on the second feature vector by using the position information of the image block. Then, the second image feature vector processed using the position information can be input to the encoder, and the output of the encoder can be obtained, so that the output of the encoder can be used as the first image feature vector corresponding to the image block.
  • the encoder and the cross-modal decoder used in operation S350 may be jointly trained.
  • the encoder and the cross-modal decoder form an upstream-downstream relationship, where the second feature vector can be processed with the position information of the image blocks of various items declared in history as the input of the encoder.
  • the output of the encoder is used as the input of the cross-modal decoder, and the text features corresponding to the historical declaration information are used as the input of the mutual attention module of the cross-modal decoder to obtain the declaration category of the image block output by the cross-modal decoder , and then repeatedly train the encoder and the cross-modal fusion decoder based on the error between the declared category of the image block output by the cross-modal decoder and the declared category marked on the image block.
  • the encoder may adopt a transformer encoder model.
  • the cross-modal decoder can adopt the transformer decoder model.
  • Fig. 6 schematically shows a flow chart of extracting image features in the method for verifying the authenticity of declaration information according to an embodiment of the present disclosure.
  • the object detection method can be used to divide the object into independent image blocks according to the performance of the object in the machine-checked radiation image 22 , and extract the corresponding position coordinates.
  • the image feature extraction module 601 can use resnet (deep residual network, a classic convolutional neural network) as the basic network, and add SE-block (Squeeze-Extract Block) neural network after the original resnet pooling layer Network structure unit, fully connected layer fc1-relu (relu is a linear rectification unit) and fully connected layer fc2 layer.
  • SE-block squeeze-Extract Block
  • the number of SE-block channels can be set as the number of categories, and the channel attention mechanism is introduced in this way, and the output result of the fully connected layer fc1 is used as the second image feature vector of each image block, thereby obtaining a set of The second image feature vector. It is assumed that its size is N1*dim, where N1 represents the number of image blocks, and dim represents the feature dimension of each second image feature vector.
  • the N1 second image feature vectors (ie, N1*dim) and the position information corresponding to each image block can be substituted into the transformer encoder 601 to obtain N1 first image feature vectors, and then use the result as an operation Input to the cross-modal decoder (eg, transformer decoder) in S350.
  • the cross-modal decoder eg, transformer decoder
  • Fig. 7 schematically shows a flow chart of extracting text features in operation S340 in the method for verifying the authenticity of declaration information according to an embodiment of the present disclosure.
  • operation S340 may include operation S341 to operation S344 according to an embodiment of the present disclosure.
  • the first sentence and the second sentence corresponding to the same item are used as the input of the text feature extraction and projection module, and the text feature extraction and projection module is used to classify the declaration category to which the item belongs.
  • the category output result of the text feature extraction and projection module for each item is used as the text feature vector corresponding to the item, wherein N2 text feature vectors are correspondingly obtained for the N2 items.
  • the text feature extraction and projection module may adopt the BERT model.
  • historical declaration information can be collected to train the BERT model.
  • the name information of each item in the historical declaration information is processed as the first sentence, and the specification model information is processed as the second sentence, forming a sentence sequence and inputting it into the BERT model, and outputting the category of each item.
  • the BERT model ie, Bidirectional Encoder Representations from Transformers model
  • Fig. 8 schematically shows a flowchart of extracting text features in a method for verifying the authenticity of declaration information according to an embodiment of the present disclosure.
  • the BERT model is used in this embodiment to extract the text features in the declaration information 21 .
  • the BERT model 801 is used to extract the category output as the text feature of the product, that is, the text feature vector with a dimension of 1*dim.
  • N2 represents the number of item names, and dim is the feature dimension. For multiple items, it is necessary to align the name information and specifications of the same item one by one, and then repeat the above steps to obtain N2 dim-dimensional text feature vectors to form a N2*dim text feature vector sequence.
  • the natural language processing method is used to obtain text features that can represent item name information and specification models.
  • the textual features are introduced into the cross-module modal decoder mutual attention module as an externally introduced feature.
  • Fig. 9 schematically shows a flow diagram of screening the declared category of the items in the container in operation S350 in the method for checking the authenticity of declared information according to an embodiment of the present disclosure.
  • a cross-modal decoder 901 may be used to identify the declared category of the items in the container.
  • the cross-modal decoder may adopt the transformer decoder 901 .
  • the N1 first image feature vectors can be used as the input of the transformer decoder 901
  • the N2 text feature vectors can be used as the input of the mutual attention module of the transformer decoder 901
  • the output value of the transformer decoder 901 can be used as the final comparison result . That is, each item block will get the probability value of its predicted declared category, and the top N categories will be used as the candidate categories of the image block in the way of topN.
  • the risk is considered relatively low. Large, otherwise, it can be considered innocent.
  • the dimension dim of the vector in the image feature and the text feature can be set to be equal.
  • the natural language processing method is used to obtain the text features that can represent the item name information and specification model through the interaction between the machine-detected radiation image and the declaration information, which can be supplemented by the item text description information. Insufficient visual image information improves recognition accuracy.
  • N2 text feature vectors are used as the external features of the mutual attention module of the transformer decoder 901, which requires an overall input.
  • the N1 first image feature vectors are identification objects of the transformer decoder 901, which can be input separately or combined into a sequence input, which can be set according to actual conditions.
  • the original entire machine inspection radiation image is divided into different image blocks by using the target detection method.
  • the feature extraction of blocks can obtain independent item features, which can improve the recognition accuracy.
  • the embodiment of the present disclosure can set the number of SE-block channels as the number of categories in the image feature extraction module to introduce a channel attention mechanism and improve the accuracy of image feature extraction.
  • the embodiment of the present disclosure can send the positional information and feature information of the image block together into the transformer encoder to obtain the feature with the spatial positional relationship (that is, the first feature vector ).
  • model training process a combination of three training tasks is adopted, that is, image feature extraction module is trained through image training (such as module 601), and text feature extraction projection module is trained through text training (for example, BERT model 801), and jointly train the encoder and cross-modal decoder (eg, transformer encoder 602 and transformer encoder 901) through image-text training, so that different tasks can achieve a complementary effect.
  • image feature extraction module is trained through image training (such as module 601)
  • text feature extraction projection module is trained through text training (eg, BERT model 801), and jointly train the encoder and cross-modal decoder (eg, transformer encoder 602 and transformer encoder 901) through image-text training, so that different tasks can achieve a complementary effect.
  • Fig. 10 schematically shows a block diagram of a system 1000 for checking the authenticity of declared information according to an embodiment of the present disclosure.
  • a system 1000 for verifying the authenticity of declaration information may include an information acquisition subsystem 110 , a feature extraction subsystem 120 , a feature fusion subsystem 130 and a conclusion determination subsystem 140 .
  • the system 1000 can be used to implement the methods described with reference to FIG. 3 to FIG. 9 .
  • the information acquisition subsystem 110 may be used to acquire the machine-examined radiation image obtained by scanning the container loaded with items, and acquire declaration information for declaring the items in the container.
  • the information acquisition subsystem 110 may be used to perform operation S310 and operation S320.
  • the feature extraction subsystem 120 can be used to identify the image information of the item in the machine-checked radiation image, obtain the image features corresponding to the machine-checked radiation image, and identify the text information of the item in the declaration information, and obtain the corresponding declaration information.
  • a corresponding text feature wherein the text feature is used to represent the declaration category to which the item in the declaration information belongs.
  • the feature extraction subsystem 120 may be used to perform operation S330 and operation S340.
  • the feature fusion subsystem 130 can be used to use the image feature as input information and the text feature as an externally introduced feature to identify the declared category of the item in the container.
  • the image features are used as the input information of the cross-modal decoder
  • the text features are used as the externally introduced features of the attention mechanism of the cross-modal decoder
  • the cross-modal decoder is used to identify the declared category of the items in the container.
  • the feature fusion subsystem 130 may be used to perform operation S350.
  • the conclusion judgment subsystem 140 can be used to determine that the declared information is doubtful when the declared category of at least one item in the container does not belong to the declared category in the declared information. In one embodiment, the conclusion determination subsystem 140 may be used to perform operation S360.
  • Fig. 11 schematically shows a block diagram of the feature extraction subsystem 120 in the system for verifying the authenticity of declaration information according to an embodiment of the present disclosure.
  • the feature extraction subsystem 120 may include an image preprocessing module 100', an image feature extraction module 100, and an image feature mapping module 101.
  • the image features include N1 first image feature vectors respectively corresponding to the image information of N1 items in the machine-detected radiation image, where N1 is an integer greater than or equal to 1.
  • the image preprocessing module 100' can be used to divide different items in the machine-detected radiation image into independent image blocks by using a target detection algorithm to obtain N1 image blocks. In one embodiment, the image preprocessing module 100' may perform operation S331.
  • the image feature extraction module 100 may be used to extract a second image feature vector corresponding to each image block. In one embodiment, the image feature extraction module 100 may perform operation S332.
  • the image feature mapping module 101 may be configured to obtain a first image feature vector corresponding to the image information of the item represented by the image block based on the second image feature vector corresponding to each image block. In one embodiment, the image feature extraction module 101 may perform operation S333.
  • the feature extraction subsystem 120 may further include a declaration information preprocessing module 102' and a text feature extraction projection module 102.
  • the text features may include text feature vectors respectively corresponding to N2 items in the declaration information, where N2 is an integer greater than or equal to 1.
  • the declaration information preprocessing module 102' is used to extract the name information and specification model information of each item in the declaration information, and for each item, process the name information into a first sentence, and process the specification model information into a second sentence .
  • the declaration information preprocessing module 102' may perform operation S341 and operation S342.
  • the text feature extraction projection module 102 is used to use the first sentence and the second sentence corresponding to the same item as input, classify the declared category to which the item belongs, and use the text feature extraction projection module to output the category of each item The result is used as a text feature vector corresponding to the item, wherein N2 text feature vectors are correspondingly obtained for N2 items.
  • the text feature extraction projection module 102 may perform operation S343 and operation S344.
  • Fig. 12 schematically shows the overall structure of a system for checking the authenticity of declared information according to another embodiment of the present disclosure.
  • a system 1200 for verifying the authenticity of declared information may include a feature extraction subsystem 120 and a cross-modal decoder 200 .
  • the system 1200 will output the overall conclusion 3 and the block conclusion 4 .
  • Refer to FIG. 13 for the feature extraction subsystem 120 and refer to FIG. 14 for the cross-modal decoder 200 .
  • Fig. 13 schematically shows the structure of the feature extraction subsystem 120 in the system 1200 for verifying the authenticity of declaration information according to an embodiment of the present disclosure.
  • the feature extraction subsystem 120 may include two branches: image feature extraction and declaration information extraction.
  • the image feature extraction branch includes an image feature extraction module 100 and an image feature mapping module 101 , which output image feature 1 .
  • the declaration information extraction branch includes a text feature extraction and projection module 102 , which outputs text feature 2 .
  • Fig. 14 schematically shows the structure of a cross-modal decoder 200 in a system 1200 for verifying the authenticity of declared information according to an embodiment of the present disclosure.
  • the cross-modal decoder 200 is composed of a multi-head self-attention module 201 , a feature summation and normalization module 202 , a multi-head mutual attention module 203 and a feedforward network 204 .
  • the input is image feature 1 and text feature 2 extracted by 120 .
  • the image feature 1 is used as the main input information of the multi-head self-attention module 201
  • the text feature 2 is introduced into the multi-head mutual attention module 203 as enhanced information.
  • the system 1200 when the system 1200 is working, for the machine-detected radiation image, use the target detection method to obtain different items in the image (distinguish the object blocks in the image by the boundary and texture of the image) ), and extract the image block according to the coordinate position, as the input of the image feature extraction module 100.
  • data enhancement processing can be performed, including a series of operations such as rotation, resize to a fixed size, de-averaging, and standardization;
  • the image feature extraction module 100 When an image block with a fixed size is input into the image feature extraction module 100, cross-entropy is used for constraint.
  • the combination of resnet50+SE-block can be used to extract the features of the image block, and the penultimate layer of the network (that is, the output result of the fully connected layer fc1) can be extracted as the image block features (ie, the second image feature vector).
  • the structure of the image feature mapping module 101 adopts a transformer encoder structure, and the position coordinates corresponding to the image blocks can be processed as vectors, and then the features of the image blocks are connected with the position coordinates corresponding to the image blocks (connected along the direction of the feature dimension) , which is input into the transformer encoder to obtain new image block features (that is, the first image feature vector) with a spatial position relationship.
  • the following steps can be pre-processed first: remove the independent numbers in the specifications and models, delete some stop words and symbols, and/or lowercase the English letters in the unified text.
  • the name information of the processed item is used as sentence 1
  • the specification model of the declared item is used as sentence 2, which is input to the text feature extraction and projection module 102 (for example, BERT model) for text feature extraction.
  • the text feature extraction and projection module 102 for example, BERT model
  • multivariate cross-entropy can be used as a loss function to constrain the training process, and the pooling result of the last layer in the model is output as the category of the declared item.
  • the image feature 1 and text feature 2 output by the feature extraction subsystem 120 are substituted into the cross-modal decoder 200, and the transformer decoder module that the cross-modal decoder 200 can adopt can use the image feature 1 as a cross-modal decoding
  • the value input of the device 200, the text feature 2 is introduced into the mutual attention module 203, and the cross-modal information fusion is completed through the interactive operation between the features. After the image feature of each image block interacts with the text feature in the declaration information, a corresponding result will be obtained. unanimous.
  • Modules, sub-modules, units, any multiple of sub-units according to the embodiments of the present disclosure, or at least part of the functions of any multiple of them may be implemented in one module. Any one or more of modules, submodules, units, and subunits according to the embodiments of the present disclosure may be implemented by being divided into multiple modules.
  • modules, submodules, units, and subunits may be at least partially implemented as hardware circuits, such as field programmable gate arrays (FPGAs), programmable logic arrays (PLAs), system-on-chip, system-on-substrate, system-on-package, application-specific integrated circuit (ASIC), or hardware or firmware that may be implemented by any other reasonable means of integrating or packaging circuits, or in a combination of software, hardware, and firmware Any one of these implementations or an appropriate combination of any of them.
  • FPGAs field programmable gate arrays
  • PLAs programmable logic arrays
  • ASIC application-specific integrated circuit
  • hardware or firmware hardware or firmware that may be implemented by any other reasonable means of integrating or packaging circuits, or in a combination of software, hardware, and firmware Any one of these implementations or an appropriate combination of any of them.
  • one or more of the modules, submodules, units, and subunits according to the embodiments of the present disclosure may be at least partially implemented as computer program modules, and when the
  • information acquisition subsystem 110 feature extraction subsystem 120, feature fusion subsystem 130, conclusion judgment subsystem 140, image preprocessing module 100', image feature extraction module 100, image feature mapping module 101, declaration information preprocessing module
  • Any multiple of 102', text feature extraction and projection module 102 or cross-modal decoder 200 may be implemented in one module, or any one of them may be split into multiple modules. Alternatively, at least part of the functions of one or more of these modules may be combined with at least part of the functions of other modules and implemented in one module.
  • the information acquisition subsystem 110, the feature extraction subsystem 120, the feature fusion subsystem 130, the conclusion judgment subsystem 140, the image preprocessing module 100', the image feature extraction module 100, the image feature mapping module 101, At least one of the declaration information preprocessing module 102', the text feature extraction projection module 102 or the cross-modal decoder 200 can be at least partially implemented as a hardware circuit, such as a field programmable gate array (FPGA), a programmable logic array ( PLA), system-on-chip, system-on-substrate, system-on-package, application-specific integrated circuit (ASIC), or any other reasonable means of integrating or packaging circuits, such as hardware or firmware, or in software, hardware And any one of the three implementations of firmware or an appropriate combination of any of them.
  • FPGA field programmable gate array
  • PLA programmable logic array
  • ASIC application-specific integrated circuit
  • information acquisition subsystem 110 feature extraction subsystem 120, feature fusion subsystem 130, conclusion determination subsystem 140, image preprocessing module 100', image feature extraction module 100, image feature mapping module 101, declaration information preprocessing module
  • At least one of 102 ′, the text feature extraction and projection module 102 or the cross-modal decoder 200 may be at least partially implemented as a computer program module, and when the computer program module is executed, corresponding functions may be performed.
  • Fig. 15 schematically shows a block diagram of an electronic device suitable for implementing the method for verifying the authenticity of declaration information according to an embodiment of the present disclosure.
  • the electronic device 1500 shown in FIG. 15 is only an example, and should not limit the functions and scope of use of the embodiments of the present disclosure.
  • an electronic device 1500 includes a processor 1501 that can be loaded into a random access memory (RAM) 1503 according to a program stored in a read-only memory (ROM) 1502 or from a storage section 1508 Various appropriate actions and processing are performed by the program.
  • the processor 1501 may include, for example, a general-purpose microprocessor (eg, a CPU), an instruction set processor and/or related chipsets and/or a special-purpose microprocessor (eg, an application-specific integrated circuit (ASIC)), and the like.
  • Processor 1501 may also include on-board memory for caching purposes.
  • the processor 1501 may include a single processing unit or a plurality of processing units for executing different actions of the method flow according to the embodiments of the present disclosure.
  • the processor 1501, ROM 1502, and RAM 1503 are connected to each other through a bus 1504.
  • the processor 1501 executes various operations according to the method flow of the embodiment of the present disclosure by executing programs in the ROM 1502 and/or RAM 1503. It should be noted that the program may also be stored in one or more memories other than ROM 1502 and RAM 1503.
  • the processor 1501 may also perform various operations according to the method flow of the embodiments of the present disclosure by executing programs stored in the one or more memories.
  • the electronic device 1500 may further include an input/output (I/O) interface 1505 which is also connected to the bus 1504 .
  • the electronic device 1500 may also include one or more of the following components connected to the I/O interface 1505: an input section 1506 including a keyboard, a mouse, etc.; including a cathode ray tube (CRT), a liquid crystal display (LCD), etc.
  • the communication section 1509 performs communication processing via a network such as the Internet.
  • a drive 1510 is also connected to the I/O interface 1505 as needed.
  • a removable medium 1511 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, etc. is mounted on the drive 1510 as necessary so that a computer program read therefrom is installed into the storage section 1508 as necessary.
  • the method flow according to the embodiments of the present disclosure can be implemented as a computer software program.
  • the embodiments of the present disclosure include a computer program product, which includes a computer program carried on a computer-readable storage medium, where the computer program includes program codes for executing the methods shown in the flowcharts.
  • the computer program may be downloaded and installed from a network via communication portion 1509 and/or installed from removable media 1511 .
  • the processor 1501 executes the above-mentioned functions defined in the system of the embodiment of the present disclosure.
  • the above-described systems, devices, devices, modules, units, etc. may be implemented by computer program modules.
  • the present disclosure also provides a computer-readable storage medium.
  • the computer-readable storage medium may be included in the device/apparatus/system described in the above embodiments; it may also exist independently without being assembled into the device/system device/system.
  • the above-mentioned computer-readable storage medium carries one or more programs, and when the above-mentioned one or more programs are executed, the method according to the embodiment of the present disclosure is implemented.
  • the computer-readable storage medium may be a non-volatile computer-readable storage medium, such as may include but not limited to: portable computer disk, hard disk, random access memory (RAM), read-only memory (ROM) , erasable programmable read-only memory (EPROM or flash memory), portable compact disk read-only memory (CD-ROM), optical storage device, magnetic storage device, or any suitable combination of the above.
  • a computer-readable storage medium may be any tangible medium that contains or stores a program that can be used by or in conjunction with an instruction execution system, apparatus, or device.
  • a computer-readable storage medium may include one or more memories other than the above-described ROM 1502 and/or RAM 1503 and/or ROM 1502 and RAM 1503.
  • Embodiments of the present disclosure also include a computer program product, which includes a computer program, and the computer program includes program codes for executing the method provided by the embodiments of the present disclosure.
  • the computer program product is run on an electronic device, the program The code is used to enable the electronic device to realize the image recognition method provided by the embodiment of the present disclosure.
  • the above-mentioned functions defined in the system/apparatus of the embodiment of the present disclosure are performed.
  • the above-described systems, devices, modules, units, etc. may be implemented by computer program modules.
  • the computer program may rely on tangible storage media such as optical storage devices and magnetic storage devices.
  • the computer program can also be transmitted and distributed in the form of a signal on network media, downloaded and installed through the communication part 1509, and/or installed from the removable media 1511.
  • the program code contained in the computer program can be transmitted by any appropriate network medium, including but not limited to: wireless, wired, etc., or any appropriate combination of the above.
  • the program codes for executing the computer programs provided by the embodiments of the present disclosure can be written in any combination of one or more programming languages, specifically, high-level procedural and/or object-oriented programming language, and/or assembly/machine language to implement these computing programs.
  • Programming languages include, but are not limited to, programming languages such as Java, C++, python, "C" or similar programming languages.
  • the program code can execute entirely on the user computing device, partly on the user device, partly on the remote computing device, or entirely on the remote computing device or server.
  • the remote computing device may be connected to the user computing device through any kind of network, including a local area network (LAN) or a wide area network (WAN), or may be connected to an external computing device (e.g., using an Internet service provider). business to connect via the Internet).
  • LAN local area network
  • WAN wide area network
  • Internet service provider an Internet service provider
  • each block in a flowchart or block diagram may represent a module, program segment, or portion of code that includes one or more logical functions for implementing specified executable instructions.
  • the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or they may sometimes be executed in the reverse order, depending upon the functionality involved.
  • each block in the block diagrams or flowchart illustrations, and combinations of blocks in the block diagrams or flowchart illustrations can be implemented by a dedicated hardware-based system that performs the specified function or operation, or can be implemented by a A combination of dedicated hardware and computer instructions.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Image Analysis (AREA)
  • Analysing Materials By The Use Of Radiation (AREA)

Abstract

本公开提供了一种用于检验申报信息真实性的方法。该方法包括:获取对装载了物品的容器进行扫描而获得的机检辐射图像;获取对容器内的物品进行申报的申报信息;对机检辐射图像中的物品的图像信息进行识别,得到与机检辐射图像对应的图像特征;对申报信息中的物品的文本信息进行识别,得到与申报信息对应的文本特征;以所述图像特征作为输入信息,以所述文本特征作为外部引入特征,对所述容器中的物品的申报类别进行甄别;以及当容器中至少一个物品的申报类别不属于申报信息中的申报类别时,确定申报信息存疑。本公开还提供了一种用于检验申报信息真实性的装置、设备及存储介质。

Description

用于检验申报信息真实性的方法、系统、设备及介质 技术领域
本公开涉及人工智能技术领域,更具体地,涉及一种用于检验申报信息真实性的方法、装置、设备及介质。
背景技术
在海关、空运、物流等物资运输或交割环节中,经常需要查验申报人所申报的物资与其实际运送的物资是否真实吻合,以杜绝走私之类事件的发生。然而,在运输途中物品通常装在集装箱等密封的容器内,不易被观测,会给查验环节带来极大不便。
现有查验方法主要有两种:一种是对物品逐件开箱、对物品的名称、规格型号、数量、重量、产地信息进行逐一核实,然而这样会消耗大量人力;另一种是按照一定比例对物品进行抽查、核验报关信息,但这样难以做到“一一开箱”,可能漏掉对关键信息的查验。为了提高通关效率,现有技术中也有引入X射线探测装置进行查验的方式。
在实现本公开构思的过程中,发明人发现现有技术中至少存在如下问题:通过X射线探测装置扫描物品进行查验时,由于同一申报类别(例如,申报的税号类别,在海关报关单中为商品编号)下的物品可能包括很多不同名称、不同规格型号的货物,这些物品在X射线图像(以下均称为机检辐射图像)上的表现差异较大,增加了查验难度。另外,在海关等业务中经常出现拼箱、拼单或是多种不同品类或名称的物品混装,会导致申报单和集装箱并不是一对一的简单映射关系,这样单独通过机检辐射图像进行查验,会存在大量名称、规格型号和图像无法对齐的复杂情形,给查验工作的准确性和便捷性带来很大困难。
发明内容
有鉴于此,本公开实施例提供了一种联合机检辐射图像和申报信息进行智能查验的用于检验申报信息真实性的方法、装置、设备及介质,可以提高查验准确率,提升查验效率。
本公开实施例的一个方面,提供了一种用于检验申报信息真实性的方法。所述方法包括:获取对装载了物品的容器进行扫描而获得的机检辐射图像;获取对所述容器内的物品进行申报的申报信息;对所述机检辐射图像中的物品的图像信息进行识别,得到与所述机检辐射图像对应的图像特征;对所述申报信息中的物品的文本信息进行识别,得到与所述申报信息对应的文本特征,其中,所述文本特征用于表征所述申报信息中的物品所属的申报类别;以所述图像特征作为输入信息,以所述文本特征作为外部引入特征,对所述容器中的物品的申报类别进行甄别;以及当所述容器中至少一个物品的申报类别不属于所述申报信息中的申报类别时,确定所述申报信息存疑。
根据本公开的实施例,所述图像特征包括与所述机检辐射图像中的N1个物品的图像信息分别对应的N1个第一图像特征向量,其中,N1为大于或等于1的整数。
根据本公开的实施例,所述对所述机检辐射图像中的物品的图像信息进行识别,得到与所述机检辐射图像对应的图像特征包括:利用目标检测算法将所述机检辐射图像中的不同物品划分成独立的图像块,得到N1个图像块;提取每个所述图像块对应的第二图像特征向量;以及基于每个所述图像块对应的所述第二图像特征向量,得到与所述图像块所表示的物品的图像信息对应的所述第一图像特征向量。
根据本公开的实施例,所述提取每个所述图像块对应的第二图像特征向量包括:利用图像特征提取模块对每个所述图像块进行图像识别,得到每个所述图像块对应的所述第二图像特征向量;其中,所述图像特征提取模块包括卷积神经网络。
根据本公开的实施例,所述图像特征提取模块包括以resnet作为基础网络,在resnet池化层后添加SE-block的网络结构。
根据本公开的实施例,所述基于每个所述图像块对应的所述第二图像特征向量,得到与所述图像块所表示的物品的图像信息对应的所述第一图 像特征向量包括:获取每个所述图像块在所述机检辐射图像中的位置信息;以及基于同一个所述图像块对应的所述第二图像特征向量和所述位置信息,得到所述第一图像特征向量。
根据本公开的实施例,所述基于同一个所述图像块对应的所述第二图像特征向量和所述位置信息,得到所述第一图像特征向量包括:利用所述图像块的所述位置信息处理所述第二图像特征向量;将处理后的所述第二图像特征向量输入编码器;以及获得所述编码器的输出,以得到与所述图像块对应的所述第一图像特征向量。
根据本公开的实施例,所述以所述图像特征作为输入信息,以所述文本特征作为外部引入特征,对所述容器中的物品的申报类别进行甄别包括:以所述图像特征作为跨模态解码器的输入信息,以所述文本特征作为所述跨模态解码器的注意力机制的外部引入特征,利用所述跨模态解码器对所述容器中的物品的申报类别进行甄别。
根据本公开的实施例,所述编码器与所述跨模态解码器联合训练。
根据本公开的实施例,所述编码器采用transformer编码器模型。
根据本公开的实施例,所述跨模态解码器采用transformer解码器模型。
根据本公开的实施例,所述文本特征包括与所述申报信息中的N2种物品分别对应的文本特征向量,其中N2为大于或等于1的整数。
根据本公开的实施例,所述对所述申报信息中的物品的文本信息进行识别,得到与所述申报信息对应的文本特征包括:提取所述申报信息中每一种物品的名称信息和规格型号信息;对于每一种物品,将所述名称信息处理为第一语句,将所述规格型号信息处理为第二语句;将对应于同一种物品的所述第一语句和所述第二语句作为文本特征提取投影模块的输入,利用所述文本特征提取投影模块对该物品所属的申报类别进行分类;以及以所述文本特征提取投影模块针对每一种物品的类别输出结果作为该物品对应所述文本特征向量;其中,针对N2种物品对应得到N2个所述文本特征向量。
根据本公开的实施例,所述文本特征提取投影模块采用BERT模型。
本公开实施例的第二方面,提供了一种用于检验申报信息真实性的系统。所述系统包括信息获取子系统、特征提取子系统、特征融合子系统以 及结论判定子系统。信息获取子系统用于获取对装载了物品的容器进行扫描而获得的机检辐射图像,以及获取对所述容器内的物品进行申报的申报信息。特征提取子系统用于对所述机检辐射图像中的物品的图像信息进行识别,得到与所述机检辐射图像对应的图像特征;以及对所述申报信息中的物品的文本信息进行识别,得到与所述申报信息对应的文本特征,其中,所述文本特征用于表征所述申报信息中的物品所属的申报类别。特征融合子系统用于以所述图像特征作为输入信息,以所述文本特征作为外部引入特征,对所述容器中的物品的申报类别进行甄别。结论判定子系统用于当所述容器中至少一个物品的申报类别不属于所述申报信息中的申报类别时,确定所述申报信息存疑。
根据本公开的实施例,所述图像特征包括与所述机检辐射图像中的N1个物品的图像信息分别对应的N1个第一图像特征向量,其中,N1为大于或等于1的整数。根据本公开的实施例,所述特征提取子系统包括图像预处理模块、图像特征提取模块以及图像特征映射模块。图像预处理模块用于利用目标检测算法将所述机检辐射图像中的不同物品划分成独立的图像块,得到N1个图像块。图像特征提取模块用于提取每个所述图像块对应的第二图像特征向量。图像特征映射模块用于基于每个所述图像块对应的所述第二图像特征向量,得到与所述图像块所表示的物品的图像信息对应的所述第一图像特征向量。
根据本公开的实施例,所述文本特征包括与所述申报信息中的N2种物品分别对应的文本特征向量,其中N2为大于或等于1的整数。根据本公开的实施例,所述特征提取子系统包括申报信息预处理模块和文本特征提取投影模块。所述申报信息预处理模块用于提取所述申报信息中每一种物品的名称信息和规格型号信息,对于每一种物品,将所述名称信息处理为第一语句,将所述规格型号信息处理为第二语句。所述文本特征提取投影模块用于以对应于同一种物品的所述第一语句和所述第二语句作为输入,对该物品所属的申报类别进行分类;以及以所述文本特征提取投影模块针对每一种物品的类别输出结果作为该物品对应所述文本特征向量;其中,针对N2种物品对应得到N2个所述文本特征向量。
本公开实施例的另一方面,提供了一种电子设备。所述电子设备包括一个或多个存储器、以及一个或多个处理器。所述存储器存储有可执行指令。所述处理器执行所述可执行指令以实现如上所述的方法。
本公开实施例的另一方面,提供了一种计算机可读存储介质,存储有计算机可执行指令,所述指令在被执行时用于实现如上所述的方法。
本公开实施例的另一方面,提供了一种计算机程序,所述计算机程序包括计算机可执行指令,所述指令在被执行时用于实现如上所述的方法。
上述一个或多个实施例具有如下优点或益效果:通过联合文本形式的申报信息以及机检辐射图像,利用基于注意力机制的跨模态融合器将图像特征与申报信息文本特征融合,实现对通关物品的智能查验,提高对申报信息的查验准确率。
附图说明
通过以下参照附图对本公开实施例的描述,本公开的上述以及其他目的、特征和优点将更为清楚,在附图中:
图1示意性示出了根据本公开实施例的用于检验申报信息真实性的方法、系统、设备、介质和程序产品的应用场景;
图2示意性示出了根据本公开实施例的用于检验申报信息真实性的方法的构思示意图;
图3示意性示出了根据本公开实施例的用于检验申报信息真实性的方法的流程图;
图4示意性示出了根据本公开实施例的用于检验申报信息真实性的方法中提取图像特征的流程图;
图5示意性根据本公开实施例的用于检验申报信息真实性的方法中结合位置信息得到图像特征的流程图;
图6示意性示出了根据本公开实施例的用于检验申报信息真实性的方法中提取图像特征的流程示意;
图7示意性示出了根据本公开实施例的用于检验申报信息真实性的方法中提取文本特征的流程图;
图8示意性示出了根据本公开实施例的用于检验申报信息真实性的方法中提取文本特征的流程示意;
图9示意性示出了根据本公开实施例的用于检验申报信息真实性的方法中对所述容器中的物品的申报类别进行甄别的流程示意;
图10示意性示出了根据本公开实施例的用于检验申报信息真实性的系统的框图;
图11示意性示出了根据本公开实施例的用于检验申报信息真实性的系统中特征提取子模块的框图;
图12示意性示出了根据本公开另一实施例的用于检验申报信息真实性的系统的整体结构示意;
图13示意性示出了根据本公开一实施例的用于检验申报信息真实性的系统中特征提取子系统的结构示意;
图14示意性示出了根据本公开一实施例的用于检验申报信息真实性的系统中跨模态解码器的结构示意;以及
图15示意性示出了适于实现根据本公开实施例的用于检验申报信息真实性的方法的电子设备的框图。
具体实施方式
以下,将参照附图来描述本公开的实施例。但是应该理解,这些描述只是示例性的,而并非要限制本公开的范围。在下面的详细描述中,为便于解释,阐述了许多具体的细节以提供对本公开实施例的全面理解。然而,明显地,一个或多个实施例在没有这些具体细节的情况下也可以被实施。此外,在以下说明中,省略了对公知结构和技术的描述,以避免不必要地混淆本公开的概念。
在本文中,需要理解的是,说明书及附图中的任何元素数量均用于示例而非限制,以及任何命名(例如,第一、第二)都仅用于区分,而不具有任何限制含义。
为了能够有效地核实封装在容器内的物品与申报信息是否一致,本公开的实施例提供了一种用于检验申报信息真实性的方法、系统、设备、介质和程序产品,联合文本形式的申报信息以及机检辐射图像,利用基于注 意力机制的跨模态融合器将图像特征与申报信息文本特征融合,实现对通关物品的智能查验,确定是否存在伪瞒报行为。
具体地,本公开的实施例提供了一种用于检验申报信息真实性的方法、系统、设备、介质和方法。该方法包括首先获取对装载了物品的容器进行扫描而获得的机检辐射图像,以及获取对容器内的物品进行申报的申报信息;然后对机检辐射图像中的物品的图像信息进行识别,得到与机检辐射图像对应的图像特征,以及对申报信息中的物品的文本信息进行识别,得到与申报信息对应的文本特征,其中,文本特征可以映射申报信息中的物品所属的申报类别;接下来以图像特征作为输入信息,以文本特征作为外部引入特征,对所述容器中的物品的申报类别进行甄别,并且当甄别出容器中至少一个物品的申报类别不属于申报信息中的申报类别时,确定申报信息存疑。
图1示意性示出了根据本公开实施例的用于检验申报信息真实性的方法、系统、设备、介质和程序产品的应用场景100。需要注意的是,图1所示仅为可以应用本公开实施例的系统架构的示例,以帮助本领域技术人员理解本公开的技术内容,但并不意味着本公开实施例不可以用于其他设备、系统、环境或场景。
如图1所示,根据该实施例的应用场景100可以包括X射线探测装置101、服务器102和终端设备103。其中,X射线探测装置101、终端设备103分别与服务器102通信连接。
容器12中装载物品在通过X射线探测装置101时,可以被X射线探测装置101扫描得到机检辐射图像。
X射线探测装置101可以扫描得到的机检辐射图像上传到服务器102。与此同时,服务器102也可以通过从数据库搜索、或云端搜索、或者工作人员11通过终端设备103上传等方式,获取到与容器12中装载物品对应的申报信息(例如,一个或多个申报单)。根据本公开的实施例,服务器102可以执行根据本公开实施例的用于检验申报信息真实性的方法,联合机检辐射图像和申报信息,利用基于注意力机制的跨模态融合器,来分析容器12中的物品与申报信息是否一致,是否存在瞒报的嫌疑。然后服务器102可以将分析结果发送给终端设备103,展示给工作人员11。然后工 作人员11可以根据分析结果,若对容器12中的物品的申报信息存有嫌疑,则可以对容器12进行开箱查验;若对容器12中的物品的申报信息没有嫌疑,则可以选择对容器12抽检或放行。这样在节约查验环节的人力投入的同时,还能够在一定程度上提高查验的全面性和准确性。
在实际应用中,由于可能存在拼单或拼箱的现象,容器12内的物品可以对应一个申报单,也可以对应多个申报单,即容器12内的物品可以是多个申报单中的物品拼装在一起形成的。在另一些情形下,也可以是x个容器中装载有y个申报单申报物品,其中,x、y均为整数,且x≠y。在这种情形下,可以将x个容器中的物品连续扫描得到机检辐射图像,与y个申报单中的申报信息相对应进行处理。或者,在这种情形下,也可以将对x个容器其中每个容器中的物品扫描得到的机检辐射图像,与该y个报关单中的申报信息对应进行处理。
需要说明的是,本公开实施例所提供的用于检验申报信息真实性的方法一般可以由服务器102执行。相应地,本公开实施例所提供的用于检验申报信息真实性的系统、设备、介质和程序产品一般可以设置于服务器102中。本公开实施例所提供的用于检验申报信息真实性的方法也可以由不同于服务器102且能够与X射线探测装置101、终端设备103和/或服务器102通信的服务器或服务器集群执行。相应地,本公开实施例所提供的用于检验申报信息真实性的系统、设备、介质和程序产品也可以设置于不同于服务器102且能够与X射线探测装置101、终端设备103和/或服务器102通信的服务器或服务器集群中。
应该理解,图1中的设备的数目和种类仅仅是示意性的。根据实现需要,可以具有任意数目和种类的设备。
图2示意性示出了根据本公开实施例的用于检验申报信息真实性的方法的构思示意图。
如图2所示,根据本公开的实施例,为了能够有效地核实容器20内装载的物品与申报信息21中申报的物品类别是否一致,可以从申报信息21中获取申报的物品的名称信息以及规格型号等描述性信息,联合对容器20扫描得到的机检辐射图像22,利用根据本公开实施例的用于检验申报 信息真实性的系统201,通过自然语言处理技术、深度学习等方法,实现对容器20内的物品的智能查验,确定是否存在瞒报行为。
根据本公开的一些实施例,可以将机检辐射图像以物品为单位,划分成不同的图像块,在系统201中利用图像识别等视觉编码技术获取图像特征,与此同时将从申报信息中提取的文本特征作为注意力机制的外部引入特征,利用跨模态分类器器将图像特征与文本融合,通过多类别约束得到每个图像块的分类结果,并将该结果与申报信息中填写的申报类别进行比较,得到最终查验结果。
其中,该用于检验申报信息真实性的系统201可以被实现为例如下文所描述的系统1000、系统1200、或电子设备1500,计算机存储介质,或者程序产品的其中之一或任意的结合,用于实现根据本公开实施例的用于检验申报信息真实性的方法。
图3示意性示出了根据本公开实施例的用于检验申报信息真实性的方法的流程图。
如图3所示,根据该实施例的用于检验申报信息真实性的方法可以包括操作S310~操作S360。
首先在操作S310,获取对装载了物品的容器进行扫描而获得的机检辐射图像。
并且在操作S320,获取对容器内的物品进行申报的申报信息。
接下来在操作S330,对机检辐射图像中的物品的图像信息进行识别,得到与机检辐射图像对应的图像特征。
并且在操作S340,对申报信息中的物品的文本信息进行识别,得到与申报信息对应的文本特征。其中,文本特征可以映射到申报信息中的物品所属的申报类别。例如,根据申报信息中申报的物品的描述性信息(例如,名称信息以及规格型号等),对所申报的物品所属的税号类别或者商品类别进行分类。
然后在操作S350,以所述图像特征作为输入信息,以所述文本特征作为外部引入特征,对所述容器中的物品的申报类别进行甄别。
在一个实施例中,可以以图像特征作为跨模态解码器的输入信息,以文本特征作为跨模态解码器的注意力机制的外部引入特征,利用跨模态解码器对容器中的物品的申报类别进行甄别。
跨模态解码器例如可以采用transformer解码器时,可以以文本特征作为外部引入特征引入到transformer解码器的互注意力模块中,以图像特征作为transformer解码器的输入,对图像特征中各个物品所属的申报类别进行甄别。通过文本特征和图像特征之间的交互操作,完成跨模态的信息融合。
此后在操作S360,当容器中至少一个物品的申报类别不属于申报信息中的申报类别时,确定申报信息存疑。此处申报信息中的申报类别可以是用户填写申报单时所填写的申报类别。用户填写申报单时所填写的申报类别与文本特征可以映射到申报类别可能相同也可能不同,毕竟用户填写申报时可能存在舞弊情况。因此,本公开实施例通过文本特征来映射物品真实的申报类别,并通过文本特征和图像特征的融合输出,对用户填写的申报信息进行核验。
图像特征可以包括与机检辐射图像中的N1个物品的图像信息分别对应的N1个第一图像特征向量,其中,N1为大于或等于1的整数。例如,在提取图像特征时,在机检辐射图像中根据物品的轮廓边界等将机检辐射图像划分为不同的图像块,然后可以针对每一个图像块识别出第一图像特征向量。
文本特征也可以包括与申报信息中的N2种物品分别对应的N2个文本特征向量,其中N2为大于或等于1的整数。本公开实施例根据申报信息中的物品描述信息对物品所属的申报类别进行识别,从而,当申报有N2种物品时,会得到N2个文本特征向量。
根据本公开的实施例,当将N1个第一图像特征向量分别或者组合成序列输入到跨模态解码器中,并将N2第二特征向量组合成序列输入到跨模态解码器的互注意力模块中,通过特征之间的交互操作,可以实现跨模态的信息融合。其中,每个第一特征向量与N2个第二特征向量交互后,都会得到一个对应的结果,该结果为容器中的物品对应的每个类别的概率输出。即每个物品会得到其预测类别的概率值。可以采用topN(例如, N=1或2)的方式将排名靠前的N个类别作为该物品的候选类别,当候选类别均不包含在申报信息中所填写的申报类别中时,认为风险较大,申报信息存在嫌疑。否则,可以认为申报信息无嫌疑。
由此可见,本公开的实施例通过联合申报信息与机检辐射图像对申报物品进行综合分析,提高了信息使用效率和查验准确率。
图4示意性示出了根据本公开实施例的用于检验申报信息真实性的方法中操作S330提取图像特征的流程图。
如图4所示,根据该实施操作S330可以包括操作S331~操作S333。
在操作S331,利用目标检测算法将机检辐射图像中的不同物品划分成独立的图像块,得到N1个图像块。
在操作S332,提取每个图像块对应的第二图像特征向量。例如,可以利用图像特征提取模块来提取图像块对应的第二图像特征向量。在一个实施例中,图像特征提取模块可以是卷积神经网络。
根据本公开的一个实施例,图像特征提取模块具体可以是以resnet作为基础网络,在resnet池化层后添加SE-block的网络结构。例如,可以采用resnet作为基础网络,并在原有resnet池化层后添加SE-block(Squeeze-Extract Block)、全连接层fc1和全连接层fc2。在一个实施例中,可以将全连接层fc1的输出结果作为第二图像特征向量,由此获取一组第二图像特征向量。
图像特征提取模块可以利用历史上申报的机检辐射图像中的图像块进行训练。例如,获取同一物品不同形态下的图像块,通过预先标注,让图像特征提取模块来学习和识别同一物品在机检辐射图像中可能的形态。
在操作S333,基于每个图像块对应的第二图像特征向量,得到与图像块所表示的物品的图像信息对应的第一图像特征向量。
在一个实施例中,可以直接将第二图像特征向量作为第一特征向量。
在另一些实施例中,也可以对第二图像特征向量进一步处理,得到第一图像特征向量。例如,可以根据每个图像块在机检辐射图像中的位置信息,对第二特征向量进行处理,得到与能够反映图像块在机检辐射图像中的空间位置关系的特征,作为第一特征向量。这样,在操作S350中进行 类别识别时,可以更全面,也可以一定程度上避免对容器内的物品的漏识别或重复识别等,提高识别效率。
图5示意性根据本公开实施例的用于检验申报信息真实性的方法中在操作S333结合位置信息得到图像特征的流程图。
如图5所示,根据本公开实施例操作S333可以包括操作S501和操作S502。在该实施例中可以根据每个图像块的位置信息对第二特征向量进行处理,得到作为第一特征向量。
具体地在操作S501,获取每个图像块在机检辐射图像中的位置信息。该位置信息例如可以是每个图像块外轮廓中的定位点坐标。比如,当图像块为规则的几何图形时(通常海关申报等业务中物品都被封装在包装盒中),位置信息可以是几何图形的顶点或交点的坐标。
然后在操作S502,基于同一个图像块对应的第二图像特征向量和位置信息,得到第一图像特征向量。
在一个实施例中,首先可以利用图像块的位置信息处理第二图像特征向量,例如可以将图像块的位置信息处理为向量,然后与图像块对应的第二图像特征向量连接起来;或者,例如可以利用图像块的位置信息对第二特征向量进行编码等映射转换处理。然后可以将利用位置信息处理后的第二图像特征向量输入到编码器,并获得编码器的输出,从而可以以编码器的输出作为到与图像块对应的第一图像特征向量。
根据本公开的实施例,在以编码器的输出作为与图像块对应的第一图像特征向量时,该编码器和操作S350中所使用的跨模态解码器可以联合训练。
联合训练时,编码器和跨模态解码器构成上下游关系,其中,可以以历史上申报的各类物品的图像块的位置信息对第二特征向量进行处理后的向量作为编码器的输入,将编码器的输出作为跨模态解码器的输入,同时将历史申报信息对应的文本特征作为跨模态解码器的互注意力模块的输入,获取跨模态解码器输出的图像块的申报类别,然后基于跨模态解码器输出的图像块的申报类别、与对该图像块标注的申报类别之间的误差,反复训练编码器和跨模态融合解码器。
根据本公开的实施例,编码器可以采用transformer编码器模型。相应地,跨模态解码器可以采用transformer解码器模型。
图6示意性示出了根据本公开实施例的用于检验申报信息真实性的方法中提取图像特征的流程示意。
结合图2和图6所示,可以利用目标检测法,根据物品在机检辐射图像22中的表现,划分成独立的图像块,并提取相应位置坐标。
然后利用图像特征提取模块601提取不同图像块的特征。在图6中图像特征提取模块601可以采用resnet(深度残差网络,一种经典的卷积神经网络)作为基础网络,并在原有resnet池化层后添加SE-block(Squeeze-Extract Block)神经网络结构单元、全连接层fc1-relu(relu为线性整流单元)和全连接层fc2层。其中,可以将SE-block通道数目设定为类别数目,以此方式引入通道注意力机制,并将全连接层fc1的输出结果作为每个图像块的第二图像特征向量,由此获取一组第二图像特征向量。假定其大小为N1*dim,其中N1表示图像块的个数,dim表示每个第二图像特征向量的特征维度。
接下来可以将N1个第二图像特征向量(即,N1*dim)与每个图像块对应的位置信息代入到transformer编码器601中,得到N1个第一图像特征向量,然后以该结果作为操作S350中跨模态解码器(例如,transformer解码器)的输入。
图7示意性示出了根据本公开实施例的用于检验申报信息真实性的方法中在操作S340提取文本特征的流程图。
如图7所示,根据本公开的实施例操作S340可以包括操作S341~操作S344。
在操作S341,提取申报信息中每一种物品的名称信息和规格型号信息。
在操作S342,对于每一种物品,将名称信息处理为第一语句,将规格型号信息处理为第二语句。
在操作S343,将对应于同一种物品的第一语句和第二语句作为文本特征提取投影模块的输入,利用文本特征提取投影模块对该物品所属的申报类别进行分类。
在操作S344,以文本特征提取投影模块针对每一种物品的类别输出结果作为该物品对应文本特征向量,其中,针对N2种物品对应得到N2个文本特征向量。
在本公开的一个实施例中,文本特征提取投影模块可以采用BERT模型。在训练时,可以搜集历史申报信息来训练BERT模型。将历史申报信息中每种物品的名称信息处理为第一语句,将规格型号信息处理为第二语句,形成语句序列输入到BERT模型,输出每种物品的类别。然后,基于BERT模型输出的文本特征向量所对应的类别,与该每种物品实际所属的申报类别之间的误差,对BERT模型(即,Bidirectional Encoder Representations from Transformers模型)进行反复训练。
图8示意性示出了根据本公开实施例的用于检验申报信息真实性的方法中提取文本特征的流程示意。
结合图2和图8,该实施例中采用BERT模型,提取申报信息21中的文本特征。
如果申报信息21中只有一种物品。则从该申报信息21中直接提取该物品的名称信息和规格型号,预处理后利用BERT模型801提取将类别输出作为该商品文本特征,即文本特征向量,维度为1*dim。
如果申报信息21包括多种物品,假设N2种。N2表示物品名称的个数,dim为特征维度。对于多种物品,需要先将同一种物品的名称信息和规格型号逐一对齐,再重复上述步骤,得到N2个dim维文本特征向量,构成N2*dim的文本特征向量序列。
以此方式,利用自然语言处理方法获得能够表示物品名称信息和规格型号的文本特征。其中,将文本特征作为外部引入特征引入到跨模块态解码器互注意力模块中。
图9示意性示出了根据本公开实施例的用于检验申报信息真实性的方法中在操作S350对容器中的物品的申报类别进行甄别的流程示意。
如图9所示,操作S350中可以采用跨模态解码器901对对容器中的物品的申报类别进行甄别。其中,跨模态解码器可以采用transformer解码器901。可以将N1个第一图像特征向量作为transformer解码器901的输入,将N2个文本特征向量作为transformer解码器901的互注意力模块的 输入,取transformer解码器901的输出值,作为最终比对结果。即每个物品块会得到其预测的申报类别的概率值,采用topN的方式将排名靠前的N个类别作为该图像块的候选类别,当候选类别均不属于申报类别时,则认为风险较大,否则,可以认为无嫌疑。
为了便于计算,可以设置图像特征和文本特征中向量的维度dim相等。
对于在同一申报类别下具有多种品类物品的情形,单独利用图像信息很难进行准确判断。本公开实施例在跨模态解码器中,通过机检辐射图像与申报信息的交互,利用自然语言处理方法获得能够表示物品名称信息和规格型号的文本特征,可以从物品文本描述信息方面来弥补视图像信息的不足,提高识别准确率。
在使用transformer解码器901进行跨模态融合时,N2个文本特征向量作为transformer解码器901的互注意力模块的外部特征,需要整体输入。N1个第一图像特征向量为transformer解码器901的识别对象,可以分别输入,或者组合成序列输入,对此可以根据实际设置。
根据本公开的实施例,为了应对在一种申报类别对应多种物品的情形下的准确报关查验问题,采用目标检测法将原来的整张机检辐射图像分割成为不同的图像块,通过对图像块的特征提取得到独立的物品特征,可以提高识别准确率。
为了能够有效提取物品特征,本公开实施例在图像特征提取模块中可以将SE-block通道数目设置为类别数目以引入通道注意力机制,提高图像特征提取的准确率。
为了体现物品块在空间上的位置关系,本公开实施例可以将图像块的位置信息与特征信息,一并送入到transformer编码器中,得到具有空间位置关系的特征(即,第一特征向量)。
本公开实施例,在模型训练过程中,采用了三种训练任务相结合的方式,即通过图像训练来训练图像特征提取模块(如,模块601)、通过文本训练来训练文本特征提取投影模块(如,BERT模型801),以及通过图像-文本训练来联合训练编码器和跨模态解码器(如,transformer编码器602和transformer编码器901),使得不同任务之间达到了相辅相成的效果。
图10示意性示出了根据本公开实施例的用于检验申报信息真实性的系统1000的框图。
如图10所示,根据本公开实施例的用于检验申报信息真实性的系统1000可以包括信息获取子系统110、特征提取子系统120、特征融合子系统130以及结论判定子系统140。该系统1000可以用于实现参考图3~图9所描述的方法。
具体地,信息获取子系统110可以用于获取对装载了物品的容器进行扫描而获得的机检辐射图像,以及获取对容器内的物品进行申报的申报信息。在一个实施例中,信息获取子系统110可以用于执行操作S310和操作S320。
特征提取子系统120可以用于对机检辐射图像中的物品的图像信息进行识别,得到与机检辐射图像对应的图像特征,以及对申报信息中的物品的文本信息进行识别,得到与申报信息对应的文本特征,其中,文本特征用于表征申报信息中的物品所属的申报类别。在一个实施例中,特征提取子系统120可以用于执行操作S330和操作S340。
特征融合子系统130可以用于以所述图像特征作为输入信息,以所述文本特征作为外部引入特征,对所述容器中的物品的申报类别进行甄别。例如,以图像特征作为跨模态解码器的输入信息,以文本特征作为跨模态解码器的注意力机制的外部引入特征,利用跨模态解码器对容器中的物品的申报类别进行甄别。在一个实施例中,特征融合子系统130可以用于执行操作S350。
结论判定子系统140可以用于当容器中至少一个物品的申报类别不属于申报信息中的申报类别时,确定申报信息存疑。在一个实施例中,结论判定子系统140可以用于执行操作S360。
图11示意性示出了根据本公开实施例的用于检验申报信息真实性的系统中特征提取子系统120的框图。
如图11所示,根据本公开的一些实施例,特征提取子系统120可以包括图像预处理模块100’、图像特征提取模块100、以及图像特征映射模块101。其中,图像特征包括与机检辐射图像中的N1个物品的图像信息分别对应的N1个第一图像特征向量,其中,N1为大于或等于1的整数。
图像预处理模块100’可以用于利用目标检测算法将机检辐射图像中的不同物品划分成独立的图像块,得到N1个图像块。在一个实施例中,图像预处理模块100’可以执行操作S331。
图像特征提取模块100可以用于提取每个图像块对应的第二图像特征向量。在一个实施例中,图像特征提取模块100可以执行操作S332。
图像特征映射模块101可以用于基于每个图像块对应的第二图像特征向量,得到与图像块所表示的物品的图像信息对应的第一图像特征向量。在一个实施例中,图像特征提取模块101可以执行操作S333。
继续参考图11,根据本公开的另一些实施例,特征提取子系统120还可以包括申报信息预处理模块102’以及文本特征提取投影模块102。其中,文本特征可以包括与申报信息中的N2种物品分别对应的文本特征向量,其中N2为大于或等于1的整数。
申报信息预处理模块102’用于提取申报信息中每一种物品的名称信息和规格型号信息,并且对于每一种物品,将名称信息处理为第一语句,将规格型号信息处理为第二语句。在一个实施例中,申报信息预处理模块102’可以执行操作S341和操作S342。
文本特征提取投影模块102用于以对应于同一种物品的第一语句和第二语句作为输入,对该物品所属的申报类别进行分类,并且以文本特征提取投影模块针对每一种物品的类别输出结果作为该物品对应文本特征向量,其中,针对N2种物品对应得到N2个文本特征向量。在一些实施例中,文本特征提取投影模块102可以执行操作S343和操作S344。
图12示意性示出了根据本公开另一实施例的用于检验申报信息真实性的系统的整体结构示意。
如图12所示,根据该实施例的用于检验申报信息真实性的系统1200可以包括特征提取子系统120和跨模态解码器200。系统1200会输出整体结论3和分块结论4。特征提取子系统120可以参考图13所示,跨模态解码器200可以参考图14所示。
图13示意性示出了根据本公开一实施例的用于检验申报信息真实性的系统1200中特征提取子系统120的结构示意。
如图13所示,根据该实施例的特征提取子系统120可以包括图像特征提取和申报信息提取两个分支。
其中,图像特征提取分支包括图像特征提取模块100和图像特征映射模块101,输出图像特征1。
申报信息提取分支包括文本特征提取投影模块102,输出文本特征2。
图14示意性示出了根据本公开一实施例的用于检验申报信息真实性的系统1200中跨模态解码器200的结构示意。
如图14所示,跨模态解码器200由多头自注意力模块201、特征加和与归一化模块202、多头互注意力模块203以及前馈网络204组合而成。输入为由120提取得到的图像特征1和文本特征2。其中,图像特征1作为多头自注意力模块201的主要输入信息,文本特征2作为增强信息引入到多头互注意力模块203中。
结合图12~图14,根据本公开的实施例,系统1200在工作时,对于机检辐射图像,利用目标检测法获取图像中不同物品(以图像的边界和纹理对图像中的物品块进行区分)的坐标位置,并根据坐标位置提取图像块,作为图像特征提取模块100的输入。
其中,图像块在进入图像特征提取模块100之前,可以进行数据增强处理,包括旋转,resize到固定大小、去均值、标准化等一系列操作;
当将固定大小的图像块输入到图像特征提取模块100中时,利用交叉熵进行约束。为了能够获取较好的图像特征,可以采用resnet50+SE-block组合的形式,对图像块的特征进行提取,抽取网络的倒数第二层(即,全连接层fc1的输出结果)作为图像块的特征(即,第二图像特征向量)。
然后将提取到的图像块的特征带入到图像特征映射模块101中。图像特征映射模块101的结构采用transformer编码器结构,可以与图像块对应的位置坐标处理为向量,然后将图像块的特征与图像块对应的位置坐标(沿着特征维度的方向连接起来)连接起来,输入到transformer编码器中,得到新的具有空间位置关系的图像块特征(即,第一图像特征向量)。
对于申报单中的物品申报信息,可以先进行如下步骤的预处理操作:去除规格型号中的独立数字,删除部分停用词及符号,和/或统一文本中的英文字母为小写。
然后,将处理好的物品的名称信息作为句子一,将申报的物品的规格型号作为句子二,输入到文本特征提取投影模块102(例如,BERT模型)进行文本特征提取。在文本特征提取投影模块102的训练过程中,可以采用多元交叉熵作为损失函数,对训练过程进行约束,将模型中最后一层的池化结果作为该条申报物品的类别输出。其中,当一个申报单中一个商品编号下包括多品种的物品时,需要将申报的物品名称和规格型号逐一对应,作为多种物品进行处理,重复对单个物品的申报信息的处理过程,通过BERT模型的前馈过程,得到每种物品的申报信息的文本特征,并将其连接起来。
然后,将特征提取子系统120输出的图像特征1和文本特征2代入到跨模态解码器200中,跨模态解码器200可以采用的transformer解码器模块,将图像特征1作为跨模态解码器200的value输入,将文本特征2引入互注意力模块203中,通过特征之间的交互操作,完成跨模态的信息融合。每个图像块的图像特征与申报信息中的文本特征交互后,都会得到一个对应的结果,该结果为多类别的概率输出,通过该值判断容器内装载的货物与申报信息中的申报类别是否一致。
根据本公开的实施例的模块、子模块、单元、子单元中的任意多个、或其中任意多个的至少部分功能可以在一个模块中实现。根据本公开实施例的模块、子模块、单元、子单元中的任意一个或多个可以被拆分成多个模块来实现。根据本公开实施例的模块、子模块、单元、子单元中的任意一个或多个可以至少被部分地实现为硬件电路,例如现场可编程门阵列(FPGA)、可编程逻辑阵列(PLA)、片上系统、基板上的系统、封装上的系统、专用集成电路(ASIC),或可以通过对电路进行集成或封装的任何其他的合理方式的硬件或固件来实现,或以软件、硬件以及固件三种实现方式中任意一种或以其中任意几种的适当组合来实现。或者,根据本公开实施例的模块、子模块、单元、子单元中的一个或多个可以至少被部分地实现为计算机程序模块,当该计算机程序模块被运行时,可以执行相应的功能。
例如,信息获取子系统110、特征提取子系统120、特征融合子系统130、结论判定子系统140、图像预处理模块100’、图像特征提取模块100、 图像特征映射模块101、申报信息预处理模块102’、文本特征提取投影模块102或者跨模态解码器200中的任意多个可以合并在一个模块中实现,或者其中的任意一个模块可以被拆分成多个模块。或者,这些模块中的一个或多个模块的至少部分功能可以与其他模块的至少部分功能相结合,并在一个模块中实现。根据本公开的实施例,信息获取子系统110、特征提取子系统120、特征融合子系统130、结论判定子系统140、图像预处理模块100’、图像特征提取模块100、图像特征映射模块101、申报信息预处理模块102’、文本特征提取投影模块102或者跨模态解码器200中的至少一个可以至少被部分地实现为硬件电路,例如现场可编程门阵列(FPGA)、可编程逻辑阵列(PLA)、片上系统、基板上的系统、封装上的系统、专用集成电路(ASIC),或可以通过对电路进行集成或封装的任何其他的合理方式等硬件或固件来实现,或以软件、硬件以及固件三种实现方式中任意一种或以其中任意几种的适当组合来实现。或者,信息获取子系统110、特征提取子系统120、特征融合子系统130、结论判定子系统140、图像预处理模块100’、图像特征提取模块100、图像特征映射模块101、申报信息预处理模块102’、文本特征提取投影模块102或者跨模态解码器200中的至少一个可以至少被部分地实现为计算机程序模块,当该计算机程序模块被运行时,可以执行相应的功能。
图15示意性示出了适于实现根据本公开实施例的用于检验申报信息真实性的方法的电子设备的框图。图15示出的电子设备1500仅仅是一个示例,不应对本公开实施例的功能和使用范围带来任何限制。
如图15所示,根据本公开实施例的电子设备1500包括处理器1501,其可以根据存储在只读存储器(ROM)1502中的程序或者从存储部分1508加载到随机访问存储器(RAM)1503中的程序而执行各种适当的动作和处理。处理器1501例如可以包括通用微处理器(例如CPU)、指令集处理器和/或相关芯片组和/或专用微处理器(例如,专用集成电路(ASIC)),等等。处理器1501还可以包括用于缓存用途的板载存储器。处理器1501可以包括用于执行根据本公开实施例的方法流程的不同动作的单一处理单元或者是多个处理单元。
在RAM 1503中,存储有电子设备1500操作所需的各种程序和数据。处理器1501、ROM 1502以及RAM 1503通过总线1504彼此相连。处理器1501通过执行ROM 1502和/或RAM 1503中的程序来执行根据本公开实施例的方法流程的各种操作。需要注意,所述程序也可以存储在除ROM 1502和RAM 1503以外的一个或多个存储器中。处理器1501也可以通过执行存储在所述一个或多个存储器中的程序来执行根据本公开实施例的方法流程的各种操作。
根据本公开的实施例,电子设备1500还可以包括输入/输出(I/O)接口1505,输入/输出(I/O)接口1505也连接至总线1504。电子设备1500还可以包括连接至I/O接口1505的以下部件中的一项或多项:包括键盘、鼠标等的输入部分1506;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分1507;包括硬盘等的存储部分1508;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分1509。通信部分1509经由诸如因特网的网络执行通信处理。驱动器1510也根据需要连接至I/O接口1505。可拆卸介质1511,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器1510上,以便于从其上读出的计算机程序根据需要被安装入存储部分1508。
根据本公开的实施例,根据本公开实施例的方法流程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括承载在计算机可读存储介质上的计算机程序,该计算机程序包含用于执行流程图所示的方法的程序代码。在这样的实施例中,该计算机程序可以通过通信部分1509从网络上被下载和安装,和/或从可拆卸介质1511被安装。在该计算机程序被处理器1501执行时,执行本公开实施例的系统中限定的上述功能。根据本公开的实施例,上文描述的系统、设备、装置、模块、单元等可以通过计算机程序模块来实现。
本公开还提供了一种计算机可读存储介质,该计算机可读存储介质可以是上述实施例中描述的设备/装置/系统中所包含的;也可以是单独存在,而未装配入该设备/装置/系统中。上述计算机可读存储介质承载有一个或者多个程序,当上述一个或者多个程序被执行时,实现根据本公开实施例的方法。
根据本公开的实施例,计算机可读存储介质可以是非易失性的计算机可读存储介质,例如可以包括但不限于:便携式计算机磁盘、硬盘、随机访问存储器(RAM)、只读存储器(ROM)、可擦式可编程只读存储器(EPROM或闪存)、便携式紧凑磁盘只读存储器(CD-ROM)、光存储器件、磁存储器件、或者上述的任意合适的组合。在本公开中,计算机可读存储介质可以是任何包含或存储程序的有形介质,该程序可以被指令执行系统、装置或者器件使用或者与其结合使用。例如,根据本公开的实施例,计算机可读存储介质可以包括上文描述的ROM 1502和/或RAM 1503和/或ROM 1502和RAM 1503以外的一个或多个存储器。
本公开的实施例还包括一种计算机程序产品,其包括计算机程序,该计算机程序包含用于执行本公开实施例所提供的方法的程序代码,当计算机程序产品在电子设备上运行时,该程序代码用于使电子设备实现本公开实施例所提供的图像识别方法。
在该计算机程序被处理器1501执行时,执行本公开实施例的系统/装置中限定的上述功能。根据本公开的实施例,上文描述的系统、装置、模块、单元等可以通过计算机程序模块来实现。
在一种实施例中,该计算机程序可以依托于光存储器件、磁存储器件等有形存储介质。在另一种实施例中,该计算机程序也可以在网络介质上以信号的形式进行传输、分发,并通过通信部分1509被下载和安装,和/或从可拆卸介质1511被安装。该计算机程序包含的程序代码可以用任何适当的网络介质传输,包括但不限于:无线、有线等等,或者上述的任意合适的组合。
根据本公开的实施例,可以以一种或多种程序设计语言的任意组合来编写用于执行本公开实施例提供的计算机程序的程序代码,具体地,可以利用高级过程和/或面向对象的编程语言、和/或汇编/机器语言来实施这些计算程序。程序设计语言包括但不限于诸如Java,C++,python,“C”语言或类似的程序设计语言。程序代码可以完全地在用户计算设备上执行、部分地在用户设备上执行、部分在远程计算设备上执行、或者完全在远程计算设备或服务器上执行。在涉及远程计算设备的情形中,远程计算设备可以通过任意种类的网络,包括局域网(LAN)或广域网(WAN),连接 到用户计算设备,或者,可以连接到外部计算设备(例如利用因特网服务提供商来通过因特网连接)。
附图中的流程图和框图,图示了按照本公开各种实施例的系统、方法和计算机程序产品的可能实现的体系架构、功能和操作。在这点上,流程图或框图中的每个方框可以代表一个模块、程序段、或代码的一部分,上述模块、程序段、或代码的一部分包含一个或多个用于实现规定的逻辑功能的可执行指令。也应当注意,在有些作为替换的实现中,方框中所标注的功能也可以以不同于附图中所标注的顺序发生。例如,两个接连地表示的方框实际上可以基本并行地执行,它们有时也可以按相反的顺序执行,这依所涉及的功能而定。也要注意的是,框图或流程图中的每个方框、以及框图或流程图中的方框的组合,可以用执行规定的功能或操作的专用的基于硬件的系统来实现,或者可以用专用硬件与计算机指令的组合来实现。
以上对本公开的实施例进行了描述。但是,这些实施例仅仅是为了说明的目的,而并非为了限制本公开的范围。尽管在以上分别描述了各实施例,但是这并不意味着各个实施例中的措施不能有利地结合使用。本公开的范围由所附权利要求及其等同物限定。不脱离本公开的范围,本领域技术人员可以做出多种替代和修改,这些替代和修改都应落在本公开的范围之内。

Claims (17)

  1. 一种用于检验申报信息真实性的方法,包括:
    获取对装载了物品的容器进行扫描而获得的机检辐射图像;
    获取对所述容器内的物品进行申报的申报信息;
    对所述机检辐射图像中的物品的图像信息进行识别,得到与所述机检辐射图像对应的图像特征;
    对所述申报信息中的物品的文本信息进行识别,得到与所述申报信息对应的文本特征;其中,所述文本特征用于表征所述申报信息中的物品所属的申报类别;
    以所述图像特征作为输入信息,以所述文本特征作为外部引入特征,对所述容器中的物品的申报类别进行甄别;以及
    当所述容器中至少一个物品的申报类别不属于所述申报信息中的申报类别时,确定所述申报信息存疑。
  2. 根据权利要求1所述的方法,其中,所述图像特征包括与所述机检辐射图像中的N1个物品的图像信息分别对应的N1个第一图像特征向量,其中,N1为大于或等于1的整数。
  3. 根据权利要求2所述的方法,其中,所述对所述机检辐射图像中的物品的图像信息进行识别,得到与所述机检辐射图像对应的图像特征包括:
    利用目标检测算法将所述机检辐射图像中的不同物品划分成独立的图像块,得到N1个图像块;
    提取每个所述图像块对应的第二图像特征向量;以及
    基于每个所述图像块对应的所述第二图像特征向量,得到与所述图像块所表示的物品的图像信息对应的所述第一图像特征向量。
  4. 根据权利要求3所述的方法,其中,所述提取每个所述图像块对应的第二图像特征向量包括:
    利用图像特征提取模块对每个所述图像块进行图像识别,得到每个所述图像块对应的所述第二图像特征向量;
    其中,所述图像特征提取模块包括卷积神经网络。
  5. 根据权利要求4所述的方法,其中,所述图像特征提取模块包括以resnet作为基础网络,在resnet池化层后添加SE-block的网络结构。
  6. 根据权利要求3所述的方法,其中,所述基于每个所述图像块对应的所述第二图像特征向量,得到与所述图像块所表示的物品的图像信息对应的所述第一图像特征向量包括:
    获取每个所述图像块在所述机检辐射图像中的位置信息;
    基于同一个所述图像块对应的所述第二图像特征向量和所述位置信息,得到所述第一图像特征向量。
  7. 根据权利要求6所述的方法,其中,所述基于同一个所述图像块对应的所述第二图像特征向量和所述位置信息,得到所述第一图像特征向量包括:
    利用所述图像块的所述位置信息处理所述第二图像特征向量;
    将处理后的所述第二图像特征向量输入编码器;以及
    获得所述编码器的输出,以得到与所述图像块对应的所述第一图像特征向量。
  8. 根据权利要求7所述的方法,其中,所述以所述图像特征作为输入信息,以所述文本特征作为外部引入特征,对所述容器中的物品的申报类别进行甄别,包括:
    以所述图像特征作为跨模态解码器的输入信息,以所述文本特征作为所述跨模态解码器的注意力机制的外部引入特征,利用所述跨模态解码器对所述容器中的物品的申报类别进行甄别。
  9. 根据权利要求8所述的方法,其中,所述编码器与所述跨模态解码器联合训练。
  10. 根据权利要求9所述的方法,其中,所述编码器采用transformer编码器模型。
  11. 根据权利要求10所述的方法,其中,所述跨模态解码器采用transformer解码器模型。
  12. 根据权利要求1~11任意一项所述的方法,其中,所述文本特征包括与所述申报信息中的N2种物品分别对应的文本特征向量,其中N2为大于或等于1的整数。
  13. 根据权利要求12所述的方法,其中,所述对所述申报信息中的物品的文本信息进行识别,得到与所述申报信息对应的文本特征包括:
    提取所述申报信息中每一种物品的名称信息和规格型号信息;
    对于每一种物品,将所述名称信息处理为第一语句,将所述规格型号信息处理为第二语句;
    将对应于同一种物品的所述第一语句和所述第二语句作为文本特征提取投影模块的输入,利用所述文本特征提取投影模块对该物品所属的申报类别进行分类;以及
    以所述文本特征提取投影模块针对每一种物品的类别输出结果作为该物品对应所述文本特征向量;其中,针对N2种物品对应得到N2个所述文本特征向量。
  14. 根据权利要求13所述的方法,其中,所述文本特征提取投影模块采用BERT模型。
  15. 一种用于检验申报信息真实性的系统,包括:
    信息获取子系统,用于获取对装载了物品的容器进行扫描而获得的机检辐射图像,以及获取对所述容器内的物品进行申报的申报信息;
    特征提取子系统,用于对所述机检辐射图像中的物品的图像信息进行识别,得到与所述机检辐射图像对应的图像特征;以及对所述申报信息中的物品的文本信息进行识别,得到与所述申报信息对应的文本特征;其中,所述文本特征用于表征所述申报信息中的物品所属的申报类别;
    特征融合子系统,用于以所述图像特征作为输入信息,以所述文本特征作为外部引入特征,对所述容器中的物品的申报类别进行甄别;以及
    结论判定子系统,用于当所述容器中至少一个物品的申报类别不属于所述申报信息中的申报类别时,确定所述申报信息存疑。
  16. 一种电子设备,包括:
    一个或多个存储器,存储有可执行指令;以及
    一个或多个处理器,执行所述可执行指令,以实现根据权利要求1~14中任一项所述的方法。
  17. 一种计算机可读存储介质,其上存储有可执行指令,该指令被处理器执行时使处理器执行根据权利要求1~14中任一项所述的方法。
PCT/CN2022/124815 2021-11-05 2022-10-12 用于检验申报信息真实性的方法、系统、设备及介质 WO2023078044A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111310339.8A CN116092096A (zh) 2021-11-05 2021-11-05 用于检验申报信息真实性的方法、系统、设备及介质
CN202111310339.8 2021-11-05

Publications (1)

Publication Number Publication Date
WO2023078044A1 true WO2023078044A1 (zh) 2023-05-11

Family

ID=84363233

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/124815 WO2023078044A1 (zh) 2021-11-05 2022-10-12 用于检验申报信息真实性的方法、系统、设备及介质

Country Status (4)

Country Link
US (1) US20230144433A1 (zh)
EP (1) EP4177793A1 (zh)
CN (1) CN116092096A (zh)
WO (1) WO2023078044A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP4357765A4 (en) * 2022-08-30 2024-05-08 Contemporary Amperex Technology Co., Limited ERROR DETECTION METHOD AND APPARATUS AND COMPUTER-READABLE STORAGE MEDIUM
CN116664450A (zh) * 2023-07-26 2023-08-29 国网浙江省电力有限公司信息通信分公司 基于扩散模型的图像增强方法、装置、设备及存储介质
CN117058473B (zh) * 2023-10-12 2024-01-16 深圳易行机器人有限公司 一种基于图像识别的仓储物料管理方法及系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070276619A1 (en) * 2004-12-28 2007-11-29 Great American Lines, Inc. Container inspection system
CN105808555A (zh) * 2014-12-30 2016-07-27 清华大学 检查货物的方法和系统
CN108108744A (zh) * 2016-11-25 2018-06-01 同方威视技术股份有限公司 用于辐射图像辅助分析的方法及其系统
CN110133739A (zh) * 2019-04-04 2019-08-16 南京全设智能科技有限公司 一种x射线安检设备及其自动识图方法
CN111753189A (zh) * 2020-05-29 2020-10-09 中山大学 一种少样本跨模态哈希检索共同表征学习方法

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109522913B (zh) * 2017-09-18 2022-07-19 同方威视技术股份有限公司 检查方法和检查设备以及计算机可读介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20070276619A1 (en) * 2004-12-28 2007-11-29 Great American Lines, Inc. Container inspection system
CN105808555A (zh) * 2014-12-30 2016-07-27 清华大学 检查货物的方法和系统
CN108108744A (zh) * 2016-11-25 2018-06-01 同方威视技术股份有限公司 用于辐射图像辅助分析的方法及其系统
CN110133739A (zh) * 2019-04-04 2019-08-16 南京全设智能科技有限公司 一种x射线安检设备及其自动识图方法
CN111753189A (zh) * 2020-05-29 2020-10-09 中山大学 一种少样本跨模态哈希检索共同表征学习方法

Also Published As

Publication number Publication date
US20230144433A1 (en) 2023-05-11
EP4177793A1 (en) 2023-05-10
CN116092096A (zh) 2023-05-09

Similar Documents

Publication Publication Date Title
WO2023078044A1 (zh) 用于检验申报信息真实性的方法、系统、设备及介质
US20200051017A1 (en) Systems and methods for image processing
US10685462B2 (en) Automatic data extraction from a digital image
EP3699579B1 (en) Inspection method and inspection device and computer-readable medium
US11170249B2 (en) Identification of fields in documents with neural networks using global document context
CN115004177A (zh) 图像检索系统
AU2020369152A1 (en) Docket analysis methods and systems
US20240045931A1 (en) Methods and apparatus for training a classification model based on images of non-bagged produce or images of bagged produce generated by a generative model
US10134123B2 (en) Methods, systems, and apparatuses for inspecting goods
Ferguson et al. A standardized representation of convolutional neural networks for reliable deployment of machine learning models in the manufacturing industry
Touati et al. Partly uncoupled siamese model for change detection from heterogeneous remote sensing imagery
US11599748B2 (en) Methods and apparatus for recognizing produce category, organic type, and bag type in an image using a concurrent neural network model
CN112966131B (zh) 一种海关数据风控类型识别方法、海关智能化风险布控方法、装置、计算机设备及存储介质
US11887579B1 (en) Synthetic utterance generation
US20230196738A1 (en) Methods and apparatus for recognizing produce category, organic type, and bag type in an image using a concurrent neural network model
CN116229195B (zh) 用于在线训练辐射图像识别模型的方法、辐射图像识别方法和装置
CHOI et al. Design and Implementation for BIC Code Recognition System of Containers using OCR and CRAFT in Smart Logistics
CN110795941B (zh) 一种基于外部知识的命名实体识别方法、系统及电子设备
US20220036066A1 (en) System and method for supporting user to read x-ray image
Shen et al. Cargo segmentation in stream of commerce (SoC) x-ray images with deep learning algorithms
WO2023279186A1 (en) Methods and systems for extracting text and symbols from documents
Riegelnegg Automated Extraction of Complexity Measures from Engineering Drawings
CN117474479A (zh) 物料审核方法、装置、计算机设备及存储介质
CN113850287A (zh) 动态分析的工业品相似度计算方法和系统
CN117975478A (zh) 图像识别方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22889070

Country of ref document: EP

Kind code of ref document: A1