WO2022022704A1 - 序列识别方法、装置、图像处理设备和存储介质 - Google Patents

序列识别方法、装置、图像处理设备和存储介质 Download PDF

Info

Publication number
WO2022022704A1
WO2022022704A1 PCT/CN2021/109764 CN2021109764W WO2022022704A1 WO 2022022704 A1 WO2022022704 A1 WO 2022022704A1 CN 2021109764 W CN2021109764 W CN 2021109764W WO 2022022704 A1 WO2022022704 A1 WO 2022022704A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
feature
feature map
image features
features
Prior art date
Application number
PCT/CN2021/109764
Other languages
English (en)
French (fr)
Inventor
许昀璐
Original Assignee
上海高德威智能交通系统有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海高德威智能交通系统有限公司 filed Critical 上海高德威智能交通系统有限公司
Priority to US18/017,660 priority Critical patent/US20230274566A1/en
Priority to EP21851002.2A priority patent/EP4191471A4/en
Publication of WO2022022704A1 publication Critical patent/WO2022022704A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/18Extraction of features or characteristics of the image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/25Fusion techniques
    • G06F18/253Fusion techniques of extracted features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/22Image preprocessing by selection of a specific region containing or referencing a pattern; Locating or processing of specific regions to guide the detection or recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • G06V10/443Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components by matching or filtering
    • G06V10/449Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters
    • G06V10/451Biologically inspired filters, e.g. difference of Gaussians [DoG] or Gabor filters with interaction between the filter responses, e.g. cortical complex cells
    • G06V10/454Integrating the filters into a hierarchical structure, e.g. convolutional neural networks [CNN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/60Type of objects
    • G06V20/62Text, e.g. of license plates, overlay texts or captions on TV images
    • G06V20/625License plates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition
    • G06V30/24Character recognition characterised by the processing or recognition method
    • G06V30/242Division of the character sequences into groups prior to recognition; Selection of dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V30/00Character recognition; Recognising digital ink; Document-oriented image-based pattern recognition
    • G06V30/10Character recognition

Definitions

  • the present application relates to the technical field of image recognition, and in particular, to a sequence recognition method, apparatus, image processing device and storage medium.
  • the image recognition model is used to recognize sequences such as license plate numbers and barcodes.
  • sequences such as license plate numbers and barcodes.
  • the license plate numbers or barcodes generally include multiple numbers and are identified through serial identification, the identification efficiency is low. Therefore, there is a need for a sequence identification method that provides the efficiency of identification.
  • Embodiments of the present application provide a sequence identification method, apparatus, image processing device, and storage medium, which can improve sequence identification efficiency.
  • the technical solution is as follows:
  • a sequence identification method comprising:
  • the first feature map is extracted with a time sequence relationship to obtain a second feature map that fuses the upper and lower information included in the target image.
  • the second feature map includes a plurality of second image features
  • character recognition is performed on the target image in parallel to obtain a character sequence.
  • performing character recognition on the target image in parallel based on the multiple first image features and the multiple second image features to obtain a character sequence including:
  • each set of image features including the first image feature and the second image feature at the same feature location
  • a sequence of characters is generated.
  • performing character recognition on the multiple groups of image features in parallel includes:
  • the size of the first image feature is B ⁇ C1 ⁇ H ⁇ W; the size of the second image feature is B ⁇ T ⁇ H ⁇ W; the size of the third image feature is B ⁇ C1 ⁇ T;
  • the B is the batch parameter of the image recognition model
  • C1 is the number of feature channels of the image recognition model
  • H is the height of the feature map
  • W is the width of the feature map
  • T is the length of the prediction sequence.
  • the time sequence relationship is extracted on the first feature map to obtain a second fusion of the upper and lower information included in the target image.
  • Feature maps including:
  • the channels in the fourth feature map are mapped to a preset sequence length to obtain the second feature map.
  • the method further includes:
  • the image recognition model is trained through a convolutional neural network based on the plurality of sample images and the character sequence labeled with each sample image.
  • a sequence identification device comprising:
  • an extraction module configured to perform feature extraction on the target image to be recognized by the image recognition model to obtain a first feature map, where the first feature map includes a plurality of first image features
  • the processing module is used for extracting the time sequence relationship of the first feature map based on the convolutional neural network layer and the fully connected layer in the image recognition model, so as to obtain the second feature that fuses the upper and lower information included in the target image
  • the second feature map includes a plurality of second image features
  • the recognition module is configured to perform character recognition on the target image in parallel based on the plurality of first image features and the plurality of second image features to obtain a character sequence.
  • the identification module includes:
  • a determining unit configured to determine multiple sets of image features based on the multiple first image features and the multiple second image features, each set of image features includes the first image feature and the second image at the same feature position feature;
  • a recognition unit for performing character recognition on the multiple groups of image features in parallel
  • the generating unit is used for generating a character sequence based on the recognized multiple characters.
  • the identification unit is used for:
  • the size of the first image feature is B ⁇ C7 ⁇ H ⁇ W; the size of the second image feature is B ⁇ T ⁇ H ⁇ W; the size of the third image feature is B ⁇ C7 ⁇ T;
  • the B is the batch parameter of the image recognition model
  • C7 is the number of feature channels of the image recognition model
  • H is the height of the feature map
  • W is the width of the feature map
  • T is the length of the prediction sequence.
  • the processing module includes:
  • a transformation unit configured to perform transformation processing on the number of channels of the first feature map through the convolutional neural network layer, to obtain a fourth feature map that fuses the upper and lower information included in the target image
  • a mapping unit configured to map the channels in the fourth feature map to a preset sequence length through the fully connected layer to obtain the second feature map.
  • the device further includes:
  • an acquisition module configured to acquire a plurality of sample images, each sample image is marked with a character sequence in the sample image
  • a training module configured to train the image recognition model through a convolutional neural network based on the plurality of sample images and the character sequence marked on each sample image.
  • an image processing device in another aspect, includes a processor and a memory, the memory stores at least one piece of program code, the at least one piece of program code is loaded and executed by the processor to Implement the following steps:
  • the first feature map is extracted with a time sequence relationship to obtain a second feature map that fuses the upper and lower information included in the target image.
  • the second feature map includes a plurality of second image features
  • character recognition is performed on the target image in parallel to obtain a character sequence.
  • the at least one piece of program code is loaded and executed by the processor to implement the following steps:
  • each set of image features including the first image feature and the second image feature at the same feature location
  • a sequence of characters is generated.
  • the at least one piece of program code is loaded and executed by the processor to implement the following steps:
  • the size of the first image feature is B ⁇ C1 ⁇ H ⁇ W; the size of the second image feature is B ⁇ T ⁇ H ⁇ W; the size of the third image feature is B ⁇ C1 ⁇ T;
  • the B is the batch parameter of the image recognition model
  • C1 is the number of feature channels of the image recognition model
  • H is the height of the feature map
  • W is the width of the feature map
  • T is the length of the prediction sequence.
  • the at least one piece of program code is loaded and executed by the processor to implement the following steps:
  • the channels in the fourth feature map are mapped to a preset sequence length to obtain the second feature map.
  • the at least one piece of program code is loaded and executed by the processor to implement the following steps:
  • the image recognition model is trained through a convolutional neural network based on the plurality of sample images and the character sequence labeled with each sample image.
  • a computer-readable storage medium where at least one piece of program code is stored in the computer-readable storage medium, and the at least one piece of program code is loaded and executed by a processor to implement any of the above possible implementations The described sequence identification method.
  • a computer program product includes at least one computer program, and when the computer program is executed by a processor, is used to implement the sequence identification method described in any of the above possible implementation manners.
  • the first feature map of the target image is subjected to time sequence relationship extraction, and the second feature map of the upper and lower information included in the fusion target image is obtained; thus, the first feature map is obtained.
  • the second feature map includes the time sequence relationship between characters; in this way, character recognition can be performed in parallel based on the first feature map and the second feature map, and the efficiency of character recognition is improved.
  • FIG. 1 is a schematic diagram of an implementation environment provided by an embodiment of the present application.
  • FIG. 4 is a schematic diagram of a sequence identification method provided by an embodiment of the present application.
  • FIG. 5 is a schematic diagram of another sequence identification method provided by an embodiment of the present application.
  • FIG. 6 is a schematic diagram of another sequence identification method provided by an embodiment of the present application.
  • FIG. 7 is a flowchart of a training method for an image recognition model provided by an embodiment of the present application.
  • FIG. 8 is a block diagram of a sequence identification device provided by an embodiment of the present application.
  • FIG. 9 is a block diagram of an image processing device provided by an embodiment of the present application.
  • FIG. 1 is a schematic diagram of an implementation environment provided by an embodiment of the present application.
  • the implementation environment includes an image capturing device 101 and an image processing device 102 ; the image capturing device 101 and the image processing device 102 are connected through a wireless or wired network.
  • the image acquisition device 101 is used to acquire a target image and transmit the target image to the image processing device 102 .
  • the image processing device 102 is used for sequence identification of the target image. And, the image processing device 102 performs sequence recognition on the target image through the image recognition model. Therefore, the image recognition model needs to be stored in the image processing device 102 in advance.
  • the image recognition model is obtained by training on the image processing device 102 , or the image recognition model is obtained by training on other devices, and then loaded onto the image processing device 102 .
  • the image acquisition device 101 is any device with an image acquisition function, such as a mobile phone, a tablet computer, a computer, a camera, or a camera.
  • the image processing device 102 is any device with an image processing function, such as a terminal or a server.
  • the image processing device 102 is a server, a server 103 cluster composed of multiple servers 103, or a cloud server 103 or the like. In the embodiments of the present application, this is not specifically limited.
  • sequence identification method of the embodiment of the present application can be applied in various practical application scenarios, and the actual technical effect of the embodiment of the present application is described below in combination with three exemplary application scenarios:
  • the scene of license plate number recognition applied in the parking lot In the scene of license plate number recognition, in response to the vehicle entering the parking lot, the image acquisition device 101 collects the first target image including the license plate number of the vehicle, and sends it to the image processing device. 102 Send the first target image. The image processing device 102 receives the first target image, identifies the license plate number from the first target image, and stores the license plate number and the entry time in association.
  • the image capturing device 101 captures the second target image including the license plate number of the vehicle again, and sends the second target image to the image processing device 102 .
  • the image processing device 102 receives the second target image, identifies the license plate number from the second target image, and according to the license plate number, searches for the drive-in time associated with the license plate number from the relationship between the license plate number and the entry time; The vehicle's drive-in time and drive-out time will be charged for the vehicle. This enables automatic charging of the vehicle.
  • the image capture device 101 captures a target image including a barcode, and sends the target image to the image processing device 102 .
  • the image processing device 102 receives the target image, identifies the numbers in the barcode from the target image, obtains a character sequence, determines the price of the commodity according to the character sequence, and then charges.
  • the image capture device 101 is a cashier's POS machine or a self-service cashier device.
  • this method also includes other applications; for example, it is applied in the scene of digital recognition and the like. In the embodiments of the present application, this is not specifically limited.
  • the image acquisition device 101 and the image processing device 102 are different devices for example for description.
  • the image acquisition device 101 and the image processing device 102 are the same device, for example, both are referred to as the image processing device 102, and the image processing device 102 is used to acquire target images and perform sequence identification on the target images.
  • the image processing device 102 not only has an image processing function, but also has an image acquisition function.
  • FIG. 2 is a flowchart of a sequence identification method provided by an embodiment of the present application. Referring to Figure 2, this embodiment includes:
  • 201 Perform feature extraction on a target image to be recognized by an image recognition model to obtain a first feature map, where the first feature map includes a plurality of first image features.
  • character recognition is performed on the target image in parallel to obtain a character sequence, including:
  • a sequence of characters is generated.
  • performing character recognition on multiple sets of image features in parallel including:
  • Matrix operation is performed on the first image feature and the second image feature in the multiple sets of image features to obtain a third feature map, the third feature map includes a plurality of third image features, and the third image feature is the first image feature at the same feature position.
  • the first image feature and the second image feature are obtained by matrix operation;
  • a plurality of third image features are decoded in parallel, and characters corresponding to each image feature are identified.
  • the size of the first image feature is B ⁇ C1 ⁇ H ⁇ W; the size of the second image feature is B ⁇ T ⁇ H ⁇ W; the size of the third image feature is B ⁇ C1 ⁇ T;
  • B is the batch parameter of the image recognition model
  • C1 is the number of feature channels of the image recognition model
  • H is the height of the feature map
  • W is the width of the feature map
  • T is the length of the prediction sequence.
  • the first feature map is extracted from the time sequence relationship, and the second feature map that fuses the upper and lower information included in the target image is obtained.
  • the number of channels of the first feature map is transformed through the convolutional neural network layer to obtain the fourth feature map of the upper and lower information included in the fusion target image;
  • the channels in the fourth feature map are mapped to the preset sequence length to obtain the second feature map.
  • the method further includes:
  • an image recognition model is trained through a convolutional neural network.
  • the first feature map of the target image is subjected to time sequence relationship extraction, and the second feature map of the upper and lower information included in the fusion target image is obtained; thus, the first feature map is obtained.
  • the second feature map includes the time sequence relationship between characters; in this way, character recognition can be performed in parallel based on the first feature map and the second feature map, and the efficiency of character recognition is improved.
  • FIG. 3 is a flowchart of a sequence identification method provided by an embodiment of the present application. Referring to Figure 3, this embodiment includes:
  • An image processing device acquires a target image to be recognized.
  • the target image is any image including a sequence of characters; the sequence of characters includes a sequence of one or more of numbers, letters and characters.
  • the image processing device acquires the target image.
  • the sequence recognition method is applied in the scene of license plate recognition in a parking lot; then, in response to a vehicle entering or exiting the parking lot, the image acquisition device collects a target image including the license plate.
  • the sequence recognition method is applied in the scene of barcode recognition; then when the user checks out the commodity, the image acquisition device captures the target image including the barcode of the commodity.
  • the sequence recognition method is applied in the scene of text recognition; when the user sees a text of interest, an image acquisition device is used for image acquisition; correspondingly, the image acquisition device acquires a target image including the text.
  • the image processing device in response to the image processing device not having an image capture function, in this step, receives the target image sent by the image capture device.
  • the scene in which the image acquisition device acquires the image is the same as the scene in which the image processing device acquires the image above, and details are not described herein again.
  • the target image is an image that includes the character sequence FLASH.
  • the target image is pre-stored in the image library in the image processing device.
  • the step of acquiring the target image to be recognized by the image processing device includes: the image acquisition device displays an image selection interface, the image selection interface includes an image index of each image in the image library; the user can select the image index to select the image.
  • the image processing device acquires the selected image index, and based on the image index, acquires the target image corresponding to the image index from the image library.
  • the image processing device performs feature extraction on the target image by using the image recognition model to obtain a first feature map, where the first feature map includes a plurality of first image features.
  • the image recognition model includes a feature extraction module; after the image acquisition device acquires the target image, the target image is input into the image recognition model, and the feature extraction module in the image recognition model performs feature extraction on the target image to obtain a first feature map.
  • the feature extraction module is obtained by CNN (Convolutional Neural Network, Convolutional Neural Network).
  • CNN is a feed-forward artificial neural network whose neurons can respond to surrounding units within a limited coverage area; and can effectively extract the structural information of images through weight sharing and feature aggregation.
  • the feature extraction module in the image recognition model is the first CNN neural network model; then referring to FIG. 4 , the image processing device inputs the target image including FLASH into the first CNN neural network model, outputs the first feature map, the first feature map It includes a plurality of first image features, and the size of the first image features is B ⁇ C1 ⁇ H ⁇ W; wherein, B is the batch parameter (Batch size) of the image recognition model, C1 is the feature channel number of the image recognition model, and H is the The height of the first feature map, and W is the width of the first feature map.
  • B is the batch parameter (Batch size) of the image recognition model
  • C1 is the feature channel number of the image recognition model
  • H is the The height of the first feature map
  • W is the width of the first feature map.
  • the image processing device can perform feature extraction on the entire target image through the image recognition model, and can also not perform feature extraction on the entire target image; instead, the image area where the sequence is located is first identified, and only the image area where the sequence is located. Feature extraction to obtain a first feature map, thereby reducing the time required for feature extraction and improving sequence recognition efficiency.
  • the process for the image processing device to perform feature extraction on the image region where the sequence is located by using the image recognition model is as follows: the image processing device intercepts a part of the image from the target image, and the part of the image is the image corresponding to the image region where the sequence is located. Input into the image recognition model, and perform feature extraction on the part of the image through the image recognition model to obtain a first feature map.
  • the image processing device Based on the convolutional neural network layer and the fully connected layer in the image recognition model, the image processing device extracts the time sequence relationship of the first feature map, and obtains a second feature map of the upper and lower information included in the fusion target image, and the second feature map A plurality of second image features are included.
  • Timing relationship extraction includes at least channel number change processing and sequence variable length processing.
  • the processing of changing the number of channels includes the processing of increasing or decreasing the number of channels; but the processing of sequence side length refers to increasing the number of feature channels of the first feature map.
  • the number of feature channels included in the second feature map and the first feature map are different, and the number of feature channels in the second feature map is greater than the number of feature channels included in the first feature map.
  • the image recognition module includes an encoding module, and the encoding module is a neural network model trained by the CNN network; the image processing device extracts the time sequence relationship of the first feature map through the second CNN neural network model, and converts the number of channels of the first feature map.
  • the second feature map is obtained; for example, continue to refer to FIG. 4 , in FIG. 4 , the encoding module is used as an encoder as an example for description.
  • the size of the second feature map is B ⁇ T ⁇ H ⁇ W. Among them, T is the preset sequence length.
  • the encoding module includes a fully connected layer (FC) and at least one convolutional neural network layer.
  • FC fully connected layer
  • the encoding module consists of 2 convolutional neural network layers, each convolutional network layer is a convolutional kernel with a kernel of 3 and a stride of 3.
  • this step is realized by following steps (1) and (2), including:
  • the image processing device transforms the number of channels of the first feature map through the convolutional neural network layer to obtain a fourth feature map of the upper and lower information included in the fusion target image.
  • the image processing module transforms the number of channels of the first feature map through the convolutional neural network layer to obtain a fourth feature map.
  • the image processing device first transforms the number of channels of the first feature map through a convolutional neural network layer, and inputs the obtained result to the next convolutional neural network layer , and transform the obtained result through the next convolutional neural network layer until the processing is completed through multiple convolutional neural network layers, and a fourth feature map is obtained.
  • the encoding module includes two convolutional neural network layers, namely convolutional neural network layer 1 and convolutional neural network layer 2; then the image processing device transforms the number of channels of the first feature map through convolutional neural network layer 1 processing to obtain a fifth feature map, and transform the number of channels of the fifth feature map through the convolutional neural network layer 2 to obtain a fourth feature map.
  • the image processing device transforms the number of channels of the first feature map through the convolutional neural network layer to obtain a fourth feature map; the number of channels of the fourth feature map is C2; correspondingly, the fourth feature map
  • the dimensions of the included image features are B ⁇ C2 ⁇ H ⁇ W.
  • the image processing device maps the channels in the fourth feature map to the preset sequence length through the fully connected layer to obtain the second feature map.
  • the preset sequence length can be set and changed as required; and the preset sequence length is the maximum number of characters that can be recognized by the image recognition model. For example, if the preset sequence length is 5; then the image recognition model can recognize character sequences that include up to 5 characters; for another example, the preset sequence length is 10; then the image recognition model can recognize character sequences that include up to 10 characters.
  • the image processing device maps the channels in the fourth feature map to the preset sequence length through the fully connected layer to obtain the second feature map; the number of channels in the second feature map is T; correspondingly, The size of the second image features included in the second feature map is B ⁇ T ⁇ H ⁇ W.
  • the width and height of the second image feature of the second feature map are respectively the same as or different from the width and height of the first image feature of the first feature map.
  • the preset sequence length and the number of channels of the first feature map are the same or different.
  • the width and height of the second feature map are respectively the same as the width and height of the first feature map, and the preset sequence length is different from the number of channels of the first feature map as an example.
  • the image processing device determines multiple sets of image features based on multiple first image features and multiple second image features, where each set of image features includes first image features and second image features at the same feature location.
  • the image processing device determines the feature position of the first image feature in the first feature map, and according to the feature position, determines from the second feature map the feature position at the feature position For the second image feature, the first image feature and the second image feature form a set of image features. Likewise, the image processing device searches sequentially according to this method until each first image feature in the first feature map is matched to the second image feature, and multiple sets of image features are obtained.
  • the first image feature in the first feature map is used to match the second feature map to the second feature map as an example for description.
  • the electronic device can also match the first feature map by using the second image feature in the second feature map.
  • the implementation process is similar to matching the second feature map from the first image feature in the first feature map to the second feature map, and details are not repeated here.
  • the first feature map includes N first image features, which are first image feature 1, first image feature 2, first image feature 3...first image feature N; second feature The figure includes N second image features, which are respectively second image feature 1, second image feature 2, second image feature 3...second image feature N.
  • the image processing device combines the first image feature 1 and the second image feature 1 into a group of image features, the first image feature 2 and the second image feature 2 into a group of image features, and the first image feature 3 and the second image feature 2.
  • the image feature 3 forms a set of image features, . . . the first image feature N and the second image feature N form a set of image features.
  • the image processing device performs character recognition on multiple sets of image features in parallel.
  • This step is achieved through the following steps (1) and (2), including:
  • the image processing device performs a matrix operation on the first image feature and the second image feature in the multiple sets of image features to obtain a third feature map, the third feature map includes a plurality of third image features, and the third image features It is obtained by matrix operation for the first image feature and the second image feature at the same feature position.
  • the size of the first image feature is B ⁇ C1 ⁇ H ⁇ W; the size of the second image feature is B ⁇ T ⁇ H ⁇ W; the size of the third image feature is B ⁇ C1 ⁇ T;
  • B is the batch parameter of the image recognition model
  • C is the number of feature channels of the image recognition model
  • H is the height of the feature map
  • W is the width of the feature map
  • T is the length of the predicted sequence.
  • the visualization of the third feature map is shown in Figure 6.
  • the output T feature maps focus on the character positions of "A”, "R”, and "T” in turn, and have high response values at the feature positions corresponding to the three letters. After getting this response, it is multiplied by the result of the feature vision module to obtain the feature map of BxC1xT, which can be directly used as an ordinary classification task for FC+Softmax classification.
  • the image processing device decodes a plurality of third image features in parallel, and identifies the character corresponding to each image feature.
  • the image recognition model includes a decoding module; the image processing device performs parallel character recognition on multiple groups of image features through the decoding module to obtain multiple characters.
  • the decoding module adopts a single-word classification prediction method, and performs parallel prediction based on multiple third image features. Therefore, the recognized multiple characters can be predicted and output side by side, without waiting for the output and state change of the previous moment.
  • the decoding module is taken as an example for description.
  • the image processing device decodes a plurality of third image features in parallel, and obtains characters corresponding to each feature position as F, L, A, S, H, [EOS]...[PAD].
  • [EOS] is the end identifier, which is used to indicate the end of character sequence recognition, that is, the character before [EOS] is the character sequence
  • [PAD] is the end bit, which is used to indicate the end of character recognition, that is, all character recognition is completed.
  • CNN and matrix operations are used to replace the mainstream Seq-to-Seq technology (such as RNN), and sequence recognition relies on the context dependence of time series modeling to realize the recognition of variable-length sequences, which can realize parallel character recognition and improve efficiency.
  • the image processing device generates a character sequence based on the recognized multiple characters.
  • the preset sequence lengths in different image recognition models are different; in response to the number of characters included in the target image being the same as the preset sequence length; the image processing device forms a character sequence from the recognized characters. In response to the number of characters included in the target image being different from the length of the preset sequence; the image processing apparatus composes a character sequence of characters located before the end identifier among the plurality of characters.
  • the plurality of characters are F, l, a, s, h, [EOS]...[PAD].
  • the image processing device forms a character sequence with the characters before [EOS]; correspondingly, the character sequence recognized by the image processing device is Flash.
  • the first feature map of the target image is subjected to time sequence relationship extraction, and the second feature map of the upper and lower information included in the fusion target image is obtained; thus, the first feature map is obtained.
  • the second feature map includes the time sequence relationship between characters; in this way, character recognition can be performed in parallel based on the first feature map and the second feature map, and the efficiency of character recognition is improved.
  • the image processing device trains the image recognition model; wherein, the process of the image processing device training the image recognition model is shown in Figure 7, including:
  • the image processing device acquires a plurality of sample images, and each sample image annotates a character sequence in the sample image.
  • the image processing device trains an image recognition model through a convolutional neural network based on the multiple sample images and the character sequence labeled with each sample image.
  • this step is realized through the following steps (1) to (4), including:
  • the image processing device performs feature extraction on each sample image based on the initial model to obtain a sixth feature map of each sample image, where the sixth feature map includes a plurality of sixth image features.
  • the image processing device Based on the convolutional neural network layer and the fully connected layer in the initial model, the image processing device extracts the temporal relationship of the sixth feature map of each sample image to obtain the seventh feature map of each sample image, the seventh feature
  • the figure includes a plurality of seventh image features, and the seventh feature map of each sample image fuses the upper and lower information included in the sample image.
  • the image processing device performs character recognition on each sample image in parallel based on the sixth feature map and the seventh feature map of each sample image, and obtains the predicted character sequence of each sample image;
  • the image processing device updates the initial model according to the predicted character sequence and the labeled character sequence of each sample image to obtain an image recognition model.
  • steps (1) to (4) are similar to steps 302-306, and will not be repeated here.
  • RNN codec when training image recognition models, RNN codec is generally used; and the limitations brought by the RNN codec framework (such as poor parallelism, slow training and testing speed, training is greatly affected by initialization, and it is difficult to fit better parameters. model, unfriendly hardware platform, etc.), this solution builds a sequence recognition framework that does not rely on RNN.
  • the image recognition model is trained through the convolutional neural network, which is friendly to the hardware platform, and the fully parallel encoding and decoding module of the string feature improves the efficiency, and the performance is stable and easy to use.
  • FIG. 8 is a block diagram of a sequence identification apparatus provided by an embodiment of the present application. Referring to Figure 8, the device includes:
  • the extraction module 801 is configured to perform feature extraction on the target image to be recognized by an image recognition model to obtain a first feature map, where the first feature map includes a plurality of first image features;
  • the processing module 802 is used for extracting the time sequence relationship on the first feature map based on the convolutional neural network layer and the fully connected layer in the image recognition model, to obtain the second feature map of the upper and lower information included in the fusion target image, and the second feature map.
  • the figure includes a plurality of second image features
  • the recognition module 803 is configured to perform character recognition on the target image in parallel based on the plurality of first image features and the plurality of second image features to obtain a character sequence.
  • the identification module 803 includes:
  • a determining unit for determining multiple groups of image features based on multiple first image features and multiple second image features, each group of image features including first image features and second image features at the same feature position;
  • a recognition unit for performing character recognition on multiple groups of image features in parallel
  • the generating unit is used for generating a character sequence based on the recognized multiple characters.
  • the identification unit is used to:
  • Matrix operation is performed on the first image feature and the second image feature in the multiple sets of image features to obtain a third feature map, the third feature map includes a plurality of third image features, and the third image feature is the first image feature at the same feature position.
  • the first image feature and the second image feature are obtained by matrix operation;
  • a plurality of third image features are decoded in parallel, and characters corresponding to each image feature are identified.
  • the size of the first image feature is B ⁇ C7 ⁇ H ⁇ W; the size of the second image feature is B ⁇ T ⁇ H ⁇ W; the size of the third image feature is B ⁇ C7 ⁇ T;
  • B is the batch parameter of the image recognition model
  • C7 is the number of feature channels of the image recognition model
  • H is the height of the feature map
  • W is the width of the feature map
  • T is the length of the predicted sequence.
  • the processing module 802 includes:
  • a transformation unit configured to perform transformation processing on the number of channels of the first feature map through a convolutional neural network layer to obtain a fourth feature map of the upper and lower information included in the fusion target image
  • the mapping unit is used to map the channels in the fourth feature map to the preset sequence length through the fully connected layer to obtain the second feature map.
  • the apparatus further includes:
  • the acquisition module is used to acquire multiple sample images, and each sample image annotates the character sequence in the sample image;
  • the training module is used to train an image recognition model through a convolutional neural network based on multiple sample images and character sequences marked with each sample image.
  • the first feature map of the target image is subjected to time sequence relationship extraction, and the second feature map of the upper and lower information included in the fusion target image is obtained; thus, the first feature map is obtained.
  • the second feature map includes the time sequence relationship between characters; in this way, character recognition can be performed in parallel based on the first feature map and the second feature map, and the efficiency of character recognition is improved.
  • sequence identification device provided by the above embodiment only uses the division of the above functional modules as an example to illustrate the sequence identification.
  • the above functions can be allocated to different functional modules as required.
  • the internal structure of the terminal is divided into different functional modules to complete all or part of the functions described above.
  • sequence identification device and the sequence identification method embodiments provided by the above embodiments belong to the same concept, and the specific implementation process thereof is detailed in the method embodiments, which will not be repeated here.
  • FIG. 9 is a block diagram of an image processing device provided by an embodiment of the present application.
  • the image processing device may vary greatly due to different configurations or performances, and may include one or more processing A processor (Central Processing Units, CPU) 901 and one or more memories 902, wherein, at least one instruction is stored in the memory 902, and at least one instruction is loaded and executed by the processor 901 to realize the sequence identification provided by each of the above method embodiments method.
  • the image processing device may also have components such as a wired or wireless network interface, a keyboard, and an input/output interface for input and output, and the image processing device may also include other components for realizing device functions, which will not be repeated here.
  • the first feature map of the target image is extracted first, and then the time sequence relationship is extracted on the first feature map to obtain the first feature map of the upper and lower information included in the fusion target image.
  • a computer-readable storage medium is also provided, and at least one piece of program code is stored in the computer-readable storage medium, and the above-mentioned at least one piece of program code can be executed by a processor in an image processing device to complete the above implementation.
  • the sequence recognition method in the example may be ROM (Read-Only Memory, read-only memory), RAM (Random Access Memory, random access memory), CD-ROM (Compact Disc Read-Only Memory, read-only optical disk), magnetic tape , floppy disks and optical data storage devices.
  • the above at least one piece of program code can be executed by the processor in the image processing device to perform the following steps:
  • the first feature map is extracted from the time sequence relationship, and the second feature map of the upper and lower information included in the fusion target image is obtained.
  • the second feature map includes a plurality of first feature maps. Two image features;
  • character recognition is performed on the target image in parallel to obtain a character sequence.
  • At least one piece of program code is loaded and executed by the processor to implement the following steps:
  • a sequence of characters is generated.
  • At least one piece of program code is loaded and executed by the processor to implement the following steps:
  • Matrix operation is performed on the first image feature and the second image feature in the multiple sets of image features to obtain a third feature map, the third feature map includes a plurality of third image features, and the third image feature is the first image feature at the same feature position.
  • the first image feature and the second image feature are obtained by matrix operation;
  • a plurality of third image features are decoded in parallel, and characters corresponding to each image feature are identified.
  • the size of the first image feature is B ⁇ C1 ⁇ H ⁇ W; the size of the second image feature is B ⁇ T ⁇ H ⁇ W; the size of the third image feature is B ⁇ C1 ⁇ T;
  • B is the batch parameter of the image recognition model
  • C1 is the number of feature channels of the image recognition model
  • H is the height of the feature map
  • W is the width of the feature map
  • T is the length of the prediction sequence.
  • At least one piece of program code is loaded and executed by the processor to implement the following steps:
  • the channels in the fourth feature map are mapped to the preset sequence length to obtain the second feature map.
  • At least one piece of program code is loaded and executed by the processor to implement the following steps:
  • an image recognition model is trained through a convolutional neural network.
  • the present application also provides a computer program product.
  • the computer program product includes at least one computer program.
  • the computer program is executed by the processor, the computer program is used to implement the sequence identification method provided by each of the above method embodiments.
  • the first feature map of the target image is subjected to time sequence relationship extraction, and the second feature map of the upper and lower information included in the fusion target image is obtained; thus, the first feature map is obtained.
  • the second feature map includes the time sequence relationship between characters; in this way, character recognition can be performed in parallel based on the first feature map and the second feature map, and the efficiency of character recognition is improved.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Evolutionary Biology (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Databases & Information Systems (AREA)
  • Medical Informatics (AREA)
  • Biodiversity & Conservation Biology (AREA)
  • Image Analysis (AREA)
  • Character Discrimination (AREA)

Abstract

一种序列识别方法、装置、图像处理设备和存储介质,属于图像识别技术领域。所述方法包括:通过图像识别模型对待识别的目标图像进行特征提取,得到第一特征图,所述第一特征图中包括多个第一图像特征;基于所述图像识别模型中的卷积神经网络层和全连接层,对所述第一特征图进行时序关系提取,得到融合所述目标图像包括的上下位信息的第二特征图,所述第二特征图中包括多个第二图像特征;基于所述多个第一图像特征和所述多个第二图像特征,对所述目标图像并行进行字符识别,得到字符序列。由于第二特征图中包含了字符之间的时序关系;这样就能够基于第一特征图和第二特征图并行进行字符识别,提高了字符识别效率。

Description

序列识别方法、装置、图像处理设备和存储介质
本申请要求于2020年07月30日提交的、申请号为202010751330.X、发明名称为“序列识别方法、装置、图像处理设备和存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及图像识别技术领域,特别涉及一种序列识别方法、装置、图像处理设备和存储介质。
背景技术
随着图像识别技术的发展,利用图像识别模型进行序列识别应用越来越广泛;例如,通过图像识别模型识别车牌号码、条形码等序列。然而由于车牌号码或者条形码一般包括多个,通过串行识别方式进行识别,从而导致识别的效率低。因此,需要一种序列识别方法,来提供识别的效率。
发明内容
本申请实施例提供了一种序列识别方法、装置、图像处理设备和存储介质,能够提高序列识别效率。所述技术方案如下:
一方面,提供了一种序列识别方法,所述方法包括:
通过图像识别模型对待识别的目标图像进行特征提取,得到第一特征图,所述第一特征图中包括多个第一图像特征;
基于所述图像识别模型中的卷积神经网络层和全连接层,对所述第一特征图进行时序关系提取,得到融合所述目标图像包括的上下位信息的第二特征图,所述第二特征图中包括多个第二图像特征;
基于所述多个第一图像特征和所述多个第二图像特征,对所述目标图像并行进行字符识别,得到字符序列。
可选地,所述基于所述多个第一图像特征和所述多个第二图像特征,对所述目标图像并行进行字符识别,得到字符序列,包括:
基于所述多个第一图像特征和所述多个第二图像特征,确定多组图像特征,每组图像特征中包括相同特征位置的第一图像特征和第二图像特征;
对所述多组图像特征并行进行字符识别;
基于已识别出的多个字符,生成字符序列。
可选地,所述对所述多组图像特征并行进行字符识别,包括:
对所述多组图像特征中的第一图像特征和第二图像特征进行矩阵运算,得到第三特征图,所述第三特征图中包括多个第三图像特征,所述第三图像特征为相同特征位置的第一图像特征和第二图像特征通过矩阵运算得到的;
对所述多个第三图像特征并行进行解码,识别每个图像特征对应的字符。
可选地,所述第一图像特征的尺寸为B×C1×H×W;所述第二图像特征的尺寸为B×T×H×W;所述第三图像特征的尺寸为B×C1×T;
其中,所述B为所述图像识别模型的批量参数、C1为所述图像识别模型的特征通道数;H为特征图高度,W为特征图宽度,T为预测序列长度。
可选地,所述基于所述图像识别模型中的卷积神经网络层和全连接层,对所述第一特征图进行时序关系提取,得到融合所述目标图像包括的上下位信息的第二特征图,包括:
通过所述卷积神经网络层对所述第一特征图的通道数进行变换处理,得到融合所述目标图像包括的上下位信息的第四特征图;
通过所述全连接层,将所述第四特征图中的通道映射到预设序列长度上,得到所述第二特征图。
可选地,所述方法还包括:
获取多个样本图像,每个样本图像标注所述样本图像中的字符序列;
基于所述多个样本图像和每个样本图像标注的字符序列,通过卷积神经网络,训练所述图像识别模型。
另一方面,提供了一种序列识别装置,所述装置包括:
提取模块,用于通过图像识别模型对待识别的目标图像进行特征提取,得到第一特征图,所述第一特征图中包括多个第一图像特征;
处理模块,用于基于所述图像识别模型中的卷积神经网络层和全连接层,对所述第一特征图进行时序关系提取,得到融合所述目标图像包括的上下位信 息的第二特征图,所述第二特征图中包括多个第二图像特征;
识别模块,用于基于所述多个第一图像特征和所述多个第二图像特征,对所述目标图像并行进行字符识别,得到字符序列。
可选地,所述识别模块,包括:
确定单元,用于基于所述多个第一图像特征和所述多个第二图像特征,确定多组图像特征,每组图像特征中包括相同特征位置的第一图像特征和第二图像特征;
识别单元,用于对所述多组图像特征并行进行字符识别;
生成单元,用于基于已识别出的多个字符,生成字符序列。
可选地,所述识别单元,用于:
对所述多组图像特征中的第一图像特征和第二图像特征进行矩阵运算,得到第三特征图,所述第三特征图中包括多个第三图像特征,所述第三图像特征为相同特征位置的第一图像特征和第二图像特征通过矩阵运算得到的;
对所述多个第三图像特征并行进行解码,识别每个图像特征对应的字符。
可选地,所述第一图像特征的尺寸为B×C7×H×W;所述第二图像特征的尺寸为B×T×H×W;所述第三图像特征的尺寸为B×C7×T;
其中,所述B为所述图像识别模型的批量参数、C7为所述图像识别模型的特征通道数;H为特征图高度,W为特征图宽度,T为预测序列长度。
可选地,所述处理模块,包括:
变换单元,用于通过所述卷积神经网络层对所述第一特征图的通道数进行变换处理,得到融合所述目标图像包括的上下位信息的第四特征图;
映射单元,用于通过所述全连接层,将所述第四特征图中的通道映射到预设序列长度上,得到所述第二特征图。
可选地,所述装置还包括:
获取模块,用于获取多个样本图像,每个样本图像标注所述样本图像中的字符序列;
训练模块,用于基于所述多个样本图像和每个样本图像标注的字符序列,通过卷积神经网络,训练所述图像识别模型。
另一方面,提供了一种图像处理设备,所述图像处理设备包括处理器和存 储器,所述存储器中存储有至少一条程序代码,所述至少一条程序代码由所述处理器加载并执行,以实现如下步骤:
通过图像识别模型对待识别的目标图像进行特征提取,得到第一特征图,所述第一特征图中包括多个第一图像特征;
基于所述图像识别模型中的卷积神经网络层和全连接层,对所述第一特征图进行时序关系提取,得到融合所述目标图像包括的上下位信息的第二特征图,所述第二特征图中包括多个第二图像特征;
基于所述多个第一图像特征和所述多个第二图像特征,对所述目标图像并行进行字符识别,得到字符序列。
可选地,所述至少一条程序代码由所述处理器加载并执行,以实现如下步骤:
基于所述多个第一图像特征和所述多个第二图像特征,确定多组图像特征,每组图像特征中包括相同特征位置的第一图像特征和第二图像特征;
对所述多组图像特征并行进行字符识别;
基于已识别出的多个字符,生成字符序列。
可选地,所述至少一条程序代码由所述处理器加载并执行,以实现如下步骤:
对所述多组图像特征中的第一图像特征和第二图像特征进行矩阵运算,得到第三特征图,所述第三特征图中包括多个第三图像特征,所述第三图像特征为相同特征位置的第一图像特征和第二图像特征通过矩阵运算得到的;
对所述多个第三图像特征并行进行解码,识别每个图像特征对应的字符。
可选地,所述第一图像特征的尺寸为B×C1×H×W;所述第二图像特征的尺寸为B×T×H×W;所述第三图像特征的尺寸为B×C1×T;
其中,所述B为所述图像识别模型的批量参数、C1为所述图像识别模型的特征通道数;H为特征图高度,W为特征图宽度,T为预测序列长度。
可选地,所述至少一条程序代码由所述处理器加载并执行,以实现如下步骤:
通过所述卷积神经网络层对所述第一特征图的通道数进行变换处理,得到融合所述目标图像包括的上下位信息的第四特征图;
通过所述全连接层,将所述第四特征图中的通道映射到预设序列长度上, 得到所述第二特征图。
可选地,所述至少一条程序代码由所述处理器加载并执行,以实现如下步骤:
获取多个样本图像,每个样本图像标注所述样本图像中的字符序列;
基于所述多个样本图像和每个样本图像标注的字符序列,通过卷积神经网络,训练所述图像识别模型。
另一方面,提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有至少一条程序代码,所述至少一条程序代码由处理器加载并执行,以实现上述任一可能实现方式所述的序列识别方法。
另一方面,提供了一种计算机程序产品,所述计算机程序产品包括至少一个计算机程序,所述计算机程序被处理器执行时,用于实现上述任一可能实现方式所述的序列识别方法。
在本申请实施例中,在对目标图像进行序列识别的过程中,由于对目标图像的第一特征图,进行时序关系提取,得到融合目标图像包括的上下位信息的第二特征图;从而第二特征图中包含了字符之间的时序关系;这样就能够基于第一特征图和第二特征图并行进行字符识别,提高了字符识别效率。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的一种实施环境的示意图;
图2是本申请实施例提供的一种序列识别方法的流程图;
图3是本申请实施例提供的另一种序列识别方法的流程图;
图4是本申请实施例提供的一种序列识别方法的示意图;
图5是本申请实施例提供的另一种序列识别方法的示意图;
图6是本申请实施例提供的另一种序列识别方法的示意图;
图7是本申请实施例提供的图像识别模型的训练方法的流程图;
图8是本申请实施例提供的一种序列识别装置的框图;
图9是本申请实施例提供的一种图像处理设备的框图。
具体实施方式
为使本申请的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施方式作进一步地详细描述。
本申请的说明书和权利要求书及所述附图中的术语“第一”、“第二”、“第三”和“第四”等是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们的任意变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其他步骤或单元。
图1是本申请实施例提供的一种实施环境的示意图。参见图1,该实施环境中包括图像采集设备101和图像处理设备102;图像采集设备101和图像处理设备102通过无线或者有线网络连接。
图像采集设备101用于采集目标图像,将目标图像传输至图像处理设备102。图像处理设备102用于对该目标图像进行序列识别。并且,图像处理设备102通过图像识别模型,对该目标图像进行序列识别。因此,图像处理设备102中需要事先存储图像识别模型。可选地,该图像识别模型为在图像处理设备102上训练得到的,或者,该图像识别模型为在其他设备上训练得到,然后加载到图像处理设备102上。
在本申请实施例中,图像采集设备101为手机、平板电脑、电脑、相机或者摄像头等任一具有图像采集功能的设备。图像处理设备102为终端或者服务器等任一具有图像处理功能的设备。响应于图像处理设备102为服务器;可选地,图像处理设备102为一个服务器、多个服务器103组成的服务器103集群,或者云服务器103等。在本申请实施例中,对此均不作具体限定。
本申请实施例的序列识别方法能够应用在各种实际应用场景中,以下结合三种示例性应用场景对本申请实施例的实际技术效果进行阐述:
(1)应用在停车场的车牌号码识别的场景:在车牌号码识别的场景中,响应于车辆驶入停车场,图像采集设备101采集包括车辆的车牌号码的第一目标图像,向图像处理设备102发送第一目标图像。图像处理设备102接收第一目标图像,从第一目标图像中识别出车牌号码,关联存储该车牌号码和驶入时间。
响应于车辆驶出停车场,图像采集设备101再次采集包括车辆的车牌号码的第二目标图像,向图像处理设备102发送第二目标图像。图像处理设备102接收第二目标图像,从第二目标图像中识别出车牌号码,根据该车牌号码,从车牌号码和驶入时间的关联关系中,查找该车牌号码关联的驶入时间;根据该车辆的驶入时间和驶出时间,对该车辆进行收费。从而可以实现自动对车辆进行收费。
(2)应用在条形码识别的场景:在条形码识别场景中,在用户购买商品进行结账时,图像采集设备101采集包括条形码的目标图像,向图像处理设备102发送该目标图像。图像处理设备102接收该目标图像,从该目标图像中识别出条形码中的数字,得到字符序列,根据该字符序列确定该商品的价格,进而进行收费。
在条形码识别场景中,图像采集设备101为收银员的Pos机或者自助收银设备。
(3)应用在文字识别的场景:在文字识别的场景中,用户看到一个段感兴趣的文字信息,用户通过图像采集设备101采集包括该文字信息的目标图像,向图像处理设备102发送该目标图像。图像处理设备102接收该目标图像,从该图像中识别该文字信息,向图像采集设备101返回该文字信息,从而实现自动从图像中识别出文字信息,不需要用户手动输入文字信息,提高了效率。
需要说明的一点是,该方法除了以上三个应用外,还包括其他应用;例如,应用在数字识别的场景中等。在本申请实施例中,对此不作具体限定。
需要说明的另一点是,在以实施环境中,以图像采集设备101和图像处理设备102为不同设备为例说明的。可选地,图像采集设备101和图像处理设备102为同一个设备,例如均称为图像处理设备102,则图像处理设备102用于采集目标图像以及对目标图像进行序列识别。相应的,图像处理设备102不仅具 有图像处理功能,还具有图像采集功能。
图2是本申请实施例提供的一种序列识别方法的流程图。参见图2,该实施例包括:
201、通过图像识别模型对待识别的目标图像进行特征提取,得到第一特征图,第一特征图中包括多个第一图像特征。
202、基于图像识别模型中的卷积神经网络层和全连接层,对第一特征图进行时序关系提取,得到融合目标图像包括的上下位信息的第二特征图,第二特征图中包括多个第二图像特征。
203、基于多个第一图像特征和多个第二图像特征,对目标图像并行进行字符识别,得到字符序列。
可选地,基于多个第一图像特征和多个第二图像特征,对目标图像并行进行字符识别,得到字符序列,包括:
基于多个第一图像特征和多个第二图像特征,确定多组图像特征,每组图像特征中包括相同特征位置的第一图像特征和第二图像特征;
对多组图像特征并行进行字符识别;
基于已识别出的多个字符,生成字符序列。
可选地,对多组图像特征并行进行字符识别,包括:
对多组图像特征中的第一图像特征和第二图像特征进行矩阵运算,得到第三特征图,第三特征图中包括多个第三图像特征,第三图像特征为相同特征位置的第一图像特征和第二图像特征通过矩阵运算得到的;
对多个第三图像特征并行进行解码,识别每个图像特征对应的字符。
可选地,第一图像特征的尺寸为B×C1×H×W;第二图像特征的尺寸为B×T×H×W;第三图像特征的尺寸为B×C1×T;
其中,B为图像识别模型的批量参数、C1为图像识别模型的特征通道数;H为特征图高度,W为特征图宽度,T为预测序列长度。
可选地,基于所述图像识别模型中的卷积神经网络层和全连接层,对所述第一特征图进行时序关系提取,得到融合所述目标图像包括的上下位信息的第二特征图,包括:
通过卷积神经网络层对第一特征图的通道数进行变换处理,得到融合目标 图像包括的上下位信息的第四特征图;
通过全连接层,将第四特征图中的通道映射到预设序列长度上,得到第二特征图。
可选地,该方法还包括:
获取多个样本图像,每个样本图像标注样本图像中的字符序列;
基于多个样本图像和每个样本图像标注的字符序列,通过卷积神经网络,训练图像识别模型。
在本申请实施例中,在对目标图像进行序列识别的过程中,由于对目标图像的第一特征图,进行时序关系提取,得到融合目标图像包括的上下位信息的第二特征图;从而第二特征图中包含了字符之间的时序关系;这样就能够基于第一特征图和第二特征图并行进行字符识别,提高了字符识别效率。
图3是本申请实施例提供的一种序列识别方法的流程图。参见图3,该实施例包括:
301、图像处理设备获取待识别的目标图像。
目标图像为包括字符序列的任一图像;字符序列包括数字、字母和文字中的一种或者多种的序列。
在一种可能的实现方式中,响应于图像处理设备具备图像采集功能,在本步骤中,图像处理设备采集目标图像。例如,该序列识别方法应用在停车场的车牌识别的场景中;则响应于车辆进入或者驶出停车场,图像采集设备采集包括车牌的目标图像。再如,该序列识别方法应用在条形码识别的场景;则用户对商品进行结账时,图像采集设备采集包括商品的条形码的目标图像。再如,该序列识别方法应用在文字识别的场景中;则用户看到一个感兴趣的文字,则使用图像采集设备进行图像采集;相应的,图像采集设备采集包括该文字的目标图像。
在另一种可能的实现方式中,响应于图像处理设备不具备图像采集功能,在本步骤中,图像处理设备接收图像采集设备发送的目标图像。图像采集设备采集图像的场景和上述图像处理设备采集图像的场景相同,在此不再赘述。例如,目标图像是包括FLASH这个字符序列的图像。
在另一种可能的实现方式中,图像处理设备中的图像库中事先存储了目标 图像。相应的,图像处理设备获取待识别的目标图像的步骤包括:图像采集设备展示图像选择界面,该图像选择界面中包括图像库中的每个图像的图像索引;用户能够选择图像索引以选择图像。图像处理设备获取已选择的图像索引,基于该图像索引,从图像库中获取该图像索引对应的目标图像。
302、图像处理设备通过图像识别模型对目标图像进行特征提取,得到第一特征图,第一特征图中包括多个第一图像特征。
图像识别模型中包括特征提取模块;图像采集设备获取到目标图像后,将目标图像输入至图像识别模型中,通过图像识别模型中的特征提取模块,对目标图像进行特征提取得到第一特征图。特征提取模块是通过CNN(Convolutional Neural Network,卷积神经网络)训练得到的。
其中,CNN是一种前馈的人工神经网络,其神经元能够响应有限覆盖范围内的周围单元;并且,能够通过权值共享和特征汇集,有效提取图像的结构信息。
例如,图像识别模型中的特征提取模块为第一CNN神经网络模型;则参见图4,图像处理设备将包括FLASH的目标图像输入第一CNN神经网络模型,输出第一特征图,第一特征图包括多个第一图像特征,第一图像特征的尺寸为B×C1×H×W;其中,B为图像识别模型的批量参数(Batch size),C1为图像识别模型的特征通道数,H为第一特征图的高度,W为第一特征图的宽度。
需要说明的一点是,图像处理设备可以通过图像识别模型对整个目标图像进行特征提取,还可以不对整个目标图像进行特征提取;而是先识别序列所在的图像区域,仅对序列所在的图像区域进行特征提取,得到第一特征图,从而减少特征提取所需时间,提高序列识别效率。
其中,图像处理设备通过图像识别模型对序列所在的图像区域进行特征提取的过程为:图像处理设备从目标图像中截取部分图像,该部分图像为序列所在的图像区域对应的图像,将该部分图像输入至图像识别模型中,通过图像识别模型对该部分图像进行特征提取,得到第一特征图。
303、图像处理设备基于图像识别模型中的卷积神经网络层和全连接层,对第一特征图进行时序关系提取,得到融合目标图像包括的上下位信息的第二特征图,第二特征图包括多个第二图像特征。
目标图像包括的上下位信息是指序列的时序关系。时序关系提取至少包括 通道数更改处理,还包括序列变长处理。通道数更改处理包括通道数变多或者变少处理;但序列边长处理是指将第一特征图的特征通道数变多。相应的,第二特征图和第一特征图包括的特征通道数不同,且第二特征图的特征通道数大于第一特征图包括的特征通道数。
图像识别模块包括编码模块,编码模块是由CNN网络训练得到的神经网络模型;图像处理设备通过第二CNN神经网络模型,对第一特征图进行时序关系提取,将第一特征图的通道数转换为预设序列长度,得到第二特征图;例如,继续参见图4,在图4中以编码模块为编码器为例进行说明。其中,第二特征图的尺寸为B×T×H×W。其中,T为预设序列长度。
在一种可能的实现方式中,编码模块包括全连接层(FC)和至少一个卷积神经网络层。例如,编码模块包括2个卷积神经网络层,每个卷积网络层为kernel为3,stride为3的卷积核。
其中,本步骤通过以下步骤(1)和(2)实现,包括:
(1)图像处理设备通过卷积神经网络层对第一特征图的通道数进行变换处理,得到融合目标图像包括的上下位信息的第四特征图。
响应于编码模块包括1个卷积神经网络层;则图像处理模块通过该卷积神经网络层对第一特征图的通道数进行变换处理,得到第四特征图。响应于编码模块包括多个卷积神经网络层;则图像处理设备先通过一个卷积神经网络层对第一特征图的通道数进行变换处理,将得到的结果输入到下一个卷积神经网络层,通过下一个卷积神经网络层对得到的结果进行变换处理,直到通过多个卷积神经网络层处理完成为止,得到第四特征图。
例如,编码模块包括2个卷积神经网络层,分别为卷积神经网络层1和卷积神经网络层2;则图像处理设备通过卷积神经网络层1对第一特征图的通道数进行变换处理,得到第五特征图,通过卷积神经网络层2对第五特征图的通道数进行变换处理,得到第四特征图。
例如,参见图5,图像处理设备通过卷积神经网络层对第一特征图的通道数进行变换处理,得到第四特征图;第四特征图的通道数为C2;相应的,第四特征图包括的图像特征的尺寸为B×C2×H×W。
(2)图像处理设备通过全连接层,将第四特征图中的通道映射到预设序列长度上,得到第二特征图。
可选地,预设序列长度能够根据需要进行设置并更改;并且,预设序列长度为图像识别模型能够识别的最大字符数。例如,预设序列长度为5;则图像识别模型能够识别最多包括5个字符的字符序列;再如,预设序列长度为10;则图像识别模型能够识别最多包括10个字符的字符序列。
例如,继续参见图5,图像处理设备通过全连接层,将第四特征图中的通道映射到预设序列长度上,得到第二特征图;第二特征图的通道数为T;相应的,第二特征图包括的第二图像特征的尺寸为B×T×H×W。
需要说明的一点是,第二特征图的第二图像特征的宽度和高度分别与第一特征图的第一图像特征的宽度和高度相同或者不同。并且,预设序列长度和第一特征图的通道数相同或者不同。在本申请实施例中,以第二特征图的宽度和高度分别与第一特征图的宽度和高度相同,且预设序列长度和第一特征图的通道数不同为例进行说明的。
304、图像处理设备基于多个第一图像特征和多个第二图像特征,确定多组图像特征,每组图像特征中包括相同特征位置的第一图像特征和第二图像特征。
对于第一特征图中的任一第一图像特征,图像处理设备确定该第一图像特征在第一特征图中的特征位置,根据该特征位置,从第二特征图中确定位于该特征位置的第二图像特征,将该第一图像特征和该第二图像特征组成一组图像特征。同样,图像处理设备按照这个方法依次查找,直到将第一特征图中的每个第一图像特征均匹配到第二图像特征为止,得到多组图像特征。
需要说明的一点是,在以上说明中,以通过第一特征图中的第一图像特征去第二特征图中匹配第二特征图为例进行说明。电子设备还能够通过第二特征图中的第二图像特征去匹配第一特征图。其实现过程与通过第一特征图中的第一图像特征去第二特征图中匹配第二特征图相似,在此不再赘述。
例如,继续参见图4,第一特征图中包括N个第一图像特征,分别为第一图像特征1、第一图像特征2、第一图像特征3……第一图像特征N;第二特征图中包括N个第二图像特征,分别为第二图像特征1、第二图像特征2、第二图像特征3……第二图像特征N。图像处理设备将第一图像特征1和第二图像特征1组成一组图像特征,将第一图像特征2和第二图像特征2组成一组图像特征,将第一图像特征3和第二图像特征3组成一组图像特征,……将第一图像特征N和第二图像特征N组成一组图像特征。
305、图像处理设备对多组图像特征并行进行字符识别。
本步骤通过以下步骤(1)和(2)实现,包括:
(1)图像处理设备对多组图像特征中的第一图像特征和第二图像特征进行矩阵运算,得到第三特征图,第三特征图中包括多个第三图像特征,第三图像特征为相同特征位置的第一图像特征和第二图像特征通过矩阵运算得到的。
其中,第一图像特征的尺寸为B×C1×H×W;第二图像特征的尺寸为B×T×H×W;第三图像特征的尺寸为B×C1×T;
其中,B为图像识别模型的批量参数、C为图像识别模型的特征通道数;H为特征图高度,W为特征图宽度,T为预测序列长度。
例如,第三特征图的可视化效果如图6所示。当图像识别模型的批量参数不用理会;单独识别一个“ART”字符串时;响应于W=10,H=1,可以大致得到如下所示的输出结果。输出的T个特征图依次关注“A”,“R”,“T”的字符位置,及在该3个字母对应的特征位置上有高的响应值。得到此响应之后,与特征视觉模块的结果举证相乘,就得到了BxC1xT的特征图,后续直接当做普通的分类任务来进行FC+Softmax分类即可。
(2)图像处理设备对多个第三图像特征并行进行解码,识别每个图像特征对应的字符。
其中,图像识别模型中包括解码模块;图像处理设备通过该解码模块对多组图像特征进行并行字符识别,得到多个字符。其中,解码模块采用单字分类预测方式,基于多个第三图像特征并行进行预测。因此识别出的多个字符能够并列预测输出,无需等待上一时刻的输出和状态变化。
例如,继续参见图4,在图4中以解码模块为解码器为例进行说明的。图像处理设备对多个第三图像特征并行进行解码,得到每个特征位置对应的字符分别为F、L、A、S、H、[EOS]……[PAD]。[EOS]为结束标识符,用于表示字符序列识别结束,也即[EOS]前的字符为字符序列,[PAD]为结束位,用于表示字符识别结束,也即所有字符识别完成。
本申请实施例中用CNN和矩阵运算替代主流的Seq-to-Seq技术(如RNN),序列识别依赖时序建模的上下文依赖来实现变长的序列的识别,这样能够实现并行字符识别,提高了效率。
306、图像处理设备基于已识别出的多个字符,生成字符序列。
不同的图像识别模型中的预设序列长度不同;响应于目标图像中包括的字符数量与该预设序列长度相同;则图像处理设备将已识别的多个字符,组成字符序列。响应于目标图像中包括的字符数量与该预设序列长度不同;则图像处理设备将多个字符中位于结束标识符之前的字符组成字符序列。
例如,继续参见图4,多个字符分别为F、l、a、s、h、[EOS]……[PAD]。则图像处理设备将[EOS]之前的字符组成字符序列;相应的,图像处理设备识别出的字符序列为Flash。
在本申请实施例中,在对目标图像进行序列识别的过程中,由于对目标图像的第一特征图,进行时序关系提取,得到融合目标图像包括的上下位信息的第二特征图;从而第二特征图中包含了字符之间的时序关系;这样就能够基于第一特征图和第二特征图并行进行字符识别,提高了字符识别效率。
在图像处理设备通过图像识别模型,对目标图像进行识别之前,图像处理设备训练图像识别模型;其中,图像处理设备训练图像识别模型的过程参见图7,包括:
701、图像处理设备获取多个样本图像,每个样本图像标注样本图像中的字符序列。
702、图像处理设备基于多个样本图像和每个样本图像标注的字符序列,通过卷积神经网络,训练图像识别模型。
其中,本步骤通过以下步骤(1)至(4)实现,包括:
(1)图像处理设备基于初始模型,对每个样本图像进行特征提取,得到每个样本图像的第六特征图,第六特征图中包括多个第六图像特征。
(2)图像处理设备基于初始模型中的卷积神经网络层和全连接层,对每个样本图像的第六特征图进行时序关系提取,得到每个样本图像的第七特征图,第七特征图中包括多个第七图像特征,且每个样本图像的第七特征图融合了该样本图像包括的上下位信息。
(3)图像处理设备基于每个样本图像的第六特征图和第七特征图,对每个样本图像并行进行字符识别,得到每个样本图像的预测字符序列;
(4)图像处理设备根据每个样本图像的预测字符序列和已标注的字符序列,对初始模型进行模型更新,得到图像识别模型。
需要说明的一点是,步骤(1)至(4)的实现过程和步骤302-306相似在此不再赘述。
相关技术中在训练图像识别模型时,一般使用RNN编解码;而RNN编解码框架带来的局限性(如并行性差,训练和测试速度慢,训练受初始化影响大,难以拟合到较优参数模型,硬件平台不友好的问题等),本方案搭建了不依赖RNN的序列识别框架。通过卷积神经网络进行图像识别模型的训练,这样对硬件平台友好,串特征的全并行编码解码模块,实现效率提高,且性能稳定提升,灵活易用。
图8是本申请实施例提供的一种序列识别装置的框图。参见图8,该装置包括:
提取模块801,用于通过图像识别模型对待识别的目标图像进行特征提取,得到第一特征图,第一特征图中包括多个第一图像特征;
处理模块802,用于基于图像识别模型中的卷积神经网络层和全连接层,对第一特征图进行时序关系提取,得到融合目标图像包括的上下位信息的第二特征图,第二特征图中包括多个第二图像特征;
识别模块803,用于基于多个第一图像特征和多个第二图像特征,对目标图像并行进行字符识别,得到字符序列。
在一种可能的实现方式中,识别模块803,包括:
确定单元,用于基于多个第一图像特征和多个第二图像特征,确定多组图像特征,每组图像特征中包括相同特征位置的第一图像特征和第二图像特征;
识别单元,用于对多组图像特征并行进行字符识别;
生成单元,用于基于已识别出的多个字符,生成字符序列。
在另一种可能的实现方式中,识别单元,用于:
对多组图像特征中的第一图像特征和第二图像特征进行矩阵运算,得到第三特征图,第三特征图中包括多个第三图像特征,第三图像特征为相同特征位置的第一图像特征和第二图像特征通过矩阵运算得到的;
对多个第三图像特征并行进行解码,识别每个图像特征对应的字符。
在另一种可能的实现方式中,第一图像特征的尺寸为B×C7×H×W;第二图像特征的尺寸为B×T×H×W;第三图像特征的尺寸为B×C7×T;
其中,B为图像识别模型的批量参数、C7为图像识别模型的特征通道数;H为特征图高度,W为特征图宽度,T为预测序列长度。
在另一种可能的实现方式中,处理模块802,包括:
变换单元,用于通过卷积神经网络层对第一特征图的通道数进行变换处理,得到融合目标图像包括的上下位信息的第四特征图;
映射单元,用于通过全连接层,将第四特征图中的通道映射到预设序列长度上,得到第二特征图。
在另一种可能的实现方式中,装置还包括:
获取模块,用于获取多个样本图像,每个样本图像标注样本图像中的字符序列;
训练模块,用于基于多个样本图像和每个样本图像标注的字符序列,通过卷积神经网络,训练图像识别模型。
在本申请实施例中,在对目标图像进行序列识别的过程中,由于对目标图像的第一特征图,进行时序关系提取,得到融合目标图像包括的上下位信息的第二特征图;从而第二特征图中包含了字符之间的时序关系;这样就能够基于第一特征图和第二特征图并行进行字符识别,提高了字符识别效率。
需要说明的是:上述实施例提供的序列识别装置在序列识别时,仅以上述各功能模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能模块完成,即将终端的内部结构划分成不同的功能模块,以完成以上描述的全部或者部分功能。另外,上述实施例提供的序列识别装置与序列识别方法实施例属于同一构思,其具体实现过程详见方法实施例,这里不再赘述。
响应于该图像处理设备为服务器;图9是本申请实施例提供的一种图像处理设备的框图,该图像处理设备可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上处理器(Central Processing Units,CPU)901和一个或一个以上的存储器902,其中,存储器902中存储有至少一条指令,至少一条指令由处理器901加载并执行以实现上述各个方法实施例提供的序列识别方法。当然,该图像处理设备还可以具有有线或无线网络接口、键盘以及输入输出接口 等部件,以便进行输入输出,该图像处理设备还可以包括其他用于实现设备功能的部件,在此不做赘述。
在本申请实施例中,在对目标图像进行序列识别的过程中,先提取目标图像的第一特征图,然后对第一特征图进行时序关系提取,得到融合目标图像包括的上下位信息的第二特征图;从而第二特征图中包含了字符之间的时序关系;这样就能够基于第一特征图和第二特征图并行进行字符识别,提高了字符识别效率。
在示例性实施例中,还提供了一种计算机可读存储介质,该计算机可读存储介质中存储有至少一条程序代码,上述至少一条程序代码可由图像处理设备中的处理器执行以完成上述实施例中的序列识别方法。例如,计算机可读存储介质可以是ROM(Read-Only Memory,只读存储器)、RAM(Random Access Memory,随机存取存储器)、CD-ROM(Compact Disc Read-Only Memory,只读光盘)、磁带、软盘和光数据存储设备等。
可选地,上述至少一条程序代码可由图像处理设备中的处理器执行以下步骤:
通过图像识别模型对待识别的目标图像进行特征提取,得到第一特征图,第一特征图中包括多个第一图像特征;
基于图像识别模型中的卷积神经网络层和全连接层,对第一特征图进行时序关系提取,得到融合目标图像包括的上下位信息的第二特征图,第二特征图中包括多个第二图像特征;
基于多个第一图像特征和多个第二图像特征,对目标图像并行进行字符识别,得到字符序列。
可选地,至少一条程序代码由处理器加载并执行,以实现如下步骤:
基于多个第一图像特征和多个第二图像特征,确定多组图像特征,每组图像特征中包括相同特征位置的第一图像特征和第二图像特征;
对多组图像特征并行进行字符识别;
基于已识别出的多个字符,生成字符序列。
可选地,至少一条程序代码由处理器加载并执行,以实现如下步骤:
对多组图像特征中的第一图像特征和第二图像特征进行矩阵运算,得到第 三特征图,第三特征图中包括多个第三图像特征,第三图像特征为相同特征位置的第一图像特征和第二图像特征通过矩阵运算得到的;
对多个第三图像特征并行进行解码,识别每个图像特征对应的字符。
可选地,第一图像特征的尺寸为B×C1×H×W;第二图像特征的尺寸为B×T×H×W;第三图像特征的尺寸为B×C1×T;
其中,B为图像识别模型的批量参数、C1为图像识别模型的特征通道数;H为特征图高度,W为特征图宽度,T为预测序列长度。
可选地,至少一条程序代码由处理器加载并执行,以实现如下步骤:
通过卷积神经网络层对第一特征图的通道数进行变换处理,得到融合目标图像包括的上下位信息的第四特征图;
通过全连接层,将第四特征图中的通道映射到预设序列长度上,得到第二特征图。
可选地,至少一条程序代码由处理器加载并执行,以实现如下步骤:
获取多个样本图像,每个样本图像标注样本图像中的字符序列;
基于多个样本图像和每个样本图像标注的字符序列,通过卷积神经网络,训练图像识别模型。
本申请还提供了一种计算机程序产品,计算机程序产品包括至少一个计算机程序,计算机程序被处理器执行时,用于实现上述各个方法实施例提供的序列识别方法。
在本申请实施例中,在对目标图像进行序列识别的过程中,由于对目标图像的第一特征图,进行时序关系提取,得到融合目标图像包括的上下位信息的第二特征图;从而第二特征图中包含了字符之间的时序关系;这样就能够基于第一特征图和第二特征图并行进行字符识别,提高了字符识别效率。
本领域普通技术人员可以理解实现上述实施例的全部或部分步骤可以通过硬件来完成,也可以通过程序来指令相关的硬件完成,所述的程序可以存储于一种计算机可读存储介质中,上述提到的存储介质可以是只读存储器,磁盘或光盘等。
以上所述仅为本申请的可选实施例,并不用以限制本申请,凡在本申请的精神和原则之内,所作的任何修改、等同替换、改进等,均应包含在本申请的保护范围之内。

Claims (19)

  1. 一种序列识别方法,其特征在于,所述方法包括:
    通过图像识别模型对待识别的目标图像进行特征提取,得到第一特征图,所述第一特征图中包括多个第一图像特征;
    基于所述图像识别模型中的卷积神经网络层和全连接层,对所述第一特征图进行时序关系提取,得到融合所述目标图像包括的上下位信息的第二特征图,所述第二特征图中包括多个第二图像特征;
    基于所述多个第一图像特征和所述多个第二图像特征,对所述目标图像并行进行字符识别,得到字符序列。
  2. 根据权利要求1所述的方法,其特征在于,所述基于所述多个第一图像特征和所述多个第二图像特征,对所述目标图像并行进行字符识别,得到字符序列,包括:
    基于所述多个第一图像特征和所述多个第二图像特征,确定多组图像特征,每组图像特征中包括相同特征位置的第一图像特征和第二图像特征;
    对所述多组图像特征并行进行字符识别;
    基于已识别出的多个字符,生成字符序列。
  3. 根据权利要求2所述的方法,其特征在于,所述对所述多组图像特征并行进行字符识别,包括:
    对所述多组图像特征中的第一图像特征和第二图像特征进行矩阵运算,得到第三特征图,所述第三特征图中包括多个第三图像特征,所述第三图像特征为相同特征位置的第一图像特征和第二图像特征通过矩阵运算得到的;
    对所述多个第三图像特征并行进行解码,识别每个图像特征对应的字符。
  4. 根据权利要求3所述的方法,其特征在于,所述第一图像特征的尺寸为B×C1×H×W;所述第二图像特征的尺寸为B×T×H×W;所述第三图像特征的尺寸为B×C1×T;
    其中,所述B为所述图像识别模型的批量参数、C1为所述图像识别模型的 特征通道数;H为特征图高度,W为特征图宽度,T为预测序列长度。
  5. 根据权利要求1所述的方法,其特征在于,所述基于所述图像识别模型中的卷积神经网络层和全连接层,对所述第一特征图进行时序关系提取,得到融合所述目标图像包括的上下位信息的第二特征图,包括:
    通过所述卷积神经网络层对所述第一特征图的通道数进行变换处理,得到融合所述目标图像包括的上下位信息的第四特征图;
    通过所述全连接层,将所述第四特征图中的通道映射到预设序列长度上,得到所述第二特征图。
  6. 根据权利要求1-5任一项所述的方法,其特征在于,所述方法还包括:
    获取多个样本图像,每个样本图像标注所述样本图像中的字符序列;
    基于所述多个样本图像和每个样本图像标注的字符序列,通过卷积神经网络,训练所述图像识别模型。
  7. 一种序列识别装置,其特征在于,所述装置包括:
    提取模块,用于通过图像识别模型对待识别的目标图像进行特征提取,得到第一特征图,所述第一特征图中包括多个第一图像特征;
    处理模块,用于基于所述图像识别模型中的卷积神经网络层和全连接层,对所述第一特征图进行时序关系提取,得到融合所述目标图像包括的上下位信息的第二特征图,所述第二特征图中包括多个第二图像特征;
    识别模块,用于基于所述多个第一图像特征和所述多个第二图像特征,对所述目标图像并行进行字符识别,得到字符序列。
  8. 根据权利要求7所述的装置,其特征在于,所述识别模块,包括:
    确定单元,用于基于所述多个第一图像特征和所述多个第二图像特征,确定多组图像特征,每组图像特征中包括相同特征位置的第一图像特征和第二图像特征;
    识别单元,用于对所述多组图像特征并行进行字符识别;
    生成单元,用于基于已识别出的多个字符,生成字符序列。
  9. 根据权利要求8所述的装置,其特征在于,所述识别单元,用于:
    对所述多组图像特征中的第一图像特征和第二图像特征进行矩阵运算,得到第三特征图,所述第三特征图中包括多个第三图像特征,所述第三图像特征为相同特征位置的第一图像特征和第二图像特征通过矩阵运算得到的;
    对所述多个第三图像特征并行进行解码,识别每个图像特征对应的字符。
  10. 根据权利要求9所述的装置,其特征在于,所述第一图像特征的尺寸为B×C7×H×W;所述第二图像特征的尺寸为B×T×H×W;所述第三图像特征的尺寸为B×C7×T;
    其中,所述B为所述图像识别模型的批量参数、C7为所述图像识别模型的特征通道数;H为特征图高度,W为特征图宽度,T为预测序列长度。
  11. 根据权利要求7所述的装置,其特征在于,所述处理模块,包括:
    变换单元,用于通过所述卷积神经网络层对所述第一特征图的通道数进行变换处理,得到融合所述目标图像包括的上下位信息的第四特征图;
    映射单元,用于通过所述全连接层,将所述第四特征图中的通道映射到预设序列长度上,得到所述第二特征图。
  12. 根据权利要求7-11任一项所述的装置,其特征在于,所述装置还包括:
    获取模块,用于获取多个样本图像,每个样本图像标注所述样本图像中的字符序列;
    训练模块,用于基于所述多个样本图像和每个样本图像标注的字符序列,通过卷积神经网络,训练所述图像识别模型。
  13. 一种图像处理设备,其特征在于,所述图像处理设备包括处理器和存储器,所述存储器中存储有至少一条程序代码,所述至少一条程序代码由所述处理器加载并执行,以实现如下步骤:
    通过图像识别模型对待识别的目标图像进行特征提取,得到第一特征图,所述第一特征图中包括多个第一图像特征;
    基于所述图像识别模型中的卷积神经网络层和全连接层,对所述第一特征图进行时序关系提取,得到融合所述目标图像包括的上下位信息的第二特征图,所述第二特征图中包括多个第二图像特征;
    基于所述多个第一图像特征和所述多个第二图像特征,对所述目标图像并行进行字符识别,得到字符序列。
  14. 根据权利要求13所述的图像处理设备,其特征在于,所述至少一条程序代码由所述处理器加载并执行,以实现如下步骤:
    基于所述多个第一图像特征和所述多个第二图像特征,确定多组图像特征,每组图像特征中包括相同特征位置的第一图像特征和第二图像特征;
    对所述多组图像特征并行进行字符识别;
    基于已识别出的多个字符,生成字符序列。
  15. 根据权利要求14所述的图像处理设备,其特征在于,所述至少一条程序代码由所述处理器加载并执行,以实现如下步骤:
    对所述多组图像特征中的第一图像特征和第二图像特征进行矩阵运算,得到第三特征图,所述第三特征图中包括多个第三图像特征,所述第三图像特征为相同特征位置的第一图像特征和第二图像特征通过矩阵运算得到的;
    对所述多个第三图像特征并行进行解码,识别每个图像特征对应的字符。
  16. 根据权利要求15所述的图像处理设备,其特征在于,所述第一图像特征的尺寸为B×C1×H×W;所述第二图像特征的尺寸为B×T×H×W;所述第三图像特征的尺寸为B×C1×T;
    其中,所述B为所述图像识别模型的批量参数、C1为所述图像识别模型的特征通道数;H为特征图高度,W为特征图宽度,T为预测序列长度。
  17. 根据权利要求13所述的图像处理设备,其特征在于,所述至少一条程序代码由所述处理器加载并执行,以实现如下步骤:
    通过所述卷积神经网络层对所述第一特征图的通道数进行变换处理,得到融合所述目标图像包括的上下位信息的第四特征图;
    通过所述全连接层,将所述第四特征图中的通道映射到预设序列长度上,得到所述第二特征图。
  18. 根据权利要求13-17任一项所述的图像处理设备,其特征在于,所述至少一条程序代码由所述处理器加载并执行,以实现如下步骤:
    获取多个样本图像,每个样本图像标注所述样本图像中的字符序列;
    基于所述多个样本图像和每个样本图像标注的字符序列,通过卷积神经网络,训练所述图像识别模型。
  19. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有至少一条程序代码,所述至少一条程序代码由处理器加载并执行,以实现如权利要求1-6任一项所述的序列识别方法。
PCT/CN2021/109764 2020-07-30 2021-07-30 序列识别方法、装置、图像处理设备和存储介质 WO2022022704A1 (zh)

Priority Applications (2)

Application Number Priority Date Filing Date Title
US18/017,660 US20230274566A1 (en) 2020-07-30 2021-07-30 Sequence recognition method and apparatus, image processing device, and storage medium
EP21851002.2A EP4191471A4 (en) 2020-07-30 2021-07-30 SEQUENCE RECOGNITION METHOD AND DEVICE, IMAGE PROCESSING DEVICE AND STORAGE MEDIUM

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010751330.X 2020-07-30
CN202010751330.XA CN111860682A (zh) 2020-07-30 2020-07-30 序列识别方法、装置、图像处理设备和存储介质

Publications (1)

Publication Number Publication Date
WO2022022704A1 true WO2022022704A1 (zh) 2022-02-03

Family

ID=72946111

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/109764 WO2022022704A1 (zh) 2020-07-30 2021-07-30 序列识别方法、装置、图像处理设备和存储介质

Country Status (4)

Country Link
US (1) US20230274566A1 (zh)
EP (1) EP4191471A4 (zh)
CN (1) CN111860682A (zh)
WO (1) WO2022022704A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114863439A (zh) * 2022-05-19 2022-08-05 北京百度网讯科技有限公司 信息提取方法、装置、电子设备和介质

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111860682A (zh) * 2020-07-30 2020-10-30 上海高德威智能交通系统有限公司 序列识别方法、装置、图像处理设备和存储介质
CN113033543B (zh) * 2021-04-27 2024-04-05 中国平安人寿保险股份有限公司 曲形文本识别方法、装置、设备及介质
CN114207673A (zh) * 2021-12-20 2022-03-18 商汤国际私人有限公司 序列识别方法及装置、电子设备和存储介质
CN116883913B (zh) * 2023-09-05 2023-11-21 长江信达软件技术(武汉)有限责任公司 一种基于视频流相邻帧的船只识别方法及系统

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108388896A (zh) * 2018-02-09 2018-08-10 杭州雄迈集成电路技术有限公司 一种基于动态时序卷积神经网络的车牌识别方法
CN110084172A (zh) * 2019-04-23 2019-08-02 北京字节跳动网络技术有限公司 文字识别方法、装置和电子设备
CN111126410A (zh) * 2019-12-31 2020-05-08 讯飞智元信息科技有限公司 字符识别方法、装置、设备及可读存储介质
CN111191663A (zh) * 2019-12-31 2020-05-22 深圳云天励飞技术有限公司 车牌号码识别方法、装置、电子设备及存储介质
CN111860682A (zh) * 2020-07-30 2020-10-30 上海高德威智能交通系统有限公司 序列识别方法、装置、图像处理设备和存储介质

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2016197381A1 (en) * 2015-06-12 2016-12-15 Sensetime Group Limited Methods and apparatus for recognizing text in an image
CN106407976B (zh) * 2016-08-30 2019-11-05 百度在线网络技术(北京)有限公司 图像字符识别模型生成和竖列字符图像识别方法和装置
CN107798327A (zh) * 2017-10-31 2018-03-13 北京小米移动软件有限公司 字符识别方法及装置
CN109102037B (zh) * 2018-06-04 2024-03-05 平安科技(深圳)有限公司 中文模型训练、中文图像识别方法、装置、设备及介质
CN110782395B (zh) * 2019-10-28 2024-02-09 西安电子科技大学 图像处理方法及装置、电子设备和计算机可读存储介质
CN111275046B (zh) * 2020-01-10 2024-04-16 鼎富智能科技有限公司 一种字符图像识别方法、装置、电子设备及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108388896A (zh) * 2018-02-09 2018-08-10 杭州雄迈集成电路技术有限公司 一种基于动态时序卷积神经网络的车牌识别方法
CN110084172A (zh) * 2019-04-23 2019-08-02 北京字节跳动网络技术有限公司 文字识别方法、装置和电子设备
CN111126410A (zh) * 2019-12-31 2020-05-08 讯飞智元信息科技有限公司 字符识别方法、装置、设备及可读存储介质
CN111191663A (zh) * 2019-12-31 2020-05-22 深圳云天励飞技术有限公司 车牌号码识别方法、装置、电子设备及存储介质
CN111860682A (zh) * 2020-07-30 2020-10-30 上海高德威智能交通系统有限公司 序列识别方法、装置、图像处理设备和存储介质

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
See also references of EP4191471A4

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114863439A (zh) * 2022-05-19 2022-08-05 北京百度网讯科技有限公司 信息提取方法、装置、电子设备和介质

Also Published As

Publication number Publication date
EP4191471A1 (en) 2023-06-07
US20230274566A1 (en) 2023-08-31
EP4191471A4 (en) 2024-01-17
CN111860682A (zh) 2020-10-30

Similar Documents

Publication Publication Date Title
WO2022022704A1 (zh) 序列识别方法、装置、图像处理设备和存储介质
JP7265003B2 (ja) ターゲット検出方法、モデル訓練方法、装置、機器及びコンピュータプログラム
CN110163059B (zh) 多人姿态识别方法、装置及电子设备
JP2022515620A (ja) 人工知能による画像領域の認識方法、モデルのトレーニング方法、画像処理機器、端末機器、サーバー、コンピュータ機器及びコンピュータプログラム
CN106897372B (zh) 语音查询方法和装置
CN112200041B (zh) 视频动作识别方法、装置、存储介质与电子设备
KR20210080291A (ko) 차량 번호판 인식 방법, 차량 번호판 인식 모델의 트레이닝 방법 및 장치
CN110765294B (zh) 图像搜索方法、装置、终端设备及存储介质
CN111401238B (zh) 一种视频中人物特写片段的检测方法及装置
CN112101329A (zh) 一种基于视频的文本识别方法、模型训练的方法及装置
CN111950570B (zh) 目标图像提取方法、神经网络训练方法及装置
CN111881849A (zh) 图像场景检测方法、装置、电子设备及存储介质
WO2023173646A1 (zh) 一种表情识别方法及装置
US20220207913A1 (en) Method and device for training multi-task recognition model and computer-readable storage medium
CN114419509A (zh) 一种多模态情感分析方法、装置及电子设备
KR102468309B1 (ko) 영상 기반 건물 검색 방법 및 장치
CN111950700A (zh) 一种神经网络的优化方法及相关设备
CN110059212A (zh) 图像检索方法、装置、设备及计算机可读存储介质
CN114330565A (zh) 一种人脸识别方法及装置
CN110610131B (zh) 人脸运动单元的检测方法、装置、电子设备及存储介质
CN113946719A (zh) 词补全方法和装置
CN115187456A (zh) 基于图像强化处理的文本识别方法、装置、设备及介质
CN109635706B (zh) 基于神经网络的手势识别方法、设备、存储介质及装置
CN114168768A (zh) 图像检索方法及相关设备
CN115222047A (zh) 一种模型训练方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21851002

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

ENP Entry into the national phase

Ref document number: 2021851002

Country of ref document: EP

Effective date: 20230228