WO2024077785A1 - Image recognition method and apparatus based on convolutional neural network model, and terminal device - Google Patents
Image recognition method and apparatus based on convolutional neural network model, and terminal device Download PDFInfo
- Publication number
- WO2024077785A1 WO2024077785A1 PCT/CN2022/142412 CN2022142412W WO2024077785A1 WO 2024077785 A1 WO2024077785 A1 WO 2024077785A1 CN 2022142412 W CN2022142412 W CN 2022142412W WO 2024077785 A1 WO2024077785 A1 WO 2024077785A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- feature map
- neural network
- convolutional neural
- image
- network model
- Prior art date
Links
- 238000013527 convolutional neural network Methods 0.000 title claims abstract description 148
- 238000000034 method Methods 0.000 title claims abstract description 74
- 238000000605 extraction Methods 0.000 claims abstract description 32
- 239000011159 matrix material Substances 0.000 claims description 37
- 238000012549 training Methods 0.000 claims description 35
- 238000004590 computer program Methods 0.000 claims description 28
- 238000012545 processing Methods 0.000 claims description 25
- 238000004364 calculation method Methods 0.000 abstract description 14
- 238000001514 detection method Methods 0.000 description 21
- 230000008569 process Effects 0.000 description 15
- 230000006870 function Effects 0.000 description 10
- 238000010586 diagram Methods 0.000 description 9
- 239000000284 extract Substances 0.000 description 8
- 230000008878 coupling Effects 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 101001121408 Homo sapiens L-amino-acid oxidase Proteins 0.000 description 2
- 101000827703 Homo sapiens Polyphosphoinositide phosphatase Proteins 0.000 description 2
- 102100026388 L-amino-acid oxidase Human genes 0.000 description 2
- 102100023591 Polyphosphoinositide phosphatase Human genes 0.000 description 2
- 101100233916 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) KAR5 gene Proteins 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000001815 facial effect Effects 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 230000009466 transformation Effects 0.000 description 2
- 101100012902 Saccharomyces cerevisiae (strain ATCC 204508 / S288c) FIG2 gene Proteins 0.000 description 1
- 230000004913 activation Effects 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 238000010606 normalization Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 238000011160 research Methods 0.000 description 1
- 239000007787 solid Substances 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/0464—Convolutional networks [CNN, ConvNet]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/764—Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V10/00—Arrangements for image or video recognition or understanding
- G06V10/70—Arrangements for image or video recognition or understanding using pattern recognition or machine learning
- G06V10/77—Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06V—IMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
- G06V40/00—Recognition of biometric, human-related or animal-related patterns in image or video data
- G06V40/10—Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
- G06V40/16—Human faces, e.g. facial parts, sketches or expressions
Definitions
- the present application belongs to the field of image recognition technology, and in particular, relates to an image recognition method, apparatus, terminal device, and computer-readable storage medium based on a convolutional neural network model.
- image features include global features and local features.
- Global features refer to the overall properties of the image
- local features refer to features extracted from local areas of the image.
- convolutional neural networks are widely used to extract global features of images because convolution operations have good hardware support.
- convolutional neural networks cannot capture global information at one time, and multiple convolutional layers need to be superimposed to increase the receptive field, which increases the number of model parameters and the amount of calculation.
- the embodiments of the present application provide an image recognition method, apparatus, and terminal device based on a convolutional neural network model, which can reduce the amount of computation required when the convolutional neural network model performs global feature extraction, thereby improving model efficiency.
- an embodiment of the present application provides an image recognition method based on a convolutional neural network model, wherein the convolutional neural network model performs frequency domain global convolution on an image to be recognized based on a fast Fourier transform, and the image recognition method comprises:
- the image to be identified is input into the trained convolutional neural network model, and the convolutional neural network model is used to extract features and identify the image to be identified in turn to obtain a recognition result.
- an embodiment of the present application provides a convolutional neural network model training method, comprising:
- the above convolutional neural network performs frequency domain global convolution on the sample image based on fast Fourier transform.
- an image recognition device including:
- An input module and a trained convolutional neural network model wherein the convolutional neural network model performs frequency domain global convolution on the image to be identified based on fast Fourier transform;
- the above-mentioned input module is used to input the image to be recognized into the above-mentioned convolutional neural network model
- the above-mentioned convolutional neural network model is used to extract features and recognize the above-mentioned images to be recognized in sequence to obtain recognition results.
- an embodiment of the present application provides a terminal device, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the computer program, the steps of the image recognition method based on the convolutional neural network model described in the first aspect or the convolutional neural network model training method described in the second aspect are implemented.
- an embodiment of the present application provides a computer-readable storage medium, which stores a computer program.
- the computer program When executed by a processor, it implements the steps of the image recognition method based on the convolutional neural network model described in the first aspect or the convolutional neural network model training method described in the second aspect.
- an embodiment of the present application provides a computer program product.
- the terminal device executes the image recognition method based on the convolutional neural network model described in any one of the first aspect or the convolutional neural network model training method described in the second aspect.
- the image to be identified is input into the trained convolutional neural network model, and the convolutional neural network model sequentially extracts features and identifies the image to be identified to obtain the identification result. Since the global convolution in the frequency domain is performed on the image to be identified based on the fast Fourier transform, the convolution operation in the spatial domain is converted into a multiplication operation in the frequency domain, thus reducing the amount of calculation when the convolutional neural network extracts global features, improving the recognition efficiency of the convolutional neural network model, and facilitating the deployment of applications on devices with lower computing power.
- FIG1 is a schematic diagram of a flow chart of an image recognition method based on a convolutional neural network model provided by an embodiment of the present application;
- FIG2 is a schematic diagram of the structure of a convolutional neural network model provided in an embodiment of the present application.
- FIG3 is a schematic diagram of the structure of a second convolution module provided in an embodiment of the present application.
- FIG4 is a flow chart of a convolutional neural network model training method provided in an embodiment of the present application.
- FIG5 is a schematic diagram of the structure of an image recognition device provided in an embodiment of the present application.
- FIG6 is a schematic diagram of the structure of a convolutional neural network model training device provided in an embodiment of the present application.
- FIG. 7 is a schematic diagram of the structure of a terminal device provided in an embodiment of the present application.
- references to "one embodiment” or “some embodiments” etc. described in the specification of the present application mean that the specific features, structures or characteristics described in conjunction with the embodiment are included in one or more embodiments of the present application. Therefore, the phrases “in one embodiment”, “in some embodiments”, “in some other embodiments”, “in some other embodiments”, etc. appearing in different places in the specification do not necessarily refer to the same embodiment, but mean “one or more but not all embodiments", unless otherwise specifically emphasized in other ways.
- Embodiment 1 is a diagrammatic representation of Embodiment 1:
- FIG1 shows a schematic flow chart of an image recognition method based on a convolutional neural network model provided by an embodiment of the present invention, which is described in detail as follows:
- the image to be identified is input into the trained convolutional neural network model, and the convolutional neural network model is used to extract features and identify the image to be identified in turn to obtain a recognition result.
- the above convolutional neural network model performs frequency domain global convolution processing on the image to be recognized based on fast Fourier transform.
- a convolutional neural network when extracting global features of an image, a convolutional neural network usually needs to stack multiple convolutional layers to increase the receptive field, so as to extract global features through a large receptive field.
- the number of parameters and the amount of calculation of the convolutional neural network model will also increase accordingly, making the computational complexity of the convolutional neural network model too large.
- the size of the image to be identified increases, when the size of the image to be identified is large (such as 112*112), the computational complexity of the convolutional neural network model quickly exceeds the 7*7 convolution, which is not convenient for practical application.
- the convolutional neural network model extracts features from the input image to be identified
- the image to be identified is processed by fast Fourier transform, and the image to be identified is converted into the image to be identified in the frequency domain, so that the convolution operation in the spatial domain is converted into the multiplication operation in the frequency domain, thereby reducing the amount of calculation in the global feature extraction process.
- the image to be identified is input into the trained convolutional neural network model, and the convolutional neural network model is used to extract features and identify the image to be identified in turn to obtain an identification result. Since the global convolution in the frequency domain is performed on the image to be identified based on the fast Fourier transform, the convolution operation in the spatial domain is converted into a multiplication operation in the frequency domain. Therefore, when extracting global features of the image to be identified, the amount of calculation in the process of extracting global features of large-size images by the convolutional neural network model can be effectively reduced, thereby improving the recognition efficiency of the convolutional neural network model and facilitating deployment and application on devices with lower computing power.
- the above-mentioned image recognition method based on the convolutional neural network model further includes:
- the image to be identified may be an image captured by a camera device, or may be an image frame in a video stream captured by a camera device.
- the adopted camera equipment, the rules for collecting the images to be detected, etc. may also be different, therefore, the corresponding images to be recognized are acquired according to the corresponding acquisition methods and acquisition rules of each application field, etc.
- a face recognition task it is necessary to collect a face image as the image to be recognized and recognize the facial features in the face image.
- the convolutional neural network model includes a feature extraction module and a recognition module.
- the steps of extracting features and recognizing the image to be recognized in sequence through the convolutional neural network model to obtain a recognition result include:
- the extracted features are recognized to obtain recognition results.
- the feature extraction module is used to extract features of the input image to be recognized, and the extracted features are used as inputs of the recognition module. According to the image recognition task, corresponding recognition is performed to obtain a recognition result.
- the recognition module may include one or more recognition units, and different recognition units perform different recognition tasks. For example, it may include a pedestrian detection unit and a target detection unit. The pedestrian detection unit performs a pedestrian detection task on the extracted features, or the extracted feature maps are input into the pedestrian detection unit and the target detection unit to perform pedestrian detection and target detection tasks.
- features are extracted from the image to be identified through a feature extraction module, and the extracted features are obtained through a recognition module for corresponding identification to obtain corresponding identification results, so as to improve the recognition efficiency of each image recognition task.
- the feature extraction module includes a first convolution module and a second convolution module
- step A1 includes:
- the above-mentioned convolutional neural network model can be constructed based on an existing convolutional neural network, with the shallow convolution layer as the first convolution module, and the deep convolution layer or self-attention replaced by the second convolution module, the above-mentioned first convolution module uses ordinary convolution to perform convolution processing on the image to be identified, extracts the local features of the image to be identified, outputs the local feature map of the image to be identified, and uses the local feature map as the input of the second convolution module, the above-mentioned second module obtains the local feature map in the frequency domain by fast Fourier transform processing on the local feature map, and extracts the global features based on the local feature map in the frequency domain to obtain the global feature map.
- the above-mentioned first convolution module uses ordinary convolution to perform convolution processing on the image to be identified, extracts the local features of the image to be identified, outputs the local feature map of the image to be identified, and uses the local feature map as the input of the second convolution module
- the first three convolution layers are the first convolution modules of ordinary convolution
- the three deeper convolution layers are the second convolution modules
- the second convolution module is connected to the recognition module
- the image to be identified is used as the input of the first convolution module for local feature extraction
- the output features are input to the second convolution module
- the second convolution module extracts global features
- the global feature map output by the above-mentioned second convolution module is used as the input of the recognition module for recognition, thereby outputting the corresponding recognition result.
- first convolution module and the second convolution module in the above-mentioned convolutional neural network model can also adopt a cross-appearing structure, that is, the first convolution module is connected to the second convolution module, and the output of the second convolution module is connected to another first convolution module.
- the structure shown in Figure 2 can also be stacked multiple times, so that the feature map output by the feature extraction module is the global feature map extracted by the second convolution module (that is, the recognition module is based on the global features extracted by the second convolution module for recognition), and the specific structure of the first convolution module (ordinary convolution layer) in the convolutional neural network model and the second convolution module provided in the embodiment of the present application is not limited.
- the local features of the image to be identified are extracted by the first convolution module of the convolutional neural network model, and the obtained local feature map is used as the input of the second convolution module, the local feature map is subjected to fast Fourier transform processing, and then the global features are extracted based on the obtained local feature map in the frequency domain. Therefore, the amount of calculation in the global feature extraction process is reduced, and the obtained global features contain both local features and global features, thereby improving the recognition accuracy of the convolutional neural network model.
- the second convolution module includes a first branch and a second branch
- the step A12 includes:
- the above-mentioned local feature map is split along the channel direction to obtain a first local feature map and a second local feature map, and the above-mentioned first local feature map and the above-mentioned second local feature map are respectively input into the above-mentioned first branch and the above-mentioned second branch.
- the local feature map is first split along the channel direction (such as evenly split into two parts along the channel direction) to obtain a first local feature map and a second local feature map, and the first local feature map is input into the first branch, and the second local feature map is input into the second branch, so as to extract global features from the first local feature map and the second local feature map, respectively.
- the first branch and the second branch respectively use fast Fourier transform to perform frequency domain global convolution on the input local feature map to obtain a first global feature map and a second global feature map.
- the first branch performs fast Fourier transform processing on the first local feature map to obtain a first local feature map in the frequency domain, and then performs global convolution processing on the first local feature map in the frequency domain to obtain a first global feature map;
- the second branch performs fast Fourier transform processing on the second local feature map to obtain a second local feature map in the frequency domain, and then performs global convolution processing on the second local feature map in the frequency domain to obtain a second global feature map.
- the first global feature map and the second global feature map are concatenated along the channel direction to obtain a global feature map.
- the first local feature map and the second local feature map are obtained after the local feature map is split along the channel direction
- the first branch and the second branch respectively extract the features of the first local feature map and the second local feature map
- the first global feature map and the second global feature map are spliced along the channel direction to obtain a complete global feature map, so as to perform subsequent processing according to the complete global feature map of the input image to be identified.
- the local feature map is split into two parts along the channel direction and input into the first branch and the second branch, the number of channels of the local feature map is halved, and in the process of global feature extraction, the local feature map is globally convolved in the frequency domain based on the fast Fourier transform. Therefore, the global features are extracted based on the local feature map with the halved channel number based on the fast Fourier transform, which reduces the computational complexity of the global feature extraction, thereby reducing the requirements on the computing power of the device, and facilitating the deployment of applications on devices with lower computing power.
- the method when the first branch and the second branch perform frequency domain global convolution according to the input local feature map, the method includes:
- the first branch performs fast Fourier transform processing on the first local feature map and the corresponding weight matrix based on the column dimension to obtain the first local feature map and the weight matrix in the frequency domain;
- the first local feature map and its weight matrix are fast Fourier transformed along the row dimension to convert them into a first local feature map and a weight matrix in the frequency domain, so that when the weight matrix is used to perform global convolution on the first local feature map, the first local feature map in the frequency domain and the weight matrix are multiplied point by point, that is, the convolution operation in the spatial domain is converted into a multiplication operation in the frequency domain, thereby obtaining a first frequency domain feature map (global feature), and then the first frequency domain feature map is inversely fast Fourier transformed to obtain a first global feature map in the spatial domain, so that the amount of calculation is effectively reduced when a large convolution kernel is used to extract global features.
- the second branch performs fast Fourier transform processing on the second local feature map and the corresponding weight matrix based on the row dimension to obtain the second local feature map and the weight matrix in the frequency domain;
- the second feature map in the frequency domain is processed by inverse fast Fourier transform to obtain a second global feature map.
- the second local feature map and its weight matrix are fast Fourier transformed along the row dimension to convert them into a second local feature map and a weight matrix in the frequency domain, so that when the weight matrix is used to perform global convolution on the second local feature map, the second local feature map and the weight matrix in the frequency domain are multiplied point by point, that is, the convolution operation in the spatial domain is converted into a multiplication operation in the frequency domain, thereby obtaining a second frequency domain feature map (global feature), and then the second frequency domain feature map is inversely fast Fourier transformed to obtain a second global feature map in the spatial domain, so that the amount of calculation is effectively reduced when a large convolution kernel is used to extract global features.
- the computational complexity of the second convolution module in the process of extracting global features can be expressed as:
- a one-dimensional fast Fourier transform is performed on the above-mentioned first local feature map and its weight matrix to obtain the first local feature map and its weight matrix in a one-dimensional form (such as an array).
- the first local feature map is represented as (C, H, W)
- a one-dimensional fast Fourier transform is performed on the above-mentioned first local feature map along the column dimension to obtain a first local feature map in a numerical form with H elements.
- the convolution operation in the spatial domain is converted into a simple multiplication operation, thereby reducing the computational complexity of the convolutional neural network model.
- the local feature map is processed by fast Fourier transform from different dimensions and the global features are extracted, global features of different dimensions are obtained, and then the obtained global features are spliced to obtain global features including width and height directions, which reduces the computational complexity when extracting the global features of the local feature map, and facilitates the deployment and application of the convolutional neural network model on devices with lower computing power.
- the second convolution module further includes a position embedding module, and the above steps further include:
- the position embedding module is used to perform feature extraction on the local feature map to obtain a position feature map, and the position feature map is added to the local feature map to obtain a local feature map containing position features.
- convolution processing is performed on the local feature map through a position embedding module to extract the position features in the local feature map, generate a position feature map, and add the position feature map to the local feature map according to the pixel position to obtain a local feature map embedded with the position features.
- the position embedding module is a two-dimensional convolution module, which performs convolution processing on the input image to be identified to generate a two-dimensional position feature map, that is, the size of the position feature map is made consistent with the resolution size of the input local feature map, so as to directly add the position feature map to the local feature map to obtain a local feature map containing position features.
- the position embedding module adopts a two-layer lightweight convolution network structure, that is, a simple structure of "convolution + normalization processing + activation function + convolution". Since it is necessary to generate a two-dimensional position feature map, the convolution layer can use a 3 ⁇ 3 depth-separable convolution to perform convolution processing on the local feature map to generate the position feature map.
- the position features of an image can enhance the ability to describe and distinguish the image content
- the position features of the local feature map are extracted based on the weight matrix of the local feature map, and the position features are embedded into the local feature map so that the local feature map contains the position features. Therefore, the recognition accuracy can be improved when image recognition is subsequently performed based on the global feature map containing the position features.
- the network structure of the second convolution module is shown in FIG3, and may include a position embedding module, a first branch, and a second branch. After the position feature of the local feature map is embedded in the local feature map by the position embedding module, the local feature map embedded with the position feature is split to obtain the first local feature map and the second local feature map and input them into the first branch and the second branch respectively.
- the first branch and the second branch perform a one-dimensional fast Fourier transform on the input local feature map and its corresponding weight matrix based on the row dimension and the column dimension respectively to obtain a one-dimensional local feature map and weight matrix in the frequency domain, and then multiply the local feature map in the frequency domain with its corresponding weight matrix point by point to obtain a first frequency domain feature map and a second frequency domain feature map, and then convert it to the spatial domain by inverse fast Fourier transform to obtain a first global feature map and a second global feature map, and finally splice the first global feature map and the second global feature map to obtain a global feature map of the image to be identified.
- the solid arrow indicates that its data flow (feature data) is in real number form
- the dotted arrow indicates that its data flow is in imaginary number format, that is, feature data in the frequency domain.
- FIG4 shows a flow chart of a convolutional neural network model training method provided in an embodiment of the present application, which is described in detail as follows:
- the constructed convolutional neural network model is obtained, and the sample image is input into the convolutional neural network for training until the convolutional neural network meets the preset requirements to obtain the convolutional neural network model.
- the above convolutional neural network performs global convolution on the sample image in the frequency domain based on fast Fourier transform.
- a convolutional neural network is pre-built according to user needs, that is, the network structure of the convolutional neural network is set according to the needs of the user's image recognition task (such as being built based on the existing ResNet and VGGNet networks) to achieve the corresponding image recognition task.
- the user needs to train a convolutional neural network model for target detection.
- a convolutional neural network can be built based on the SSD (Single Shot MultiBox Detector) network structure to detect targets of different scales by detecting at different feature scales.
- a pre-constructed convolutional neural network is obtained, and the corresponding sample image is used as the input of the convolutional neural network for training until the convolutional neural network meets the preset requirements (such as the recognition accuracy of the convolutional neural network reaches a preset threshold, such as 0.99), then the training of the convolutional neural network is stopped to obtain a trained convolutional neural network model.
- the preset requirements such as the recognition accuracy of the convolutional neural network reaches a preset threshold, such as 0.99
- the sample image and the weight matrix are processed by fast Fourier transform to convert them into sample images and weight matrices in the frequency domain, so as to extract global features of the sample image using the weight matrix from the frequency domain, that is, the input image in the spatial domain is converted into the form of the frequency domain based on the fast Fourier transform, and then the converted sample image in the frequency domain is multiplied, so as to realize the rapid extraction of global features, so that in the process of extracting global features, and when the resolution of the image is large, the amount of calculation of extracting the global features of the image can be effectively reduced.
- the sample images are labeled sample images corresponding to the image recognition tasks performed by the user, so that the sample images can be directly used for training without labeling the sample images.
- part of the sample images can be used as training sets, and part of the sample images can be used as validation sets and test sets to adjust the convolutional neural network and obtain a good convolutional neural network model.
- the Market1501 dataset can be used as a training set for training.
- Market1501 contains 32,217 images of 1,501 pedestrians taken by 6 cameras. Each pedestrian is captured by at least 2 cameras, and there may be multiple images in one camera, which are divided into training sets and test sets.
- the sample image input for a single training can be one or more, such as 100.
- the number of input sample images is represented as the batch size Batch Size.
- the characteristic shape of the sample image extracted by the convolutional neural network can be represented in a four-dimensional format (B, H, W, C), where B represents the batch size, H represents the height, W represents the width, and C represents the channel.
- a convolutional neural network is pre-constructed according to the needs of the user, and the labeled sample image is used as the input of the constructed convolutional neural network for training until the convolutional neural network meets the preset requirements, and a trained convolutional neural network model is obtained. Since the input image is converted from the spatial domain to the frequency domain based on the fast Fourier transform, the convolution operation in the spatial domain is converted into the multiplication operation in the frequency domain, thus effectively reducing the amount of calculation in the process of global feature extraction, reducing the computational complexity of the convolutional neural network model for large-scale input images, facilitating deployment and operation on devices with lower computing power, and also improving the computing speed of the convolutional neural network model.
- the image recognition method based on the convolutional neural network model is introduced below based on some application scenarios.
- High-resolution remote sensing images have the characteristics of containing information buildings and complex natural scenes.
- a remote sensing image often contains a large number of buildings, sites, vegetation, farmland and other types of ground objects and geomorphic elements.
- Target detection of remote sensing images has always been a hot research topic.
- Most of the existing remote sensing image target detection models have deep structures and complex connection channels, and remote sensing image data is more and larger than natural images.
- the global feature extraction of remote sensing images by ordinary convolution using large convolution kernels is too computationally complex and has low detection efficiency, which also limits the deployment and use of the model in many scenarios with limited computing resources.
- the image recognition method based on the convolutional neural network model provided in this application is precisely aimed at the problem of large amount of computation of global features of large-size images extracted by convolution.
- the convolution operation in the spatial domain into multiplication operation in the frequency domain, the global features of the image to be detected are extracted, which effectively reduces the computation of global feature extraction, thereby improving the detection efficiency of the convolutional neural network model.
- the collected image to be detected is input into the convolutional neural network model (target detection model), and the local features of the image to be detected are extracted through the first convolution module to obtain the feature information such as edges, corners, lines, etc. of the image to be detected, and a local feature map is obtained, which is then used as the input of the second convolution module.
- the position features in the image to be detected are embedded into the local feature map through the position embedding module so that it contains more position information.
- the local feature map is then split along the channel direction to obtain the first local feature map and the second local feature map, which are input into the first branch and the second branch.
- the first local feature map and the second local feature map are processed by one-dimensional fast Fourier transform along the column dimension and the row dimension to obtain the first local feature map and the second local feature map in the frequency domain.
- the weight matrices corresponding to the first local feature map and the second local feature map are processed by one-dimensional fast Fourier transform, so that the weight matrix in the frequency domain is multiplied point by point with the corresponding first local feature map and the second local feature map in the frequency domain to obtain the first frequency domain feature map and the second frequency domain feature map, which are inversely fast Fourier transformed and spliced to obtain a complete global feature map.
- the global feature map is detected by the detection (recognition) module to output the corresponding detection result.
- the local feature map is split and calculated, which reduces the amount of calculation, and a one-dimensional fast Fourier transform is performed separately, converting the convolution operation in the spatial domain into a multiplication operation determined by the frequency domain, which greatly reduces the amount of calculation for global feature extraction and effectively improves the target detection efficiency for remote sensing images.
- Face recognition technology is currently widely used in smart access control, security monitoring and other fields. Since face recognition needs to extract facial features from images for recognition, images with high clarity, that is, high resolution, are usually collected for face recognition. For example, face recognition used in smart access control requires more accurate recognition to improve security. At this time, the camera resolution is required to be higher, but the computing resources of the smart access control system are limited. When extracting global features from high-resolution images for recognition, its efficiency is low, which is not conducive to practical application.
- the image recognition method based on the convolutional neural network model provided in this application can be deployed in the smart access control system, effectively reducing the amount of computational effort in extracting global features from high-resolution images and improving face recognition efficiency.
- Embodiment 2 is a diagrammatic representation of Embodiment 1:
- Figure 5 shows a structural block diagram of the image recognition device based on the convolutional neural network model provided in the embodiment of the present application. For the sake of convenience of explanation, only the parts related to the embodiment of the present application are shown.
- the device includes: an input module 51 and a convolutional neural network model 52.
- the convolutional neural network model performs frequency domain global convolution on the image to be identified based on fast Fourier transform.
- An input module 51 used to input the image to be recognized into the above-mentioned convolutional neural network model
- the convolutional neural network model 52 is used to extract features and recognize the above-mentioned images to be recognized in sequence to obtain recognition results.
- the image to be identified is input into the trained convolutional neural network model, and the convolutional neural network model sequentially extracts features and identifies the image to be identified to obtain the identification result. Since the global convolution in the frequency domain is performed on the image to be identified based on the fast Fourier transform, the convolution operation in the spatial domain is converted into a multiplication operation in the frequency domain. Therefore, when extracting the global features of the image to be identified, the amount of calculation in the process of extracting the global features of large-size images by the convolutional neural network model can be effectively reduced, thereby improving the recognition efficiency of the convolutional neural network model and facilitating the deployment and application on devices with lower computing power.
- the image recognition device further includes:
- the module for acquiring the image to be identified is used to acquire the image to be identified.
- the convolutional neural network model 52 includes:
- a feature extraction unit used to extract features from the above-mentioned image to be identified
- the recognition unit is used to recognize the extracted features and obtain recognition results.
- the feature extraction unit includes:
- a first convolution unit is used to extract local features of the image to be identified to obtain a local feature map
- the second convolution unit is used to perform frequency domain global convolution on the local feature map by using fast Fourier transform to obtain a global feature map.
- the second convolution unit includes:
- a splitting unit used for splitting the local feature map along the channel direction to obtain a first local feature map and a second local feature map
- a first branch unit is used to perform frequency domain global convolution on the first local feature map by using a fast Fourier transform to obtain a first global feature map;
- the second branch unit is used to perform frequency domain global convolution on the second local feature map by using fast Fourier transform to obtain a second global feature map;
- the splicing unit is used to splice the first global feature map and the second global feature map along the channel direction to obtain a global feature map.
- the first branch unit comprises:
- a first transformation unit is used to perform fast Fourier transform processing on the first local feature map and the corresponding weight matrix based on the column dimension to obtain the first local feature map and the weight matrix in the frequency domain;
- a first convolution unit is used to multiply the first local feature map in the frequency domain and the weight matrix point by point to obtain a first frequency domain feature map
- the first inverse transform unit is used to perform a fast Fourier inverse transform on the first feature map in the frequency domain to obtain a first global feature map.
- the second branch unit comprises:
- a second transformation unit is used to perform fast Fourier transform processing on the second local feature map and the corresponding weight matrix based on the row dimension to obtain the second local feature map and the weight matrix in the frequency domain;
- a second convolution unit is used to multiply the second local feature map in the frequency domain and the weight matrix point by point to obtain a second frequency domain feature map
- the second inverse transform unit is used to perform inverse fast Fourier transform processing on the second feature map in the frequency domain to obtain a second global feature map.
- the second convolution unit further includes:
- the position embedding unit is used to extract features from the local feature map through the position embedding module to obtain a position feature map, and add the position feature map to the local feature map to obtain a local feature map containing position features.
- FIG6 shows a structural block diagram of a convolutional neural network model training device provided in an embodiment of the present application.
- the device includes:
- the training module 61 is used to obtain the constructed convolutional neural network model and input the sample image into the convolutional neural network for training until the convolutional neural network meets the preset requirements to obtain the convolutional neural network model, wherein the convolutional neural network performs frequency domain global convolution on the sample image based on fast Fourier transform.
- FIG7 is a schematic diagram of the structure of a terminal device provided in an embodiment of the present application.
- the terminal device 7 of this embodiment includes: at least one processor 70 (only one processor is shown in FIG7 ), a memory 71, and a computer program 72 stored in the memory 71 and executable on the at least one processor 70, and when the processor 70 executes the computer program 72, the steps in any of the above-mentioned method embodiments are implemented.
- the computer program 72 can be divided into an input module 51 and a convolutional neural network model 52.
- the convolutional neural network model performs frequency domain global convolution on the image to be identified based on fast Fourier transform.
- the specific functions of each module are as follows:
- An input module 51 used to input the image to be recognized into the above-mentioned convolutional neural network model
- the convolutional neural network model 52 is used to extract features and recognize the above-mentioned images to be recognized in sequence to obtain recognition results.
- the computer program 72 may be divided into training modules 61, and the specific functions of the modules are as follows:
- the training module 61 is used to obtain the constructed convolutional neural network model and input the sample image into the convolutional neural network for training until the convolutional neural network meets the preset requirements to obtain the convolutional neural network model, wherein the convolutional neural network performs frequency domain global convolution on the sample image based on fast Fourier transform.
- the terminal device 7 may be a computing device such as a desktop computer, a notebook, a PDA, a cloud server, etc.
- the terminal device may include, but not limited to, a processor 70 and a memory 71.
- FIG. 7 is merely an example of the terminal device 7 and does not constitute a limitation on the terminal device 7.
- the terminal device 7 may include more or fewer components than shown in the figure, or may combine certain components, or different components, and may also include, for example, input and output devices, network access devices, etc.
- the processor 70 may be a central processing unit (CPU), or other general-purpose processors, digital signal processors (DSP), application-specific integrated circuits (ASIC), field-programmable gate arrays (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
- a general-purpose processor may be a microprocessor or any conventional processor, etc.
- the memory 71 may be an internal storage unit of the terminal device 7, such as a hard disk or memory of the terminal device 7. In other embodiments, the memory 71 may also be an external storage device of the terminal device 7, such as a plug-in hard disk, a smart media card (SMC), a secure digital (SD) card, a flash card (Flash Card), etc. equipped on the terminal device 7. Further, the memory 71 may also include both an internal storage unit and an external storage device of the terminal device 7. The memory 71 is used to store an operating system, an application program, a boot loader (BootLoader), data, and other programs, such as the program code of the computer program. The memory 71 may also be used to temporarily store data that has been output or is to be output.
- BootLoader boot loader
- the technicians in the relevant field can clearly understand that for the convenience and simplicity of description, only the division of the above-mentioned functional units and modules is used as an example for illustration.
- the above-mentioned function allocation can be completed by different functional units and modules as needed, that is, the internal structure of the device can be divided into different functional units or modules to complete all or part of the functions described above.
- the functional units and modules in the embodiment can be integrated in a processing unit, or each unit can exist physically separately, or two or more units can be integrated in one unit.
- the above-mentioned integrated unit can be implemented in the form of hardware or in the form of software functional units.
- An embodiment of the present application also provides a network device, which includes: at least one processor, a memory, and a computer program stored in the memory and executable on the at least one processor, wherein the processor implements the steps in any of the above-mentioned method embodiments when executing the computer program.
- An embodiment of the present application further provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the steps in the above-mentioned method embodiments can be implemented.
- An embodiment of the present application provides a computer program product.
- the terminal device can implement the steps in the above-mentioned method embodiments when executing the computer program product.
- the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium.
- the present application implements all or part of the processes in the above-mentioned embodiment method, which can be completed by instructing the relevant hardware through a computer program.
- the computer program can be stored in a computer-readable storage medium.
- the computer program is executed by the processor, the steps of the above-mentioned various method embodiments can be implemented.
- the computer program includes computer program code, which can be in source code form, object code form, executable file or some intermediate form.
- the computer-readable medium may at least include: any entity or device that can carry the computer program code to the camera/terminal device, recording medium, computer memory, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), electric carrier signal, telecommunication signal and software distribution medium.
- ROM read-only memory
- RAM random access memory
- electric carrier signal telecommunication signal and software distribution medium.
- USB flash drive mobile hard disk, disk or optical disk.
- computer-readable media cannot be electric carrier signals and telecommunication signals.
- the disclosed devices/network equipment and methods can be implemented in other ways.
- the device/network equipment embodiments described above are merely schematic.
- the division of the modules or units is only a logical function division. There may be other division methods in actual implementation, such as multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed.
- Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, indirect coupling or communication connection of devices or units, which can be electrical, mechanical or other forms.
- the units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Computing Systems (AREA)
- Software Systems (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Multimedia (AREA)
- Databases & Information Systems (AREA)
- Molecular Biology (AREA)
- Mathematical Physics (AREA)
- Medical Informatics (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Oral & Maxillofacial Surgery (AREA)
- Human Computer Interaction (AREA)
- Image Analysis (AREA)
Abstract
The present application is applicable to the technical field of image recognition. Provided are an image recognition method and apparatus based on a convolutional neural network model, and a terminal device. The convolutional neural network model performs, on the basis of fast Fourier transform, frequency-domain global convolution on an image to be subjected to recognition. The image recognition method comprises: inputting an image to be subjected to recognition into a trained convolutional neural network model, and sequentially performing feature extraction and recognition on said image by means of the convolutional neural network model, so as to obtain a recognition result. The present application can reduce the calculation amount of the convolutional neural network model during global feature extraction, thereby improving the model efficiency.
Description
本申请要求于2022年10月13日提交中国专利局,申请号为202211255315.1、发明名称为“基于卷积神经网络模型的图像识别方法、装置及终端设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。This application claims priority to the Chinese patent application filed with the China Patent Office on October 13, 2022, with application number 202211255315.1 and invention name “Image recognition method, device and terminal device based on convolutional neural network model”, the entire contents of which are incorporated by reference in this application.
本申请属于图像识别技术领域,尤其涉及基于卷积神经网络模型的图像识别方法、装置、终端设备以及计算机可读存储介质。The present application belongs to the field of image recognition technology, and in particular, relates to an image recognition method, apparatus, terminal device, and computer-readable storage medium based on a convolutional neural network model.
特征提取和匹配是许多计算机视觉应用中的一个重要任务,广泛运用在图像检索、目标检测等图像识别任务中。其中,在对图像进行特征提取时,图像特征包括全局特征和局部特征,全局特征是指图像的整体属性,局部特征则是从图像局部区域抽取的特征。Feature extraction and matching is an important task in many computer vision applications and is widely used in image recognition tasks such as image retrieval and target detection. When extracting features from an image, image features include global features and local features. Global features refer to the overall properties of the image, while local features refer to features extracted from local areas of the image.
现有技术中,由于卷积运算具有良好的硬件支持,因此广泛采用卷积神经网络提取图像的全局特征,但基于卷积神经网络无法一次捕获全局信息,需要多个卷积层叠加增大感受野,使得模型的参数量和计算量随之增大。In the prior art, convolutional neural networks are widely used to extract global features of images because convolution operations have good hardware support. However, convolutional neural networks cannot capture global information at one time, and multiple convolutional layers need to be superimposed to increase the receptive field, which increases the number of model parameters and the amount of calculation.
发明内容Summary of the invention
本申请实施例提供了基于卷积神经网络模型的图像识别方法、装置及终端设备,可以降低卷积神经网络模型进行全局特征提取时的计算量,从而提高模型效率。The embodiments of the present application provide an image recognition method, apparatus, and terminal device based on a convolutional neural network model, which can reduce the amount of computation required when the convolutional neural network model performs global feature extraction, thereby improving model efficiency.
第一方面,本申请实施例提供了一种基于卷积神经网络模型的图像识别方法,上述卷积神经网络模型基于快速傅里叶变换对待识别图像进行频域全局卷积,上述图像识别方法包括:In a first aspect, an embodiment of the present application provides an image recognition method based on a convolutional neural network model, wherein the convolutional neural network model performs frequency domain global convolution on an image to be recognized based on a fast Fourier transform, and the image recognition method comprises:
将待识别图像输入到经训练的上述卷积神经网络模型,通过上述卷积神 经网络模型对上述待识别图像依次进行特征提取和识别,得到识别结果。The image to be identified is input into the trained convolutional neural network model, and the convolutional neural network model is used to extract features and identify the image to be identified in turn to obtain a recognition result.
第二方面,本申请实施例提供了一种卷积神经网络模型训练方法,包括:In a second aspect, an embodiment of the present application provides a convolutional neural network model training method, comprising:
获取构建的卷积神经网络模型,并将样本图像输入到上述卷积神经网络进行训练,直至上述卷积神经网络满足预设要求,得到卷积神经网络模型;Obtaining the constructed convolutional neural network model, and inputting the sample image into the convolutional neural network for training until the convolutional neural network meets the preset requirements, thereby obtaining a convolutional neural network model;
其中,上述卷积神经网络基于快速傅里叶变换对样本图像进行频域全局卷积。Among them, the above convolutional neural network performs frequency domain global convolution on the sample image based on fast Fourier transform.
第三方面,本申请实施例提供了一种图像识别装置,包括:In a third aspect, an embodiment of the present application provides an image recognition device, including:
输入模块和经训练的卷积神经网络模型,上述卷积神经网络模型基于快速傅里叶变换对待识别图像进行频域全局卷积;An input module and a trained convolutional neural network model, wherein the convolutional neural network model performs frequency domain global convolution on the image to be identified based on fast Fourier transform;
上述输入模块,用于将待识别图像输入到上述卷积神经网络模型;The above-mentioned input module is used to input the image to be recognized into the above-mentioned convolutional neural network model;
上述卷积神经网络模型,用于对上述待识别图像依次进行特征提取和识别,得到识别结果。The above-mentioned convolutional neural network model is used to extract features and recognize the above-mentioned images to be recognized in sequence to obtain recognition results.
第四方面,本申请实施例提供了一种终端设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现上述第一方面所述的基于卷积神经网络模型的图像识别方法或上述第二方面所述的卷积神经网络模型训练方法的步骤。In a fourth aspect, an embodiment of the present application provides a terminal device, comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the computer program, the steps of the image recognition method based on the convolutional neural network model described in the first aspect or the convolutional neural network model training method described in the second aspect are implemented.
第五方面,本申请实施例提供了一种计算机可读存储介质,所述计算机存储介质存储有计算机程序,所述计算机程序被处理器执行时实现上述第一方面中所述的基于卷积神经网络模型的图像识别方法或上述第二方面所述的卷积神经网络模型训练方法的步骤。In a fifth aspect, an embodiment of the present application provides a computer-readable storage medium, which stores a computer program. When the computer program is executed by a processor, it implements the steps of the image recognition method based on the convolutional neural network model described in the first aspect or the convolutional neural network model training method described in the second aspect.
第六方面,本申请实施例提供了一种计算机程序产品,当计算机程序产品在终端设备上运行时,使得终端设备执行上述第一方面中任一项所述的基于卷积神经网络模型的图像识别方法或上述第二方面所述的卷积神经网络模型训练方法。In the sixth aspect, an embodiment of the present application provides a computer program product. When the computer program product is run on a terminal device, the terminal device executes the image recognition method based on the convolutional neural network model described in any one of the first aspect or the convolutional neural network model training method described in the second aspect.
本申请实施例与现有技术相比存在的有益效果是:Compared with the prior art, the embodiments of the present invention have the following beneficial effects:
本申请实施例中,将待识别图像输入到经训练的上述卷积神经网络模型,通过上述卷积神经网络模型对上述待识别图像依次进行特征提取和识别,得到识别结果。由于基于快速傅里叶变换对待识别图像进行频域的全局卷积,将空间域的卷积运算转换为频域的乘法运算,因此,可以减少卷积神经网络提取全局特征时的计算量,提高卷积神经网络模型的识别效率,同时便于在 算力较低的设备上部署应用。In the embodiment of the present application, the image to be identified is input into the trained convolutional neural network model, and the convolutional neural network model sequentially extracts features and identifies the image to be identified to obtain the identification result. Since the global convolution in the frequency domain is performed on the image to be identified based on the fast Fourier transform, the convolution operation in the spatial domain is converted into a multiplication operation in the frequency domain, thus reducing the amount of calculation when the convolutional neural network extracts global features, improving the recognition efficiency of the convolutional neural network model, and facilitating the deployment of applications on devices with lower computing power.
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍。In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings required for use in the embodiments or the description of the prior art are briefly introduced below.
图1是本申请一实施例提供的一种基于卷积神经网络模型的图像识别方法的流程示意图;FIG1 is a schematic diagram of a flow chart of an image recognition method based on a convolutional neural network model provided by an embodiment of the present application;
图2是本申请实施例提供的卷积神经网络模型的结构示意图;FIG2 is a schematic diagram of the structure of a convolutional neural network model provided in an embodiment of the present application;
图3是本申请实施例提供的第二卷积模块的结构示意图;FIG3 is a schematic diagram of the structure of a second convolution module provided in an embodiment of the present application;
图4是本申请实施例提供的卷积神经网络模型训练方法的流程示意图;FIG4 is a flow chart of a convolutional neural network model training method provided in an embodiment of the present application;
图5是本申请实施例提供的图像识别装置的结构示意图;FIG5 is a schematic diagram of the structure of an image recognition device provided in an embodiment of the present application;
图6是本申请实施例提供的卷积神经网络模型训练装置的结构示意图;FIG6 is a schematic diagram of the structure of a convolutional neural network model training device provided in an embodiment of the present application;
图7是本申请实施例提供的终端设备的结构示意图。FIG. 7 is a schematic diagram of the structure of a terminal device provided in an embodiment of the present application.
以下描述中,为了说明而不是为了限定,提出了诸如特定系统结构、技术之类的具体细节,以便透彻理解本申请实施例。然而,本领域的技术人员应当清楚,在没有这些具体细节的其它实施例中也可以实现本申请。在其它情况中,省略对众所周知的系统、装置、电路以及方法的详细说明,以免不必要的细节妨碍本申请的描述。In the following description, specific details such as specific system structures, technologies, etc. are provided for the purpose of illustration rather than limitation, so as to provide a thorough understanding of the embodiments of the present application. However, it should be clear to those skilled in the art that the present application may also be implemented in other embodiments without these specific details. In other cases, detailed descriptions of well-known systems, devices, circuits, and methods are omitted to prevent unnecessary details from obstructing the description of the present application.
应当理解,当在本申请说明书和所附权利要求书中使用时,术语“包括”指示所描述特征、整体、步骤、操作、元素和/或组件的存在,但并不排除一个或多个其它特征、整体、步骤、操作、元素、组件和/或其集合的存在或添加。It should be understood that when used in the present specification and the appended claims, the term "comprising" indicates the presence of described features, integers, steps, operations, elements and/or components, but does not exclude the presence or addition of one or more other features, integers, steps, operations, elements, components and/or combinations thereof.
还应当理解,在本申请说明书和所附权利要求书中使用的术语“和/或”是指相关联列出的项中的一个或多个的任何组合以及所有可能组合,并且包括这些组合。It should also be understood that the term “and/or” used in the specification and appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
另外,在本申请说明书和所附权利要求书的描述中,术语“第一”、“第二”、“第三”等仅用于区分描述,而不能理解为指示或暗示相对重要性。In addition, in the description of the present application specification and the appended claims, the terms "first", "second", "third", etc. are only used to distinguish the descriptions and cannot be understood as indicating or implying relative importance.
在本申请说明书中描述的参考“一个实施例”或“一些实施例”等意味 着在本申请的一个或多个实施例中包括结合该实施例描述的特定特征、结构或特点。由此,在本说明书中的不同之处出现的语句“在一个实施例中”、“在一些实施例中”、“在其他一些实施例中”、“在另外一些实施例中”等不是必然都参考相同的实施例,而是意味着“一个或多个但不是所有的实施例”,除非是以其他方式另外特别强调。References to "one embodiment" or "some embodiments" etc. described in the specification of the present application mean that the specific features, structures or characteristics described in conjunction with the embodiment are included in one or more embodiments of the present application. Therefore, the phrases "in one embodiment", "in some embodiments", "in some other embodiments", "in some other embodiments", etc. appearing in different places in the specification do not necessarily refer to the same embodiment, but mean "one or more but not all embodiments", unless otherwise specifically emphasized in other ways.
实施例一:Embodiment 1:
图1示出了本发明实施例提供的一种基于卷积神经网络模型的图像识别方法的流程示意图,详述如下:FIG1 shows a schematic flow chart of an image recognition method based on a convolutional neural network model provided by an embodiment of the present invention, which is described in detail as follows:
将待识别图像输入到经训练的上述卷积神经网络模型,通过上述卷积神经网络模型对上述待识别图像依次进行特征提取和识别,得到识别结果。The image to be identified is input into the trained convolutional neural network model, and the convolutional neural network model is used to extract features and identify the image to be identified in turn to obtain a recognition result.
其中,上述卷积神经网络模型基于快速傅里叶变换对待识别图像进行频域全局卷积处理。Among them, the above convolutional neural network model performs frequency domain global convolution processing on the image to be recognized based on fast Fourier transform.
具体地,由于卷积神经网络在提取图像全局特征时,通常需要叠加多个卷积层增大感受野,以通过大感受野提取全局特征,但卷积神经网络模型的参数量和计算量也会随之增大,使得卷积神经网络模型的计算复杂度过大,并且,随着待识别图像的尺寸的增大,在待识别图像尺寸较大时(如112*112),卷积神经网络模型的计算复杂度迅速超过7*7卷积,不便于实际应用。因此,在卷积神经网络模型对输入的待识别图像进行特征提取时,对待识别图像进行快速傅里叶变换处理,将待识别图像转换为频域的待识别图像,使得空间域的卷积运算转换为频域的乘法运算,从而减少全局特征提取过程中的计算量。Specifically, when extracting global features of an image, a convolutional neural network usually needs to stack multiple convolutional layers to increase the receptive field, so as to extract global features through a large receptive field. However, the number of parameters and the amount of calculation of the convolutional neural network model will also increase accordingly, making the computational complexity of the convolutional neural network model too large. Moreover, as the size of the image to be identified increases, when the size of the image to be identified is large (such as 112*112), the computational complexity of the convolutional neural network model quickly exceeds the 7*7 convolution, which is not convenient for practical application. Therefore, when the convolutional neural network model extracts features from the input image to be identified, the image to be identified is processed by fast Fourier transform, and the image to be identified is converted into the image to be identified in the frequency domain, so that the convolution operation in the spatial domain is converted into the multiplication operation in the frequency domain, thereby reducing the amount of calculation in the global feature extraction process.
本申请实施例中,将待识别图像输入到经训练的上述卷积神经网络模型,通过上述卷积神经网络模型对上述待识别图像依次进行特征提取和识别,得到识别结果。由于基于快速傅里叶变换对待识别图像进行频域的全局卷积,将空间域的卷积运算转换为频域的乘法运算,因此,在对待识别图像进行全局特征的提取时,可以有效减少卷积神经网络模型提取大尺寸图像的全局特征过程中的计算量,从而提高卷积神经网络模型的识别效率,同时便于在算力较低的设备上部署应用。In an embodiment of the present application, the image to be identified is input into the trained convolutional neural network model, and the convolutional neural network model is used to extract features and identify the image to be identified in turn to obtain an identification result. Since the global convolution in the frequency domain is performed on the image to be identified based on the fast Fourier transform, the convolution operation in the spatial domain is converted into a multiplication operation in the frequency domain. Therefore, when extracting global features of the image to be identified, the amount of calculation in the process of extracting global features of large-size images by the convolutional neural network model can be effectively reduced, thereby improving the recognition efficiency of the convolutional neural network model and facilitating deployment and application on devices with lower computing power.
在一些实施例中,上述基于卷积神经网络模型的图像识别方法还包括:In some embodiments, the above-mentioned image recognition method based on the convolutional neural network model further includes:
获取待识别图像。Get the image to be recognized.
可选地,上述待识别图像可以是摄像设备采集得到的一个图像,也可以是摄像设备采集得到的视频流中的一个图像帧。Optionally, the image to be identified may be an image captured by a camera device, or may be an image frame in a video stream captured by a camera device.
可选地,由于不同图像识别任务所需的待识别图像可能不同,所采用的摄像设备、采集待检测图像的规则等也不尽相同,因此,根据各应用领域相应的采集方法和采集规则等,获取相应的待识别图像。例如,对于人脸识别任务,需要采集人脸图像作为待识别图像,对人脸图像中的人脸特征进行识别。Optionally, since different image recognition tasks may require different images to be recognized, the adopted camera equipment, the rules for collecting the images to be detected, etc. may also be different, therefore, the corresponding images to be recognized are acquired according to the corresponding acquisition methods and acquisition rules of each application field, etc. For example, for a face recognition task, it is necessary to collect a face image as the image to be recognized and recognize the facial features in the face image.
本申请实施例中,根据各应用领域中图像识别任务所需的图像,采用相应的采集方法和采集规则,获取符合图像识别任务要求的待识别图像,以便进行图像识别任务。In the embodiments of the present application, according to the images required for the image recognition tasks in various application fields, corresponding acquisition methods and acquisition rules are adopted to obtain the images to be recognized that meet the requirements of the image recognition tasks, so as to perform the image recognition tasks.
在一些实施例中,上述卷积神经网络模型包括特征提取模块和识别模块,上述步骤在通过上述卷积神经网络模型对上述待识别图像依次进行特征提取和识别,得到识别结果,包括:In some embodiments, the convolutional neural network model includes a feature extraction module and a recognition module. The steps of extracting features and recognizing the image to be recognized in sequence through the convolutional neural network model to obtain a recognition result include:
A1、通过上述特征提取模块对上述待识别图像进行特征提取;A1. Extracting features of the image to be identified by the feature extraction module;
A2、基于上述识别模块对提取到的特征进行识别,得到识别结果。A2. Based on the above recognition module, the extracted features are recognized to obtain recognition results.
可选地,由于图像识别包括图像分类、目标检测等不同的任务,不同的图像识别任务在进行图像识别时,对同一特征所采用的识别方法不尽相同,因此,通过上述特征提取模块对输入的待识别图像进行特征提取,并将提取到的特征作为上述识别模块的输入,根据图像识别任务进行相应的识别,得到识别结果。上述识别模块中可包括一个或多个识别单元,不同的识别单元所进行的识别任务不同,例如,可以包括行人检测单元和目标检测单元,通过行人检测单元对提取到的特征进行行人检测任务,或将提取到的特征图输入行人检测单元和目标检测单元,进行行人检测和目标检测任务。Optionally, since image recognition includes different tasks such as image classification and target detection, different image recognition tasks use different recognition methods for the same feature when performing image recognition. Therefore, the feature extraction module is used to extract features of the input image to be recognized, and the extracted features are used as inputs of the recognition module. According to the image recognition task, corresponding recognition is performed to obtain a recognition result. The recognition module may include one or more recognition units, and different recognition units perform different recognition tasks. For example, it may include a pedestrian detection unit and a target detection unit. The pedestrian detection unit performs a pedestrian detection task on the extracted features, or the extracted feature maps are input into the pedestrian detection unit and the target detection unit to perform pedestrian detection and target detection tasks.
本申请实施例中,通过特征提取模块对待识别图像进行特征提取,并通过识别模块获取提取到的特征进行相应的识别,得到相应的识别结果,以提高各图像识别任务的识别效率。In the embodiment of the present application, features are extracted from the image to be identified through a feature extraction module, and the extracted features are obtained through a recognition module for corresponding identification to obtain corresponding identification results, so as to improve the recognition efficiency of each image recognition task.
在一些实施例中,上述特征提取模块包括第一卷积模块和第二卷积模块,步骤A1包括:In some embodiments, the feature extraction module includes a first convolution module and a second convolution module, and step A1 includes:
A11、基于上述第一卷积模块对上述待识别图像进行局部特征提取,得到局部特征图;A11. Extracting local features of the image to be identified based on the first convolution module to obtain a local feature map;
A12、基于上述第二卷积模块采用快速傅里叶变换对上述局部特征图进行频域全局卷积,得到全局特征图。A12. Based on the second convolution module, use fast Fourier transform to perform frequency domain global convolution on the local feature map to obtain a global feature map.
可选地,上述卷积神经网络模型可以基于现有的卷积神经网络进行构建,将浅层的卷积层作为第一卷积模块,并将深层的卷积层或自注意力替换为第二卷积模块,上述第一卷积模块采用普通卷积方式对待识别图像进行卷积处理,提取待识别图像的局部特征,输出待识别图像的局部特征图,并将该局部特征图作为第二卷积模块的输入,上述第二模块通过对局部特征图进行快速傅里叶变换处理,获取频域的局部特征图,并基于频域的局部特征图进行全局特征的提取,得到全局特征图。例如,如图2所示的卷积神经网络模型,前三个卷积层为普通卷积的第一卷积模块,较深层的三个卷积层为第二卷积模块,第二卷积模块与识别模块相连,将待识别图像作为第一卷积模块的输入进行局部特征提取,并将输出的特征输入到第二卷积模块,第二卷积模块进行全局特征的提取,上述第二卷积模块输出的全局特征图作为识别模块的输入进行识别,从而输出相应的识别结果。Optionally, the above-mentioned convolutional neural network model can be constructed based on an existing convolutional neural network, with the shallow convolution layer as the first convolution module, and the deep convolution layer or self-attention replaced by the second convolution module, the above-mentioned first convolution module uses ordinary convolution to perform convolution processing on the image to be identified, extracts the local features of the image to be identified, outputs the local feature map of the image to be identified, and uses the local feature map as the input of the second convolution module, the above-mentioned second module obtains the local feature map in the frequency domain by fast Fourier transform processing on the local feature map, and extracts the global features based on the local feature map in the frequency domain to obtain the global feature map. For example, in the convolutional neural network model shown in Figure 2, the first three convolution layers are the first convolution modules of ordinary convolution, the three deeper convolution layers are the second convolution modules, the second convolution module is connected to the recognition module, the image to be identified is used as the input of the first convolution module for local feature extraction, and the output features are input to the second convolution module, the second convolution module extracts global features, and the global feature map output by the above-mentioned second convolution module is used as the input of the recognition module for recognition, thereby outputting the corresponding recognition result.
需要说明的是,上述卷积神经网络模型中第一卷积模块和第二卷积模块也可以采用交叉出现的结构,即第一卷积模块与第二卷积模块连接,该第二卷积模块的输出与另一个第一卷积模块连接的结构,也可以采用图2所示结构多次堆叠的结构,使特征提取模块输出的特征图为第二卷积模块所提取的全局特征图即可(即,使识别模块基于第二卷积模块提取的全局特征进行识别),并不该限定卷积神经网络模型中第一卷积模块(普通卷积层)与本申请实施例提供的第二卷积模块的具体结构。It should be noted that the first convolution module and the second convolution module in the above-mentioned convolutional neural network model can also adopt a cross-appearing structure, that is, the first convolution module is connected to the second convolution module, and the output of the second convolution module is connected to another first convolution module. The structure shown in Figure 2 can also be stacked multiple times, so that the feature map output by the feature extraction module is the global feature map extracted by the second convolution module (that is, the recognition module is based on the global features extracted by the second convolution module for recognition), and the specific structure of the first convolution module (ordinary convolution layer) in the convolutional neural network model and the second convolution module provided in the embodiment of the present application is not limited.
本申请实施例中,由于通过卷积神经网络模型的第一卷积模块提取待识别图像的局部特征,并将得到的局部特征图作为第二卷积模块的输入,对局部特征图进行快速傅里叶变换处理,再根据得到的频域的局部特征图进行全局特征的提取,因此,减少了全局特征提取过程中的计算量,并且,得到的全局特征同时包含局部特征和全局特征,提高了卷积神经网络模型的识别准确度。In the embodiment of the present application, the local features of the image to be identified are extracted by the first convolution module of the convolutional neural network model, and the obtained local feature map is used as the input of the second convolution module, the local feature map is subjected to fast Fourier transform processing, and then the global features are extracted based on the obtained local feature map in the frequency domain. Therefore, the amount of calculation in the global feature extraction process is reduced, and the obtained global features contain both local features and global features, thereby improving the recognition accuracy of the convolutional neural network model.
在一些实施例中,上述第二卷积模块包括第一分支和第二分支,上述步骤A12包括:In some embodiments, the second convolution module includes a first branch and a second branch, and the step A12 includes:
将上述局部特征图沿通道方向拆分,得到第一局部特征图和第二局部特 征图,并将上述第一局部特征图和上述第二局部特征图分别输入到上述第一分支和上述第二分支。The above-mentioned local feature map is split along the channel direction to obtain a first local feature map and a second local feature map, and the above-mentioned first local feature map and the above-mentioned second local feature map are respectively input into the above-mentioned first branch and the above-mentioned second branch.
可选地,在第二卷积模块对输入的局部特征图进行全局特征提取时,先将局部特征图沿通道方向进行拆分(如沿通道方向均匀拆分为两份),得到第一局部特征图和第二局部特征图,在将第一局部特征图输入到第一分支,将第二局部特征图输入到第二分支,以分别对第一局部特征图和第二局部特征图进行全局特征的提取。Optionally, when the second convolution module performs global feature extraction on the input local feature map, the local feature map is first split along the channel direction (such as evenly split into two parts along the channel direction) to obtain a first local feature map and a second local feature map, and the first local feature map is input into the first branch, and the second local feature map is input into the second branch, so as to extract global features from the first local feature map and the second local feature map, respectively.
上述第一分支和上述第二分支分别采用快速傅里叶变换对根据输入的局部特征图进行频域全局卷积,得到第一全局特征图和第二全局特征图。The first branch and the second branch respectively use fast Fourier transform to perform frequency domain global convolution on the input local feature map to obtain a first global feature map and a second global feature map.
可选地,第一分支对第一局部特征图进行快速傅里叶变换处理,得到频域的第一局部特征图,再对该频域的第一局部特征图进行全局卷积处理,得到第一全局特征图;第二分支对第二局部特征图进行快速傅里叶变换处理,得到频域的第二局部特征图,再对该频域的第二局部特征图进行全局卷积处理,得到第二全局特征图。Optionally, the first branch performs fast Fourier transform processing on the first local feature map to obtain a first local feature map in the frequency domain, and then performs global convolution processing on the first local feature map in the frequency domain to obtain a first global feature map; the second branch performs fast Fourier transform processing on the second local feature map to obtain a second local feature map in the frequency domain, and then performs global convolution processing on the second local feature map in the frequency domain to obtain a second global feature map.
将上述第一全局特征图和上述第二全局特征图沿通道方向进行拼接,得到全局特征图。The first global feature map and the second global feature map are concatenated along the channel direction to obtain a global feature map.
可选地,由于第一局部特征图和第二局部特征图是经局部特征图沿通道方向拆分后得到的,因此,在第一分支和第二分支分别提取得到第一局部特征图和第二局部特征图的特征后,将得到的第一全局特征图和第二全局特征图沿通道方向进行拼接,得到完整的全局特征图,以根据输入的待识别图像的完整的全局特征图进行后续处理。Optionally, since the first local feature map and the second local feature map are obtained after the local feature map is split along the channel direction, after the first branch and the second branch respectively extract the features of the first local feature map and the second local feature map, the first global feature map and the second global feature map are spliced along the channel direction to obtain a complete global feature map, so as to perform subsequent processing according to the complete global feature map of the input image to be identified.
本申请实施例中,由于将局部特征图沿通道方向拆分成两份输入到第一分支和第二分支,局部特征图的通道数减半,且在全局特征提取过程中基于快速傅里叶变换对局部特征图进行频域全局卷积,因此,根据通道数减半的局部特征图基于快速傅里叶变换进行全局特征的提取,降低了全局特征提取的计算复杂度,从而降低对设备计算能力的要求,便于在算力较低的设备上部署应用。In the embodiment of the present application, since the local feature map is split into two parts along the channel direction and input into the first branch and the second branch, the number of channels of the local feature map is halved, and in the process of global feature extraction, the local feature map is globally convolved in the frequency domain based on the fast Fourier transform. Therefore, the global features are extracted based on the local feature map with the halved channel number based on the fast Fourier transform, which reduces the computational complexity of the global feature extraction, thereby reducing the requirements on the computing power of the device, and facilitating the deployment of applications on devices with lower computing power.
在一些实施例中,第一分支和第二分支根据输入的局部特征图进行频域全局卷积时,包括:In some embodiments, when the first branch and the second branch perform frequency domain global convolution according to the input local feature map, the method includes:
第一分支基于列维度对上述第一局部特征图和相应的权重矩阵分别进行 快速傅里叶变换处理,得到频域的第一局部特征图和权重矩阵;The first branch performs fast Fourier transform processing on the first local feature map and the corresponding weight matrix based on the column dimension to obtain the first local feature map and the weight matrix in the frequency domain;
将上述频域的第一局部特征图和权重矩阵逐点相乘,得到第一频域特征图;Multiplying the first local feature map in the frequency domain and the weight matrix point by point to obtain a first frequency domain feature map;
对上述频域的第一特征图进行快速傅里叶逆变换处理,得到第一全局特征图。Perform inverse fast Fourier transform processing on the first feature map in the frequency domain to obtain a first global feature map.
可选地,在第一分支对第一局部特征图提取全局特征的过程中,对第一局部特征图及其权重矩阵沿行维度进行快速傅里叶变换处理,将其转换为频域的第一局部特征图和权重矩阵,使得在采用权重矩阵对第一局部特征图进行全局卷积时,将频域的第一局部特征图和权重矩阵逐点相乘,即将空间域的卷积运算转换成了频域的乘法运算,从而得到第一频域特征图(全局特征),再对第一频域特征图进行快速傅里叶逆变换,得到空间域的第一全局特征图,使得采用大卷积核提取全局特征时有效减小其计算量。Optionally, in the process of extracting global features from the first local feature map by the first branch, the first local feature map and its weight matrix are fast Fourier transformed along the row dimension to convert them into a first local feature map and a weight matrix in the frequency domain, so that when the weight matrix is used to perform global convolution on the first local feature map, the first local feature map in the frequency domain and the weight matrix are multiplied point by point, that is, the convolution operation in the spatial domain is converted into a multiplication operation in the frequency domain, thereby obtaining a first frequency domain feature map (global feature), and then the first frequency domain feature map is inversely fast Fourier transformed to obtain a first global feature map in the spatial domain, so that the amount of calculation is effectively reduced when a large convolution kernel is used to extract global features.
第二分支基于行维度对上述第二局部特征图和相应的权重矩阵分别进行快速傅里叶变换处理,得到频域的第二局部特征图和权重矩阵;The second branch performs fast Fourier transform processing on the second local feature map and the corresponding weight matrix based on the row dimension to obtain the second local feature map and the weight matrix in the frequency domain;
将上述频域的第二局部特征图和权重矩阵逐点相乘,得到第二频域特征图;Multiplying the second local feature map in the frequency domain and the weight matrix point by point to obtain a second frequency domain feature map;
对上述频域的第二特征图进行快速傅里叶逆变换处理,得到第二全局特征图。The second feature map in the frequency domain is processed by inverse fast Fourier transform to obtain a second global feature map.
可选地,在第二分支对第二局部特征图提取全局特征的过程中,对第二局部特征图及其权重矩阵沿行维度进行快速傅里叶变换处理,将其转换为频域的第二局部特征图和权重矩阵,使得在采用权重矩阵对第二局部特征图进行全局卷积时,将频域的第二局部特征图和权重矩阵逐点相乘,即将空间域的卷积运算转换为了频域的乘法运算,从而得到第二频域特征图(全局特征),再对第二频域特征图进行快速傅里叶逆变换,得到空间域的第二全局特征图,使得采用大卷积核提取全局特征时有效减小其计算量。Optionally, in the process of extracting global features from the second local feature map by the second branch, the second local feature map and its weight matrix are fast Fourier transformed along the row dimension to convert them into a second local feature map and a weight matrix in the frequency domain, so that when the weight matrix is used to perform global convolution on the second local feature map, the second local feature map and the weight matrix in the frequency domain are multiplied point by point, that is, the convolution operation in the spatial domain is converted into a multiplication operation in the frequency domain, thereby obtaining a second frequency domain feature map (global feature), and then the second frequency domain feature map is inversely fast Fourier transformed to obtain a second global feature map in the spatial domain, so that the amount of calculation is effectively reduced when a large convolution kernel is used to extract global features.
其中,第二卷积模块在提取全局特征的过程中,其计算复杂度可以表示为:Among them, the computational complexity of the second convolution module in the process of extracting global features can be expressed as:
即,第一分支和第二分支分别基于快速傅里叶变换对局部特征图进行全局卷积的计算复杂度为O(CHW(log
2[H]+log
2[W])。
That is, the computational complexity of the first branch and the second branch performing global convolution on the local feature map based on fast Fourier transform is O(CHW(log 2 [H]+log 2 [W]).
可选地,为了减小卷积神经网络模型的计算复杂度,在对第一局部特征图及其权重矩阵沿行维度进行快速傅里叶变换时,对上述第一局部特征图及其权重矩阵进行一维的快速傅里叶变换,得到一维形式(如数组)的第一局部特征图及其权重矩阵,例如,第一局部特征图表示为(C,H,W),对上述第一局部特征图沿列维度进行一维的快速傅里叶变换,得到一个具有H个元素的数值形式的第一局部特征图。Optionally, in order to reduce the computational complexity of the convolutional neural network model, when performing a fast Fourier transform on the first local feature map and its weight matrix along the row dimension, a one-dimensional fast Fourier transform is performed on the above-mentioned first local feature map and its weight matrix to obtain the first local feature map and its weight matrix in a one-dimensional form (such as an array). For example, the first local feature map is represented as (C, H, W), and a one-dimensional fast Fourier transform is performed on the above-mentioned first local feature map along the column dimension to obtain a first local feature map in a numerical form with H elements.
本申请实施例中,由于基于快速傅里叶变换将空间域的局部特征图及其权重矩阵转换为频域形式进行全局特征的提取,使得空间域的卷积运算转换为简单的乘法运算,减少了卷积神经网络模型的计算量,同时,由于从不同维度对局部特征图进行快速傅里叶变换处理并提取全局特征,得到不同维度的全局特征,再将得到的全局特征进行拼接得到包含宽度方向和高度方向的全局特征,减少了提取局部特征图的全局特征时的计算量,便于卷积神经网络模型在算力较低的设备上部署和应用。In the embodiment of the present application, since the local feature map and its weight matrix in the spatial domain are converted into the frequency domain form based on the fast Fourier transform to extract the global features, the convolution operation in the spatial domain is converted into a simple multiplication operation, thereby reducing the computational complexity of the convolutional neural network model. At the same time, since the local feature map is processed by fast Fourier transform from different dimensions and the global features are extracted, global features of different dimensions are obtained, and then the obtained global features are spliced to obtain global features including width and height directions, which reduces the computational complexity when extracting the global features of the local feature map, and facilitates the deployment and application of the convolutional neural network model on devices with lower computing power.
在一些实施例中,上述第二卷积模块还包括位置嵌入模块,上述步骤在将上述局部特征图沿通道方向进行拆分之前,还包括:In some embodiments, the second convolution module further includes a position embedding module, and the above steps further include:
通过上述位置嵌入模块对上述局部特征图进行特征提取,得到位置特征图,并将上述位置特征图与上述局部特征图相加,得到包含位置特征的局部特征图。The position embedding module is used to perform feature extraction on the local feature map to obtain a position feature map, and the position feature map is added to the local feature map to obtain a local feature map containing position features.
可选地,在对该局部特征图进行全局特征提取之前,通过位置嵌入模块对该局部特征图进行卷积处理,提取局部特征图中的位置特征,生成位置特征图,并根据像素位置将该位置特征图与该局部特征图相加,得到嵌入了位置特征的局部特征图。Optionally, before performing global feature extraction on the local feature map, convolution processing is performed on the local feature map through a position embedding module to extract the position features in the local feature map, generate a position feature map, and add the position feature map to the local feature map according to the pixel position to obtain a local feature map embedded with the position features.
可选地,上述位置嵌入模块为二维卷积模块,对输入的待识别图像进行卷积处理生成二维的位置特征图,即,使位置特征图的大小与输入的局部特征图的分辨率大小一致,以便直接将上述位置特征图与局部特征图相加,得到包含位置特征的局部特征图。例如,位置嵌入模块采用两层的轻量级卷积 网络结构,即“卷积+归一化处理+激活函数+卷积”的简单结构,由于需要生成二维的位置特征图,因此,卷积层可以采用3×3的深度可分离卷积来对局部特征图进行卷积处理生成位置特征图。Optionally, the position embedding module is a two-dimensional convolution module, which performs convolution processing on the input image to be identified to generate a two-dimensional position feature map, that is, the size of the position feature map is made consistent with the resolution size of the input local feature map, so as to directly add the position feature map to the local feature map to obtain a local feature map containing position features. For example, the position embedding module adopts a two-layer lightweight convolution network structure, that is, a simple structure of "convolution + normalization processing + activation function + convolution". Since it is necessary to generate a two-dimensional position feature map, the convolution layer can use a 3×3 depth-separable convolution to perform convolution processing on the local feature map to generate the position feature map.
本申请实施例中,由于图像的位置特征可以加强对图像内容的描述区分能力,在根据局部特征图提取全局特征之前,根据局部特征图的权重矩阵提取该局部特征图的位置特征,并将该位置特征嵌入到局部特征图中,使该局部特征图包含位置特征,因此,后续基于包含位置特征的全局特征图进行图像识别时可以提高识别准确度。In an embodiment of the present application, since the position features of an image can enhance the ability to describe and distinguish the image content, before extracting the global features based on the local feature map, the position features of the local feature map are extracted based on the weight matrix of the local feature map, and the position features are embedded into the local feature map so that the local feature map contains the position features. Therefore, the recognition accuracy can be improved when image recognition is subsequently performed based on the global feature map containing the position features.
在一些实施例中,第二卷积模块的网络结构如图3所示,可包括位置嵌入模块、第一分支和第二分支,通过位置嵌入模块将局部特征图的位置特征嵌入局部特征图后,将嵌入了位置特征的局部特征图拆分,得到第一局部特征图和第二局部特征图并分别输入到第一分支和第二分支中,第一分支和第二分支分别基于行维度和列维度,对输入的局部特征图及其相应的权重矩阵进行一维的快速傅里叶变换里,得到一维形式的频域的局部特征图和权重矩阵,再将频域的局部特征图与其相应的权重矩阵进行逐点相乘,得到第一频域特征图和第二频域特征图,再通过快速傅里叶逆变换将其转换到空间域,得到第一全局特征图和第二全局特征图,最后将第一全局特征图和第二全局特征图进行拼接,得到待识别图像的全局特征图。其中,实线箭头表示其数据流(特征数据)为实数形式,虚线箭头表示其数据流为虚数格式,即频域的特征数据。In some embodiments, the network structure of the second convolution module is shown in FIG3, and may include a position embedding module, a first branch, and a second branch. After the position feature of the local feature map is embedded in the local feature map by the position embedding module, the local feature map embedded with the position feature is split to obtain the first local feature map and the second local feature map and input them into the first branch and the second branch respectively. The first branch and the second branch perform a one-dimensional fast Fourier transform on the input local feature map and its corresponding weight matrix based on the row dimension and the column dimension respectively to obtain a one-dimensional local feature map and weight matrix in the frequency domain, and then multiply the local feature map in the frequency domain with its corresponding weight matrix point by point to obtain a first frequency domain feature map and a second frequency domain feature map, and then convert it to the spatial domain by inverse fast Fourier transform to obtain a first global feature map and a second global feature map, and finally splice the first global feature map and the second global feature map to obtain a global feature map of the image to be identified. Wherein, the solid arrow indicates that its data flow (feature data) is in real number form, and the dotted arrow indicates that its data flow is in imaginary number format, that is, feature data in the frequency domain.
对应于上述基于卷积神经网络模型的图像识别方法,图4示出了本申请实施例提供的一种卷积神经网络模型训练方法的流程示意图,详述如下:Corresponding to the above-mentioned image recognition method based on the convolutional neural network model, FIG4 shows a flow chart of a convolutional neural network model training method provided in an embodiment of the present application, which is described in detail as follows:
获取构建的卷积神经网络模型,并将样本图像输入到上述卷积神经网络进行训练,直至上述卷积神经网络满足预设要求,得到卷积神经网络模型。The constructed convolutional neural network model is obtained, and the sample image is input into the convolutional neural network for training until the convolutional neural network meets the preset requirements to obtain the convolutional neural network model.
其中,上述卷积神经网络基于快速傅里叶变换对样本图像进行频域的全局卷积。Among them, the above convolutional neural network performs global convolution on the sample image in the frequency domain based on fast Fourier transform.
可选地,在训练卷积神经网络之前,预先根据用户需求构建卷积神经网络,即根据用户图像识别任务的需求设置卷积神经网络的网络结构(如可以基于现有的ResNet、VGGNet网络进行构建),以实现相应的图像识别任务。 例如,用户需要训练用于进行目标检测的卷积神经网络模型,为了实现对图像中不同大小的目标的检测,达到较好的检测效果,可以基于SSD(Single Shot MultiBox Detector,单次多目标检测器)网络结构构建卷积神经网络,通过在不同特征尺度上检测实现对不同尺度的目标的检测。Optionally, before training the convolutional neural network, a convolutional neural network is pre-built according to user needs, that is, the network structure of the convolutional neural network is set according to the needs of the user's image recognition task (such as being built based on the existing ResNet and VGGNet networks) to achieve the corresponding image recognition task. For example, the user needs to train a convolutional neural network model for target detection. In order to detect targets of different sizes in an image and achieve better detection results, a convolutional neural network can be built based on the SSD (Single Shot MultiBox Detector) network structure to detect targets of different scales by detecting at different feature scales.
具体地,获取预先构建的卷积神经网络,将相应的样本图像作为上述卷积神经网络的输入进行训练,直至上述卷积神经网络满足预设的要求(如卷积神经网络的识别准确度到达预设阈值,如0.99),则停止训练卷积神经网络,得到训练完成的卷积神经网络模型。其中,在进行卷积神经网络的训练时,对样本图像以及权重矩阵进行快速傅里叶变换处理,使其转换为频域形式的样本图像和权重矩阵,从而从频域采用权重矩阵对该样本图像进行全局特征提取,即基于快速傅里叶变换将空间域的输入图像转换为频域的形式,再对转换后的频域的样本图像进行乘法运算,从而实现快速的全局特征的提取,使得在提取全局特征的过程中,以及图像的分辨率较大时,可以有效减小提取图像的全局特征的计算量。Specifically, a pre-constructed convolutional neural network is obtained, and the corresponding sample image is used as the input of the convolutional neural network for training until the convolutional neural network meets the preset requirements (such as the recognition accuracy of the convolutional neural network reaches a preset threshold, such as 0.99), then the training of the convolutional neural network is stopped to obtain a trained convolutional neural network model. Wherein, when training the convolutional neural network, the sample image and the weight matrix are processed by fast Fourier transform to convert them into sample images and weight matrices in the frequency domain, so as to extract global features of the sample image using the weight matrix from the frequency domain, that is, the input image in the spatial domain is converted into the form of the frequency domain based on the fast Fourier transform, and then the converted sample image in the frequency domain is multiplied, so as to realize the rapid extraction of global features, so that in the process of extracting global features, and when the resolution of the image is large, the amount of calculation of extracting the global features of the image can be effectively reduced.
可选地,上述样本图像为用户进行的图像识别任务所对应的、带标签的样本图像,以便直接采用上述样本图像进行训练,不需要再对样本图像进行标注。在采用样本图像进行训练时,可将部分样本图像作为训练集进行训练,将一部分样本图像作为验证集和测试集,以调整卷积神经网络,得到良好的卷积神经网络模型。例如在训练用于进行行人重识别的卷积神经网络模型时,可以采用Market1501数据集作为训练集进行训练,Market1501包含由6个摄像头拍摄的1501个行人的32217张图像,每个行人至少由2个摄像头捕获到,并且再一个摄像头中可能具有多张图像,并分为训练集和测试集。Optionally, the sample images are labeled sample images corresponding to the image recognition tasks performed by the user, so that the sample images can be directly used for training without labeling the sample images. When using sample images for training, part of the sample images can be used as training sets, and part of the sample images can be used as validation sets and test sets to adjust the convolutional neural network and obtain a good convolutional neural network model. For example, when training a convolutional neural network model for pedestrian re-identification, the Market1501 dataset can be used as a training set for training. Market1501 contains 32,217 images of 1,501 pedestrians taken by 6 cameras. Each pedestrian is captured by at least 2 cameras, and there may be multiple images in one camera, which are divided into training sets and test sets.
可选地,在将样本图像作为卷积神经网络的输入进行训练时,单次训练输入的样本图像可以是一张,也可以是多张,如100张,批量输入样本图像时,输入的样本图像的数量表示为批处理大小Batch Size,相应的,卷积神经网络提取到的样本图像的特征形状可以表示为四维格式(B,H,W,C),B表示批处理大小,H表示高度,W表示宽度,C表示通道channel。Optionally, when the sample image is used as the input of the convolutional neural network for training, the sample image input for a single training can be one or more, such as 100. When the sample images are input in batches, the number of input sample images is represented as the batch size Batch Size. Correspondingly, the characteristic shape of the sample image extracted by the convolutional neural network can be represented in a four-dimensional format (B, H, W, C), where B represents the batch size, H represents the height, W represents the width, and C represents the channel.
本申请实施例中,根据用户的需求预先构建卷积神经网络,将带标签的样本图像作为构建的卷积神经网络的输入进行训练,直至上述卷积神经网络满足预设的要求,得到训练完成的卷积神经网络模型。由于基于快速傅里叶 变换将输入图像从空间域转换为频域,使空间域的卷积运算转换为频域的乘法运算,因此,有效减少了在全局特征提取的过程中的计算量,降低卷积神经网络模型对于大尺度的输入图像的计算复杂度,便于在算力较低的设备上部署运行,同时也提高了卷积神经网络模型的运算速度。In the embodiment of the present application, a convolutional neural network is pre-constructed according to the needs of the user, and the labeled sample image is used as the input of the constructed convolutional neural network for training until the convolutional neural network meets the preset requirements, and a trained convolutional neural network model is obtained. Since the input image is converted from the spatial domain to the frequency domain based on the fast Fourier transform, the convolution operation in the spatial domain is converted into the multiplication operation in the frequency domain, thus effectively reducing the amount of calculation in the process of global feature extraction, reducing the computational complexity of the convolutional neural network model for large-scale input images, facilitating deployment and operation on devices with lower computing power, and also improving the computing speed of the convolutional neural network model.
对应于上述基于卷积神经网络模型的图像识别方法或卷积神经网络模型训练方法,以下基于部分应用场景对基于卷积神经网络模型的图像识别方法进行介绍。Corresponding to the above-mentioned image recognition method based on the convolutional neural network model or the convolutional neural network model training method, the image recognition method based on the convolutional neural network model is introduced below based on some application scenarios.
(一)遥感检测1. Remote sensing detection
高分辨率遥感影像具有包含信息大厦,自然场景复杂等特点,一副遥感影像中往往包含大量的建筑物、场地、植被、农田等多类别地物和地貌要素信息,遥感图像的目标检测一直以来是热点的研究内容。现有遥感影像目标检测模型大多有着深层次的结构以及复杂的连接通道,而遥感图像数据比自然图像更多、范围更大,通过普通卷积采用大卷积核对遥感图像进行全局特征提取,其计算复杂度过大,检测效率低,也限制了模型在许多计算资源有限的场景下部署和使用。而本申请提供的基于卷积神经网络模型的图像识别方法,正是针对卷积提取大尺寸图像的全局特征的计算量大的问题,通过将空间域的卷积运算转换为频域的乘法运算,对待检测图像进行全局特征的提取,有效减小全局特征提取的计算量,从而提高卷积神经网络模型的检测效率。High-resolution remote sensing images have the characteristics of containing information buildings and complex natural scenes. A remote sensing image often contains a large number of buildings, sites, vegetation, farmland and other types of ground objects and geomorphic elements. Target detection of remote sensing images has always been a hot research topic. Most of the existing remote sensing image target detection models have deep structures and complex connection channels, and remote sensing image data is more and larger than natural images. The global feature extraction of remote sensing images by ordinary convolution using large convolution kernels is too computationally complex and has low detection efficiency, which also limits the deployment and use of the model in many scenarios with limited computing resources. The image recognition method based on the convolutional neural network model provided in this application is precisely aimed at the problem of large amount of computation of global features of large-size images extracted by convolution. By converting the convolution operation in the spatial domain into multiplication operation in the frequency domain, the global features of the image to be detected are extracted, which effectively reduces the computation of global feature extraction, thereby improving the detection efficiency of the convolutional neural network model.
首先,将采集到的待检测图像输入卷积神经网络模型(目标检测模型),通过第一卷积模块对该待检测图像进行局部特征的提取,以获取待检测图像的边缘、角点、线等特征信息,得到局部特征图,再将该局部特征图作为第二卷积模块的输入。第二卷积模块中,通过位置嵌入模块将待检测图像中的位置特征嵌入到局部特征图中,使其包含更多的位置信息,再将该局部特征图沿通道方向进行拆分得到第一局部特征图和第二局部特征图,输入到第一分支和第二分支,沿列维度和行维度对第一局部特征图、第二局部特征图进行一维的快速傅里叶变换处理,得到频域的第一局部特征图和第二局部特征图,同时,对第一局部特征图和第二局部特征图相应的权重矩阵进行一维的快速傅里叶变换处理,从而将频域的权重矩阵与其相应的频域的第一局部特 征图、第二局部特征图逐点相乘,得到第一频域特征图和第二频域特征图,将其进行快速傅里叶逆变换后拼接得到完整的全局特征图,最后通过检测(识别)模块对该全局特征图进行检测,从而输出相应的检测结果。由于在全局特征提取的过程中,将局部特征图拆分计算,减小了计算量,并分别进行一维的快速傅里叶变换,将空间域的卷积运算转换为频域决定的乘法运算,大大减少了全局特征提取的计算量,有效提高对于遥感图像的目标检测效率。First, the collected image to be detected is input into the convolutional neural network model (target detection model), and the local features of the image to be detected are extracted through the first convolution module to obtain the feature information such as edges, corners, lines, etc. of the image to be detected, and a local feature map is obtained, which is then used as the input of the second convolution module. In the second convolution module, the position features in the image to be detected are embedded into the local feature map through the position embedding module so that it contains more position information. The local feature map is then split along the channel direction to obtain the first local feature map and the second local feature map, which are input into the first branch and the second branch. The first local feature map and the second local feature map are processed by one-dimensional fast Fourier transform along the column dimension and the row dimension to obtain the first local feature map and the second local feature map in the frequency domain. At the same time, the weight matrices corresponding to the first local feature map and the second local feature map are processed by one-dimensional fast Fourier transform, so that the weight matrix in the frequency domain is multiplied point by point with the corresponding first local feature map and the second local feature map in the frequency domain to obtain the first frequency domain feature map and the second frequency domain feature map, which are inversely fast Fourier transformed and spliced to obtain a complete global feature map. Finally, the global feature map is detected by the detection (recognition) module to output the corresponding detection result. In the process of global feature extraction, the local feature map is split and calculated, which reduces the amount of calculation, and a one-dimensional fast Fourier transform is performed separately, converting the convolution operation in the spatial domain into a multiplication operation determined by the frequency domain, which greatly reduces the amount of calculation for global feature extraction and effectively improves the target detection efficiency for remote sensing images.
(二)人脸识别2. Face recognition
人脸识别技术目前广泛应用在智能门禁、安防监控等领域,由于人脸识别需要提取图像中的人脸特征进行识别,因此,通常采集清晰度,即分辨率较高的图像进行人脸识别,例如,应用在智能门禁的人脸识别需要较为准确的识别以提高安全性,此时要求摄像头的分辨率较高,而智能门禁系统的计算资源有限,对高分辨率图像提取全局特征进行识别时其效率较低,不利于实际应用,而本申请提供的基于卷积神经网络模型的图像识别方法可以部署在智能门禁系统中,有效减少对高分辨率图像提取全局特征的计算量,提高人脸识别效率。Face recognition technology is currently widely used in smart access control, security monitoring and other fields. Since face recognition needs to extract facial features from images for recognition, images with high clarity, that is, high resolution, are usually collected for face recognition. For example, face recognition used in smart access control requires more accurate recognition to improve security. At this time, the camera resolution is required to be higher, but the computing resources of the smart access control system are limited. When extracting global features from high-resolution images for recognition, its efficiency is low, which is not conducive to practical application. The image recognition method based on the convolutional neural network model provided in this application can be deployed in the smart access control system, effectively reducing the amount of computational effort in extracting global features from high-resolution images and improving face recognition efficiency.
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。It should be understood that the size of the serial numbers of the steps in the above embodiments does not mean the order of execution. The execution order of each process should be determined by its function and internal logic, and should not constitute any limitation on the implementation process of the embodiments of the present application.
实施例二:Embodiment 2:
对应于上文实施例所述的基于卷积神经网络模型的图像识别方法,图5示出了本申请实施例提供的基于卷积神经网络模型的图像识别装置的结构框图,为了便于说明,仅示出了与本申请实施例相关的部分。Corresponding to the image recognition method based on the convolutional neural network model described in the above embodiment, Figure 5 shows a structural block diagram of the image recognition device based on the convolutional neural network model provided in the embodiment of the present application. For the sake of convenience of explanation, only the parts related to the embodiment of the present application are shown.
参照图5,该装置包括:输入模块51、卷积神经网络模型52,上述卷积神经网络模型基于快速傅里叶变换对待识别图像进行频域全局卷积。其中,5 , the device includes: an input module 51 and a convolutional neural network model 52. The convolutional neural network model performs frequency domain global convolution on the image to be identified based on fast Fourier transform.
输入模块51,用于将待识别图像输入到上述卷积神经网络模型;An input module 51, used to input the image to be recognized into the above-mentioned convolutional neural network model;
卷积神经网络模型52,用于对上述待识别图像依次进行特征提取和识别,得到识别结果。The convolutional neural network model 52 is used to extract features and recognize the above-mentioned images to be recognized in sequence to obtain recognition results.
本申请实施例中,将待识别图像输入到经训练的上述卷积神经网络模型,通过上述卷积神经网络模型对上述待识别图像依次进行特征提取和识别,得 到识别结果。由于基于快速傅里叶变换对待识别图像进行频域的全局卷积,将空间域的卷积运算转换为频域的乘法运算,因此,在对待识别图像进行全局特征的提取时,可以有效减少卷积神经网络模型提取大尺寸图像的全局特征过程中的计算量,从而提高卷积神经网络模型的识别效率,同时便于在算力较低的设备上部署应用。In the embodiment of the present application, the image to be identified is input into the trained convolutional neural network model, and the convolutional neural network model sequentially extracts features and identifies the image to be identified to obtain the identification result. Since the global convolution in the frequency domain is performed on the image to be identified based on the fast Fourier transform, the convolution operation in the spatial domain is converted into a multiplication operation in the frequency domain. Therefore, when extracting the global features of the image to be identified, the amount of calculation in the process of extracting the global features of large-size images by the convolutional neural network model can be effectively reduced, thereby improving the recognition efficiency of the convolutional neural network model and facilitating the deployment and application on devices with lower computing power.
在一些实施例中,上述图像识别装置还包括:In some embodiments, the image recognition device further includes:
待识别图像获取模块,用于获取待识别图像。The module for acquiring the image to be identified is used to acquire the image to be identified.
在一些实施例中,上述卷积神经网络模型52包括:In some embodiments, the convolutional neural network model 52 includes:
特征提取单元,用于对上述待识别图像进行特征提取;A feature extraction unit, used to extract features from the above-mentioned image to be identified;
识别单元,用于对提取到的特征进行识别,得到识别结果。The recognition unit is used to recognize the extracted features and obtain recognition results.
在一些实施例中,上述特征提取单元包括:In some embodiments, the feature extraction unit includes:
第一卷积单元,用于对上述待识别图像进行局部特征提取,得到局部特征图;A first convolution unit is used to extract local features of the image to be identified to obtain a local feature map;
第二卷积单元,用于采用快速傅里叶变换对上述局部特征图进行频域全局卷积,得到全局特征图。The second convolution unit is used to perform frequency domain global convolution on the local feature map by using fast Fourier transform to obtain a global feature map.
在一些实施例中,上述第二卷积单元包括:In some embodiments, the second convolution unit includes:
拆分单元,用于将上述局部特征图沿通道方向拆分,得到第一局部特征图和第二局部特征图;A splitting unit, used for splitting the local feature map along the channel direction to obtain a first local feature map and a second local feature map;
第一分支单元,用于对采用快速傅里叶变换对第一局部特征图进行频域全局卷积,得到第一全局特征图;A first branch unit is used to perform frequency domain global convolution on the first local feature map by using a fast Fourier transform to obtain a first global feature map;
第二分支单元,用于对采用快速傅里叶变换对第二局部特征图进行频域全局卷积,得到第二全局特征图;The second branch unit is used to perform frequency domain global convolution on the second local feature map by using fast Fourier transform to obtain a second global feature map;
拼接单元,用于将上述第一全局特征图和上述第二全局特征图沿通道方向进行拼接,得到全局特征图。The splicing unit is used to splice the first global feature map and the second global feature map along the channel direction to obtain a global feature map.
在一些实施例中,上述第一分支单元包括:In some embodiments, the first branch unit comprises:
第一变换单元,用于基于列维度对上述第一局部特征图和相应的权重矩阵分别进行快速傅里叶变换处理,得到频域的第一局部特征图和权重矩阵;A first transformation unit is used to perform fast Fourier transform processing on the first local feature map and the corresponding weight matrix based on the column dimension to obtain the first local feature map and the weight matrix in the frequency domain;
第一卷积单元,用于将上述频域的第一局部特征图和权重矩阵逐点相乘,得到第一频域特征图;A first convolution unit is used to multiply the first local feature map in the frequency domain and the weight matrix point by point to obtain a first frequency domain feature map;
第一逆变换单元,用于对上述频域的第一特征图进行快速傅里叶逆变换 处理,得到第一全局特征图。The first inverse transform unit is used to perform a fast Fourier inverse transform on the first feature map in the frequency domain to obtain a first global feature map.
上述第二分支单元包括:The second branch unit comprises:
第二变换单元,用于基于行维度对上述第二局部特征图和相应的权重矩阵分别进行快速傅里叶变换处理,得到频域的第二局部特征图和权重矩阵;A second transformation unit is used to perform fast Fourier transform processing on the second local feature map and the corresponding weight matrix based on the row dimension to obtain the second local feature map and the weight matrix in the frequency domain;
第二卷积单元,用于将上述频域的第二局部特征图和权重矩阵逐点相乘,得到第二频域特征图;A second convolution unit is used to multiply the second local feature map in the frequency domain and the weight matrix point by point to obtain a second frequency domain feature map;
第二逆变换单元,用于对上述频域的第二特征图进行快速傅里叶逆变换处理,得到第二全局特征图。The second inverse transform unit is used to perform inverse fast Fourier transform processing on the second feature map in the frequency domain to obtain a second global feature map.
在一些实施例中,上述第二卷积单元还包括:In some embodiments, the second convolution unit further includes:
位置嵌入单元,用于通过上述位置嵌入模块对上述局部特征图进行特征提取,得到位置特征图,并将上述位置特征图与上述局部特征图相加,得到包含位置特征的局部特征图。The position embedding unit is used to extract features from the local feature map through the position embedding module to obtain a position feature map, and add the position feature map to the local feature map to obtain a local feature map containing position features.
对应于上文实施例所述的卷积神经网络模型的训练方法,图6示出了本申请实施例提供的一种卷积神经网络模型训练装置的结构框图,参照图6,该装置包括:Corresponding to the training method of the convolutional neural network model described in the above embodiment, FIG6 shows a structural block diagram of a convolutional neural network model training device provided in an embodiment of the present application. Referring to FIG6 , the device includes:
训练模块61,用于获取构建的卷积神经网络模型,并将样本图像输入到上述卷积神经网络进行训练,直至上述卷积神经网络满足预设要求,得到卷积神经网络模型,其中,上述卷积神经网络基于快速傅里叶变换对样本图像进行频域全局卷积。The training module 61 is used to obtain the constructed convolutional neural network model and input the sample image into the convolutional neural network for training until the convolutional neural network meets the preset requirements to obtain the convolutional neural network model, wherein the convolutional neural network performs frequency domain global convolution on the sample image based on fast Fourier transform.
需要说明的是,上述装置/单元之间的信息交互、执行过程等内容,由于与本申请方法实施例基于同一构思,其具体功能及带来的技术效果,具体可参见方法实施例部分,此处不再赘述。It should be noted that the information interaction, execution process, etc. between the above-mentioned devices/units are based on the same concept as the method embodiment of the present application. Their specific functions and technical effects can be found in the method embodiment part and will not be repeated here.
实施例三:Embodiment three:
图7为本申请一实施例提供的终端设备的结构示意图。如图7所示,该实施例的终端设备7包括:至少一个处理器70(图7中仅示出一个处理器)、存储器71以及存储在所述存储器71中并可在所述至少一个处理器70上运行的计算机程序72,所述处理器70执行所述计算机程序72时实现上述任意各个方法实施例中的步骤。FIG7 is a schematic diagram of the structure of a terminal device provided in an embodiment of the present application. As shown in FIG7 , the terminal device 7 of this embodiment includes: at least one processor 70 (only one processor is shown in FIG7 ), a memory 71, and a computer program 72 stored in the memory 71 and executable on the at least one processor 70, and when the processor 70 executes the computer program 72, the steps in any of the above-mentioned method embodiments are implemented.
例如,所述计算机程序72可以被分割成输入模块51和卷积神经网络模型52,上述卷积神经网络模型基于快速傅里叶变换对待识别图像进行频域全局卷积,各模块之间具体功能如下:For example, the computer program 72 can be divided into an input module 51 and a convolutional neural network model 52. The convolutional neural network model performs frequency domain global convolution on the image to be identified based on fast Fourier transform. The specific functions of each module are as follows:
输入模块51,用于将待识别图像输入到上述卷积神经网络模型;An input module 51, used to input the image to be recognized into the above-mentioned convolutional neural network model;
卷积神经网络模型52,用于对上述待识别图像依次进行特征提取和识别,得到识别结果。The convolutional neural network model 52 is used to extract features and recognize the above-mentioned images to be recognized in sequence to obtain recognition results.
或者,上述计算机程序72可以被分割成训练模块61,模块具体功能如下:Alternatively, the computer program 72 may be divided into training modules 61, and the specific functions of the modules are as follows:
训练模块61,用于获取构建的卷积神经网络模型,并将样本图像输入到上述卷积神经网络进行训练,直至上述卷积神经网络满足预设要求,得到卷积神经网络模型,其中,上述卷积神经网络基于快速傅里叶变换对样本图像进行频域全局卷积。The training module 61 is used to obtain the constructed convolutional neural network model and input the sample image into the convolutional neural network for training until the convolutional neural network meets the preset requirements to obtain the convolutional neural network model, wherein the convolutional neural network performs frequency domain global convolution on the sample image based on fast Fourier transform.
所述终端设备7可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。该终端设备可包括,但不仅限于,处理器70、存储器71。本领域技术人员可以理解,图7仅仅是终端设备7的举例,并不构成对终端设备7的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如还可以包括输入输出设备、网络接入设备等。The terminal device 7 may be a computing device such as a desktop computer, a notebook, a PDA, a cloud server, etc. The terminal device may include, but not limited to, a processor 70 and a memory 71. Those skilled in the art will appreciate that FIG. 7 is merely an example of the terminal device 7 and does not constitute a limitation on the terminal device 7. The terminal device 7 may include more or fewer components than shown in the figure, or may combine certain components, or different components, and may also include, for example, input and output devices, network access devices, etc.
所称处理器70可以是中央处理单元(Central Processing Unit,CPU),该处理器70还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现场可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。The processor 70 may be a central processing unit (CPU), or other general-purpose processors, digital signal processors (DSP), application-specific integrated circuits (ASIC), field-programmable gate arrays (FPGA), or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc. A general-purpose processor may be a microprocessor or any conventional processor, etc.
所述存储器71在一些实施例中可以是所述终端设备7的内部存储单元,例如终端设备7的硬盘或内存。所述存储器71在另一些实施例中也可以是所述终端设备7的外部存储设备,例如所述终端设备7上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。进一步地,所述存储器71还可以既包括所述终端设备7的内部存储单元也包括外部存储设备。所述存储器71用于存储操作系统、应用程序、引导装载程序(BootLoader)、数据以及其他程序等,例如所述 计算机程序的程序代码等。所述存储器71还可以用于暂时地存储已经输出或者将要输出的数据。In some embodiments, the memory 71 may be an internal storage unit of the terminal device 7, such as a hard disk or memory of the terminal device 7. In other embodiments, the memory 71 may also be an external storage device of the terminal device 7, such as a plug-in hard disk, a smart media card (SMC), a secure digital (SD) card, a flash card (Flash Card), etc. equipped on the terminal device 7. Further, the memory 71 may also include both an internal storage unit and an external storage device of the terminal device 7. The memory 71 is used to store an operating system, an application program, a boot loader (BootLoader), data, and other programs, such as the program code of the computer program. The memory 71 may also be used to temporarily store data that has been output or is to be output.
所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,仅以上述各功能单元、模块的划分进行举例说明,实际应用中,可以根据需要而将上述功能分配由不同的功能单元、模块完成,即将所述装置的内部结构划分成不同的功能单元或模块,以完成以上描述的全部或者部分功能。实施例中的各功能单元、模块可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中,上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能单元的形式实现。另外,各功能单元、模块的具体名称也只是为了便于相互区分,并不用于限制本申请的保护范围。上述系统中单元、模块的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。The technicians in the relevant field can clearly understand that for the convenience and simplicity of description, only the division of the above-mentioned functional units and modules is used as an example for illustration. In practical applications, the above-mentioned function allocation can be completed by different functional units and modules as needed, that is, the internal structure of the device can be divided into different functional units or modules to complete all or part of the functions described above. The functional units and modules in the embodiment can be integrated in a processing unit, or each unit can exist physically separately, or two or more units can be integrated in one unit. The above-mentioned integrated unit can be implemented in the form of hardware or in the form of software functional units. In addition, the specific names of the functional units and modules are only for the convenience of distinguishing each other, and are not used to limit the scope of protection of this application. The specific working process of the units and modules in the above-mentioned system can refer to the corresponding process in the aforementioned method embodiment, which will not be repeated here.
本申请实施例还提供了一种网络设备,该网络设备包括:至少一个处理器、存储器以及存储在所述存储器中并可在所述至少一个处理器上运行的计算机程序,所述处理器执行所述计算机程序时实现上述任意各个方法实施例中的步骤。An embodiment of the present application also provides a network device, which includes: at least one processor, a memory, and a computer program stored in the memory and executable on the at least one processor, wherein the processor implements the steps in any of the above-mentioned method embodiments when executing the computer program.
本申请实施例还提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序被处理器执行时实现可实现上述各个方法实施例中的步骤。An embodiment of the present application further provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program, and when the computer program is executed by a processor, the steps in the above-mentioned method embodiments can be implemented.
本申请实施例提供了一种计算机程序产品,当计算机程序产品在终端设备上运行时,使得终端设备执行时实现可实现上述各个方法实施例中的步骤。An embodiment of the present application provides a computer program product. When the computer program product is run on a terminal device, the terminal device can implement the steps in the above-mentioned method embodiments when executing the computer program product.
所述集成的单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本申请实现上述实施例方法中的全部或部分流程,可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一计算机可读存储介质中,该计算机程序在被处理器执行时,可实现上述各个方法实施例的步骤。其中,所述计算机程序包括计算机程序代码,所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质至少可以包括:能够将计算机程序代码携带到拍照装置/终端设备的任何实体或 装置、记录介质、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、电载波信号、电信信号以及软件分发介质。例如U盘、移动硬盘、磁碟或者光盘等。在某些司法管辖区,根据立法和专利实践,计算机可读介质不可以是电载波信号和电信信号。If the integrated unit is implemented in the form of a software functional unit and sold or used as an independent product, it can be stored in a computer-readable storage medium. Based on this understanding, the present application implements all or part of the processes in the above-mentioned embodiment method, which can be completed by instructing the relevant hardware through a computer program. The computer program can be stored in a computer-readable storage medium. When the computer program is executed by the processor, the steps of the above-mentioned various method embodiments can be implemented. Among them, the computer program includes computer program code, which can be in source code form, object code form, executable file or some intermediate form. The computer-readable medium may at least include: any entity or device that can carry the computer program code to the camera/terminal device, recording medium, computer memory, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), electric carrier signal, telecommunication signal and software distribution medium. For example, USB flash drive, mobile hard disk, disk or optical disk. In some jurisdictions, according to legislation and patent practice, computer-readable media cannot be electric carrier signals and telecommunication signals.
在上述实施例中,对各个实施例的描述都各有侧重,某个实施例中没有详述或记载的部分,可以参见其它实施例的相关描述。In the above embodiments, the description of each embodiment has its own emphasis. For parts that are not described or recorded in detail in a certain embodiment, reference can be made to the relevant descriptions of other embodiments.
本领域普通技术人员可以意识到,结合本文中所公开的实施例描述的各示例的单元及算法步骤,能够以电子硬件、或者计算机软件和电子硬件的结合来实现。这些功能究竟以硬件还是软件方式来执行,取决于技术方案的特定应用和设计约束条件。专业技术人员可以对每个特定的应用来使用不同方法来实现所描述的功能,但是这种实现不应认为超出本申请的范围。Those of ordinary skill in the art will appreciate that the units and algorithm steps of each example described in conjunction with the embodiments disclosed herein can be implemented in electronic hardware, or a combination of computer software and electronic hardware. Whether these functions are performed in hardware or software depends on the specific application and design constraints of the technical solution. Professional and technical personnel can use different methods to implement the described functions for each specific application, but such implementation should not be considered to be beyond the scope of this application.
在本申请所提供的实施例中,应该理解到,所揭露的装置/网络设备和方法,可以通过其它的方式实现。例如,以上所描述的装置/网络设备实施例仅仅是示意性的,例如,所述模块或单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通讯连接可以是通过一些接口,装置或单元的间接耦合或通讯连接,可以是电性,机械或其它的形式。In the embodiments provided in the present application, it should be understood that the disclosed devices/network equipment and methods can be implemented in other ways. For example, the device/network equipment embodiments described above are merely schematic. For example, the division of the modules or units is only a logical function division. There may be other division methods in actual implementation, such as multiple units or components can be combined or integrated into another system, or some features can be ignored or not executed. Another point is that the mutual coupling or direct coupling or communication connection shown or discussed can be through some interfaces, indirect coupling or communication connection of devices or units, which can be electrical, mechanical or other forms.
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。The units described as separate components may or may not be physically separated, and the components shown as units may or may not be physical units, that is, they may be located in one place or distributed on multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.
以上所述实施例仅用以说明本申请的技术方案,而非对其限制;尽管参照前述实施例对本申请进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本申请各实施例技术方案的精神和范围,均应包含在本申请的保护范围之内。The embodiments described above are only used to illustrate the technical solutions of the present application, rather than to limit them. Although the present application has been described in detail with reference to the aforementioned embodiments, a person skilled in the art should understand that the technical solutions described in the aforementioned embodiments may still be modified, or some of the technical features may be replaced by equivalents. Such modifications or replacements do not deviate the essence of the corresponding technical solutions from the spirit and scope of the technical solutions of the embodiments of the present application, and should all be included in the protection scope of the present application.
Claims (10)
- 一种基于卷积神经网络模型的图像识别方法,其特征在于,所述卷积神经网络模型基于快速傅里叶变换对待识别图像进行频域全局卷积;An image recognition method based on a convolutional neural network model, characterized in that the convolutional neural network model performs frequency domain global convolution on the image to be recognized based on fast Fourier transform;所述图像识别方法包括:The image recognition method comprises:将待识别图像输入到经训练的所述卷积神经网络模型,通过所述卷积神经网络模型对所述待识别图像依次进行特征提取和识别,得到识别结果。The image to be identified is input into the trained convolutional neural network model, and the convolutional neural network model is used to extract features and identify the image to be identified in turn to obtain a recognition result.
- 如权利要求1所述的图像识别方法,其特征在于,所述卷积神经网络模型包括特征提取模块和识别模块,所述通过所述卷积神经网络模型对所述待识别图像依次进行特征提取和识别,得到识别结果,包括:The image recognition method according to claim 1, characterized in that the convolutional neural network model includes a feature extraction module and a recognition module, and the convolutional neural network model is used to extract features and recognize the image to be recognized in sequence to obtain a recognition result, comprising:通过所述特征提取模块对所述待识别图像进行特征提取;Extracting features of the image to be identified by the feature extraction module;基于所述识别模块对提取到的特征进行识别,得到识别结果。The extracted features are identified based on the identification module to obtain an identification result.
- 如权利要求2所述的图像识别方法,其特征在于,所述特征提取模块包括第一卷积模块和第二卷积模块,所述第二卷积模块基于快速傅里叶变换对所述待识别图像进行频域全局卷积,所述通过所述特征提取模块对所述待识别图像进行特征提取,包括:The image recognition method according to claim 2, characterized in that the feature extraction module includes a first convolution module and a second convolution module, the second convolution module performs frequency domain global convolution on the image to be recognized based on fast Fourier transform, and the feature extraction of the image to be recognized by the feature extraction module includes:基于所述第一卷积模块对所述待识别图像进行局部特征提取,得到局部特征图;Extracting local features of the image to be identified based on the first convolution module to obtain a local feature map;基于所述第二卷积模块采用快速傅里叶变换对所述局部特征图进行频域全局卷积,得到全局特征图。Based on the second convolution module, a fast Fourier transform is used to perform frequency domain global convolution on the local feature map to obtain a global feature map.
- 如权利要求3所述的图像识别方法,其特征在于,所述第二卷积模块包括第一分支和第二分支,所述基于所述第二卷积模块采用快速傅里叶变换对所述局部特征图进行频域全局卷积,得到全局特征图,包括:The image recognition method according to claim 3, characterized in that the second convolution module includes a first branch and a second branch, and the method of performing frequency domain global convolution on the local feature map using fast Fourier transform based on the second convolution module to obtain a global feature map includes:将所述局部特征图沿通道方向拆分,得到第一局部特征图和第二局部特征图,并将所述第一局部特征图和所述第二局部特征图分别输入到所述第一分支和所述第二分支;Splitting the local feature map along the channel direction to obtain a first local feature map and a second local feature map, and inputting the first local feature map and the second local feature map into the first branch and the second branch respectively;所述第一分支和所述第二分支分别采用快速傅里叶变换对输入的局部特征图进行频域全局卷积,得到第一全局特征图和第二全局特征图;The first branch and the second branch respectively perform frequency domain global convolution on the input local feature map using fast Fourier transform to obtain a first global feature map and a second global feature map;将所述第一全局特征图和所述第二全局特征图沿通道方向进行拼接,得到全局特征图。The first global feature map and the second global feature map are concatenated along a channel direction to obtain a global feature map.
- 如权利要求4所述的图像识别方法,其特征在于,所述第一分支和所 述第二分支分别采用快速傅里叶变换对输入的局部特征图进行频域全局卷积,得到第一全局特征图和第二全局特征图,包括:The image recognition method according to claim 4, characterized in that the first branch and the second branch respectively use fast Fourier transform to perform frequency domain global convolution on the input local feature map to obtain a first global feature map and a second global feature map, comprising:所述第一分支基于列维度对所述第一局部特征图和相应的权重矩阵分别进行快速傅里叶变换处理,得到频域的第一局部特征图和权重矩阵;The first branch performs fast Fourier transform processing on the first local feature map and the corresponding weight matrix based on the column dimension to obtain the first local feature map and the weight matrix in the frequency domain;将所述频域的第一局部特征图和权重矩阵逐点相乘,得到第一频域特征图;Multiplying the first local feature map in the frequency domain and the weight matrix point by point to obtain a first frequency domain feature map;对所述频域的第一特征图进行快速傅里叶逆变换处理,得到第一全局特征图;Performing inverse fast Fourier transform processing on the first feature map in the frequency domain to obtain a first global feature map;所述第二分支基于行维度对所述第二局部特征图和相应的权重矩阵分别进行快速傅里叶变换处理,得到频域的第二局部特征图和权重矩阵;The second branch performs fast Fourier transform processing on the second local feature map and the corresponding weight matrix based on the row dimension to obtain a second local feature map and a weight matrix in the frequency domain;将所述频域的第二局部特征图和权重矩阵逐点相乘,得到第二频域特征图;Multiplying the second local feature map in the frequency domain and the weight matrix point by point to obtain a second frequency domain feature map;对所述频域的第二特征图进行快速傅里叶逆变换处理,得到第二全局特征图。Perform inverse fast Fourier transform processing on the second feature map in the frequency domain to obtain a second global feature map.
- 如权利要求4所述的图像识别方法,其特征在于,所述第二卷积模块还包括位置嵌入模块,在所述将所述局部特征图沿通道方向进行拆分之前,还包括:The image recognition method according to claim 4, characterized in that the second convolution module further includes a position embedding module, and before splitting the local feature map along the channel direction, further includes:通过所述位置嵌入模块对所述局部特征图进行特征提取,得到位置特征图,并将所述位置特征图与所述局部特征图相加,得到包含位置特征的局部特征图。The position embedding module is used to perform feature extraction on the local feature map to obtain a position feature map, and the position feature map is added to the local feature map to obtain a local feature map containing position features.
- 一种卷积神经网络模型训练方法,其特征在于,包括:A convolutional neural network model training method, characterized by comprising:获取构建的卷积神经网络模型,并将样本图像输入到所述卷积神经网络进行训练,直至所述卷积神经网络满足预设要求,得到卷积神经网络模型;Acquire the constructed convolutional neural network model, and input the sample image into the convolutional neural network for training until the convolutional neural network meets the preset requirements, thereby obtaining a convolutional neural network model;其中,所述卷积神经网络基于快速傅里叶变换对样本图像进行频域全局卷积。Wherein, the convolutional neural network performs frequency domain global convolution on the sample image based on fast Fourier transform.
- 一种图像识别装置,其特征在于,包括:An image recognition device, characterized in that it comprises:输入模块和经训练的卷积神经网络模型,所述卷积神经网络模型基于快速傅里叶变换对待识别图像进行频域全局卷积;An input module and a trained convolutional neural network model, wherein the convolutional neural network model performs frequency domain global convolution on the image to be identified based on fast Fourier transform;所述输入模块,用于将待识别图像输入到所述卷积神经网络模型;The input module is used to input the image to be recognized into the convolutional neural network model;所述卷积神经网络模型,用于对所述待识别图像依次进行特征提取和识 别,得到识别结果。The convolutional neural network model is used to extract and identify features of the image to be identified in sequence to obtain a recognition result.
- 一种终端设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现如权利要求1至6任一项所述的图像识别方法或如权利要求7所述的卷积神经网络模型训练方法。A terminal device comprises a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein when the processor executes the computer program, the image recognition method as described in any one of claims 1 to 6 or the convolutional neural network model training method as described in claim 7 is implemented.
- 一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至6任一项所述的图像识别方法或如权利要求7所述的卷积神经网络模型训练方法。A computer-readable storage medium storing a computer program, wherein the computer program, when executed by a processor, implements the image recognition method according to any one of claims 1 to 6 or the convolutional neural network model training method according to claim 7.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202211255315.1 | 2022-10-13 | ||
CN202211255315.1A CN115690488A (en) | 2022-10-13 | 2022-10-13 | Image identification method and device based on convolutional neural network model and terminal equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2024077785A1 true WO2024077785A1 (en) | 2024-04-18 |
Family
ID=85065349
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/CN2022/142412 WO2024077785A1 (en) | 2022-10-13 | 2022-12-27 | Image recognition method and apparatus based on convolutional neural network model, and terminal device |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN115690488A (en) |
WO (1) | WO2024077785A1 (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118587706A (en) * | 2024-08-01 | 2024-09-03 | 苏州宝丽迪材料科技股份有限公司 | Fiber color master batch aggregation and dispersion ultrastructural detection method |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116070695B (en) * | 2023-04-03 | 2023-07-18 | 中国科学技术大学 | Training method of image detection model, image detection method and electronic equipment |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150036946A1 (en) * | 2013-07-30 | 2015-02-05 | Hewlett-Packard Indigo B.V. | Metrics to identify image smoothness |
CN113627472A (en) * | 2021-07-05 | 2021-11-09 | 南京邮电大学 | Intelligent garden defoliating pest identification method based on layered deep learning model |
CN113869330A (en) * | 2021-10-12 | 2021-12-31 | 大连智慧渔业科技有限公司 | Underwater fish target detection method and device and storage medium |
CN115100301A (en) * | 2022-07-19 | 2022-09-23 | 重庆七腾科技有限公司 | Image compression sensing method and system based on fast Fourier convolution and convolution filtering flow |
-
2022
- 2022-10-13 CN CN202211255315.1A patent/CN115690488A/en active Pending
- 2022-12-27 WO PCT/CN2022/142412 patent/WO2024077785A1/en unknown
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20150036946A1 (en) * | 2013-07-30 | 2015-02-05 | Hewlett-Packard Indigo B.V. | Metrics to identify image smoothness |
CN113627472A (en) * | 2021-07-05 | 2021-11-09 | 南京邮电大学 | Intelligent garden defoliating pest identification method based on layered deep learning model |
CN113869330A (en) * | 2021-10-12 | 2021-12-31 | 大连智慧渔业科技有限公司 | Underwater fish target detection method and device and storage medium |
CN115100301A (en) * | 2022-07-19 | 2022-09-23 | 重庆七腾科技有限公司 | Image compression sensing method and system based on fast Fourier convolution and convolution filtering flow |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN118587706A (en) * | 2024-08-01 | 2024-09-03 | 苏州宝丽迪材料科技股份有限公司 | Fiber color master batch aggregation and dispersion ultrastructural detection method |
Also Published As
Publication number | Publication date |
---|---|
CN115690488A (en) | 2023-02-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2024077785A1 (en) | Image recognition method and apparatus based on convolutional neural network model, and terminal device | |
WO2024001123A1 (en) | Image recognition method and apparatus based on neural network model, and terminal device | |
US10534957B2 (en) | Eyeball movement analysis method and device, and storage medium | |
US9208608B2 (en) | Systems and methods for feature tracking | |
US8792722B2 (en) | Hand gesture detection | |
US8750573B2 (en) | Hand gesture detection | |
CN110503076B (en) | Video classification method, device, equipment and medium based on artificial intelligence | |
US9996755B2 (en) | Method and image processing apparatus for image-based object feature description | |
WO2024077781A1 (en) | Convolutional neural network model-based image recognition method and apparatus, and terminal device | |
CN114612987B (en) | Expression recognition method and device | |
CN109815823B (en) | Data processing method and related product | |
CN110852311A (en) | Three-dimensional human hand key point positioning method and device | |
CN113158773B (en) | Training method and training device for living body detection model | |
CN107330387B (en) | Pedestrian detection method based on image data | |
WO2022166258A1 (en) | Behavior recognition method and apparatus, terminal device, and computer-readable storage medium | |
CN112507897A (en) | Cross-modal face recognition method, device, equipment and storage medium | |
WO2017156864A1 (en) | Method, apparatus, and device for image recognition, and nonvolatile computer storage medium | |
CN113673584A (en) | Image detection method and related device | |
CN114330565A (en) | Face recognition method and device | |
CN112580480A (en) | Hyperspectral remote sensing image classification method and device | |
Prasad et al. | Mobile plant species classification: a low computational aproach | |
Ma et al. | ApLeafis: an android-based plant leaf identification system | |
Deng et al. | Attention-aware dual-stream network for multimodal face anti-spoofing | |
Chandra et al. | A novel method for cnn training using existing color datasets for classifying hand postures in bayer images | |
CN108764289B (en) | Method and system for classifying UI (user interface) abnormal pictures based on convolutional neural network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 22961963 Country of ref document: EP Kind code of ref document: A1 |