WO2021147366A1 - 一种图像处理方法以及相关设备 - Google Patents

一种图像处理方法以及相关设备 Download PDF

Info

Publication number
WO2021147366A1
WO2021147366A1 PCT/CN2020/118076 CN2020118076W WO2021147366A1 WO 2021147366 A1 WO2021147366 A1 WO 2021147366A1 CN 2020118076 W CN2020118076 W CN 2020118076W WO 2021147366 A1 WO2021147366 A1 WO 2021147366A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
data distribution
processed
training
feature
Prior art date
Application number
PCT/CN2020/118076
Other languages
English (en)
French (fr)
Inventor
魏龙辉
谢凌曦
田奇
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2021147366A1 publication Critical patent/WO2021147366A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/757Matching configurations of points or features
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • This application relates to the field of artificial intelligence, and in particular to an image processing method and related equipment.
  • Artificial Intelligence is a theory, method, technology and application system that uses digital computers or machines controlled by digital computers to simulate, extend and expand human intelligence, perceive the environment, acquire knowledge, and use knowledge to obtain the best results.
  • artificial intelligence is a branch of computer science that attempts to understand the essence of intelligence and produce a new kind of intelligent machine that can react in a similar way to human intelligence.
  • Artificial intelligence is to study the design principles and implementation methods of various intelligent machines, so that the machines have the functions of perception, reasoning and decision-making.
  • the use of artificial intelligence for image processing is a common application of artificial intelligence.
  • the widespread data domain gap problem causes the generalization ability of image feature extraction to be very low.
  • the trained neural network can only be deployed on application data in the same scene as the training data, otherwise its performance is very poor or even unusable.
  • the embodiments of the present application provide an image processing method and related equipment, which use a first data distribution characteristic to perform data distribution alignment on a feature map of an image to be processed.
  • the first data distribution characteristic is based on the same data distribution law as that of the image to be processed.
  • the feature maps of the images in the image collection are obtained by data distribution statistics, so it is ensured that the images processed by the neural network have similar data distributions, and the data distribution of the feature maps of the first image to be processed can be distributed to the sensitivity of the neural network in a large span.
  • the data area is narrowed, which reduces the difficulty of image processing of the neural network, and further improves the feature extraction performance of the neural network in cross-scene.
  • an embodiment of the present application provides an image processing method, which can be used in the image processing field of the artificial intelligence field.
  • the execution device obtains the first image to be processed, and obtains the first data distribution characteristic corresponding to the first image to be processed.
  • the first image to be processed has the same data distribution law as the first image collection;
  • the first data distribution characteristic includes the data distribution characteristic of the feature map corresponding to the image in the first image collection, including the data distribution characteristic corresponding to the image in the first image collection
  • the data distribution characteristics of the feature map in at least one feature dimension, the at least one feature dimension may include color features, texture features, brightness features, and resolution features; further, the first data distribution feature is obtained by comparing with the first image collection
  • the data distribution of the feature map corresponding to the middle image is obtained by statistics; further, the first data distribution characteristic is obtained based on the feature maps of part of the images or all the images in the first image set.
  • the execution device performs feature extraction on the first image to be processed, and according to the first data distribution characteristics, performs data distribution alignment on the first feature map during the feature extraction process.
  • the first feature map is generated during the feature extraction process of the first image to be processed, the first feature map includes the feature map in the at least one feature dimension; the process of aligning the data distribution of the first feature map is The process of drawing the data distribution of the first feature map to the sensitive value area of the nonlinear function is to weaken the first data distribution characteristic carried in the data distribution of the first feature map.
  • the first data distribution characteristic is the data distribution characteristic of the feature map corresponding to the image in the first image set, and the first The images in the image set have the same data distribution law as the first image to be processed.
  • the data of the feature map of the first image to be processed can be distributed to the sensitive data of the neural network in a large span The region is closer, further reducing the difficulty of image processing of the neural network, and further improving the feature extraction performance of the neural network in cross-scene.
  • the method may further include: the execution device acquires a second data distribution characteristic corresponding to the first image to be processed, and the second The data distribution characteristic is the data distribution characteristic of the images in the first image set, and the second data distribution characteristic is obtained by statistically calculating the data distribution of some or all of the images in the first image set. Then, the execution device aligns the data distribution of the first image to be processed according to the second data distribution characteristic.
  • the process of aligning the data distribution of the first image to be processed is the process of drawing the data distribution of the first image to be processed to the sensitive value area of the nonlinear function, and the method is to weaken the data distribution of the first image to be processed.
  • the second data distribution characteristics may be performed.
  • the execution device may perform normalization processing on the first image to be processed according to the second data distribution characteristic, so as to realize the alignment of the data distribution of the first image to be processed.
  • the execution device performs feature extraction on the first to-be-processed image on which the data distribution alignment has been performed.
  • the image processed by the neural network not only the data distribution alignment of the feature map during the feature extraction process, but also the data distribution alignment of the image to be processed before the feature extraction, that is, the image processed by the neural network also has a similar data distribution, and further The similarity between different images across scenes is improved, that is, the difficulty of image processing of the neural network is further reduced, thereby further improving the feature extraction performance of the neural network across scenes.
  • the convolutional neural network because the convolutional neural network generates a feature map of at least one feature dimension during the feature extraction process of an image, the feature map of each feature dimension corresponding to the image in the first image set is generated.
  • the feature map performs data distribution statistics, and both a mean and a variance are obtained.
  • the first data distribution characteristics generated according to the feature maps corresponding to the images in the first image set include at least one mean and at least one variance, and the number of mean and variance The number of dimensions is the same as the feature dimension.
  • the execution device performs feature extraction on the first image to be processed, and performs data distribution alignment on the first feature map during the feature extraction process according to the first data distribution characteristics, which may include: the execution device performs feature extraction on the first image to be processed , And according to at least one mean value and at least one variance, the at least one feature map included in the first feature map is standardized during the feature extraction process.
  • the first feature map includes the feature map of the target feature dimension
  • the execution device obtains the target mean and target variance corresponding to the target feature dimension from the first data distribution characteristics, and places the first image to be processed in the target feature dimension.
  • the feature map is subtracted from the target mean, and then divided by the target variance to obtain the feature map of the target feature dimension after the standardization process.
  • the target feature dimension is any one of the at least one feature dimension.
  • the first image to be processed and the images in the first image set originate from the same target image acquisition device, or the image acquisition time of the first image to be processed and the image in the first image set
  • the image collection moments of the images are all within the same target time period, or the first image to be processed and the images in the first image collection originate from the same image collection location, or the subject and the first image in the first image to be processed
  • the objects in the images included in the collection are of the same object type.
  • the aforementioned image acquisition device includes, but is not limited to, cameras, radars, or other types of image acquisition devices; the aforementioned time period refers to different time periods within a day; the aforementioned image acquisition location can be divided into provinces, cities, or counties.
  • the granularity of the object type of the aforementioned photographing object can be divided into a category, a phylum, an outline, an order, a family, a genus, or a species, etc., which are not limited here.
  • this implementation manner multiple implementation manners for acquiring the first image set with the same distribution law of the first image data to be processed are provided, which expands the application scenarios of the solution and improves the implementation flexibility of the solution.
  • the method before the execution device acquires the first data distribution characteristic corresponding to the first image to be processed, the method further includes: acquiring identification information of the target image acquisition device that acquires the first image to be processed, And obtain the first image set corresponding to the identification information of the target image acquisition device from the at least two image sub-sets included in the second image set.
  • the first image set is one of the at least two image subsets included in the second image set, and the first image subset includes images collected by the target image acquisition device, that is, the first image to be processed and The images in the first image set originate from the same target image acquisition device.
  • the data distribution of the feature map of the image acquired by the same image acquisition device will have the unique style of the image acquisition device, and the source image acquisition device
  • the data distribution alignment of the feature map of the first image to be processed is performed to weaken the feature map of the first image to be processed
  • the unique style of the image acquisition device carried in the image acquisition device is to improve the similarity between the feature maps of the images from different image acquisition devices, so as to reduce the difficulty of feature extraction of the neural network.
  • the method before acquiring the first data distribution characteristic corresponding to the first image to be processed, the method further includes: acquiring the image acquisition time of acquiring the first image to be processed, and collecting from the second image
  • the first image set corresponding to the image acquisition moment of the first image to be processed is acquired from the at least two image subsets included.
  • the first image set is one of the at least two image subsets included in the second image set
  • the first image set includes images acquired within the target time period
  • the image acquisition time of the first image to be processed is at Within the target time period, that is, the image acquisition time of the first image to be processed and the image acquisition time of the images in the first image set are both within the same target time period.
  • the data distribution of the feature maps of the images collected in the same time period will have the unique style of the time period.
  • the time period is used as the classification standard, and the data distribution is based on the first waiting period.
  • the data distribution characteristics of the feature map of the image in the first image set to which the processed image belongs, and the data distribution alignment of the feature map of the first image to be processed, so as to reduce the time period carried in the feature map of the first image to be processed Unique style, that is, to improve the similarity between the feature maps of images from different time periods, so as to reduce the difficulty of feature extraction of the neural network.
  • the execution device performs feature extraction on the first image to be processed, and performs data distribution alignment on the first feature map during the feature extraction process according to the first data distribution characteristics, including: The execution device performs feature extraction on the first image to be processed, and according to the first data distribution characteristic, performs data distribution alignment on the first feature map during the feature extraction process to obtain feature information of the first image to be processed. Then, the execution device matches the first image to be processed with the images in the second image set according to the feature information of the first image to be processed, to obtain a matching result.
  • the first image set is one of the at least two image subsets included in the second image set
  • the matching result includes at least one target image
  • the target image and the first image to be processed include the same subject
  • the matching result It may also include the image collection location and image collection time of each image in the at least one matched image.
  • the feature extraction performance of the convolutional neural network is improved, so that image matching operations can be performed based on more accurate feature information, which is beneficial to improve the accuracy of image matching, that is, the accuracy of the image matching process of the monitoring system .
  • the execution device performs feature extraction on the first image to be processed, and performs data distribution alignment on the first feature map during the feature extraction process according to the first data distribution characteristics, including: The execution device performs feature extraction on the first image to be processed, and according to the first data distribution characteristic, performs data distribution alignment on the first feature map during the feature extraction process to obtain feature information of the first image to be processed. Then, the execution device recognizes the first image to be processed according to the feature information of the first image to be processed, and obtains the description information of the photographed object in the first image to be processed. In this implementation manner, the feature extraction performance of the convolutional neural network is improved, thereby helping to improve the accuracy of image recognition.
  • the method before the execution device matches the first to-be-processed image with the images in the second image set according to the feature information of the first to-be-processed image, the method further includes: the execution device acquires the first image Second, the image to be processed and the third data distribution characteristics.
  • the second image to be processed is any image in the second image subset
  • the third data distribution characteristic is the data distribution characteristic of the feature map corresponding to the image in the third image set
  • the second image to be processed and the third image set The data distribution rules of the images in the middle are the same.
  • the execution device performs feature extraction on the second to-be-processed image, and according to the third data distribution characteristic, performs data distribution alignment on the second feature map during the feature extraction process to obtain feature information of the second to-be-processed image.
  • the second image to be processed is any one of at least one image included in the third image set, and the second feature map is generated during the feature extraction process of the second image to be processed.
  • the execution device repeats the above steps until the characteristic information of each image in the second image set is obtained. Then, the feature information of the first image to be processed is matched with the feature information of each image in the second image set to obtain a matching result.
  • the data distribution alignment operation is not performed according to the data distribution characteristics of the feature maps of all the images in the second image set, but the second image set is divided according to the data distribution law of the image.
  • the data distribution alignment operation is performed based on the data distribution characteristics of the feature maps of the images in the image subsets, which avoids the mutual interference of the data distribution characteristics between different image subsets, which is conducive to the large-span waiting
  • the data distribution of the feature map of the processed image is drawn to the sensitive area of the neural network to improve the feature extraction performance; when the accuracy of the feature information of the image to be processed and the feature information of each image in the second image set is improved, the accuracy is improved The accuracy of the image matching process.
  • a second aspect of the embodiments of the present application provides an image processing method.
  • An execution device obtains a first image to be processed; the execution device obtains a first data distribution characteristic corresponding to the first image to be processed, wherein the first data distribution characteristic includes The data distribution characteristics of the feature maps corresponding to the images in the first image set, the data distribution rules of the first to-be-processed image and the first image set are the same; the execution device inputs the first to-be-processed image and the first data distribution characteristics into the feature extraction network,
  • the first feature map is the feature extraction network's data distribution alignment for the first to-be-processed image. Processed images are generated during feature extraction.
  • the method before the execution device inputs the first image to be processed and the first data distribution characteristics into the feature extraction network, the method further includes: the execution device obtains second data corresponding to the first image to be processed The distribution characteristic, the second data distribution characteristic is the data distribution characteristic of the image in the first image set; the execution device performs data distribution alignment on the first image to be processed according to the second data distribution characteristic.
  • the execution device inputs the first to-be-processed image and the first data distribution characteristics into the feature extraction network, including: the execution device inputs the first to-be-processed image that has executed data distribution alignment into the feature extraction network.
  • the first data distribution characteristic includes a mean value and a variance
  • the mean value and the variance are obtained by performing data distribution statistics on the feature maps corresponding to the images in the first image set.
  • the execution device inputs the first to-be-processed image and the first data distribution characteristics into the feature extraction network, so that the feature extraction network performs feature extraction on the first to-be-processed image according to the first data distribution characteristics to compare the first feature map
  • Performing data distribution alignment includes: the execution device inputs the first to-be-processed image and the first data distribution characteristics into the feature extraction network, so that the feature extraction network performs feature extraction on the first to-be-processed image according to the mean and variance, Standardize the first feature map.
  • the first image to be processed and the images in the first image set originate from the same target image acquisition device, or the image acquisition time of the first image to be processed and the image in the first image set
  • the image collection moments of the images are all within the same target time period, or the first image to be processed and the images in the first image collection originate from the same image collection location, or the subject and the first image in the first image to be processed
  • the objects in the images included in the collection are of the same object type.
  • the method before the execution device acquires the first data distribution characteristic corresponding to the first image to be processed, the method further includes: the execution device acquires a target image acquisition device that acquires the first image to be processed, and Acquire a first image set corresponding to the target image acquisition device from at least two image subsets included in the second image set, where the first image set is one of the at least two image subsets included in the second image set A subset, the first image subset includes images collected by the target image acquisition device.
  • the method before the execution device acquires the first data distribution characteristic corresponding to the first image to be processed, the method further includes: the execution device acquires the image acquisition time of the first image to be processed, and obtains from The first image set corresponding to the image acquisition moment of the first image to be processed is acquired from the at least two image subsets included in the second image set, where the first image set is at least two image subsets included in the second image set A subset of images in the first image set includes images acquired within the target time period, and the image acquisition moment of the first image to be processed is within the target time period.
  • the execution device inputs the first to-be-processed image and the first data distribution characteristics into the feature extraction network, so that the feature extraction network performs feature extraction on the first to-be-processed image
  • the data distribution alignment of the first feature map includes: the execution device inputs the first to-be-processed image and the first data distribution characteristic into the feature extraction network, so that the feature extraction network compares the first image to be processed
  • data distribution alignment is performed on the first feature map according to the first data distribution characteristics to obtain feature information of the first to-be-processed image output by the feature extraction network.
  • the method further includes: the execution device inputs the feature information of the first to-be-processed image into the image matching network, so that the image matching network compares the first to-be-processed image with The images in the second image set are matched to obtain the matching result output by the image matching network, where the feature extraction network and the image matching network are included in the same convolutional neural network, and the first image set is at least two of the second image set One of the image subsets in the image subset, the matching result includes at least one target image, and the target image and the first image to be processed include the same shooting object.
  • the execution device inputs the characteristic information of the first image to be processed into the image recognition network, so that the image recognition network recognizes the first image to be processed, and obtains the description information of the photographed object in the first image to be processed output by the image recognition network, Among them, the feature extraction network and the image recognition network are included in the same convolutional neural network.
  • the method before the execution device inputs the first to-be-processed image and the first data distribution characteristics into the feature extraction network, the method further includes: the execution device obtains the second to-be-processed image and the third data distribution characteristics , wherein the second image to be processed is any image in the second image subset, the third data distribution characteristic is the data distribution characteristic of the feature map corresponding to the image in the third image set, and the second image to be processed and the third image The data distribution law of the images in the collection is the same.
  • the execution device inputs the second to-be-processed image and the third data distribution characteristics into the feature extraction network, so that the feature extraction network performs feature extraction on the second to-be-processed image, according to the third data distribution characteristics, to the second feature map
  • the data distribution alignment is performed to obtain the feature information of the second to-be-processed image, where the second feature map is generated during the feature extraction process of the second to-be-processed image by the feature extraction network.
  • the execution device repeats the above steps until the characteristic information of each image in the second image set is obtained.
  • the execution device inputs the characteristic information of the first image to be processed into the image matching network, so that the image matching network matches the first image to be processed with the images in the second image set to obtain the matching result output by the image matching network, including: execution The device inputs the feature information of the first image to be processed and the feature information of each image in the second image set into the image matching network, so that the image matching network matches the first image to be processed with the images in the second image set to obtain The matching result output by the image matching network.
  • the embodiments of the present application provide an image processing method, which can be used in the image processing field of the artificial intelligence field.
  • the training device obtains at least two training images from the training image collection, the at least two training images include a first training image and a second training image, and the first training image and the second training image include the same shooting object.
  • the training device acquires the data distribution characteristic corresponding to the feature map of the first training image, and the data distribution characteristic corresponding to the feature map of the first training image is the data of the feature map corresponding to the image in the training image subset to which the first training image belongs Distribution characteristics, the data distribution law of the first training image and the training image subset to which the first training image belongs is the same.
  • the training device performs feature extraction on the first training image through the convolutional neural network, and according to the data distribution characteristics corresponding to the feature map of the first training image, performs data distribution alignment on the third feature map during the feature extraction process to obtain the first training image.
  • the training device obtains the data distribution characteristic corresponding to the feature map of the second training image, and the data distribution characteristic corresponding to the feature map of the second training image is the data of the feature map corresponding to the image in the training image subset to which the second training image belongs Distribution characteristics, the second training image has the same data distribution law as the training image subset to which the second training image belongs.
  • the training device performs feature extraction on the second training image through the convolutional neural network, and according to the data distribution characteristics corresponding to the feature map of the second training image, performs data distribution alignment on the fourth feature map during the feature extraction process to obtain the first 2.
  • the training device trains the convolutional neural network through the loss function according to the feature information of the first training image and the feature information of the second training image until the convergence condition is met, and outputs the convolutional neural network that has performed the iterative training operation.
  • the loss function is used to indicate the similarity between the feature information of the first training image and the feature information of the second training image
  • the loss function can be one or more of the following: two-tuple loss function, three-tuple Loss function or quadruple loss function or other loss functions, etc.
  • the convergence condition can be that the convergence condition of the loss function is satisfied, or it can be that the number of iterations reaches a preset number.
  • this implementation provides a specific implementation on the training side when the general capability is image re-recognition, and provides a convolutional neural network that can still maintain good feature extraction capabilities in the cross-scene process, which improves the solution Only the feature extraction skills are trained, which improves the efficiency of the training stage; in addition, in the case of incremental learning used in the training process, the method provided in the embodiment of this application can remove certain features carried in the feature map.
  • the data distribution characteristics of the training image subsets thereby avoiding the over-fitting of the convolutional neural network to a small training data set, and solving the catastrophic forgetting problem of the incremental learning process.
  • the training device can also be used to perform the steps performed by the device in each possible implementation of the first aspect.
  • the training device performs the specific implementation steps of each possible implementation. You can refer to the first aspect and various possible implementations in the first aspect. The description in, I won’t repeat them one by one here.
  • an embodiment of the present application provides an image processing method, which can be used in the image processing field of the artificial intelligence field.
  • the training device obtains a third training image from the training image collection.
  • the third training image is an image in the training image collection.
  • the training image collection also stores the true description information of each image.
  • the training device obtains the data distribution characteristic corresponding to the feature map of the third training image, and the data distribution characteristic corresponding to the feature map of the third training image is the data of the feature map corresponding to the image in the training image subset to which the third training image belongs Distribution characteristics.
  • the training device performs feature extraction on the third training image through the convolutional neural network, and according to the data distribution characteristics corresponding to the feature map of the third training image, performs data distribution alignment on the third feature map during the feature extraction process to obtain the first Three feature information of the training image, where the third feature map is obtained during the feature extraction process of the third training image.
  • the training device performs image recognition according to the feature information of the third training image, and obtains the predicted description information of the shooting object in the third training image.
  • the training device calculates the value of the loss function according to the predicted description information of the subject in the third training image and the actual description information of the subject in the third training image, and backpropagates according to the value of the loss function to adjust the convolutional neural network.
  • the training device repeats the foregoing operations to perform iterative training on the convolutional neural network until the convergence condition is met, and outputs the convolutional neural network that has performed the iterative training operation.
  • it provides a specific implementation on the training side when the general capability is image re-recognition, and provides a convolutional neural network that can still maintain good feature extraction capabilities in the cross-scene process, which improves the solution Only the feature extraction skills are trained, which improves the efficiency of the training stage; in addition, in the case of incremental learning used in the training process, the method provided in the embodiment of this application can remove certain features carried in the feature map.
  • the data distribution characteristics of the training image subsets thereby avoiding the over-fitting of the convolutional neural network to a small training data set, and solving the catastrophic forgetting problem of the incremental learning process.
  • the training device can also be used to perform the steps performed by the device in each possible implementation of the first aspect.
  • the training device performs the specific implementation steps of each possible implementation. You can refer to the first aspect and various possible implementations in the first aspect. The description in, I won’t repeat them one by one here.
  • an embodiment of the present application provides an image processing device that can be used in the image processing field of the artificial intelligence field.
  • the image processing device includes: an acquisition module for acquiring a first image to be processed.
  • the acquiring module is further configured to acquire the first data distribution characteristic corresponding to the first image to be processed, where the first data distribution characteristic includes the data distribution characteristic of the feature map corresponding to the image in the first image set, and the first image to be processed
  • the data distribution law is the same as that of the first image set.
  • the feature extraction module is used to perform feature extraction on the first image to be processed, and according to the first data distribution characteristics, perform data distribution alignment on the first feature map during the feature extraction process, where the first feature map is a comparison of the first feature map.
  • the image to be processed is generated during the feature extraction process.
  • an embodiment of the present application provides an image processing device that can be used in the image processing field of the artificial intelligence field.
  • the image processing device includes: an acquisition module for acquiring at least two training images from a training image set, at least The two training images include a first training image and a second training image, and the first training image and the second training image include the same subject.
  • the acquisition module is also used to acquire the data distribution characteristics corresponding to the feature map of the first training image, and the data distribution characteristics corresponding to the feature map of the first training image are those corresponding to the images in the training image subset to which the first training image belongs
  • the first training image has the same data distribution law as the training image subset to which the first training image belongs.
  • the feature extraction module is used for feature extraction of the first training image through the convolutional neural network, and according to the data distribution characteristics corresponding to the feature map of the first training image, perform data distribution on the third feature map during the feature extraction process Align to obtain the feature information of the first training image, where the third feature map is obtained during the feature extraction process of the first training image.
  • the acquisition module is also used to acquire the data distribution characteristics corresponding to the feature map of the second training image, and the data distribution characteristics corresponding to the feature map of the second training image are those corresponding to the images in the training image subset to which the second training image belongs For the data distribution characteristics of the feature map, the second training image has the same data distribution law as the training image subset to which the second training image belongs.
  • the feature extraction module is also used to perform feature extraction on the second training image through the convolutional neural network, and perform data on the fourth feature map during the feature extraction process according to the data distribution characteristics corresponding to the feature map of the second training image The distribution is aligned to obtain the feature information of the second training image, where the fourth feature map is obtained during the feature extraction process of the second training image.
  • the training module is used to train the convolutional neural network through the loss function according to the feature information of the first training image and the feature information of the second training image, until the convergence condition is met, and output the convolutional neural network that has performed iterative training operations,
  • the loss function is used to indicate the similarity between the feature information of the first training image and the feature information of the second training image.
  • the component modules of the execution device provided by the sixth aspect of this application can also be used to execute the steps executed by the execution device in each possible implementation manner of the third aspect, and the component modules of the execution device execute the sixth aspect and various possibilities of the sixth aspect.
  • the specific implementation steps of the implementation manner reference may be made to the description in the third aspect and various possible implementation manners in the third aspect, which will not be repeated here.
  • an embodiment of the present application provides an image processing device that can be used in the image processing field of the artificial intelligence field.
  • the image processing device includes: an acquisition module for acquiring a fourth training image from a training image set.
  • the training image is an image in the training image set.
  • the acquisition module is also used to acquire the data distribution characteristics corresponding to the feature map of the fourth training image, and the data distribution characteristics corresponding to the feature map of the fourth training image correspond to the images in the training image subset to which the fourth training image belongs The data distribution characteristics of the feature map.
  • the feature extraction module is used for feature extraction of the fourth training image through the convolutional neural network, and according to the data distribution characteristics corresponding to the feature map of the fourth training image, perform data distribution on the fourth feature map during the feature extraction process Align to obtain the feature information of the fourth training image, where the fourth feature map is obtained during the feature extraction process of the fourth training image.
  • the recognition module is used to perform image recognition according to the feature information of the fourth training image to obtain the description information of the shooting object in the fourth training image.
  • the training module is used to train the convolutional neural network through the loss function according to the description information.
  • the component modules of the execution device provided in the seventh aspect of the present application can also be used to execute the steps executed by the execution device in each possible implementation manner of the fourth aspect, and the component modules of the execution device execute the seventh aspect and various possibilities of the seventh aspect.
  • the specific implementation steps of the implementation manner reference may be made to the description in the fourth aspect and various possible implementation manners in the fourth aspect, and details are not repeated here.
  • an embodiment of the present application provides an execution device, including a processor, which is coupled to a memory; the memory is used to store a program; the processor is used to execute the program in the memory, so that the execution device executes the first aspect Or execute the steps executed by the device in each possible implementation manner of the second aspect.
  • an embodiment of the present application provides a training device, including a processor, which is coupled to a memory; the memory is used to store programs; the processor is used to execute the programs in the memory, so that the training device executes the third aspect
  • the steps executed by the device in each possible implementation manner of the fourth aspect are executed, or the training device is made to execute the steps executed by the device in each possible implementation manner of the fourth aspect.
  • an embodiment of the present application provides a computer-readable storage medium in which a computer program is stored, and when it runs on a computer, the computer can execute the first and second aspects above.
  • the image processing method of the third aspect or the fourth aspect is described in detail below.
  • an embodiment of the present application provides a computer program that, when run on a computer, causes the computer to execute the image processing method described in the first, second, third, or fourth aspect.
  • this application provides a chip system that includes a processor for supporting execution devices or training devices to implement the functions involved in the above aspects, for example, sending or processing data involved in the above methods And/or information.
  • the chip system further includes a memory for storing program instructions and data necessary for the execution device or the training device.
  • the chip system can be composed of chips, and can also include chips and other discrete devices.
  • FIG. 1 is a schematic diagram of a structure of an artificial intelligence main frame provided by an embodiment of this application;
  • FIG. 2 is a system architecture diagram of an image processing system provided by an embodiment of the application
  • FIG. 3 is a schematic diagram of a scene of an image processing method provided by an embodiment of the application.
  • FIG. 4 is a schematic diagram of another scene of the image processing method provided by an embodiment of the application.
  • FIG. 5 is a schematic flowchart of an image processing method provided by an embodiment of this application.
  • FIG. 6 is a schematic diagram of data distribution characteristics in an image processing method provided by an embodiment of the application.
  • FIG. 7 is a schematic diagram of data distribution alignment in the image processing method provided by an embodiment of the application.
  • FIG. 8 is a schematic diagram of a convolutional neural network in an image processing method provided by an embodiment of the application.
  • FIG. 9 is a schematic diagram of the data distribution of the feature map in the image processing method provided by the embodiment of the application.
  • FIG. 10 is a schematic flowchart of another image processing method provided by an embodiment of the application.
  • FIG. 11 is a schematic flowchart of another image processing method provided by an embodiment of this application.
  • FIG. 12 is a schematic flowchart of still another image processing method provided by an embodiment of this application.
  • FIG. 13 is a schematic structural diagram of an image processing device provided by an embodiment of the application.
  • FIG. 14 is a schematic diagram of another structure of an image processing apparatus provided by an embodiment of the application.
  • FIG. 15 is a schematic diagram of another structure of an image processing device provided by an embodiment of the application.
  • FIG. 16 is a schematic diagram of still another structure of the image processing device provided by an embodiment of the application.
  • FIG. 17 is a schematic structural diagram of an execution device provided by an embodiment of this application.
  • FIG. 18 is a schematic structural diagram of a training device provided by an embodiment of the present application.
  • FIG. 19 is a schematic diagram of a structure of a chip provided by an embodiment of the application.
  • the embodiments of the present application provide an image processing method and related equipment, which use a first data distribution characteristic to perform data distribution alignment on a feature map of an image to be processed.
  • the first data distribution characteristic is based on the same data distribution law as that of the image to be processed.
  • the feature maps of the images in the image collection are obtained by data distribution statistics, so it is ensured that the images processed by the neural network have similar data distributions, and the data distribution of the feature maps of the first image to be processed can be distributed to the sensitivity of the neural network in a large span.
  • the data area is narrowed, which reduces the difficulty of image processing of the neural network, and further improves the feature extraction performance of the neural network in cross-scene.
  • Figure 1 shows a schematic diagram of the main framework of artificial intelligence.
  • the following section describes the "intelligent information chain” (horizontal axis) and “IT value chain” ( (Vertical axis)
  • the "intelligent information chain” reflects a series of processes from data acquisition to processing. For example, it can be the general process of intelligent information perception, intelligent information representation and formation, intelligent reasoning, intelligent decision-making, intelligent execution and output. In this process, the data has gone through the condensing process of "data-information-knowledge-wisdom”.
  • the "IT value chain” from the underlying infrastructure of human intelligence, information (providing and processing technology realization) to the industrial ecological process of the system, reflects the value that artificial intelligence brings to the information technology industry.
  • the infrastructure provides computing power support for the artificial intelligence system, realizes communication with the outside world, and realizes support through the basic platform.
  • smart chips hardware acceleration chips such as CPU, NPU, GPU, ASIC, FPGA
  • basic platforms include distributed computing frameworks and network related platform guarantees and support, which can include cloud storage and Computing, interconnection network, etc.
  • sensors communicate with the outside to obtain data, and these data are provided to the smart chip in the distributed computing system provided by the basic platform for calculation.
  • the data in the upper layer of the infrastructure is used to represent the data source in the field of artificial intelligence.
  • the data involves graphics, images, voice, and text, as well as the Internet of Things data of traditional devices, including business data of existing systems and sensory data such as force, displacement, liquid level, temperature, and humidity.
  • Data processing usually includes data training, machine learning, deep learning, search, reasoning, decision-making and other methods.
  • machine learning and deep learning can symbolize and formalize data for intelligent information modeling, extraction, preprocessing, and training.
  • Reasoning refers to the process of simulating human intelligent reasoning in a computer or intelligent system, using formal information to conduct machine thinking and solving problems based on reasoning control strategies.
  • the typical function is search and matching.
  • Decision-making refers to the process of making decisions after intelligent information is reasoned, and usually provides functions such as classification, ranking, and prediction.
  • some general capabilities can be formed based on the results of the data processing, such as an algorithm or a general system, for example, translation, text analysis, computer vision processing, speech recognition, image Recognition and so on.
  • Intelligent products and industry applications refer to the products and applications of artificial intelligence systems in various fields. It is an encapsulation of the overall solution of artificial intelligence, productizing intelligent information decision-making and realizing landing applications. Its application fields mainly include: intelligent terminals, intelligent manufacturing, Intelligent transportation, smart home, smart medical, smart security, autonomous driving, unmanned supermarket, etc.
  • the image processing system 200 includes an execution device 210, a training device 220, a database 230, a client device 240, and a data storage system 250, and the execution device 210 Include the calculation module 211.
  • a training image set is stored in the database 230, and the training device 220 generates a target model/rule 201 for processing images, and uses the training image set in the database to train the target model/rule 201 to obtain a mature target model/rule 201.
  • the target model/rule 201 is a convolutional neural network as an example for description.
  • the convolutional neural network obtained by the training device 220 can be applied to different systems or devices, such as mobile phones, tablets, laptops, VR devices, monitoring systems, radar data processing systems, and so on.
  • the execution device 210 can call data, codes, etc. in the data storage system 250, and can also store data, instructions, etc. in the data storage system 250.
  • the data storage system 250 may be placed in the execution device 210, or the data storage system 250 may be an external memory relative to the execution device 210.
  • the calculation module 211 may perform a convolution operation on the image to be processed obtained through the client device 240 through a convolutional neural network, and after extracting the feature map of the image to be processed, perform data distribution alignment on the feature map according to the data distribution characteristics obtained in advance, And generate the feature information of the image to be processed according to the feature map on which the data distribution alignment has been performed.
  • the pre-acquired data distribution characteristics are obtained after data distribution statistics are performed on the feature maps corresponding to the images in the image collection, and the data distribution law of the images to be processed is the same as that of the images in the aforementioned image collection.
  • the execution device 210 and the client device 240 may be independent devices, and the execution device 210 is equipped with an I/O interface 212 to perform data interaction with the client device 240, and the "user" may The image to be processed is input to the I/O interface 212 through the client device 240, and the execution device 210 returns the processing result to the client device 240 through the I/O interface 212, and provides it to the user.
  • the client device 240 is a surveillance video processing device in a surveillance system
  • the client device 240 may be a terminal-side device in the surveillance system.
  • the execution device 210 receives an image to be processed from the client device 240 and responds to the image to be processed.
  • the execution device 210 may be specifically represented as a local device or a remote device.
  • Fig. 2 is only a schematic diagram of the architecture of the image processing system provided by the embodiment of the present invention, and the positional relationship among the devices, devices, modules, etc. shown in the figure does not constitute any limitation.
  • the execution device 210 may be configured in the client device 240.
  • the execution device 210 may be the main processor (Host CPU) of the desktop computer.
  • the execution device 210 can also be a graphics processing unit (GPU) or a neural network processor (NPU) in a desktop computer.
  • the GPU or NPU is mounted to the main processing unit as a coprocessor.
  • the main processor assigns tasks.
  • the execution device 210 may be configured in the training device 220, and the data storage system 250 and the database 230 may be integrated in the same storage device, and the training device 220 is generating mature convolutional nerves. After the network, the mature convolutional neural network is stored in the data storage system 250, so that the computing module 211 can directly call the mature convolutional neural network.
  • the image processing method in the embodiments of this application can be used in fields such as smart security, unmanned supermarkets, or smart terminals (the actual situation is not limited to these four typical application fields), the image processing method is divided into a training phase And application stage. Based on the system architecture described in FIG. 2 above, the implementation of the application phase in the image processing method provided by the embodiment of the present application to multiple application scenarios will be introduced below.
  • the following first takes a re-identification scenario of a surveillance system in the field of intelligent security protection as an example to introduce four implementation modes of the application stage of the image processing method provided in the embodiments of the present application.
  • FIG. 3 is a schematic diagram of an image processing method provided by an embodiment of this application.
  • the monitoring system includes 4 cameras, the execution equipment is arranged on the server, and the server uses the source camera as the classification standard of different image subsets as an example.
  • the server receives and stores the images sent by camera 1, camera 2, camera 3, and camera 4.
  • the images sent by camera 1, camera 2, camera 3, and camera 4 constitute an image collection in the server, and the server may also store The source camera of each image in the image collection, the image collection location corresponding to the source camera, and the image acquisition time.
  • the server uses the source camera as the classification standard for the image sub-collection.
  • the aforementioned image collection can be divided into four image sub-collections, which are the image sub-collection obtained by camera 1, the image sub-collection obtained by camera 2, and the image sub-collection obtained by camera 3.
  • the server may generate data distribution characteristics corresponding to the camera in advance through a mature convolutional neural network. Since the server is integrated with training equipment and execution equipment at the same time, after the training equipment in the server has trained a mature convolutional neural network, the execution equipment in the server can directly obtain the mature convolutional neural network from the storage system.
  • the data distribution characteristics corresponding to the camera include the data distribution characteristics of the image collected by the camera and the data distribution characteristics of the feature map corresponding to the image collected by the camera.
  • the preset number is 500 as an example. .
  • the data distribution characteristics of the feature map corresponding to the image collected by the camera 1 may include data distribution characteristics of one or more feature dimensions, the number of data distribution characteristics and the characteristics extracted from an image by the convolutional neural network.
  • the number of dimensions of the graphs is the same.
  • the feature graph extracted from an image by the convolutional neural network includes three-dimensional feature graphs of color feature, texture feature, and resolution feature as an example.
  • the server can directly count the 500 images collected by camera 1 to obtain the images collected by camera 1.
  • the server can also use a mature convolutional neural network to perform feature extraction on the 500 images collected by camera 1, so as to obtain 1500 feature maps corresponding to the 500 images collected by camera 1.
  • the aforementioned 1500 feature maps include camera 1
  • the 500 images collected have 500 feature maps in the color feature dimension, 500 feature maps in the texture feature dimension, and 500 feature maps in the resolution feature dimension.
  • the server also uses the convolutional neural network to extract features from the distribution of the 500 images collected by camera 1, and the server can count the 500 feature maps in the color feature dimension of the 500 images collected by camera 1 to generate The data distribution characteristics of the feature map corresponding to the image in the color feature; the 500 feature maps in the texture feature dimension of the 500 images collected by the camera 1 are counted to generate the texture feature data in the feature map corresponding to the image collected by the camera 1 Distribution characteristics; the 500 images collected by the camera 1 are counted on the 500 feature maps in the resolution feature dimension to generate the data distribution characteristics of the feature maps corresponding to the images collected by the camera 1 in the resolution feature.
  • Table 1 to show the correspondence between the feature map and the data distribution characteristics in the three feature dimensions.
  • Table 1 shows the corresponding relationship between the feature map and the data distribution characteristics of the feature map in the three feature dimensions of color feature dimension, texture feature dimension, and resolution feature dimension. It should be understood that this example is only used to facilitate the understanding of the solution, and is not used to limit the solution.
  • the server After generating the data distribution characteristics corresponding to the camera 1, the server performs feature extraction on the images in the image subset obtained by the camera 1 to obtain feature information of each image in the image subset obtained by the camera 1. Specifically, for the first image in the image subset acquired by the camera 1, the first image is any image in the image subset acquired by the camera 1, and the server uses the data distribution characteristics of the image acquired by the camera 1 to compare the first image. An image is aligned for data distribution.
  • the server obtains the feature map of the first image in the color feature dimension, and uses the mature convolutional neural network and The data distribution characteristics in the color feature dimension of the feature map corresponding to the image collected by the camera 1 are aligned with the data distribution of the feature map of the first image in the color dimension; after the feature map of the texture dimension of the first image is obtained, the mature The convolutional neural network uses the data distribution characteristics of the feature map corresponding to the image collected by camera 1 in the texture feature dimension to align the data distribution of the feature map of the first image in the texture feature dimension; obtain the resolution dimension of the first image After the feature map, a mature convolutional neural network uses the data distribution characteristics in the resolution feature dimension of the feature map corresponding to the image collected by the camera 1 to align the data distribution of the feature map in the resolution feature dimension of the first image.
  • a mature convolutional neural network is used to generate a feature map on the color feature dimension with data distribution alignment, a feature map on the texture feature dimension with data distribution alignment, and a feature map on the resolution feature dimension with data distribution alignment.
  • the feature information of the first image The server performs the foregoing operations on each image in the image subset obtained by the camera 1, and obtains characteristic information of each image in the image subset obtained by the camera 1.
  • the data distribution characteristics corresponding to camera 3 and the specific generation method of data distribution characteristics corresponding to camera 4 please refer to the specific generation method of data distribution characteristics corresponding to camera 1. 2
  • the specific generation method of the feature information of refer to the specific generation method of the feature information of each image in the image subset obtained by the camera 1, which will not be repeated this time.
  • the user equipment may send a matching request to the server to receive at least one image that matches the image to be processed sent by the server.
  • the at least one matched image and the image to be processed include the same subject, and the matching request carries the image to be processed and the source camera of the image to be processed.
  • the image to be processed is derived from Take camera 1 as an example.
  • the server After receiving the matching request, the server knows according to the matching request that it needs to obtain at least one image matching the image to be processed from the image collection, and the image to be processed is collected by the camera 1.
  • the server obtains the data distribution characteristics corresponding to the camera 1, and performs data distribution alignment on the image to be processed according to the data distribution characteristics of the image collected by the camera 1.
  • the server uses a mature convolutional neural network to perform feature extraction on the image to be processed for which data distribution alignment has been performed. In the process of feature extraction, after obtaining the feature map of the color feature dimension of the image to be processed, it is collected with camera 1.
  • the data distribution characteristics of the feature map corresponding to the image in the color feature are aligned with the feature map of the image to be processed in the color dimension; after the feature map of the image to be processed in the texture feature dimension is obtained, the image corresponding to the image collected by camera 1 is used
  • the data distribution characteristics of the feature map in the texture feature are aligned with the feature map of the image to be processed in the texture dimension; after the feature map of the image to be processed in the resolution feature dimension is obtained, the feature map corresponding to the image collected by camera 1 is used to The data distribution characteristics of the resolution feature are aligned with the feature map of the image to be processed in the resolution dimension; then the feature map of the color feature dimension and the feature map of the texture feature dimension corresponding to the image to be processed and aligned with the data distribution are aligned. And the feature map of the resolution feature dimension to obtain the feature information of the image to be processed.
  • the server After the server obtains the characteristic information of the image to be processed, it will match the characteristic information of the image to be processed with the characteristic information of each image in the image set, so as to obtain the image matching the image to be processed from the image set.
  • At least one image the shooting object of each image in the at least one matched image is the same as the shooting object in the image to be processed, and the matching result is obtained.
  • the matching result includes the at least one matched image, and may also include the image collection location and image collection time of each image in the at least one matched image.
  • the server After obtaining the matching result, the server sends the matching result to the client device, and the client device displays the matching result to the user.
  • a client device can also be connected to one or more cameras, and the client device sends the image collected by the camera to the server.
  • the number of cameras connected to the client devices can be the same or different.
  • the examples of the number of cameras in the surveillance system, the preset number, and the three characteristic dimensions in the above embodiments are only for the convenience of understanding this solution.
  • the number of cameras included in a surveillance system may be more. More or less, the value of the preset number can also be more or less, and the data distribution characteristics of the feature map corresponding to the image collected by a certain camera can also include the types of other dimensions, which will not be done this time. limited.
  • this embodiment is described in conjunction with FIG. 3, taking as an example that the monitoring system includes 4 cameras, and the server uses the image collection time as the classification standard for different image subsets.
  • the servers 1 to 4 After the cameras 1 to 4 capture the video, they obtain images from the video and send the obtained images to the server.
  • the server receives the images sent by camera 1 to camera 4.
  • the images sent by camera 1 to camera 4 constitute the image collection in the server, and the server can also store the source camera of each image in the image collection and corresponding to the source camera.
  • the server uses the time of image collection as the classification standard for the image subset. In this embodiment, the entire image set is divided into two image sub-sets, and the time period from 7 o'clock to 18 o'clock is determined as the first time period, and the images collected in the first time period are regarded as an image sub-set. In the collection, the time period from 19:00 to 6:00 is determined as the second time period, and the images collected in the second time period are taken as another image subset as an example.
  • the server may pre-generate the data distribution characteristics corresponding to the first time period, where the first time period refers to 7 o'clock to 18 o'clock In this time period, the data distribution characteristic corresponding to the first time period includes the data distribution characteristic of the image collected in the first time period and the data distribution characteristic of the feature map corresponding to the image collected in the first time period.
  • the data distribution characteristics of the feature map corresponding to the image collected in the first time period may include data distribution characteristics of one or more feature dimensions, and the data distribution characteristics of the one or more feature dimensions included in this embodiment
  • the data distribution characteristics may be the same or different from the data distribution characteristics of one or more feature dimensions included in the first implementation of the re-identification scene of the monitoring system.
  • a convolutional neural network is used to obtain data from an image.
  • the feature map extracted in includes brightness feature, texture feature and color feature as examples.
  • the server uses the data distribution characteristics corresponding to the first time period to perform data distribution alignment on the second image in the image subset collected in the first time period in the image collection, and performs data distribution through a mature neural network. Perform feature extraction on the aligned second image. In the process of feature extraction on the second image, after obtaining the feature map of the brightness dimension of the second image, it is used by a mature convolutional neural network and collected in the first time period.
  • the data distribution characteristics of the feature map in the brightness feature dimension corresponding to the image of the second image are aligned with the data distribution of the feature map of the second image in the brightness dimension; after the feature map of the texture dimension of the second image is obtained, the mature convolutional neural The network uses the data distribution characteristics in the texture feature dimension of the feature map corresponding to the image collected in the first time period to align the data distribution of the feature map in the texture dimension of the second image; obtain the color dimension feature of the second image After the map, the mature convolutional neural network uses the data distribution characteristics in the color feature dimension of the feature map corresponding to the image collected in the first time period to align the data distribution of the feature map in the color dimension of the second image.
  • the mature convolutional neural network generates the first feature map based on the feature map on the brightness feature dimension with data distribution alignment, the feature map on the texture feature dimension with data distribution alignment, and the feature map on the color feature dimension with data distribution alignment.
  • Feature information of the image The server performs the foregoing operations on each image in the image subset collected in the first time period to obtain feature information of each image in the image subset collected in the first time period.
  • the specific generation method of the data distribution characteristic corresponding to the second time period can refer to the specific generation method of the data distribution characteristic corresponding to the first time period.
  • the characteristic information of each image in the image subset collected in the second time period can be referred to.
  • the specific generation method can refer to the specific generation method of the feature information of each image in the image subset collected in the first time period, which will not be repeated this time.
  • the user equipment When the user equipment needs to re-identify the image to be processed, the user equipment will send a matching request to the server to receive at least one image that matches the image to be processed sent by the server.
  • the at least one matched image and the image to be processed include the same subject, and the matching request carries the image to be processed and the image collection time of the image to be processed.
  • the image to be processed is Take the example collected in the first time period.
  • the server learns according to the matching request that it needs to obtain at least one image matching the image to be processed from the image collection, and the image to be processed is collected in the first time period.
  • the server obtains the data distribution characteristics corresponding to the first time period, and performs data distribution alignment on the image to be processed according to the data distribution characteristics of the images collected in the first time period.
  • the server uses a mature convolutional neural network to perform feature extraction on the image to be processed for which data distribution alignment has been performed.
  • the process of feature extraction after obtaining the feature map of the brightness feature dimension of the image to be processed, it uses the first time
  • the data distribution characteristics of the feature map corresponding to the image collected in the segment are aligned with the feature map of the brightness dimension of the image to be processed; after the feature map of the image to be processed in the texture feature dimension is obtained, use the first time segment acquisition
  • the data distribution characteristics of the feature map corresponding to the image in the texture feature are aligned with the feature map of the image to be processed in the texture dimension; after the feature map of the image to be processed in the color feature dimension is obtained, the image collected in the first time period is used
  • the data distribution characteristics of the corresponding feature map in the color feature are aligned with the data distribution of the feature map in the color dimension of the image to be processed; and then the feature map and texture feature dimension of the brightness feature dimension corresponding to the image to be processed and aligned with the data distribution are aligned.
  • the feature map of and the feature map of the color feature dimension to obtain the feature information of the image to
  • the server After the server obtains the characteristic information of the image to be processed, it will match the characteristic information of the image to be processed with the characteristic information of each image in the image set to obtain the matching result, and then send the matching result to the client device.
  • the device displays the matching result to the user.
  • this embodiment is taken as an example with reference to FIG. 3.
  • the difference between this embodiment and the first and second implementations of the surveillance scene is that the first implementation of the surveillance scene uses the source camera as a classification Standard, the second implementation of the monitoring scene uses the image collection time as the classification standard, and this implementation uses the image collection location as the classification standard.
  • the server After the server receives the images sent by the camera 1 to the camera 4, the images sent by the aforementioned camera 1 to the camera 4 constitute the image collection in the server, and the image collection location is used as the classification criterion for the image sub-collection.
  • the image set composed of images collected by camera 1 to camera 4 is divided into three image sub-sets.
  • the server generates data distribution characteristics corresponding to the image collection location Beijing, and the data distribution characteristics corresponding to the image collection location Beijing include the data distribution characteristics of the images collected in Beijing in the image collection and the data corresponding to the images collected in Beijing
  • the data distribution characteristics of the feature map please refer to the descriptions in the first implementation and the second implementation in the above monitoring scenario.
  • the server Based on the data distribution characteristics of the images collected in Beijing in the image collection, the server performs data distribution alignment on each image collected in Beijing in the image collection.
  • the mature convolutional neural network is used to perform feature extraction on the images that have been aligned with the data distribution.
  • the Data distribution alignment is performed on the feature map of the image set to obtain the feature information of each image collected in Beijing in the image collection.
  • the specific generation method of the data distribution characteristics corresponding to the image collection location Shandong and the specific generation method of the data distribution characteristics corresponding to the image collection location Guangzhou, please refer to the specific generation method of the data distribution characteristics corresponding to the image collection location Beijing In the description.
  • the specific generation method of the feature information of each image collected in Shandong in the image collection and for the specific generation method of the feature information of each image collected in Guangzhou in the image collection, please refer to the image The specific generation method of the feature information of each image collected in Beijing in the collection.
  • the server When the user equipment needs to obtain at least one image that matches the image to be matched, the server receives a matching request, and the matching request carries the image to be matched and the image collection location of the image to be matched. Therefore, the server can use the data distribution characteristics corresponding to the image collection location of the image to be matched to align the data distribution of the image to be matched and the feature map corresponding to the image to be matched, and then obtain the feature information of the image to be matched.
  • the server can use the data distribution characteristics corresponding to the image collection location of the image to be matched to align the data distribution of the image to be matched and the feature map corresponding to the image to be matched, and then obtain the feature information of the image to be matched.
  • the server receives a matching request, and the matching request carries the image to be matched and the image collection location of the image to be matched. Therefore, the server can use the data distribution characteristics corresponding to the image collection location of the image to be matched to align the data distribution of the image to be
  • the server After obtaining the characteristic information of the image to be processed, the server will match the characteristic information of the image to be processed with the characteristic information of each image in the image set to obtain a matching result, and then send the matching result to the client device.
  • the server After obtaining the characteristic information of the image to be processed, the server will match the characteristic information of the image to be processed with the characteristic information of each image in the image set to obtain a matching result, and then send the matching result to the client device.
  • the object type of the photographed object in the image is used as the classification standard.
  • the object type refers to the species type of the object, as an example, for example, the distribution of people, birds, cats, and dogs belongs to different object types.
  • the server After the server obtains the image set composed of the images collected by the camera 1 to the camera 4, the image set may be divided into at least two different image sub-sets according to the object type of the photographed object in the image.
  • the server generates data distribution characteristics corresponding to each image subset, and uses the data distribution characteristics corresponding to each image subset to align the images in the image subsets and the feature maps corresponding to the images to generate an image set.
  • the server After the server receives the matching request and obtains the image to be processed from the matching request, it determines the object type of the object in the image to be processed.
  • the server can use From the data distribution characteristics corresponding to each image subset, the data distribution characteristics corresponding to the image subset composed of images of dogs as the subject are acquired, and then according to the data distribution corresponding to the image subset composed of images of the subjects as dogs For characteristics, data distribution alignment is performed on the image to be processed and the feature map corresponding to the image to be processed to obtain the feature information of the image to be processed.
  • the server matches the feature information of the image to be processed with the feature information of each image in the image set to obtain a matching result, and then sends the matching result to the client device.
  • the image processing method provided in the embodiment of the application is used in the re-identification scene of the monitoring system to improve the feature extraction performance of the convolutional neural network, so that image matching operations can be performed based on more accurate feature information. It is beneficial to improve the accuracy of image matching, that is, to improve the accuracy of the image matching process of the monitoring system.
  • FIG. 4 is a schematic diagram of an image processing method provided by an embodiment of the application.
  • the monitoring system includes 8 cameras, the training equipment is deployed on the server, the execution equipment is deployed on the client device, and the client device uses the source camera as the classification standard for different image subsets as an example.
  • the server After the server has trained the mature convolutional neural network, it can send the matured convolutional neural network to the client device.
  • cameras 1 to 8 capture the video, they send the captured video to the client device in real time.
  • the client device obtains and stores the image corresponding to each camera from the video sent by each camera, that is, the client device is based on camera 1.
  • the images corresponding to the camera 1 to the camera 8 are obtained and stored respectively.
  • the aforementioned images corresponding to cameras 1 to 8 constitute an image set on the client device.
  • the image set includes 12 image sub-sets, which are the image sub-set corresponding to camera 1, the image sub-set corresponding to camera 2, and The image sub-set corresponding to the camera 3,..., The image sub-set corresponding to the camera 7 and the image sub-set corresponding to the camera 8.
  • the client device generates the data distribution characteristics corresponding to each camera through a mature convolutional neural network, and extracts the characteristic information of each image in each image subset.
  • the specific implementation method for the client device to generate the data distribution characteristics corresponding to each camera, and the specific implementation method for the client device to generate the feature information of each image in each image sub-collection is the same as the server in the first implementation of the monitoring scene.
  • the specific implementation of generating the data distribution characteristics corresponding to the camera, and the specific implementation of generating the feature information of each image in each image subset by the server are similar. You can refer to the description of the first implementation of the surveillance scene, here Do not repeat it.
  • the client device When the client device wants to match a certain image to be processed among the images acquired by camera 1 to camera 8, it can determine which of the camera 1 to camera 8 the image to be processed originates from.
  • the image to be processed comes from camera 3 as an example.
  • the client device performs data distribution alignment on the image to be processed according to the data distribution characteristics of the image corresponding to the camera 3.
  • the data distribution characteristics of the feature map of the image corresponding to camera 3 in at least one feature dimension are used to separate the image to be processed.
  • the client device matches the feature information of the image to be processed with the feature information of each image in the image collection to obtain the matching result, and displays the matching result to the user through the display interface.
  • the content of the matching result can refer to the first implementation of the monitoring scene description of.
  • the image processing method provided in the embodiment of this application is used in the pedestrian re-identification scene of the unmanned supermarket, which improves the accuracy of the image matching process, so as to improve the safety of the unmanned supermarket under the supervision of the supermarket. .
  • the client device is a client device configured with an image recognition function.
  • an image recognition function such as a mobile phone configured with a face recognition function
  • the following two implementations are described in detail by taking the client device as a mobile phone as an example.
  • the execution device is configured on the mobile phone, and the source camera is used as the classification standard as an example.
  • the mobile phone Since the mobile phone is equipped with an image recognition function, the mobile phone is configured with a mature convolutional neural network before leaving the factory, and data distribution characteristics corresponding to the camera on the mobile phone.
  • the data distribution characteristics corresponding to the camera on the mobile phone include the data distribution characteristics of the image collected by the camera on the mobile phone, and the data distribution in at least one feature dimension of the feature map corresponding to the image collected by the camera on the mobile phone. characteristic.
  • the technician can collect a preset number of images through the camera on the mobile phone, and use a mature convolutional neural network to perform feature extraction on each image included in the preset number of images.
  • the mobile phone After the mobile phone is sold, when the user collects the image to be processed through the camera of the mobile phone and needs to recognize the image to be processed through the camera, the mobile phone will first according to the data distribution characteristics of the image collected by the camera on the mobile phone, the image to be processed Perform data distribution alignment, and then use a mature convolutional neural network to perform feature extraction on the image to be processed that has been aligned with the data distribution, and based on the data in at least one feature dimension of the feature map corresponding to the image collected by the camera on the mobile phone Distribution characteristics, data distribution alignment is performed on the feature map of at least one feature dimension of the image to be processed, and the feature information of the image to be processed is generated according to the feature map of at least one feature dimension that has been aligned with the data distribution through a mature convolutional neural network, The feature information of the generated image to be processed is used for identification, and the description information of the image to be processed is obtained.
  • the execution device is configured on the mobile phone, and the object type of the photographed object in the image is taken as an example of the distribution standard.
  • This embodiment is similar to the first implementation in the scenario where the image recognition function is configured in the client device.
  • the difference is that the data distribution characteristic configured on the mobile phone in this embodiment is the data distribution characteristic corresponding to at least one object type of the photographed object.
  • the data distribution characteristics include image-level data distribution characteristics and feature map-level data distribution characteristics.
  • the object types may include terrestrial animals, amphibians, marine animals, plants, and non-living creatures.
  • the technician can configure data distribution characteristics corresponding to terrestrial animals and data distribution corresponding to amphibians on the mobile phone before leaving the factory. Characteristics, data distribution characteristics corresponding to marine animals, data distribution characteristics corresponding to plants, and data distribution characteristics corresponding to non-living things.
  • the object category of the subject in the image to be processed is first determined.
  • the object category of the subject is plant Take for example.
  • the mobile phone will acquire the image-level data distribution characteristics included in the data distribution characteristics corresponding to the plants, and perform data distribution alignment on the image to be processed.
  • the mature convolutional neural network is used to extract the features of the image to be processed that has been aligned with the data distribution.
  • the feature is extracted according to the data distribution characteristics of the feature map level included in the data distribution characteristics corresponding to the plants.
  • the feature map in the process performs data distribution alignment, and generates feature information based on the feature map that has been aligned with the data distribution, and then uses the generated feature information of the image to be processed for identification to obtain the description information of the image to be processed.
  • the client device is in the form of a mobile phone as an example for description.
  • the client device may also be a tablet, a notebook computer, a wearable device, or other terminal-side devices.
  • the image processing method provided in the embodiments of the present application is used in a scene where the image recognition function is configured in the client device, which improves the feature extraction performance of the convolutional neural network, thereby helping to improve the accuracy of image recognition.
  • the general capabilities of the convolutional neural network in the image processing method provided by the embodiments of this application mainly include image matching and image recognition.
  • the general capabilities of the product neural network are different in specific implementations in the two cases of image matching and image recognition.
  • the specific implementations of the foregoing two capabilities in the application stage are described below.
  • FIG. 5 is a schematic flowchart of an image processing method provided by an embodiment of this application.
  • the image processing method provided in the embodiment of the present application may include:
  • the execution device generates a data distribution characteristic set.
  • the execution device before performing image matching, the execution device generates a data distribution characteristic set.
  • the data distribution characteristic set includes the data distribution characteristic corresponding to each of the at least two image subsets.
  • the data distribution characteristics corresponding to each image subset may include the data distribution characteristics of the feature maps corresponding to the images in the image subsets and the data distribution characteristics of the images in the image subsets, and the data distribution characteristics of the images in the image subsets.
  • the data distribution characteristics of the feature maps corresponding to the images in the subset may include the data distribution characteristics of the feature maps of at least one feature dimension.
  • the aforementioned one or more feature dimensions include but are not limited to color feature dimensions, texture feature dimensions, resolution feature dimensions, brightness feature dimensions, etc.
  • the data distribution characteristics of the feature map of the aforementioned at least one feature dimension include but Not limited to the data distribution characteristics of the color feature maps corresponding to the images in the image subset, the data distribution characteristics of the texture feature maps corresponding to the images in the image subset, and the data distribution characteristics of the resolution feature maps corresponding to the images in the image subset , The data distribution characteristics of the brightness feature map corresponding to the image in the image subset, and so on.
  • the data distribution characteristic refers to a matrix corresponding to at least one image or a pair with at least one feature. Data distribution statistics are performed on the matrix corresponding to the graph, and the data distribution characteristics are obtained.
  • the overall brightness of the images collected during the time period from 19:00 to 6 o’clock in the monitoring system is low
  • the data distribution characteristics of the image subset formed from the images collected during the time period from 19:00 to 6 o’clock can be:
  • the brightness is low; as another example, for example, some cameras have low resolution, and the data distribution characteristics of the images collected by the camera may be low resolution, etc., which is not limited this time.
  • the data distribution characteristics may also include the mean value and variance of multiple images or multiple feature maps.
  • FIG. 6 is a schematic diagram of the data distribution characteristics in the image processing method provided by the embodiment of the application.
  • Figure 6 shows a schematic diagram of two data distribution characteristics using a two-dimensional coordinate system as an example.
  • the horizontal axis and the vertical axis of the two-dimensional coordinate system in Figure 6 correspond to the two data distribution descriptions of the image. Dimension. It should be understood that the data distribution characteristics can also be displayed through a three-dimensional coordinate graph or other graphics.
  • the classification criteria for different image subsets can be the source of the image acquisition device, that is, the images in different image subsets come from different image acquisition devices; the classification criteria for different image subsets can be the image acquisition time period, and also That is, the images in different image subsets are collected in different time periods; the classification criteria for different image subsets can also be the image collection location, that is, the images in different image subsets are collected at different locations.
  • the classification criteria for different image subsets can also be the object types of the objects in the images, that is, the object types of the objects in the images in different image subsets are different.
  • the aforementioned image acquisition device includes, but is not limited to, cameras, radars, or other types of image acquisition devices; the aforementioned time period refers to different time periods within a day; the aforementioned image acquisition location can be divided into provinces, cities, or counties. Etc.; the granularity of the object type of the aforementioned photographing object can be divided into a category, a phylum, an outline, an order, a family, a genus, or a species, etc., which are not limited here.
  • the execution device stores the second image set, so that the execution device generates data distribution characteristics according to the images in the second image set.
  • the second image set includes at least two image sub-sets.
  • the images collected by cameras 1 to 4 constitute the second image set; as another example, such as a pedestrian re-recognition scene in an unmanned supermarket, the images collected by cameras 1 to 8 The formation of the second image collection, etc. will not be exhaustive this time.
  • the execution device in the server directly receives the image capture device. All the images received from the image acquisition device form the second image set.
  • the execution device in the server directly receives the video sent by the image acquisition device, acquires images from the video received by the image acquisition device, and the images acquired from the video sent by the image acquisition device form the second image set .
  • the image acquisition device is connected to the client device, and the image or video is collected by the image acquisition device and sent to the client device.
  • the client device sends the image to the execution device in the server, and the image sent by the client device is composed of The second collection of images.
  • the execution device is configured in the device on the terminal side, in one implementation, you can refer to the description of the implementation of the pedestrian re-recognition scene in the unmanned supermarket above.
  • the execution device on the terminal side directly receives the video sent by the image capture device and executes The device acquires images from the received video, and the images acquired from the video sent by the image acquisition device form a second image set.
  • the execution device on the terminal side may receive the image sent by the image acquisition device, and the image sent by the image acquisition device constitutes the second image set.
  • the execution device can obtain a preset number of images in a certain image subset according to the preset number.
  • the number of images generates the data distribution characteristics corresponding to the image subset.
  • the value of the preset number can be 50, 100, 200, 300, 400, 500, 600, 700, 800, 900, or other values, etc., which are not limited this time.
  • the execution device since the execution device can acquire new images in real time, that is, the images in the second image set are constantly being updated, the execution device may also generate the data distribution characteristics corresponding to each image subset for the first time. According to the newly acquired images, the data distribution characteristics corresponding to each image subset are updated.
  • the execution device acquires a second to-be-processed image.
  • the execution device obtains the second to-be-processed image from the second image set, and the second to-be-processed image is any image in the second image set.
  • the execution device acquires a fourth data distribution characteristic corresponding to the second image to be processed, where the fourth data distribution characteristic is the data distribution characteristic of the image in the third image set.
  • the execution device after acquiring the second image to be processed, acquires the third image set to which the second image to be processed belongs, and can then acquire the fourth data distribution characteristic corresponding to the image in the third image set , That is, the fourth data distribution characteristic corresponding to the second to-be-processed image is acquired.
  • the third image set is any one of the at least two image subsets included in the second image set.
  • the fourth data distribution characteristic is used to indicate the data distribution characteristic of the image in the third image set in the data distribution characteristic set.
  • the data distribution characteristic of the image collected by the camera 2 in the re-identification scene of the surveillance system and then For example, the data distribution characteristics of the image collected by the camera 5 in the pedestrian re-recognition scene of the unmanned supermarket.
  • the execution device performs data distribution alignment on the second to-be-processed image according to the fourth data distribution characteristic.
  • the execution device will perform data distribution alignment on the second to-be-processed image according to the fourth data distribution characteristic.
  • the process of aligning the data distribution of the second image to be processed refers to the process of drawing the data distribution of the second image to be processed to the sensitive value area of the nonlinear function, and the method is to weaken the data distribution of the second image to be processed.
  • the data distribution characteristics of the images in the third image set carried.
  • the fourth data distribution characteristic includes the mean value corresponding to the image in the third image set and the variance corresponding to the image in the third image set
  • step 504 includes: the execution device according to the image corresponding to the image in the third image set
  • the mean value of and the variance corresponding to the image in the third image set are normalized for the second image to be processed. Specifically, the execution device subtracts the data distribution of the second image to be processed from the mean value corresponding to the image in the third image set, and divides it by the variance corresponding to the image in the third image set, to obtain the data distribution aligned The second image to be processed.
  • the source camera as the classification standard as an example. If the images in the third image set are collected by camera c, the formula for generating the mean value corresponding to the images in the third image set is as follows:
  • ⁇ (c) represents the average value of M images in the images collected by the c-th camera
  • c represents the c-th camera
  • the value of M can be 50, 100, 200, 300, 500, or other values.
  • step 504 includes: the execution device adjusts the color space of the second image to be processed according to the fourth data distribution characteristic, so as to realize the alignment of the data distribution of the second image to be processed.
  • the fourth data distribution characteristic indicates that the brightness of the images in the third image set is too high, then the second to-be-processed image can be converted to hue, saturation and brightness (hue saturation value, HSV) channels, and then the second The brightness of the image to be processed is lowered to achieve the alignment of the data distribution of the second image to be processed.
  • HSV hue saturation and brightness
  • FIG. 7 is a schematic diagram of data distribution alignment in the image processing method provided by the embodiment of the application.
  • Figure 7 shows the data distribution characteristics of the image through a two-dimensional graph as an example.
  • the upper picture in Fig. 7 shows the data distribution characteristics without data distribution alignment, and the lower picture in Fig. 7 shows the data distribution after data distribution alignment has been performed. Characteristic, after the data distribution alignment is performed, the data distribution of the image is pulled to the sensitive value area of the nonlinear function. It should be understood that the example in FIG. 7 is only to facilitate understanding of the solution, and is not used to limit the solution.
  • the data distribution alignment is performed on the feature map during the feature extraction process, but also the data distribution alignment is performed on the image to be processed before the feature extraction is performed, that is, the image processed by the neural network also has a similar data distribution.
  • the similarity between different images across scenes is further improved, that is, the difficulty of image processing of the neural network is further reduced, thereby further improving the feature extraction performance of the neural network across scenes.
  • the execution device acquires a third data distribution characteristic, where the third data distribution characteristic is a data distribution characteristic of a feature map corresponding to an image in the third image set.
  • the execution device before performing feature extraction on the second to-be-processed image, the execution device also obtains the third data distribution characteristic.
  • the third data distribution characteristic is the data distribution characteristic of the feature map corresponding to the image in the third image set to which the second to-be-processed image belongs, and the third data distribution characteristic includes the data distribution characteristic of one or more characteristic dimensions.
  • the one or more feature dimensions include, but are not limited to, color feature dimensions, texture feature dimensions, resolution feature dimensions, brightness feature dimensions, and so on. Take the example of the first implementation in the re-identification scene of the surveillance system.
  • the third image set is the images collected by the camera 3 in the re-identification scene of the surveillance system
  • the third data distribution characteristic includes the images collected with the camera 3
  • the corresponding feature map has the data distribution characteristics of color characteristics, the data distribution characteristics of the texture characteristics of the feature map corresponding to the image collected by the camera 3, and the data distribution characteristics of the resolution characteristics of the feature map corresponding to the image collected by the camera 3.
  • the specific expression form of the data distribution characteristic at the feature map level is similar to the specific expression form of the data distribution characteristic at the image level. You can refer to the example in FIG. 7 and will not be repeated this time.
  • the execution device performs feature extraction on the second image to be processed, and performs data distribution alignment on the second feature map during the feature extraction process according to the third data distribution characteristic, to obtain feature information of the second image to be processed.
  • the execution device after acquiring the third data distribution characteristics, performs feature extraction on the second to-be-processed image through a mature convolutional neural network to obtain the second to-be-processed image in at least one feature dimension.
  • Feature map using the data distribution characteristics of one or more feature dimensions in the third data distribution feature to align the data distribution of the second feature map in each feature dimension of the second to-be-processed image, and then perform the data distribution according to the executed data distribution Align the second feature maps in each feature dimension to generate feature information of the second image to be processed.
  • the second image to be processed is any one of at least one image included in the third image set; the second feature map is generated during the feature extraction process of the second image to be processed, and is combined with the first re-identification scene in the monitoring system.
  • the second feature map is the feature map of the first image in the color feature dimension, the feature map of the first image in the texture feature dimension, or the feature map of the first image in the resolution feature dimension.
  • step 506 please refer to the descriptions of various implementations in the re-identification scene in the above-mentioned monitoring system and the re-identification scene in the unmanned supermarket, which will not be repeated here.
  • Step 506 may include: the execution device performs feature extraction on the first image to be processed, and performs standardization processing on at least one feature map included in the first feature map during the feature extraction process according to at least one mean value and at least one variance.
  • the execution device can obtain the feature map of the target feature dimension in the process of feature extraction through the mature convolutional neural network, and the execution device obtains the target mean and target variance corresponding to the target feature dimension from the third data distribution characteristic.
  • the feature map of the first image to be processed in the target feature dimension is subtracted from the target mean, and then divided by the target variance to obtain the feature map of the target feature dimension after the normalization process.
  • the target feature dimension is any one of the at least one feature dimension.
  • step 506 reference may be made to the description of the data distribution alignment part of the feature map in the foregoing various scenario embodiments, which will not be repeated this time.
  • a specific implementation method for data distribution alignment of the feature map of the image to be processed is provided, which is simple to operate and easy to implement.
  • FIG. 8 is a schematic diagram of a convolutional neural network in an image processing method provided by an embodiment of this application.
  • the classification standard of different image subsets is a camera as an example.
  • the convolutional neural network mentioned in the embodiment of this application includes an input layer, at least one convolutional layer, at least one camera-based batch normalization layer (CBN), and at least one activation function Layer, at least one hidden layer, and one output layer.
  • CBN camera-based batch normalization layer
  • activation function Layer at least one hidden layer
  • the output layer the difference from the current convolutional neural network is that the convolutional neural network in this embodiment uses the batch normalization layer (batch normalization, BN) in the current convolutional neural network.
  • the at least one convolutional layer may include a convolutional layer for extracting texture features of an image, a convolutional layer for extracting color features of an image, a convolutional layer for extracting brightness features of an image, and a convolutional layer for extracting brightness features of an image.
  • the resolution feature of the convolutional layer or the convolutional layer used to extract other types of feature dimensions.
  • At least one CBN includes a CBN used to align the data distribution of the feature map of the image in the texture feature dimension, the CBN used to align the data distribution of the feature map of the image in the color feature dimension, and the CBN used to align the image in the brightness CBN for data distribution alignment of feature maps of feature dimensions, CBN for data distribution alignment of feature maps of the image in the resolution feature dimension, or CBNs for data distribution alignment of feature maps of the image in other types of feature dimensions.
  • step 506 may include: the execution device inputs the second to-be-processed image to the input layer, and the first convolutional layer performs a feature extraction operation to obtain a feature map of the second to-be-processed image in the first feature dimension ,
  • the first camera-based standardization layer performs data distribution alignment on the feature map of the second image to be processed in the first feature dimension according to the data distribution feature of the first feature dimension included in the third data distribution feature, and the first activation function layer Activate the first feature map that has undergone the data distribution alignment operation.
  • the first convolutional layer is any one of the at least one convolutional layer included in the convolutional neural network
  • the first camera-based standardized layer is one of the at least one camera-based standardized layer included in the convolutional neural network. Any standardized layer based on cameras.
  • the execution device repeatedly executes the foregoing operations to perform data distribution alignment on the feature maps of each feature dimension and then activate them, so as to obtain the feature information of the second to-be-processed image.
  • the function of the convolutional neural network is image matching
  • the task of at least one hidden layer is image matching
  • the output layer outputs the image matching result.
  • the convolution layer can include many convolution operators.
  • the convolution operator is also called a kernel. Its role in image processing is equivalent to a filter that extracts specific information from the input image matrix.
  • the convolution operator In essence, it can be a weight matrix. This weight matrix is usually predefined. In the process of convolution on the image, the weight matrix is usually one pixel by pixel (or two pixels by two pixels) along the horizontal direction on the input image. Pixels...It depends on the value of stride) to complete the work of extracting specific features from the image.
  • the size of the weight matrix should be related to the size of the image. It should be noted that the depth dimension of the weight matrix and the depth dimension of the input image are the same.
  • the weight matrix will extend to Enter the entire depth of the image. Therefore, convolution with a single weight matrix will produce a convolution output of a single depth dimension, but in most cases, a single weight matrix is not used, but multiple weight matrices with the same dimension are applied. The output of each weight matrix is stacked to form the depth dimension of the convolutional image. Different weight matrices can be used to extract different features in the image. For example, one weight matrix is used to extract edge information of the image, another weight matrix is used to extract specific colors of the image, and another weight matrix is used to eliminate unwanted noise in the image. Fuzzy... the dimensions of the multiple weight matrices are the same, the dimension of the feature map extracted by the weight matrix of the same dimension is also the same, and then the extracted feature maps of the same dimension are merged to form the output of the convolution operation .
  • weight values in these weight matrices need to be obtained through a lot of training in practical applications.
  • Each weight matrix formed by the weight values obtained through training can extract information from the input image, thereby helping the convolutional neural network to make correct predictions.
  • the initial convolutional layer (for example) often extracts more general features, which can also be called low-level features; with the depth of the convolutional neural network With the deepening of the depth, the features extracted by the subsequent convolutional layers (for example) become more and more complex, such as features such as high-level semantics, and features with higher semantics are more suitable for the problem to be solved.
  • the convolutional neural network After the processing of the convolutional layer, the convolutional neural network is not enough to output the required output information. Because as mentioned earlier, the convolutional layer will only extract features and reduce the parameters brought by the input image. However, in order to generate the final output information (the required class information or other related information), the convolutional neural network needs to use the neural network layer to generate one or a group of required classes of output. Therefore, the neural network layer can include multiple hidden layers and an output layer.
  • the parameters contained in the hidden layers can be pre-trained according to the relevant training data of the specific task type.
  • the task type can include Image recognition, image classification, image super-resolution reconstruction, etc.
  • the task type of the multilayer hidden layer in this embodiment is image matching.
  • the output layer After the multi-layer hidden layer in the neural network layer, that is, the final layer of the entire convolutional neural network is the output layer, the output layer has a loss function similar to the classification cross entropy, which is specifically used to calculate the prediction error.
  • the forward propagation of the neural network (as shown in Figure 8 from the input layer to the output layer is forward propagation) is completed, and the back propagation (as shown in Figure 3, the propagation from the output layer to the input layer is back propagation) will start to update the previous
  • the weight values and deviations of each layer are mentioned to reduce the loss of the convolutional neural network and the error between the output result of the convolutional neural network through the output layer and the ideal result.
  • the convolutional neural network shown in FIG. 8 is only used as an example of a convolutional neural network.
  • the convolutional neural network may also exist in the form of other network models, for example, a convolutional neural network. It can also include a pooling layer and so on.
  • steps 502 to 504 are optional steps. If steps 502 to 504 are performed, the device processing in step 506 is the second to-be-processed image that has undergone data distribution alignment, and this embodiment of the application does not limit the steps.
  • the order of execution between 505 and 506 and steps 502 to 504 may be that steps 502 and 505 are executed simultaneously, then step 504 is executed, and then step 506 is executed; or steps 502 to 504 are executed first, and then steps 505 and 506 are executed. If steps 502 to 504 are not executed, the original second to-be-processed image acquired by the executing device that executes the device processing in step 506.
  • the execution device repeatedly executes steps 502 to 506 to generate feature information of each image in the second image set.
  • the execution device acquires the first to-be-processed image.
  • the execution device acquires the first image to be processed, and may also acquire one or more of the following information: the source image acquisition device of the first image to be processed, the image acquisition time of the first image to be processed , The image collection location of the first image to be processed, the object type of the subject in the first image to be processed, or other information of the first image to be processed, etc.
  • the client device can receive a matching request input by the user, and then send a matching request to the execution device.
  • the execution device can receive a matching request sent by the client device, and the matching request carries the first waiting request.
  • the processed image may also carry one or more of the following information: the source image acquisition device of the first image to be processed, the image acquisition time of the first image to be processed, the image acquisition location of the first image to be processed, or the first Other information about the image to be processed, etc.
  • a client with an image matching function may be configured in the client device, so that the user inputs a matching request through the aforementioned client.
  • the aforementioned client can receive the first image to be processed and the acquisition interface of the related information of the first image to be processed, and the client device can acquire the first image to be processed from a mobile storage device or a storage device in the client device. And related information of the first image to be processed; the client device may also obtain the first image to be processed and related information of the first image to be processed from other devices through the communication network.
  • the execution device may receive a matching request input by the user, and the matching request includes the first to-be-processed image and related information of the first to-be-processed image.
  • the execution device may directly obtain the first image to be processed and related information of the first image to be processed from the image acquisition device.
  • the execution device may obtain the first to-be-processed image and related information about the first to-be-processed image from a mobile storage device, or from another device through a communication network.
  • the execution device acquires a second data distribution characteristic corresponding to the first image to be processed, where the second data distribution characteristic is the data distribution characteristic of the image in the first image set.
  • the execution device may determine the first image set to which the first image to be processed belongs.
  • the first image set is the image set to which the first image to be processed belongs in at least two image sets included in the second image set, the data distribution law of the first image to be processed and the data distribution law of the images in the first image set same.
  • the first image set and the third image set may be the same image set or different image sets.
  • the first image to be processed and the images in the first image set originate from the same target image acquisition device, that is, the classification criteria for different image subsets in the first image set are the source image acquisition device .
  • step 508 includes: the execution device obtains the identification information of the target image acquisition device that collects the first image to be processed according to the matching request, and determines the identification information of the target image acquisition device from the at least two image subsets included in the second image set. The first image collection corresponding to the information.
  • the first image set includes images collected by the target image acquisition device; the identification information of the target image acquisition device is used to uniquely identify the target image acquisition device, which can be specifically expressed as a number number, a character number, or other types of identification information, etc., As an example, for example, the identification information of the target image capture device may be expressed as "000001", "BJ00001" or other identification information. More specifically, the execution device may store a one-to-one mapping relationship between the identification information of the image acquisition device and the image subset, so that the execution device can perform the mapping relationship according to the pre-configured mapping relationship after acquiring the identification information of the target image acquisition device , The first image set corresponding to the identification information of the target image acquisition device is acquired.
  • the data distribution of the feature maps of the images acquired by the same image acquisition device will have the unique style of the image acquisition device, and the source image is collected.
  • the device performs data distribution alignment on the feature map of the first image to be processed according to the data distribution characteristics of the feature map of the image in the first image set to which the first image to be processed belongs to reduce the feature of the first image to be processed
  • the unique style of the image acquisition device carried in the figure is to improve the similarity between the feature maps of the images from different image acquisition devices, so as to reduce the difficulty of feature extraction of the neural network.
  • step 508 includes: according to the matching request, the execution device acquires the image acquisition time of the first image to be processed, and determines from the at least two image subsets included in the second image set that it corresponds to the image acquisition time of the first image to be processed
  • the first image set of the first image set wherein the first image set includes images captured within the target time period, and the image capture moment of the first image to be processed is within the target time period.
  • the data distribution of the feature maps of the images collected in the same time period will have the unique style of the time period, and the time period is used as the classification standard, according to the first
  • the data distribution characteristics of the feature map of the image in the first image set to which the image to be processed belongs, and the data distribution alignment of the feature map of the first image to be processed is performed to reduce a certain period of time carried in the feature map of the first image to be processed
  • the unique style that is, to improve the similarity between the feature maps of images from different time periods, so as to reduce the difficulty of feature extraction of the neural network.
  • the first image to be processed and the images in the first image collection originate from the same image collection location, that is, the classification criteria for different image sub-collections in the first image collection are the image collection locations.
  • step 508 includes: the execution device obtains the target image collection location of the first image to be processed according to the matching request, and determines the first image set corresponding to the target image collection location from the at least two image subsets included in the second image set , Wherein the first image collection includes images collected at the target image collection location.
  • step 508 includes: the execution device obtains the target object type of the photographed object in the first image to be processed according to the matching request, and determines the first image corresponding to the target object type from at least two image subsets included in the second image set A collection, wherein the object type of the shooting object in the images included in the first image collection is the same as the object type of the shooting object in the first image to be processed.
  • the execution device performs data distribution alignment on the first image to be processed according to the second data distribution characteristic.
  • the execution device acquires a first data distribution characteristic corresponding to the first image to be processed, where the first data distribution characteristic includes the data distribution characteristic of the feature map corresponding to the image in the first image set.
  • the execution device performs feature extraction on the first image to be processed, and performs data distribution alignment on the first feature map during the feature extraction process according to the first data distribution characteristics, to obtain feature information of the first image to be processed.
  • the specific implementation manner of the execution device performing steps 509 to 511 is similar to the specific implementation manner of the execution device performing steps 504 to 506, and will not be repeated this time.
  • the execution device matches the first image to be processed with images in the second image set according to the feature information of the first image to be processed.
  • steps 502 to 506 are optional steps. If steps 502 to 506 are performed, the execution device may obtain the characteristic information of the first image to be processed, and then use the convolutional neural network to communicate with the second image. The feature information of each image in the set is matched to obtain the matching result.
  • the matching result includes at least one image, and the subject of each image in the at least one matched image is the same as the subject in the image to be processed; the matching result may also include each of the at least one matched image. Image collection location and image collection time of the image.
  • the data distribution alignment operation is not performed according to the data distribution characteristics of the feature maps of all the images in the second image collection, but the second image collection is performed according to the data distribution law of the images.
  • the data distribution of the feature map of the image to be processed is drawn to the sensitive area of the neural network to improve the feature extraction performance; when the accuracy of the feature information of the image to be processed and the feature information of each image in the second image set are both improved, increase The accuracy of the image matching process is improved.
  • the execution device may perform feature extraction on each image in the second image set by not performing data distribution alignment, so as to obtain feature information of each image in the second image set. Then, the feature information of the first image to be processed is matched with the feature information of each image in the second image set to obtain a matching result.
  • FIG. 9 is a schematic diagram of the data distribution of the feature map in the image processing method provided by the embodiment of the application.
  • the classification standard is used as the source camera, and the data distribution alignment is performed in a standardized manner as an example.
  • the execution device standardizes the texture feature data corresponding to an image captured by the camera 1, standardizes the texture feature data corresponding to an image captured by the camera 2, and standardizes the texture feature data corresponding to an image captured by the camera 3.
  • the texture feature data is standardized, and the data of the three feature maps after the normalization process are obtained respectively, and the data of the three feature maps after the standardization process is calibrated, that is, the three feature maps after the standardization process are about to be processed.
  • the execution device outputs the matching result.
  • the execution device after the execution device generates the matching result, it will output the matching result. If the executing device is a server, the executing device will send the matching result to the client device, and the client device will show the matching result to the user; if the executing device is a terminal device, the executing device can show the matching result to the user through the display interface.
  • FIG. 10 is a schematic flowchart of an image processing method provided by an embodiment of this application.
  • the image processing method provided in the embodiment of the present application may include:
  • the execution device acquires a first image to be processed.
  • the execution device may directly capture and acquire the first image to be processed through the image acquisition device configured on the execution device, or may select an image from the gallery of the execution device as the first image to be processed.
  • some execution devices are equipped with a license plate recognition function, and when the execution device recognizes the license plate, the first image to be processed can be directly collected by a camera integrated on the execution device.
  • the execution device may also obtain the object type of the photographed object in the first to-be-processed image.
  • some mobile phone-shaped execution devices are equipped with a plant species recognition function, and the user will need to first select the category of the subject in the image to be recognized.
  • the aforementioned categories of subjects include but are not limited to plants, cats, dogs, or other categories. Wait.
  • the execution device acquires a second data distribution characteristic corresponding to the first image to be processed, where the second data distribution characteristic is the data distribution characteristic of the image in the first image set.
  • the execution device may be configured with a second data distribution characteristic before leaving the factory, and the second data distribution characteristic is the data distribution characteristic of the images in the first image set.
  • the first to-be-processed image and the images in the first image set originate from the same image acquisition device. Therefore, those skilled in the art may configure the execution device with the second data distribution characteristic and the first data distribution characteristic before the execution device leaves the factory.
  • the first data distribution characteristic is the characteristic map corresponding to the image in the first image set. Data distribution characteristics.
  • the shooting object in the first image to be processed and the shooting object in the images included in the first image set are of the same object type.
  • Those skilled in the art can obtain the data distribution characteristics of the images of at least two object categories, and the data distribution characteristics of the feature maps corresponding to the images of each object category in at least one feature dimension before the device is shipped from the factory. And arrange it on the execution equipment.
  • the data distribution characteristics of the feature map corresponding to the image of the plant in the texture feature dimension the data distribution characteristics of the feature map corresponding to the image of the plant in the color feature dimension, and so on.
  • step 1002 may include: after acquiring the target category of the object in the first to-be-processed image, the execution device selects the second data distribution characteristic corresponding to the target category from the data distribution characteristics of the images of the at least two object categories.
  • the images in the image collection are the target categories.
  • the first image to be processed and the images included in the first image set are collected at the same image collection location.
  • Those skilled in the art can obtain the data distribution characteristics of the images of at least two image collection locations, and the data distribution of the feature maps corresponding to the images of each image collection location in at least one feature dimension before the device is shipped from the factory. Characteristics and arrange them on the execution equipment. As an example, for example, the data distribution characteristic of the feature map corresponding to the image collected in Beijing in the texture feature dimension, the data distribution characteristic of the feature map corresponding to the image collected in Beijing in the color feature dimension, and so on.
  • step 1002 may include: after acquiring the target image collection location of the first image to be processed, the execution device selects the second data distribution characteristic corresponding to the target image collection location from the data distribution characteristics of the images of the at least two image collection locations, The images in the first image collection are collected from the target image collection location.
  • the execution device performs data distribution alignment on the first image to be processed according to the second data distribution characteristic.
  • the execution device acquires a first data distribution characteristic corresponding to the first image to be processed, where the first data distribution characteristic is the data distribution characteristic of the feature map corresponding to the image in the first image set.
  • the execution device performs feature extraction on the first image to be processed, and performs data distribution alignment on the first feature map during the feature extraction process according to the first data distribution characteristics, to obtain feature information of the first image to be processed.
  • the first feature map is generated during the feature extraction process of the first image to be processed.
  • the specific implementation manner of executing steps 1003 to 1005 by the executing device may refer to the specific implementation manner of executing steps 504 to 506 by the executing device, which will not be repeated this time.
  • the execution device recognizes the first image to be processed according to the feature information of the first image to be processed, and obtains description information of the photographed object in the first image to be processed.
  • the execution device recognizes the first image to be processed through the convolutional neural network according to the feature information of the first image to be processed, and obtains the description information of the shooting object in the first image to be processed.
  • the description information of the shooting object may include one or more of the following: the content of the shooting object, the type of the shooting object, and the attributes of the shooting object.
  • the description information can be the license plate number of the subject; as an example, if the subject is a plant, the description information can be a plant species; as an example, if the subject is a person, the description information can be Descriptive information such as the gender and age of the person, etc.
  • the examples here are only for the convenience of understanding this solution, and are not used to limit this solution.
  • the execution device outputs description information.
  • the first data distribution characteristic corresponding to the first image to be processed is acquired, the first image to be processed is extracted, and the characteristic is performed according to the first data distribution characteristic.
  • the generated feature maps are aligned with the data distribution. Since the neural network processes the feature maps after the data distribution alignment has been performed, it is ensured that the images processed by the neural network have similar data distributions to improve different images across scenes.
  • the first data distribution characteristic is the feature map corresponding to the image in the first image set
  • Data distribution characteristics and the images in the first image set have the same data distribution rules as the first to-be-processed image.
  • the data of the feature map of the first to-be-processed image can be broadly scaled. The distribution is closer to the sensitive data area of the neural network, further reducing the difficulty of image processing of the neural network, and further improving the feature extraction performance of the neural network in cross-scene.
  • FIG. 11 is a schematic flowchart of an image processing method provided by an embodiment of this application.
  • the image processing method provided in the embodiment of the present application may include:
  • the training device acquires a training image collection.
  • a training image set may be configured on the training device, and the training image set includes at least two training image sub-sets, and the classification criteria of different training image sub-sets are the same as those in the corresponding embodiment in FIG. 5 , I won’t repeat it this time.
  • the training device is also configured with identification information corresponding to the images in the training image collection one-to-one, and the identification information is used to uniquely identify a photographed object, and specifically may be a number code, a character code, or other identification information.
  • the training device Before the iterative training of the convolutional neural network, the training device initializes the convolutional neural network.
  • the training device obtains at least two training images from the training image collection.
  • the training device obtains at least two training images from the training image collection.
  • the at least two training images include a first training image and a second training image, and the first training image and the second training image include the same subject.
  • the first training image and the second training image may belong to the same training image subset, or they may belong to different image subsets.
  • the at least two second training images further include a third training image, and the third training image and the first training image are different shooting objects.
  • the at least two training images may also include more training images, and the specific number of training images may be determined in combination with the type of the loss function.
  • the training device obtains a data distribution characteristic corresponding to the first training image, and the data distribution characteristic corresponding to the first training image is the data distribution characteristic of the image in the training image subset to which the first training image belongs.
  • the training device determines the training image subset to which the first training image belongs, and then obtains the data distribution characteristics corresponding to the first training image. Specifically, the training device may pre-generate the data distribution characteristics of each training image subset according to the training image set, so that the training device obtains the data distribution characteristics corresponding to the first training image from the data distribution characteristics of all the training image subsets. The training device may also generate data distribution characteristics corresponding to the first training image after determining the training image subset to which the first training image belongs. For the specific method of generating the image-level data distribution characteristics, please refer to the description in the embodiment corresponding to FIG. 5, which will not be repeated this time.
  • the training device performs data distribution alignment on the first training image according to the data distribution characteristics corresponding to the first training image.
  • the specific implementation manner of executing step 1104 by the executing device can refer to the specific implementation manner of executing step 504 by the executing device, which will not be repeated this time.
  • the training device obtains the data distribution characteristic corresponding to the feature map of the first training image, and the data distribution characteristic corresponding to the feature map of the first training image is the feature map corresponding to the image in the training image subset to which the first training image belongs. Data distribution characteristics.
  • the training device determines the training image subset to which the first training image belongs, and obtains the data distribution characteristics of the feature map corresponding to the image in the training image subset to which the first training image belongs. Specifically, after determining the training image subset to which the first training image belongs, the training device uses a convolutional neural network to generate the data distribution characteristics of the feature map corresponding to the image in the training image subset to which the first training image belongs. Please refer to the description in the embodiment corresponding to FIG. 5 for the content of the data distribution feature and the specific generation method, which will not be repeated this time.
  • the training device performs feature extraction on the first training image through the convolutional neural network, and performs data distribution alignment on the third feature map during the feature extraction process according to the data distribution characteristics corresponding to the feature map of the first training image. Obtain the feature information of the first training image.
  • the specific implementation manner of executing step 1106 by the executing device may refer to the specific implementation manner of executing step 506 by the executing device, which will not be repeated this time.
  • steps 1103 and 1104 are optional steps. If steps 1103 and 1104 are performed, the execution device in step 1106 is to perform feature extraction on the first training image that has been aligned with the data distribution; if steps 1103 and 1104 are not performed , The execution device in step 1106 performs feature extraction on the first training image that has not been aligned with the data distribution.
  • the training device acquires a data distribution characteristic corresponding to the second training image, and the data distribution characteristic corresponding to the second training image is the data distribution characteristic of the image in the training image subset to which the second training image belongs.
  • the training device performs data distribution alignment on the second training image according to the data distribution characteristics corresponding to the second training image.
  • the training device obtains the data distribution characteristic corresponding to the feature map of the second training image, and the data distribution characteristic corresponding to the feature map of the second training image is the feature map corresponding to the image in the training image subset to which the second training image belongs. Data distribution characteristics.
  • the training device performs feature extraction on the second training image through the convolutional neural network, and performs data distribution alignment on the fourth feature map during the feature extraction process according to the data distribution characteristics corresponding to the feature map of the second training image. Obtain the feature information of the second training image.
  • the specific implementation manners of the execution device executing steps 1107 to 1110 can refer to the specific implementation manners of the execution device executing steps 1103 to 1106, which will not be repeated this time.
  • steps 1107 and 1108 are optional steps. If steps 1107 and 1108 are executed, the execution device in step 1110 is to perform feature extraction on the first training image that has been aligned with the data distribution; if steps 1107 and 1108 are not executed , The execution device in step 1110 performs feature extraction on the first training image that has not been aligned with the data distribution.
  • the training device acquires a data distribution characteristic corresponding to the third training image, and the data distribution characteristic corresponding to the third training image is the data distribution characteristic of the image in the training image subset to which the third training image belongs.
  • the training device performs data distribution alignment on the third training image according to the data distribution characteristics corresponding to the third training image.
  • the training device acquires the data distribution characteristic corresponding to the feature map of the third training image, and the data distribution characteristic corresponding to the feature map of the third training image is the feature map corresponding to the image in the training image subset to which the third training image belongs Data distribution characteristics.
  • the training device performs feature extraction on the third training image through the convolutional neural network, and performs data distribution alignment on the sixth feature map during the feature extraction process according to the data distribution characteristics corresponding to the feature map of the third training image. Obtain the feature information of the third training image.
  • the specific implementation manner of the execution device performing steps 1111 to 1114 and the description of whether the steps are optional steps can refer to the specific implementation manner of the execution device performing steps 1103 to 1106, which will not be repeated this time.
  • Steps 1102 to 1110 may be executed sequentially, or steps 1111 to 1114 may be executed first, and then steps 1103 to 1110 may be executed. Steps 1103 to 1114 can also be executed alternately.
  • the training device trains the convolutional neural network through the loss function until the convergence condition is met.
  • the loss function includes, but is not limited to, a two-tuple loss function, a three-tuple loss function, a four-tuple loss function, or other loss functions.
  • the convergence condition can be a convergence condition that satisfies the loss function, or it can be that the number of iterations reaches a preset number, or other convergence conditions, etc.
  • the training device calculates and generates the function of the two-tuple loss function according to the feature information of the first training image and the feature information of the second training image. Value, and adjust the parameter value of the convolutional neural network in reverse based on the function value of the two-tuple loss function to complete a training operation.
  • the goal of training is to narrow the feature information of the first training image and the feature information of the second training image. ⁇ similarity.
  • the training device repeatedly executes steps 1102 to 1110 and step 1115 until the convergence condition is met, and a convolutional neural network that has performed iterative training operations is obtained.
  • the training device calculates and generates a triplet based on the feature information of the first training image, the feature information of the second training image, and the feature information of the third training image. Set the function value of the loss function, and adjust the parameter value of the convolutional neural network in reverse based on the function value of the triple loss function to complete a training operation.
  • the training goal is to narrow the feature information of the first training image and the second
  • the training device repeats steps 1102 to 1115 until the convergence condition is met, and a convolutional neural network that has performed iterative training operations is obtained.
  • the training device outputs a convolutional neural network that has performed an iterative training operation.
  • a specific implementation method on the training side when the general capability is image re-recognition is provided, and a convolutional neural network that can still maintain good feature extraction capabilities in the cross-scene process is provided, which improves the performance.
  • the data distribution characteristics of a certain training image subset avoids overfitting the convolutional neural network to a certain small training data set, and solves the catastrophic forgetting problem of the incremental learning process.
  • FIG. 12 is a schematic flowchart of an image processing method provided by an embodiment of this application.
  • the image processing method provided in the embodiment of the present application may include:
  • the training device acquires a training image collection.
  • the training device may be configured with a training image set and real description information corresponding to the images in the training image set; the training image set includes at least two training image subsets, and the content of the description information may be Refer to the description in the corresponding embodiment in FIG. 10 above.
  • the training device initializes the convolutional neural network.
  • the training device obtains a third training image from the training image collection, where the third training image is an image in the training image collection.
  • the training device obtains the data distribution characteristic corresponding to the third training image, and the data distribution characteristic corresponding to the third training image is the data distribution characteristic of the image in the training image subset to which the third training image belongs.
  • the training device performs data distribution alignment on the third training image according to the data distribution characteristics corresponding to the third training image.
  • the training device acquires the data distribution characteristic corresponding to the feature map of the third training image, and the data distribution characteristic corresponding to the feature map of the third training image is the feature map corresponding to the image in the training image subset to which the third training image belongs. Data distribution characteristics.
  • the training device performs feature extraction on the third training image through the convolutional neural network, and performs data distribution alignment on the fifth feature map during the feature extraction process according to the data distribution characteristics corresponding to the feature map of the third training image. Obtain the feature information of the third training image.
  • the specific implementation manner of executing steps 1203 to 1206 by the executing device can refer to the specific implementation manner of executing steps 1103 to 1106 by the executing device, which will not be repeated here.
  • the training device performs image recognition according to the feature information of the third training image, and obtains description information of the shooting object in the third training image.
  • the training device performs image recognition based on the feature information of the third training image through a convolutional neural network, and obtains the description information of the shooting object in the third training image.
  • the training device trains the convolutional neural network through the loss function according to the description information until the convergence condition is met.
  • the training device generates description information of the subject in the third training image (that is, predicted description information) and the description information of the subject in the third training image stored in the training device (that is, Real description information), calculate the value of the loss function, and propagate back according to the value of the loss function to adjust the parameter values of the convolutional neural network, thus completing a training of the convolutional neural network.
  • the loss function in this embodiment may adopt a cross-entropy loss function or other loss functions used to train a convolutional neural network whose general capability is image recognition.
  • the training device repeatedly executes steps 1202 to 1208 until the convergence condition is met, and a convolutional neural network that has performed iterative training operations is obtained.
  • the training device outputs a convolutional neural network that has performed an iterative training operation.
  • a specific implementation method on the training side when the general capability is image recognition is provided, and a convolutional neural network that can still maintain good feature extraction capabilities in the cross-scene process is provided, which improves the solution
  • the completeness of this solution also expands the application scenarios of this solution; in addition, when the training process adopts incremental learning, the method provided by the embodiment of this application can remove the sub-set of a certain training image carried in the feature map. Data distribution characteristics, thus avoiding the over-fitting of the convolutional neural network to a small training data set, and solving the catastrophic forgetting problem of the incremental learning process.
  • the embodiment of the present application also provides a convolutional neural network, which includes an input layer, at least one convolutional layer, at least one normalization layer, at least one activation function layer, and at least one neural network layer.
  • the input layer is used to receive the image to be processed
  • the convolution layer is used to perform a convolution operation based on the received image to be processed to output the feature map of the image to be processed;
  • the standardization layer is used to standardize the feature maps output by the convolutional layer according to the target data distribution characteristics.
  • the target data distribution characteristics include the data distribution characteristics of the feature maps corresponding to the images in the target image set, and the difference between the image to be processed and the target image set
  • the data distribution law is the same;
  • the activation function layer is used to activate the standardized feature map output by the standardization layer
  • the neural network layer is used to match the feature information of the image to be processed output by the activation function layer with the feature information of each image in the image set, and output the matching result.
  • the specific working mode of the above-mentioned convolutional neural network can be referred to the description in the embodiment corresponding to FIG. 5, which will not be repeated this time.
  • the embodiment of the present application also provides another convolutional neural network, which includes an input layer, at least one convolutional layer, at least one normalization layer, at least one activation function layer, and at least one neural network layer.
  • the input layer is used to receive the image to be processed
  • the convolution layer is used to perform a convolution operation based on the received image to be processed to output the feature map of the image to be processed;
  • the standardization layer is used to standardize the feature maps output by the convolutional layer according to the data distribution characteristics.
  • the target data distribution characteristics include the data distribution characteristics of the feature maps corresponding to the images in the target image set, and the data to be processed and the target image set The distribution law is the same;
  • the activation function layer is used to activate the standardized feature map output by the standardization layer
  • the neural network layer is used to perform image recognition according to the feature information of the image to be processed output by the activation function layer, and output description information of the shooting object in the image to be processed.
  • the specific working mode of the above-mentioned convolutional neural network can be referred to the description in the embodiment corresponding to FIG. 10, which will not be repeated this time.
  • Duke to Market refers to training on the public data set Duke and application on the public data set Market, that is, training data and application data are different.
  • Rank-1, rank-5, and rank-10 are three accuracy indicators respectively, and mean average precision (mAP) is an indicator of detection accuracy.
  • the Pedestrian Transfer Generative Adversarial Network (PTGAN) and the Hetero-homogeneous Learning (HHL) are distributed into two neural networks.
  • the general capability is image re-recognition, which can also be called image matching. Since the convolutional neural network used in the embodiments of this application replaces the standardized modules in the existing convolutional neural network, the sample memory convolutional neural network (ECN) is used as the basic network here.
  • the standardization layer of ECN is replaced with the standardization layer in the embodiment of this application for testing. It can be clearly seen from the foregoing Table 2 that the accuracy and precision of the cross-scene image re-recognition task in the cross-scene image re-recognition task in the cross-scene image re-recognition task of the embodiment of the application is greatly improved.
  • Market to Duke refers to the use of the public data set Market and the public data set Duke for incremental learning.
  • resnet50 refers to a typical convolutional neural network
  • Ours+resnet50 refers to replacing the batch normalization layer in resnet50 with a camera-based batch normalization layer.
  • 92.5% refers to the rank-1 accuracy obtained by incremental training resnet50 with the public data set Market and the public data set Duke, and the rank-1 accuracy obtained by training resnet50 with the public data set Market all the time The ratio of is 92.5%. It can be seen from Table 3 that the use of the image processing method provided by the embodiment of the present application slows down the performance degradation of the incremental learning process.
  • FIG. 13 is a schematic structural diagram of an image processing apparatus according to an embodiment of the application.
  • the image processing apparatus 1300 includes:
  • the obtaining module 1301 is used to obtain the first image to be processed
  • the acquiring module 1301 is further configured to acquire a first data distribution characteristic corresponding to the first image to be processed, where the first data distribution characteristic includes the data distribution characteristic of the feature map corresponding to the image in the first image set, and the first to-be-processed image
  • the data distribution law of the image and the first image set is the same;
  • the feature extraction module 1302 is configured to perform feature extraction on the first image to be processed, and according to the first data distribution characteristics, perform data distribution alignment on the first feature map during the feature extraction process, where the first feature map is a pair of A to-be-processed image is generated during feature extraction.
  • the acquiring module 1301 acquires the first data distribution characteristics corresponding to the first to-be-processed image
  • the feature extraction module 1302 performs feature extraction on the first to-be-processed image, according to the first Data distribution characteristics, in the process of feature extraction, the generated feature maps are aligned with the data distribution. Since the neural network processes the feature maps after the data distribution alignment has been performed, it is ensured that the images processed by the neural network have similar data distributions.
  • the first data distribution characteristic is the same as the first image collection
  • the data distribution characteristic of the feature map corresponding to the middle image, and the image in the first image set has the same data distribution law as the first image to be processed.
  • Using the first data distribution characteristic to align the data distribution can be used to align the first to-be-processed image in a large span.
  • the data distribution of the feature map of the processed image is drawn closer to the sensitive data area of the neural network, further reducing the difficulty of image processing of the neural network, and further improving the feature extraction performance of the neural network in cross-scene.
  • FIG. 14 is a schematic structural diagram of an image processing apparatus provided by an embodiment of the application.
  • the acquisition module 1301 is also used to acquire second data corresponding to the first image to be processed.
  • Distribution characteristics, the second data distribution characteristics are the data distribution characteristics of the images in the first image set,
  • the device 1300 further includes: a data distribution alignment module 1303, configured to perform data distribution alignment on the first image to be processed according to the second data distribution characteristic;
  • the feature extraction module 1302 is specifically configured to perform feature extraction on the first image to be processed for which data distribution alignment has been performed.
  • the data distribution alignment module 1303 also performs data distribution alignment on the image to be processed, that is, neural network processing.
  • the images also have similar data distributions, which further improves the similarity between different images across scenes, which further reduces the difficulty of image processing of the neural network, thereby further improving the feature extraction performance of the neural network across scenes.
  • the first data distribution characteristics include a mean value and a variance
  • the mean value and the variance are obtained by performing data distribution statistics on the feature maps corresponding to the images in the first image set
  • the feature extraction module 1302 is specifically configured to perform feature extraction on the first image to be processed, and perform standardization processing on the feature maps included in the first feature map in the process of feature extraction according to the mean value and variance.
  • a specific implementation method for data distribution alignment of the feature map of the image to be processed is provided, which is simple to operate and easy to implement.
  • the first image to be processed and the images in the first image set originate from the same target image acquisition device, or the image acquisition time of the first image to be processed and the image acquisition of the images in the first image set All times are within the same target time period, or the first image to be processed and the images in the first image set originate from the same image collection location, or the subject in the first image to be processed and the images included in the first image set The subjects in are of the same object type.
  • the acquisition module 1301 is also used to acquire the identification information of the target image acquisition device that acquires the first image to be processed, and acquires the target image from at least two image subsets included in the second image set.
  • the first image set corresponding to the identification information of the acquisition device, where the first image set is one of the at least two image subsets included in the second image set, and the first image subset includes the acquisition by the target image acquisition device To the image.
  • the data distribution of the feature maps of the images captured by the same image capture device will have the unique style of the image capture device, and the acquisition module 1301 is based on The source image acquisition device is used as a classification standard.
  • the source image acquisition device is used as a classification standard.
  • data distribution alignment is performed on the feature map of the first to-be-processed image to weaken the first to-be-processed image.
  • the unique style of the image acquisition device carried in the feature map of the image is to improve the similarity between the feature maps of the images from different image acquisition devices, so as to reduce the difficulty of feature extraction of the neural network.
  • the acquiring module 1301 is also used to acquire the image acquisition moment of acquiring the first image to be processed, and to acquire the difference between the first image to be processed from at least two image subsets included in the second image set
  • the image acquisition moment of the first image to be processed is within the target time period.
  • the data distribution of the feature maps of the images collected in the same time period will have the unique style of the time period, and the acquisition module 1301 uses the time period as the classification standard.
  • the acquisition module 1301 uses the time period as the classification standard.
  • data distribution alignment is performed on the feature maps of the first to-be-processed image, so as to reduce some of the features carried in the feature map of the first to-be-processed image.
  • the unique style of a period of time that is, to improve the similarity between the feature maps of images from different periods of time, so as to reduce the difficulty of feature extraction of the neural network.
  • the feature extraction module 1302 is specifically used to extract features of the first image to be processed, and according to the first data distribution characteristics, the first feature map is extracted during the feature extraction process. Perform data distribution alignment to obtain feature information of the first image to be processed;
  • the device 1300 further includes: a matching module 1304, configured to match the first image to be processed with images in the second image set according to the feature information of the first image to be processed to obtain a matching result, wherein the first image set is the first image set.
  • a matching module 1304 configured to match the first image to be processed with images in the second image set according to the feature information of the first image to be processed to obtain a matching result, wherein the first image set is the first image set.
  • One of the at least two image subsets included in the image set, the matching result includes at least one target image, and the target image and the first image to be processed include the same subject; or,
  • the device 1300 further includes: an identification module 1305, configured to identify the first image to be processed according to the feature information of the first image to be processed, and obtain description information of the shooting object in the first image to be processed.
  • the image processing method provided in the embodiment of this application is applied to image matching, which improves the feature extraction performance of the convolutional neural network, so that image matching operations can be performed based on more accurate feature information, which is beneficial to improve the image
  • the accuracy of matching that is, to improve the accuracy of the image matching process of the monitoring system; applying the image processing method provided in the embodiments of this application to image recognition improves the feature extraction performance of the convolutional neural network, thereby helping to improve the image Accuracy of recognition.
  • the acquisition module 1301 is also used to acquire the second to-be-processed image and the third data distribution characteristics, where the second to-be-processed image is any image in the second image subset, and the third data distribution
  • the characteristic is the data distribution characteristic of the feature map corresponding to the image in the third image set, and the data distribution law of the second image to be processed is the same as that of the image in the third image set;
  • the feature extraction module 1302 is also used to perform feature extraction on the second image to be processed, and according to the third data distribution characteristics, perform data distribution alignment on the second feature map during the feature extraction process to obtain the features of the second image to be processed Information, where the second image to be processed is any one of at least one image included in the third image set, and the second feature map is generated during feature extraction of the second image to be processed;
  • the matching module 1304 is specifically configured to match the feature information of the first image to be processed with the feature information of each image in the second image set to obtain a matching result.
  • the feature extraction module 1302 does not perform the data distribution alignment operation according to the data distribution characteristics of the feature maps of all the images in the second image set, but according to the data distribution law of the image,
  • the second image set is divided into at least two image subsets, and the data distribution alignment operation is performed based on the data distribution characteristics of the feature maps of the images in the image subsets, which avoids the mutual interference of the data distribution characteristics between different image subsets, which is beneficial to
  • the data distribution of the feature map of the image to be processed is drawn to the sensitive area of the neural network to improve the feature extraction performance; the accuracy of the feature information of the image to be processed and the feature information of each image in the second image set are improved In this case, the accuracy of the image matching process is improved.
  • FIG. 15 is a schematic structural diagram of an image processing device provided by an embodiment of the application.
  • the image processing device 1500 includes:
  • the obtaining module 1501 is configured to obtain at least two training images from a training image set, the at least two training images include a first training image and a second training image, and the first training image and the second training image include the same shooting object;
  • the obtaining module 1501 is also used to obtain the data distribution characteristic corresponding to the feature map of the first training image, and the data distribution characteristic corresponding to the feature map of the first training image corresponds to the image in the training image subset to which the first training image belongs.
  • the data distribution characteristics of the feature map of, the first training image has the same data distribution law as the training image subset to which the first training image belongs;
  • the feature extraction module 1502 is used to perform feature extraction on the first training image through a convolutional neural network, and perform data on the third feature map during the feature extraction process according to the data distribution characteristics corresponding to the feature map of the first training image Align the distributions to obtain the feature information of the first training image, where the third feature map is obtained during the feature extraction process of the first training image;
  • the obtaining module 1501 is also used to obtain the data distribution characteristic corresponding to the feature map of the second training image, and the data distribution characteristic corresponding to the feature map of the second training image corresponds to the image in the training image subset to which the second training image belongs.
  • the data distribution characteristics of the feature map of, the second training image has the same data distribution law as the training image subset to which the second training image belongs;
  • the feature extraction module 1502 is also used to perform feature extraction on the second training image through the convolutional neural network, and perform feature extraction on the fourth feature map according to the data distribution characteristics corresponding to the feature map of the second training image.
  • the data distribution is aligned to obtain the feature information of the second training image, where the fourth feature map is obtained during the feature extraction process of the second training image;
  • the training module 1503 is used to train the convolutional neural network through the loss function according to the feature information of the first training image and the feature information of the second training image, until the convergence condition is met, and output the convolutional neural network that has performed the iterative training operation , Where the loss function is used to indicate the similarity between the feature information of the first training image and the feature information of the second training image.
  • a specific implementation method on the training side when the general capability is image re-recognition is provided, and a convolutional neural network that can still maintain good feature extraction capabilities in the cross-scene process is provided, which improves the performance.
  • the data distribution characteristics of a certain training image subset avoids overfitting the convolutional neural network to a certain small training data set, and solves the catastrophic forgetting problem of the incremental learning process.
  • FIG. 16 is a schematic structural diagram of an image processing device provided by an embodiment of the application.
  • the image processing device 1600 includes:
  • the obtaining module 1601 is configured to obtain a third training image from the training image collection, where the third training image is an image in the training image collection;
  • the obtaining module 1601 is also used to obtain the data distribution characteristic corresponding to the feature map of the third training image, and the data distribution characteristic corresponding to the feature map of the third training image corresponds to the image in the training image subset to which the third training image belongs The data distribution characteristics of the characteristic map;
  • the feature extraction module 1602 is used to perform feature extraction on the third training image through the convolutional neural network, and perform data on the third feature map during the feature extraction process according to the data distribution characteristics corresponding to the feature map of the third training image Align the distributions to obtain the feature information of the third training image, where the third feature map is obtained during the feature extraction process of the third training image;
  • the recognition module 1603 is configured to perform image recognition according to the feature information of the third training image to obtain description information of the shooting object in the third training image;
  • the training module 1604 is used to train the convolutional neural network through the loss function according to the description information.
  • a specific implementation method on the training side when the general capability is image recognition is provided, and a convolutional neural network that can still maintain good feature extraction capabilities in the cross-scene process is provided, which improves the solution
  • the completeness of this solution also expands the application scenarios of this solution; in addition, when the training process adopts incremental learning, the method provided by the embodiment of this application can remove the sub-set of a certain training image carried in the feature map. Data distribution characteristics, thus avoiding the over-fitting of the convolutional neural network to a small training data set, and solving the catastrophic forgetting problem of the incremental learning process.
  • FIG. 17 is a schematic structural diagram of an execution device provided by an embodiment of this application.
  • the image processing apparatus 1300 described in the embodiment corresponding to FIG. 13 or FIG. 14 may be deployed on the execution device 1700 to implement the function of the execution device in the embodiment corresponding to FIG. 3 or FIG. 10.
  • the execution device 1700 includes: a receiver 1701, a transmitter 1702, a processor 1703, and a memory 1704 (the number of processors 1703 in the execution device 1700 may be one or more, and one processor is taken as an example in FIG. 17) , Where the processor 1703 may include an application processor 17031 and a communication processor 17032.
  • the receiver 1701, the transmitter 1702, the processor 1703, and the memory 1704 may be connected by a bus or other methods.
  • the memory 1704 may include a read-only memory and a random access memory, and provides instructions and data to the processor 1703. A part of the memory 1704 may also include a non-volatile random access memory (NVRAM).
  • NVRAM non-volatile random access memory
  • the memory 1704 stores a processor and operating instructions, executable modules or data structures, or a subset of them, or an extended set of them.
  • the operating instructions may include various operating instructions for implementing various operations.
  • the processor 1703 controls the operation of the execution device.
  • the various components of the execution device are coupled together through a bus system, where the bus system may include a power bus, a control bus, and a status signal bus in addition to a data bus.
  • bus system may include a power bus, a control bus, and a status signal bus in addition to a data bus.
  • various buses are referred to as bus systems in the figure.
  • the method disclosed in the foregoing embodiment of the present application may be applied to the processor 1703 or implemented by the processor 1703.
  • the processor 1703 may be an integrated circuit chip with signal processing capabilities. In the implementation process, the steps of the foregoing method can be completed by an integrated logic circuit of hardware in the processor 1703 or instructions in the form of software.
  • the aforementioned processor 1703 may be a general-purpose processor, a digital signal processing (digital signal processing, DSP), a microprocessor or a microcontroller, and may further include an application specific integrated circuit (ASIC), field programmable Field-programmable gate array (FPGA) or other programmable logic devices, discrete gates or transistor logic devices, discrete hardware components.
  • DSP digital signal processing
  • FPGA field programmable Field-programmable gate array
  • the processor 1703 can implement or execute the methods, steps, and logical block diagrams disclosed in the embodiments of the present application.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the steps of the method disclosed in the embodiments of the present application may be directly embodied as being executed and completed by a hardware decoding processor, or executed and completed by a combination of hardware and software modules in the decoding processor.
  • the software module can be located in a mature storage medium in the field, such as random access memory, flash memory, read-only memory, programmable read-only memory, or electrically erasable programmable memory, registers.
  • the storage medium is located in the memory 1704, and the processor 1703 reads the information in the memory 1704, and completes the steps of the foregoing method in combination with its hardware.
  • the receiver 1701 can be used to receive input digital or character information, and generate signal input related to the relevant settings and function control of the execution device.
  • the transmitter 1702 can be used to output digital or character information through the first interface; the transmitter 1702 can also be used to send instructions to the disk group through the first interface to modify the data in the disk group; the transmitter 1702 can also include display devices such as a display .
  • the application processor 17031 is configured to execute the image processing method executed by the execution device in the embodiment corresponding to FIG. 3 to FIG. 10. Specifically, the application processor 17031 is configured to execute the following steps:
  • the first data distribution characteristic includes the data distribution characteristic of the feature map corresponding to the image in the first image set, and the difference between the first image to be processed and the first image set
  • the data distribution law is the same;
  • the application processor 17031 is also used to execute other steps executed by the execution device in the embodiment corresponding to FIG. 3 to FIG. 10, and the specific manner in which the application processor 17031 executes each of the above steps is similar to that in FIG. 3 to FIG.
  • Each method embodiment corresponding to 10 is based on the same concept, and the technical effect brought by it is the same as each method embodiment corresponding to FIG. 3 to FIG. 10 in this application.
  • I won’t repeat it here.
  • FIG. 18 is a schematic structural diagram of a training device provided in an embodiment of the present application.
  • the training device 1800 may be deployed with the image described in the embodiment corresponding to FIG. 15
  • the processing device 1500 is used to implement the function of the training device in the embodiment corresponding to FIG. 11; or, the training device 1800 may be deployed with the image processing device 1600 described in the embodiment corresponding to FIG. Function of training equipment.
  • the training device 1800 is implemented by one or more servers.
  • the training device 1800 may have relatively large differences due to different configurations or performance, and may include one or more central processing units (CPU) 1822 (for example, One or more processors) and memory 1832, one or more storage media 1830 (for example, one or more storage devices) that store application programs 1842 or data 1844.
  • the memory 1832 and the storage medium 1830 may be short-term storage or persistent storage.
  • the program stored in the storage medium 1830 may include one or more modules (not shown in the figure), and each module may include a series of instruction operations on the training device.
  • the central processing unit 1822 may be configured to communicate with the storage medium 1830, and execute a series of instruction operations in the storage medium 1830 on the training device 1800.
  • the training device 1800 may also include one or more power supplies 1826, one or more wired or wireless network interfaces 1850, one or more input and output interfaces 1858, and/or one or more operating systems 1841, such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
  • operating systems 1841 such as Windows ServerTM, Mac OS XTM, UnixTM, LinuxTM, FreeBSDTM, etc.
  • the central processing unit 1822 is configured to execute the image processing method executed by the training device in the embodiment corresponding to FIG. 11. Specifically, the central processing unit 1822 is configured to execute the following steps:
  • the at least two training images include a first training image and a second training image, and the first training image and the second training image include the same shooting object;
  • the data distribution characteristics corresponding to the feature map of the first training image are the data distribution characteristics of the feature map corresponding to the image in the training image subset to which the first training image belongs ,
  • the first training image has the same data distribution law as the training image subset to which the first training image belongs;
  • the data distribution characteristics corresponding to the feature map of the second training image are the data distribution characteristics of the feature map corresponding to the image in the training image subset to which the second training image belongs ,
  • the second training image has the same data distribution law as the training image subset to which the second training image belongs;
  • the convolutional neural network is trained through the loss function until the convergence condition is met, and the convolutional neural network that has performed iterative training operations is output.
  • the loss function is To indicate the similarity between the feature information of the first training image and the feature information of the second training image.
  • the central processing unit 1822 is also used to execute other steps performed by the training device in the embodiment corresponding to FIG. 11, and the specific manner in which the central processing unit 1822 executes each of the above steps corresponds to the implementation of each method corresponding to FIG. 11 in this application.
  • the examples are based on the same concept, and the technical effects brought by them are the same as those of the method embodiments corresponding to FIG. 11 in this application.
  • the central processing unit 1822 is configured to execute the image processing method executed by the training device in the embodiment corresponding to FIG. 12. Specifically, the central processing unit 1822 is configured to execute the following steps:
  • the data distribution characteristics corresponding to the feature map of the third training image are the data distribution characteristics of the feature map corresponding to the image in the training image subset to which the third training image belongs ;
  • the convolutional neural network is trained through the loss function.
  • the central processing unit 1822 is also used to execute other steps performed by the training device in the embodiment corresponding to FIG. 12, and the specific manner in which the central processing unit 1822 executes each of the above steps corresponds to the implementation of each method corresponding to FIG. 12 in this application.
  • the examples are based on the same concept, and the technical effects brought by them are the same as those of the method embodiments corresponding to FIG. 12 in the present application.
  • the embodiment of the present application also provides a product including a computer program, which when it runs on a computer, causes the computer to execute the steps performed by the execution device in the method described in the embodiments shown in FIGS. 3 to 10, or, The computer executes the steps performed by the training device in the method described in the embodiment shown in FIG. 11, or causes the computer to execute the steps performed by the training device in the method described in the embodiment shown in FIG. 12.
  • the embodiments of the present application also provide a computer-readable storage medium, the computer-readable storage medium stores a program for signal processing, and when it runs on a computer, the computer can execute as shown in FIGS. 3 to 10 above.
  • the steps performed by the device are executed, or the computer is caused to execute the steps performed by the training device in the method described in the embodiment shown in FIG. 11, or the computer is caused to execute the steps shown in the foregoing FIG. 12 The steps performed by the training device in the method described in the example.
  • the execution device, training device, terminal device, or communication device provided by the embodiments of the present application may specifically be a chip.
  • the chip includes a processing unit and a communication unit.
  • the processing unit may be a processor, for example, and the communication unit may be an input/ Output interface, pin or circuit, etc.
  • the processing unit can execute the computer-executable instructions stored in the storage unit, so that the chip in the execution device executes the image processing method described in the embodiments shown in FIGS. 3 to 10, or so that the chip in the training device executes the above-mentioned FIG. 11
  • the image processing method described in the illustrated embodiment may alternatively be such that the chip in the training device executes the image processing method described in the embodiment illustrated in FIG. 12 above.
  • the storage unit is a storage unit in the chip, such as a register, a cache, etc.
  • the storage unit may also be a storage unit located outside the chip in the wireless access device, such as Read-only memory (ROM) or other types of static storage devices that can store static information and instructions, random access memory (RAM), etc.
  • ROM Read-only memory
  • RAM random access memory
  • FIG. 19 is a schematic structural diagram of a chip provided by an embodiment of the application.
  • the chip may be expressed as a neural network processor NPU 190, which is mounted as a coprocessor to the main CPU (Host On the CPU), the Host CPU assigns tasks.
  • the core part of the NPU is the arithmetic circuit 1903.
  • the arithmetic circuit 1903 is controlled by the controller 1904 to extract matrix data from the memory and perform multiplication operations.
  • the arithmetic circuit 1903 includes multiple processing units (Process Engine, PE). In some implementations, the arithmetic circuit 1903 is a two-dimensional systolic array. The arithmetic circuit 1903 may also be a one-dimensional systolic array or other electronic circuit capable of performing mathematical operations such as multiplication and addition. In some implementations, the arithmetic circuit 1903 is a general-purpose matrix processor.
  • the arithmetic circuit fetches the corresponding data of matrix B from the weight memory 1902 and caches it on each PE in the arithmetic circuit.
  • the arithmetic circuit takes the matrix A data and matrix B from the input memory 1901 to perform matrix operations, and the partial or final result of the obtained matrix is stored in an accumulator 1908.
  • the unified memory 1906 is used to store input data and output data.
  • the weight data directly passes through the memory unit access controller (Direct Memory Access Controller, DMAC) 1905, and the DMAC is transferred to the weight memory 1902.
  • the input data is also transferred to the unified memory 1906 through the DMAC.
  • DMAC Direct Memory Access Controller
  • the BIU is the Bus Interface Unit, that is, the bus interface unit 1910, which is used for the interaction between the AXI bus and the DMAC and the instruction fetch buffer (IFB) 1909.
  • IFB instruction fetch buffer
  • the bus interface unit 1910 (Bus Interface Unit, BIU for short) is used for the instruction fetch memory 1909 to obtain instructions from the external memory, and is also used for the storage unit access controller 1905 to obtain the original data of the input matrix A or the weight matrix B from the external memory.
  • the DMAC is mainly used to transfer the input data in the external memory DDR to the unified memory 1906, or to transfer the weight data to the weight memory 1902, or to transfer the input data to the input memory 1901.
  • the vector calculation unit 1907 includes multiple arithmetic processing units, and further processes the output of the arithmetic circuit if necessary, such as vector multiplication, vector addition, exponential operation, logarithmic operation, size comparison, and so on. Mainly used in the calculation of non-convolutional/fully connected layer networks in neural networks, such as pixel-level summation, data distribution alignment of feature maps, etc.
  • the vector calculation unit 1907 can store the processed output vector to the unified memory 1906.
  • the vector calculation unit 1907 may apply a linear function and/or a non-linear function to the output of the arithmetic circuit 1903, for example, perform linear interpolation on the feature map extracted by the convolutional layer, and for example, a vector of accumulated values, to generate an activation value.
  • the vector calculation unit 1907 generates normalized values, pixel-level summed values, or both.
  • the processed output vector can be used as an activation input to the arithmetic circuit 1903, for example for use in subsequent layers in a neural network.
  • the instruction fetch buffer 1909 connected to the controller 1904 is used to store instructions used by the controller 1904;
  • the unified memory 1906, the input memory 1901, the weight memory 1902, and the fetch memory 1909 are all On-Chip memories.
  • the external memory is private to the NPU hardware architecture.
  • each layer in the convolutional neural network shown in the foregoing embodiments may be executed by the arithmetic circuit 1903 or the vector calculation unit 1907.
  • processor mentioned in any of the foregoing may be a general-purpose central processing unit, a microprocessor, an ASIC, or one or more integrated circuits used to control the execution of the program of the method in the first aspect.
  • the device embodiments described above are only illustrative, and the units described as separate components may or may not be physically separated, and the components displayed as units may or may not be physically separate.
  • the physical unit can be located in one place or distributed across multiple network units. Some or all of the modules can be selected according to actual needs to achieve the objectives of the solutions of the embodiments.
  • the connection relationship between the modules indicates that they have a communication connection between them, which may be specifically implemented as one or more communication buses or signal lines.
  • this application can be implemented by means of software plus necessary general hardware.
  • it can also be implemented by dedicated hardware including dedicated integrated circuits, dedicated CPUs, dedicated memory, Dedicated components and so on to achieve.
  • all functions completed by computer programs can be easily implemented with corresponding hardware.
  • the specific hardware structures used to achieve the same function can also be diverse, such as analog circuits, digital circuits or special-purpose circuits. Circuit etc.
  • software program implementation is a better implementation in more cases.
  • the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a readable storage medium, such as a computer floppy disk. , U disk, mobile hard disk, ROM, RAM, magnetic disk or optical disk, etc., including several instructions to make a computer device (which can be a personal computer, training device, or network device, etc.) execute the various embodiments described in this application method.
  • a computer device which can be a personal computer, training device, or network device, etc.
  • the computer program product includes one or more computer instructions.
  • the computer may be a general-purpose computer, a special-purpose computer, a computer network, or other programmable devices.
  • the computer instructions may be stored in a computer-readable storage medium, or transmitted from one computer-readable storage medium to another computer-readable storage medium.
  • the computer instructions may be transmitted from a website, computer, training device, or data.
  • the center uses wired (such as coaxial cable, optical fiber, digital subscriber line (DSL)) or wireless (such as infrared, wireless, microwave, etc.) to transmit to another website, computer, training equipment, or data center.
  • wired such as coaxial cable, optical fiber, digital subscriber line (DSL)
  • wireless such as infrared, wireless, microwave, etc.
  • the computer-readable storage medium may be any available medium that can be stored by a computer or a data storage device such as a training device or a data center integrated with one or more available media.
  • the usable medium may be a magnetic medium (for example, a floppy disk, a hard disk, and a magnetic tape), an optical medium (for example, a DVD), or a semiconductor medium (for example, a solid state disk (SSD)).

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computational Linguistics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

一种图像处理方法以及相关设备,该方法可用于人工智能领域的图像处理领域中,方法可以包括:获取第一待处理图像,以及第一数据分布特性,第一数据分布特性包括与第一图像集合中图像对应的特征图的数据分布特性,第一待处理图像与第一图像集合的数据分布规律相同;对第一待处理图像进行特征提取,并根据第一数据分布特性,在进行特征提取过程中对第一特征图进行数据分布对齐,第一特征图为对第一待处理图像进行特征提取过程中生成的,执行过数据分布对齐后的特征图有着相似的数据分布,利用第一数据分布特性进行数据分布对齐,大跨度的将第一特征图的数据分布向神经网络的敏感数据区域拉近,提升对跨场景图像的特征提取性能。

Description

一种图像处理方法以及相关设备
本申请要求于2020年01月23日提交中国专利局、申请号为202010085440.7、发明名称为“一种图像处理方法以及相关设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能领域,尤其涉及一种图像处理方法以及相关设备。
背景技术
人工智能(Artificial Intelligence,AI)是利用数字计算机或者数字计算机控制的机器模拟、延伸和扩展人的智能,感知环境、获取知识并使用知识获得最佳结果的理论、方法、技术及应用系统。换句话说,人工智能是计算机科学的一个分支,它企图了解智能的实质,并生产出一种新的能以人类智能相似的方式作出反应的智能机器。人工智能也就是研究各种智能机器的设计原理与实现方法,使机器具有感知、推理与决策的功能。利用人工智能进行图像处理是人工智能常见的一个应用方式。
目前,广泛存在的数据域鸿沟问题,导致图像特征提取的泛化能力非常低,训练好的神经网络只能在与训练数据相同场景的应用数据上部署,否则其性能非常差甚至无法使用。
因此,一种提升神经网络在跨场景的特征提取性能的方案亟待推出。
发明内容
本申请实施例提供了一种图像处理方法以及相关设备,利用第一数据分布特性对待处理图像的特征图进行数据分布对齐,第一数据分布特性为对根据与待处理图像的数据分布规律相同的图像集合中图像的特征图进行数据分布统计得到的,因此保证神经网络处理的图像都有着相似的数据分布,且可以大跨度的将第一待处理图像的特征图的数据分布向神经网络的敏感数据区域拉近,降低了神经网络的图像处理难度,进一步提升神经网络在跨场景的特征提取性能。
为解决上述技术问题,本申请实施例提供以下技术方案:
第一方面,本申请实施例提供一种图像处理方法,可用于人工智能领域的图像处理领域。执行设备获取第一待处理图像,以及,获取与第一待处理图像对应的第一数据分布特性。其中,第一待处理图像与第一图像集合的数据分布规律相同;第一数据分布特性包括与第一图像集合中图像对应的特征图的数据分布特性,其中包括与第一图像集合中图像对应的特征图在至少一个特征维度的数据分布特性,所述至少一个特征维度可以包括颜色特征、纹理特征、亮度特征和分辨率特征;进一步地,第一数据分布特性是通过对与第一图像集合中图像对应的特征图的数据分布进行统计获得的;更进一步地,第一数据分布特性是根据第一图像集合中的部分图像或全部图像的特征图得到的。之后执行设备对第一待处理图像进行特征提取,并根据第一数据分布特性,在进行特征提取过程中对第一特征图进 行数据分布对齐。其中,第一特征图为对第一待处理图像进行特征提取过程中生成的,第一特征图包括在所述至少一个特征维度上的特征图;对第一特征图进行数据分布对齐的过程是将第一特征图的数据分布向非线性函数的敏感值区域拉拢的过程,方法是减弱第一特征图的数据分布中携带的第一数据分布特性。本实现方式中,由于神经网络处理的是执行过数据分布对齐后的特征图,因此保证神经网络处理的特征图都有着相似的数据分布,以提高跨场景的不同图像的特征图之间的相似度,从而降低神经网络的图像处理难度,以提高神经网络在跨场景的特征提取性能;此外,第一数据分布特性是与第一图像集合中图像对应的特征图的数据分布特性,而第一图像集合中的图像与第一待处理图像的数据分布规律相同,利用第一数据分布特性进行数据分布对齐,可以大跨度的将第一待处理图像的特征图的数据分布向神经网络的敏感数据区域拉近,进一步降低神经网络的图像处理难度,进一步提升神经网络在跨场景的特征提取性能。
在第一方面的一种可能实现方式中,执行设备在对第一待处理图像进行特征提取之前,方法还可以包括:执行设备获取与第一待处理图像对应的第二数据分布特性,第二数据分布特性为第一图像集合中图像的数据分布特性,第二数据分布特性为对第一图像集合中的部分图像或全部图像的数据分布进行统计获得的。之后执行设备根据第二数据分布特性,对第一待处理图像进行数据分布对齐。其中,对第一待处理图像进行数据分布对齐的过程是将第一待处理图像的数据分布向非线性函数的敏感值区域拉拢的过程,方法是减弱第一待处理图像的数据分布中携带的第二数据分布特性。具体的,执行设备可以根据第二数据分布特性,对第一待处理图像进行归一化处理,以实现对第一待处理图像的数据分布对齐。进而执行设备对执行过数据分布对齐的第一待处理图像进行特征提取。本实现方式中,不仅在特征提取过程中对特征图进行数据分布对齐,在进行特征提取之前,还会对待处理图像进行数据分布对齐,也即神经网络处理的图像也有着相似的数据分布,进一步提高了跨场景的不同图像之间的相似度,也即进一步降低了神经网络的图像处理难度,从而进一步提升神经网络在跨场景的特征提取性能。
在第一方面的一种可能实现方式中,由于卷积神经网络对一个图像进行特征提取过程中会生成至少一个特征维度的特征图,对与第一图像集合中图像对应的每个特征维度的特征图进行数据分布统计,都会得到一个均值和一个方差,则根据与第一图像集合中图像对应的特征图生成的第一数据分布特性中包括至少一个均值和至少一个方差,均值和方差的数量和特征维度的维度数量相同。执行设备对第一待处理图像进行特征提取,并根据第一数据分布特性,在进行特征提取过程中对第一特征图进行数据分布对齐,可以包括:执行设备对第一待处理图像进行特征提取,并根据至少一个均值和至少一个方差,在进行特征提取过程中对第一特征图包括的至少一个特征图进行标准化处理。具体的,第一特征图中包括目标特征维度的特征图,执行设备从第一数据分布特性中获取与目标特征维度对应的目标均值和目标方差,并将第一待处理图像在目标特征维度的特征图与目标均值相减,再与目标方差相除,得到进行过标准化处理后的目标特征维度的特征图。其中,目标特征维度为所述至少一个特征维度中的任一个特征维度。本实现方式中,提供了对待处理图像的特征图进行数据分布对齐的具体实现方式,操作简单,易实现。
在第一方面的一种可能实现方式中,第一待处理图像和第一图像集合中的图像来源于 同一目标图像采集装置,或者,第一待处理图像的图像采集时刻和第一图像集合中图像的图像采集时刻均位于同一目标时间段内,或者,第一待处理图像和第一图像集合中的图像来源于同一图像采集地点,或者,第一待处理图像中的拍摄对象和第一图像集合包括的图像中的拍摄对象为同一对象类型。进一步地,前述图像采集装置包括但不限于摄像机、雷达或其他类型的图像采集装置;前述时间段指的可以为一天内不同的时间段;前述图像采集地点的划分粒度可以为省、市或县等;前述拍摄对象的对象类型的划分粒度可以为界、门、纲、目、科、属或种等,此处均不做限定。本实现方式中,提供了获取与第一待处理图像数据分布规律相同的第一图像集合的多种实现方式,扩展了本方案的应用场景,提高了本方案的实现灵活性。
在第一方面的一种可能实现方式中,执行设备获取与第一待处理图像对应的第一数据分布特性之前,方法还包括:获取采集第一待处理图像的目标图像采集装置的标识信息,并从第二图像集合包括的至少两个图像子集合中获取与目标图像采集装置的标识信息对应的第一图像集合。其中,第一图像集合为第二图像集合包括的至少两个图像子集合中的一个图像子集合,第一图像子集合包括通过目标图像采集装置采集到的图像,也即第一待处理图像和第一图像集合中的图像来源于同一目标图像采集装置。本实现方式中,不同的图像采集装置由于硬件配置或参数设置的不同,从而同一图像采集装置采集的图像的特征图的数据分布中会带有该图像采集装置的特有风格,以来源图像采集装置作为分类标准,根据第一待处理图像归属的第一图像集合中图像的特征图的数据分布特性,对第一待处理图像的特征图进行数据分布对齐,以减弱第一待处理图像的特征图中携带的图像采集装置的特有风格,也即提高来自于不同的图像采集装置的图像的特征图之间的相似度,以降低神经网络的特征提取难度。
在第一方面的一种可能实现方式中,获取与第一待处理图像对应的第一数据分布特性之前,方法还包括:获取采集第一待处理图像的图像采集时刻,并从第二图像集合包括的至少两个图像子集合中获取与第一待处理图像的图像采集时刻对应的第一图像集合。其中,第一图像集合为第二图像集合包括的至少两个图像子集合中的一个图像子集合,第一图像集合包括在目标时间段内采集的图像,第一待处理图像的图像采集时刻位于目标时间段内,也即第一待处理图像的图像采集时刻和第一图像集合中图像的图像采集时刻均位于同一目标时间段内。本实现方式中,不同时间段由于光线信息的不同,从而同一时间段内采集的图像的特征图的数据分布中会带有该时间段的特有风格,以时间段作为分类标准,根据第一待处理图像归属的第一图像集合中图像的特征图的数据分布特性,对第一待处理图像的特征图进行数据分布对齐,以减弱第一待处理图像的特征图中携带的某一时间段的特有风格,也即提高来自于不同的时间段的图像的特征图之间的相似度,以降低神经网络的特征提取难度。
在第一方面的一种可能实现方式中,执行设备对第一待处理图像进行特征提取,并根据第一数据分布特性,在进行特征提取过程中对第一特征图进行数据分布对齐,包括:执行设备对第一待处理图像进行特征提取,并根据第一数据分布特性,在进行特征提取过程中对第一特征图进行数据分布对齐,得到第一待处理图像的特征信息。之后执行设备根据第一待处理图像的特征信息,将第一待处理图像与第二图像集合中的图像进行匹配,得到 匹配结果。其中,第一图像集合为第二图像集合包括的至少两个图像子集合中的一个图像子集合,匹配结果包括至少一个目标图像,目标图像与第一待处理图像包括同样的拍摄对象,匹配结果中还可以包括所述匹配到的至少一个图像中每个图像的图像采集地点和图像采集时间。本实现方式中,提高了卷积神经网络的特征提取性能,从而可以根据更准确的特征信息进行图像匹配操作,有利于提高图像匹配的准确率,也即提高监控系统的图像匹配过程的准确率。
在第一方面的一种可能实现方式中,执行设备对第一待处理图像进行特征提取,并根据第一数据分布特性,在进行特征提取过程中对第一特征图进行数据分布对齐,包括:执行设备对第一待处理图像进行特征提取,并根据第一数据分布特性,在进行特征提取过程中对第一特征图进行数据分布对齐,得到第一待处理图像的特征信息。之后执行设备根据第一待处理图像的特征信息,对第一待处理图像进行识别,得到第一待处理图像中拍摄对象的描述信息。本实现方式中,提高了卷积神经网络的特征提取性能,从而有利于提高图像识别的准确率。
在第一方面的一种可能实现方式中,执行设备根据第一待处理图像的特征信息,将第一待处理图像与第二图像集合中的图像进行匹配之前,方法还包括:执行设备获取第二待处理图像和第三数据分布特性。其中,第二待处理图像为第二图像子集合中任一个图像,第三数据分布特性为与第三图像集合中图像对应的特征图的数据分布特性,第二待处理图像与第三图像集合中图像的数据分布规律相同。执行设备对第二待处理图像进行特征提取,并根据第三数据分布特性,在进行特征提取过程中对第二特征图进行数据分布对齐,得到第二待处理图像的特征信息。其中,第二待处理图像为第三图像集包括的至少一个图像中任一个图像,第二特征图为对第二待处理图像进行特征提取过程中生成的。执行设备重复执行上述步骤,直至得到第二图像集合中每个图像的特征信息。进而将第一待处理图像的特征信息与第二图像集合中每个图像的特征信息进行匹配,得到匹配结果。
本实现方式中,在图像重识别场景中,不是根据第二图像集合中的所有图像的特征图的数据分布特性执行数据分布对齐操作,而是根据图像的数据分布规律,将第二图像集合分为至少两个图像子集合,基于图像子集合中图像的特征图的数据分布特性执行数据分布对齐操作,避免了不同图像子集合之间的数据分布特性的互相干扰,有利于大跨度的将待处理图像的特征图的数据分布拉拢到神经网络的敏感区域,提高特征提取性能;待处理图像的特征信息和第二图像集合中每个图像的特征信息的精准度都提高的情况下,提高了图像匹配过程的准确率。
本申请实施例第二方面提供了一种图像处理方法,执行设备获取第一待处理图像;执行设备获取与第一待处理图像对应的第一数据分布特性,其中,第一数据分布特性包括与第一图像集合中图像对应的特征图的数据分布特性,第一待处理图像与第一图像集合的数据分布规律相同;执行设备将第一待处理图像和第一数据分布特性输入特征提取网络,以使特征提取网络在对第一待处理图像进行特征提取的过程中,根据第一数据分布特性,对第一特征图进行数据分布对齐,其中,第一特征图为特征提取网络对第一待处理图像进行特征提取过程中生成的。
在第二方面的一种可能实现方式中,执行设备将第一待处理图像和第一数据分布特性 输入特征提取网络之前,方法还包括:执行设备获取与第一待处理图像对应的第二数据分布特性,第二数据分布特性为第一图像集合中图像的数据分布特性;执行设备根据第二数据分布特性,对第一待处理图像进行数据分布对齐。执行设备将第一待处理图像和第一数据分布特性输入特征提取网络,包括:执行设备将执行过数据分布对齐的第一待处理图像输入特征提取网络。
在第二方面的一种可能实现方式中,第一数据分布特性包括均值和方差,均值和方差为对与第一图像集合中图像对应的特征图进行数据分布统计得到的。执行设备将第一待处理图像和第一数据分布特性输入特征提取网络,以使特征提取网络在对第一待处理图像进行特征提取的过程中,根据第一数据分布特性,对第一特征图进行数据分布对齐,包括:执行设备将第一待处理图像和第一数据分布特性输入特征提取网络,以使特征提取网络在对第一待处理图像进行特征提取的过程中,根据均值和方差,对第一特征图进行标准化处理。
在第二方面的一种可能实现方式中,第一待处理图像和第一图像集合中的图像来源于同一目标图像采集装置,或者,第一待处理图像的图像采集时刻和第一图像集合中图像的图像采集时刻均位于同一目标时间段内,或者,第一待处理图像和第一图像集合中的图像来源于同一图像采集地点,或者,第一待处理图像中的拍摄对象和第一图像集合包括的图像中的拍摄对象为同一对象类型。
在第二方面的一种可能实现方式中,执行设备获取与第一待处理图像对应的第一数据分布特性之前,方法还包括:执行设备获取采集第一待处理图像的目标图像采集装置,并从第二图像集合包括的至少两个图像子集合中获取与目标图像采集装置对应的第一图像集合,其中,第一图像集合为第二图像集合包括的至少两个图像子集合中的一个图像子集合,第一图像子集合包括通过目标图像采集装置采集到的图像。
在第二方面的一种可能实现方式中,执行设备获取与第一待处理图像对应的第一数据分布特性之前,方法还包括:执行设备获取采集第一待处理图像的图像采集时刻,并从第二图像集合包括的至少两个图像子集合中获取与第一待处理图像的图像采集时刻对应的第一图像集合,其中,第一图像集合为第二图像集合包括的至少两个图像子集合中的一个图像子集合,第一图像集合包括在目标时间段内采集的图像,第一待处理图像的图像采集时刻位于目标时间段内。
在第二方面的一种可能实现方式中,执行设备将第一待处理图像和第一数据分布特性输入特征提取网络,以使特征提取网络在对第一待处理图像进行特征提取的过程中,根据第一数据分布特性,对第一特征图进行数据分布对齐,包括:执行设备将第一待处理图像和第一数据分布特性输入特征提取网络,以使特征提取网络在对第一待处理图像进行特征提取的过程中,根据第一数据分布特性,对第一特征图进行数据分布对齐,得到特征提取网络输出的第一待处理图像的特征信息。执行设备得到特征提取网络输出的第一待处理图像的特征信息之后,方法还包括:执行设备将第一待处理图像的特征信息输入图像匹配网络,以使图像匹配网络将第一待处理图像与第二图像集合中的图像进行匹配,得到图像匹配网络输出的匹配结果,其中,特征提取网络和图像匹配网络包括于同一卷积神经网络,第一图像集合为第二图像集合包括的至少两个图像子集合中的一个图像子集合,匹配结果包括至少一个目标图像,目标图像与第一待处理图像包括同样的拍摄对象。或者,执行设 备将第一待处理图像的特征信息输入图像识别网络,以使图像识别网络对第一待处理图像进行识别,得到图像识别网络输出的第一待处理图像中拍摄对象的描述信息,其中,特征提取网络和图像识别网络包括于同一卷积神经网络。
在第二方面的一种可能实现方式中,执行设备将第一待处理图像和第一数据分布特性输入特征提取网络之前,方法还包括:执行设备获取第二待处理图像和第三数据分布特性,其中,第二待处理图像为第二图像子集合中任一个图像,第三数据分布特性为与第三图像集合中图像对应的特征图的数据分布特性,第二待处理图像与第三图像集合中图像的数据分布规律相同。执行设备将第二待处理图像和第三数据分布特性输入特征提取网络,以使特征提取网络在对第二待处理图像进行特征提取的过程中,根据第三数据分布特性,对第二特征图进行数据分布对齐,得到第二待处理图像的特征信息,其中,第二特征图为特征提取网络对第二待处理图像进行特征提取过程中生成的。执行设备重复执行上述步骤,直至得到第二图像集合中每个图像的特征信息。执行设备将第一待处理图像的特征信息输入图像匹配网络,以使图像匹配网络将第一待处理图像与第二图像集合中的图像进行匹配,得到图像匹配网络输出的匹配结果,包括:执行设备将第一待处理图像的特征信息和第二图像集合中每个图像的特征信息输入图像匹配网络,以使图像匹配网络将第一待处理图像与第二图像集合中的图像进行匹配,得到图像匹配网络输出的匹配结果。
执行设备执行各个可能实现方式的具体实现步骤,可以参考第一方面以及第一方面中各种可能的实现方式中的描述,此处不再一一赘述。
第三方面,本申请实施例提供了一种图像处理方法,可用于人工智能领域的图像处理领域中。训练设备从训练图像集合中获取至少两个训练图像,至少两个训练图像包括第一训练图像和第二训练图像,第一训练图像和第二训练图像中包括相同的拍摄对象。训练设备获取与第一训练图像的特征图对应的数据分布特性,与第一训练图像的特征图对应的数据分布特性为与第一训练图像归属的训练图像子集合中图像对应的特征图的数据分布特性,第一训练图像与第一训练图像归属的训练图像子集合中图像的数据分布规律相同。训练设备通过卷积神经网络对第一训练图像进行特征提取,并根据与第一训练图像的特征图对应的数据分布特性,在进行特征提取过程中对第三特征图进行数据分布对齐,得到第一训练图像的特征信息,其中,第三特征图为对第一训练图像进行特征提取过程中得到的。训练设备获取与第二训练图像的特征图对应的数据分布特性,与第二训练图像的特征图对应的数据分布特性为与第二训练图像归属的训练图像子集合中图像对应的特征图的数据分布特性,第二训练图像与第二训练图像归属的训练图像子集合中图像的数据分布规律相同。训练设备通过卷积神经网络对第二训练图像进行特征提取,并根据与第二训练图像的特征图对应的数据分布特性,在进行特征提取过程中对第四特征图进行数据分布对齐,得到第二训练图像的特征信息,其中,第四特征图为对第二训练图像进行特征提取过程中得到的。训练设备根据第一训练图像的特征信息和第二训练图像的特征信息,通过损失函数对卷积神经网络进行训练,直至满足收敛条件,输出执行过迭代训练操作的卷积神经网络。其中,损失函数用于指示第一训练图像的特征信息和第二训练图像的特征信息之间的相似度,损失函数可以为以下中的一项或多项:二元组损失函数、三元组损失函数或四元组损失函数或其他损失函数等。收敛条件可以为满足损失函数的收敛条件,也可以为迭代次数达到预 设次数。本实现方式中,提供了通用能力为图像重识别情况下,训练侧的具体实现方式,提供了一种在跨场景过程中依旧可以保持良好的特征提取能力的卷积神经网络,提高了本方案的完整性;仅对特征提取技能进行训练,提高了训练阶段的效率;此外,在训练过程采用的为增量学习的情况下,由于本申请实施例提供的方法可以去除特征图中携带的某个训练图像子集合的数据分布特性,从而避免了将卷积神经网络过拟合到某个小的训练数据集中,解决了增量学习过程的灾难遗忘问题。
训练设备还可以用于执行第一方面的各个可能实现方式中执行设备执行的步骤,训练设备执行各个可能实现方式的具体实现步骤,可以参考第一方面以及第一方面中各种可能的实现方式中的描述,此处不再一一赘述。
第四方面,本申请实施例提供了一种图像处理方法,可用于人工智能领域的图像处理领域中。训练设备从训练图像集合中获取第三训练图像,第三训练图像为训练图像集合中的一个图像,训练图像集合中还存储有每个图像的真实描述信息。训练设备获取与第三训练图像的特征图对应的数据分布特性,与第三训练图像的特征图对应的数据分布特性为与第三训练图像归属的训练图像子集合中图像对应的特征图的数据分布特性。训练设备通过卷积神经网络对第三训练图像进行特征提取,并根据与第三训练图像的特征图对应的数据分布特性,在进行特征提取过程中对第三特征图进行数据分布对齐,得到第三训练图像的特征信息,其中,第三特征图为对第三训练图像进行特征提取过程中得到的。训练设备根据第三训练图像的特征信息进行图像识别,得到第三训练图像中拍摄对象的预测描述信息。训练设备根据第三训练图像中拍摄对象的预测描述信息和第三训练图像中拍摄对象的真实描述信息,计算损失函数的值,并根据损失函数的值反向传播,以调整卷积神经网络的参数值,从而完成了对卷积神经网络的一次训练。训练设备重复前述操作以对卷积神经网络进行迭代训练,直至满足收敛条件,输出执行过迭代训练操作的卷积神经网络。本实现方式中,提供了通用能力为图像重识别情况下,训练侧的具体实现方式,提供了一种在跨场景过程中依旧可以保持良好的特征提取能力的卷积神经网络,提高了本方案的完整性;仅对特征提取技能进行训练,提高了训练阶段的效率;此外,在训练过程采用的为增量学习的情况下,由于本申请实施例提供的方法可以去除特征图中携带的某个训练图像子集合的数据分布特性,从而避免了将卷积神经网络过拟合到某个小的训练数据集中,解决了增量学习过程的灾难遗忘问题。
训练设备还可以用于执行第一方面的各个可能实现方式中执行设备执行的步骤,训练设备执行各个可能实现方式的具体实现步骤,可以参考第一方面以及第一方面中各种可能的实现方式中的描述,此处不再一一赘述。
第五方面,本申请实施例提供了一种图像处理装置,可用于人工智能领域的图像处理领域中,图像处理装置包括:获取模块,用于获取第一待处理图像。获取模块,还用于获取与第一待处理图像对应的第一数据分布特性,其中,第一数据分布特性包括与第一图像集合中图像对应的特征图的数据分布特性,第一待处理图像与第一图像集合的数据分布规律相同。特征提取模块,用于对第一待处理图像进行特征提取,并根据第一数据分布特性,在进行特征提取过程中对第一特征图进行数据分布对齐,其中,第一特征图为对第一待处理图像进行特征提取过程中生成的。
对于本申请第五方面提供的执行设备的组成模块执行第五方面以及第五方面的各种可能实现方式的具体实现步骤,均可以参考第一方面以及第一方面中各种可能的实现方式中的描述,此处不再一一赘述。
第六方面,本申请实施例提供了一种图像处理装置,可用于人工智能领域的图像处理领域中,图像处理装置包括:获取模块,用于从训练图像集合中获取至少两个训练图像,至少两个训练图像包括第一训练图像和第二训练图像,第一训练图像和第二训练图像中包括相同的拍摄对象。获取模块,还用于获取与第一训练图像的特征图对应的数据分布特性,与第一训练图像的特征图对应的数据分布特性为与第一训练图像归属的训练图像子集合中图像对应的特征图的数据分布特性,第一训练图像与第一训练图像归属的训练图像子集合中图像的数据分布规律相同。特征提取模块,用于通过卷积神经网络对第一训练图像进行特征提取,并根据与第一训练图像的特征图对应的数据分布特性,在进行特征提取过程中对第三特征图进行数据分布对齐,得到第一训练图像的特征信息,其中,第三特征图为对第一训练图像进行特征提取过程中得到的。获取模块,还用于获取与第二训练图像的特征图对应的数据分布特性,与第二训练图像的特征图对应的数据分布特性为与第二训练图像归属的训练图像子集合中图像对应的特征图的数据分布特性,第二训练图像与第二训练图像归属的训练图像子集合中图像的数据分布规律相同。特征提取模块,还用于通过卷积神经网络对第二训练图像进行特征提取,并根据与第二训练图像的特征图对应的数据分布特性,在进行特征提取过程中对第四特征图进行数据分布对齐,得到第二训练图像的特征信息,其中,第四特征图为对第二训练图像进行特征提取过程中得到的。训练模块,用于根据第一训练图像的特征信息和第二训练图像的特征信息,通过损失函数对卷积神经网络进行训练,直至满足收敛条件,输出执行过迭代训练操作的卷积神经网络,其中,损失函数用于指示第一训练图像的特征信息和第二训练图像的特征信息之间的相似度。
对于本申请第六方面提供的执行设备的组成模块还可以用于执行第三方面的各个可能实现方式中执行设备执行的步骤,执行设备的组成模块执行第六方面以及第六方面的各种可能实现方式的具体实现步骤,均可以参考第三方面以及第三方面中各种可能的实现方式中的描述,此处不再一一赘述。
第七方面,本申请实施例提供了一种图像处理装置,可用于人工智能领域的图像处理领域中,图像处理装置包括:获取模块,用于从训练图像集合中获取第四训练图像,第四训练图像为训练图像集合中的一个图像。获取模块,还用于获取与第四训练图像的特征图对应的数据分布特性,与第四训练图像的特征图对应的数据分布特性为与第四训练图像归属的训练图像子集合中图像对应的特征图的数据分布特性。特征提取模块,用于通过卷积神经网络对第四训练图像进行特征提取,并根据与第四训练图像的特征图对应的数据分布特性,在进行特征提取过程中对第四特征图进行数据分布对齐,得到第四训练图像的特征信息,其中,第四特征图为对第四训练图像进行特征提取过程中得到的。识别模块,用于根据第四训练图像的特征信息进行图像识别,得到第四训练图像中拍摄对象的描述信息。训练模块,用于根据描述信息,通过损失函数对卷积神经网络进行训练。
对于本申请第七方面提供的执行设备的组成模块还可以用于执行第四方面的各个可能实现方式中执行设备执行的步骤,执行设备的组成模块执行第七方面以及第七方面的各种 可能实现方式的具体实现步骤,均可以参考第四方面以及第四方面中各种可能的实现方式中的描述,此处不再一一赘述。
第八方面,本申请实施例提供了一种执行设备,包括处理器,处理器与存储器耦合;存储器,用于存储程序;处理器,用于执行存储器中的程序,使得执行设备执行第一方面或第二方面的各个可能实现方式中执行设备执行的步骤。
第九方面,本申请实施例提供了一种训练设备,包括处理器,处理器与存储器耦合;存储器,用于存储程序;处理器,用于执行存储器中的程序,使得训练设备执行第三方面的各个可能实现方式中执行设备执行的步骤,或者,使得训练设备执行第四方面的各个可能实现方式中执行设备执行的步骤。
第十方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,当其在计算机上运行时,使得计算机执行上述第一方面、第二方面、第三方面或第四方面所述的图像处理方法。
第十一方面,本申请实施例提供了一种计算机程序,当其在计算机上运行时,使得计算机执行上述第一方面、第二方面、第三方面或第四方面所述的图像处理方法。
第十二方面,本申请提供了一种芯片系统,该芯片系统包括处理器,用于支持执行设备或训练设备实现上述方面中所涉及的功能,例如,发送或处理上述方法中所涉及的数据和/或信息。在一种可能的设计中,所述芯片系统还包括存储器,所述存储器,用于保存执行设备或训练设备必要的程序指令和数据。该芯片系统,可以由芯片构成,也可以包括芯片和其他分立器件。
附图说明
图1为本申请实施例提供的人工智能主体框架的一种结构示意图;
图2为本申请实施例提供的图像处理系统的一种系统架构图;
图3为本申请实施例提供的图像处理方法的一种场景示意图;
图4为本申请实施例提供的图像处理方法的另一种场景示意图;
图5为本申请实施例提供的图像处理方法的一种流程示意图;
图6为本申请实施例提供的图像处理方法中数据分布特性的一种示意图;
图7为本申请实施例提供的图像处理方法中进行数据分布对齐的一种示意图;
图8为本申请实施例提供的图像处理方法中卷积神经网络的一种示意图;
图9为本申请实施例提供的图像处理方法中特征图的数据分布的一种示意图;
图10为本申请实施例提供的图像处理方法的另一种流程示意图;
图11为本申请实施例提供的图像处理方法的又一种流程示意图;
图12为本申请实施例提供的图像处理方法的再一种流程示意图;
图13为本申请实施例提供的图像处理装置的一种结构示意图;
图14为本申请实施例提供的图像处理装置的另一种结构示意图;
图15为本申请实施例提供的图像处理装置的又一种结构示意图;
图16为本申请实施例提供的图像处理装置的再一种结构示意图;
图17为本申请实施例提供的执行设备的一种结构示意图;
图18是本申请实施例提供的训练设备一种结构示意图;
图19为本申请实施例提供的芯片的一种结构示意图。
具体实施方式
本申请实施例提供了一种图像处理方法以及相关设备,利用第一数据分布特性对待处理图像的特征图进行数据分布对齐,第一数据分布特性为对根据与待处理图像的数据分布规律相同的图像集合中图像的特征图进行数据分布统计得到的,因此保证神经网络处理的图像都有着相似的数据分布,且可以大跨度的将第一待处理图像的特征图的数据分布向神经网络的敏感数据区域拉近,降低了神经网络的图像处理难度,进一步提升神经网络在跨场景的特征提取性能。
下面结合附图,对本申请的实施例进行描述。本领域普通技术人员可知,随着技术的发展和新场景的出现,本申请实施例提供的技术方案对于类似的技术问题,同样适用。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的术语在适当情况下可以互换,这仅仅是描述本申请的实施例中对相同属性的对象在描述时所采用的区分方式。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,以便包含一系列单元的过程、方法、系统、产品或设备不必限于那些单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它单元。
首先对人工智能系统总体工作流程进行描述,请参见图1,图1示出的为人工智能主体框架的一种结构示意图,下面从“智能信息链”(水平轴)和“IT价值链”(垂直轴)两个维度对上述人工智能主题框架进行阐述。其中,“智能信息链”反映从数据的获取到处理的一列过程。举例来说,可以是智能信息感知、智能信息表示与形成、智能推理、智能决策、智能执行与输出的一般过程。在这个过程中,数据经历了“数据—信息—知识—智慧”的凝练过程。“IT价值链”从人智能的底层基础设施、信息(提供和处理技术实现)到系统的产业生态过程,反映人工智能为信息技术产业带来的价值。
(1)基础设施
基础设施为人工智能系统提供计算能力支持,实现与外部世界的沟通,并通过基础平台实现支撑。通过传感器与外部沟通;计算能力由智能芯片(CPU、NPU、GPU、ASIC、FPGA等硬件加速芯片)提供;基础平台包括分布式计算框架及网络等相关的平台保障和支持,可以包括云存储和计算、互联互通网络等。举例来说,传感器和外部沟通获取数据,这些数据提供给基础平台提供的分布式计算系统中的智能芯片进行计算。
(2)数据
基础设施的上一层的数据用于表示人工智能领域的数据来源。数据涉及到图形、图像、语音、文本,还涉及到传统设备的物联网数据,包括已有系统的业务数据以及力、位移、液位、温度、湿度等感知数据。
(3)数据处理
数据处理通常包括数据训练,机器学习,深度学习,搜索,推理,决策等方式。
其中,机器学习和深度学习可以对数据进行符号化和形式化的智能信息建模、抽取、 预处理、训练等。
推理是指在计算机或智能系统中,模拟人类的智能推理方式,依据推理控制策略,利用形式化的信息进行机器思维和求解问题的过程,典型的功能是搜索与匹配。
决策是指智能信息经过推理后进行决策的过程,通常提供分类、排序、预测等功能。
(4)通用能力
对数据经过上面提到的数据处理后,进一步基于数据处理的结果可以形成一些通用的能力,比如可以是算法或者一个通用系统,例如,翻译,文本的分析,计算机视觉的处理,语音识别,图像的识别等等。
(5)智能产品及行业应用
智能产品及行业应用指人工智能系统在各领域的产品和应用,是对人工智能整体解决方案的封装,将智能信息决策产品化、实现落地应用,其应用领域主要包括:智能终端、智能制造、智能交通、智能家居、智能医疗、智能安防、自动驾驶、无人超市等。
本申请可以应用于人工智能领域的图像处理领域中,在对本申请实施例提供的图像处理方法进行详细描述之前,先介绍一下本申请实施例所采用的系统架构,请先参阅图2,图2为本申请实施例提供的图像处理系统的一种系统架构图,在图2中,图像处理系统200包括执行设备210、训练设备220、数据库230、客户设备240和数据存储系统250,执行设备210中包括计算模块211。
其中,数据库230中存储有训练图像集合,训练设备220生成用于处理图像的目标模型/规则201,并利用数据库中的训练图像集合对目标模型/规则201进行训练,得到成熟的目标模型/规则201。本申请实施例中以目标模型/规则201为卷积神经网络为例进行说明。
训练设备220得到的卷积神经网络可以应用不同的系统或设备中,例如手机、平板、笔记本电脑、VR设备、监控系统、雷达的数据处理系统等等。其中,执行设备210可以调用数据存储系统250中的数据、代码等,也可以将数据、指令等存入数据存储系统250中。数据存储系统250可以置于执行设备210中,也可以为数据存储系统250相对执行设备210是外部存储器。
计算模块211可以通过卷积神经网络对通过客户设备240获取的待处理图像进行卷积操作,在提取到待处理图像的特征图之后,根据预先获取的数据分布特性对特征图进行数据分布对齐,并根据执行过数据分布对齐的特征图生成待处理图像的特征信息。其中,预先获取的数据分布特性对与图像集合中图像对应的特征图进行数据分布统计后得到的,而待处理图像与前述图像集合中图像的数据分布规律相同。
本申请的一些实施例中,请参阅图2,执行设备210和客户设备240可以为分别独立的设备,执行设备210配置有I/O接口212,与客户设备240进行数据交互,“用户”可以通过客户设备240向I/O接口212输入待处理图像,执行设备210通过I/O接口212将处理结果返回给客户设备240,提供给用户。作为示例,例如客户设备240为监控系统中的监控视频处理设备,客户设备240可以为监控系统中终端侧的设备,执行设备210根据从客户设备240接收待处理图像,并对所述待处理图像进行数据处理,执行设备210可以具体表现为本地设备,也可以为远端设备。
值得注意的,图2仅是本发明实施例提供的图像处理系统的一种架构示意图,图中所 示设备、器件、模块等之间的位置关系不构成任何限制。例如,在本申请的另一些实施例中,执行设备210可以配置于客户设备240中,作为示例,例如当客户设备为台式电脑时,执行设备210可以为台式电脑的主处理器(Host CPU)中用于进行图像处理的模块,执行设备210也可以为台式电脑中的图形处理器(graphics processing unit,GPU)或者神经网络处理器(NPU),GPU或NPU作为协处理器挂载到主处理器上,由主处理器分配任务。再例如,在本申请的另一些实施例中,执行设备210可以配置于训练设备220中,则数据存储系统250和数据库230可以集成于同一存储设备中,训练设备220在生成成熟的卷积神经网络之后,将成熟的卷积神经网络存储于数据存储系统250中,从而计算模块211可以直接调用成熟的卷积神经网络。
由于智能安防、无人超市或智能终端(实际情况不限于这四种典型的应用领域)等领域中都可以用到本申请实施例中的图像处理方法,而所述图像处理方法分为训练阶段和应用阶段。基于上述图2描述的系统架构,下面将对本申请实施例提供的图像处理方法中的应用阶段落地到多个应用场景进行介绍。
下面首先以智能安防领域中监控系统的重识别场景为例,介绍本申请实施例提供的图像处理方法的应用阶段的四种实现方式。
第一种,请参与图3,图3为本申请实施例提供的图像处理方法的一种示意图。图3中以监控系统中包括4个摄像机,执行设备布置于服务器上,且服务器以来源摄像机作为不同图像子集合的分类标准为例。
摄像机1、摄像机2、摄像机3以及摄像机4在采集到视频之后,从视频中获取图像,并将获取到的图像发送给服务器。对应的,服务器接收并存储摄像机1、摄像机2、摄像机3以及摄像机4发送的图像,前述摄像机1、摄像机2、摄像机3以及摄像机4发送的图像构成服务器中的图像集合,服务器中还可以存储有图像集合中每个图像的来源摄像机、与来源摄像机对应的图像采集地点以及图像获取时间。服务器以来源摄像机作为图像子集合的分类标准,可以将前述图像集合分为四个图像子集合,分别为通过摄像机1获取的图像子集合、通过摄像机2获取的图像子集合、通过摄像机3获取的图像子集合和通过摄像机4获取的图像子集合。
服务器在接收到某个摄像机的图像数量达到预设个数的情况下,可以通过成熟的卷积神经网络预先生成与所述摄像机对应的数据分布特性。由于服务器中同时集成有训练设备和执行设备,所以服务器中的训练设备在训练得到成熟的卷积神经网络之后,服务器中的执行设备可以直接从存储系统中获取成熟的卷积神经网络。其中,与摄像机对应的数据分布特性中包括摄像机采集的图像的数据分布特性和与摄像机采集的图像对应的特征图的数据分布特性,本实施例中以预设个数的取值为500为例。进一步地,与摄像机1采集的图像对应的特征图的数据分布特性中可以包括一个或多个特征维度的数据分布特性,数据分布特性的个数与卷积神经网络从一个图像中提取出的特征图的维度数量一致,本实施例中以卷积神经网络从一个图像中提取出的特征图包括颜色特征、纹理特征和分辨率特征3个维度的特征图为例。
具体的,针对与摄像机1对应的数据分布特性,服务器在接收到摄像机1采集的图像数量达到500个以后,服务器可以直接对摄像机1采集的所述500个图像进行统计,得到 摄像机1采集的图像的数据分布特性。服务器还可以利用成熟的卷积神经网络对摄像机1采集的所述500个图像进行特征提取,从而得到与摄像机1采集的500个图像对应的1500个特征图,前述1500个特征图中包括摄像机1采集的500个图像在颜色特征维度的500个特征图、在纹理特征维度的500个特征图以及在分辨率特征维度的500个特征图。服务器还利用卷积神经网络对摄像机1采集的500个图像分布进行特征提取,而服务器可以对摄像机1采集的500个图像在颜色特征维度的500个特征图进行统计,以生成与摄像机1采集的图像对应的特征图在颜色特征的数据分布特性;对摄像机1采集的500个图像在纹理特征维度的500个特征图进行统计,以生成与摄像机1采集的图像对应的特征图在纹理特征的数据分布特性;摄像机1采集的500个图像在分辨率特征维度的500个特征图进行统计,以生成与摄像机1采集的图像对应的特征图在分辨率特征的数据分布特性。为进一步理解本方案,结合如下表1,展示三种特征维度下特征图与数据分布特性的对应关系。
表1
Figure PCTCN2020118076-appb-000001
参阅如上表1,表1中示出了分别示出了在颜色特征维度、纹理特征维度和分辨率特征维度这三个特征维度中,特征图和特征图的数据分布特性之间的对应关系,应当理解,此次举例仅为方便理解本方案,不用于限定本方案。
服务器在生成与摄像机1对应的数据分布特性之后,对通过摄像机1获取的图像子集合中的图像进行特征提取,以得到通过摄像机1获取的图像子集合中的每个图像的特征信息。具体的,对于通过摄像机1获取的图像子集合中第一图像,所述第一图像为通过摄像机1获取的图像子集合中的任一个图像,服务器利用摄像机1采集的图像的数据分布特性 对第一图像进行数据分布对齐。服务器在通过成熟的卷积神经网络对进行数据分布对齐后的第一图像进行特征提取的过程中,在得到第一图像在颜色特征维度上的特征图之后,通过成熟的卷积神经网络利用与摄像机1采集的图像对应的特征图在颜色特征维度上的数据分布特性,对第一图像在颜色维度的特征图进行数据分布对齐;在得到第一图像的纹理维度的特征图之后,通过成熟的卷积神经网络利用与摄像机1采集的图像对应的特征图在纹理特征维度上的数据分布特性,对第一图像在纹理特征维度的特征图进行数据分布对齐;得到第一图像的分辨率维度的特征图之后,通过成熟的卷积神经网络利用与摄像机1采集的图像对应的特征图在分辨率特征维度上的数据分布特性,对第一图像在分辨率特征维度的特征图进行数据分布对齐。进而通过成熟的卷积神经网络基于进行过数据分布对齐的颜色特征维度上特征图、进行过数据分布对齐的纹理特征维度上特征图以及进行过数据分布对齐的分辨率特征维度上特征图,生成第一图像的特征信息。服务器对通过摄像机1获取的图像子集合中的每个图像均执行前述操作,得到通过摄像机1获取的图像子集合中每个图像的特征信息。
针对与摄像机2对应的数据分布特性、与摄像机3对应的数据分布特性和与摄像机4对应的数据分布特性的具体生成方式,可以参考与摄像机1对应的数据分布特性的具体生成方式,针对通过摄像机2获取的图像子集合中每个图像的特征信息的具体生成方式、通过摄像机3获取的图像子集合中每个图像的特征信息的具体生成方式和通过摄像机4获取的图像子集合中每个图像的特征信息的具体生成方式,均可以参考通过摄像机1获取的图像子集合中每个图像的特征信息的具体生成方式,此次均不做赘述。
在用户设备需要对待处理图像进行重识别情况下,用户设备可以向服务器发送匹配请求,以接收服务器发送的与待处理图像匹配的至少一个图像。其中,所述匹配到的至少一个图像与待处理图像中包括的为相同的拍摄对象,所述匹配请求中携带有待处理图像以及待处理图像的来源摄像机,本实施例中以待处理图像来源于摄像机1为例。
服务器在接收到匹配请求之后,根据匹配请求获知需要从图像集合中获取与待处理图像匹配的至少一个图像,待处理图像是通过摄像机1采集的。则服务器获取与摄像机1对应的数据分布特性,根据摄像机1采集的图像的数据分布特性对待处理图像进行数据分布对齐。进而服务器通过成熟的卷积神经网络对执行过数据分布对齐的待处理图像进行特征提取,在进行特征提取的过程中,在得到待处理图像在颜色特征维度的特征图之后,利用与摄像机1采集的图像对应的特征图在颜色特征的数据分布特性对待处理图像在颜色维度的特征图进行数据分布对齐;在得到待处理图像在纹理特征维度的特征图之后,利用与摄像机1采集的图像对应的特征图在纹理特征的数据分布特性对待处理图像在纹理维度的特征图进行数据分布对齐;在得到待处理图像在分辨率特征维度的特征图之后,利用与摄像机1采集的图像对应的特征图在分辨率特征的数据分布特性对待处理图像在分辨率维度的特征图进行数据分布对齐;进而根据与待处理图像对应的且进行过数据分布对齐的颜色特征维度的特征图、纹理特征维度的特征图以及分辨率特征维度的特征图,得到待处理图像的特征信息。
服务器在得到待处理图像的特征信息之后,会将待处理图像的特征信息与所述图像集合中的每个图像的特征信息进行匹配,以从所述图像集合中获取到与待处理图像匹配的至 少一个图像,所述匹配到的至少一个图像中每个图像的拍摄对象与待处理图像中的拍摄对象相同,进而得到匹配结果。其中,匹配结果包括所述匹配到的至少一个图像,还可以包括所述匹配到的至少一个图像中每个图像的图像采集地点和图像采集时间。
服务器在得到匹配结果之后,将匹配结果发送给客户设备,客户设备向用户展示所述匹配结果。
需要说明的是,图3中的架构示例仅为一种示例,在其他实现方式中,也可以为一个客户设备与一个或多个摄像机连接,由客户设备将摄像机采集的图像发送给服务器,不同的客户设备连接的摄像机的数量可以相同或不同。此外,上述实施例中对于监控系统中摄像机数量、预设个数以及三个特征维度的举例均仅为方便理解本方案,在实际实现场景中,一个监控系统中包括的摄像机的数量可以为更多或更少,预设个数的取值也可以为更多或更少,与某个摄像机采集的图像对应的特征图的数据分布特性中也可以包括其他维度的类型,此次均不做限定。
第二种,本实施例结合图3进行描述,以监控系统中包括4个摄像机,且服务器以图像采集时间作为不同图像子集合的分类标准为例。
摄像机1至摄像机4在采集视频之后,从视频中获取图像,并将获取的图像发送给服务器。对应的,服务器接收摄像机1至摄像机4发送的图像,前述摄像机1至摄像机4发送的图像构成服务器中的图像集合,服务器中还可以存储有图像集合中每个图像的来源摄像机、与来源摄像机对应的图像采集地点以及图像获取时间。服务器以图像采集的时间作为图像子集合的分类标准。本实施例中,以将整个图像集合分为两个图像子集合,分别为将7点至18点这个时间段确定为第一时间段,在第一时间段内采集到的图像作为一个图像子集合,将19点至6点这个时间段确定为第二时间段,在第二时间段内采集到的图像作为另一个图像子集合为例。
服务器在获取到第一时间段的图像数量达到预设个数的情况下,可以预先生成与所述第一时间段对应的数据分布特性,其中,第一时间段指的是7点至18点这个时间段,与第一时间段对应的数据分布特性包括在第一时间段内采集的图像的数据分布特性和与在第一时间段内采集的图像对应的特征图的数据分布特性。进一步地,与在第一时间段内采集的图像对应的特征图的数据分布特性中可以包括一个或多个特征维度的数据分布特性,本实施例中包括的所述一个或多个特征维度的数据分布特性与监控系统的重识别场景的第一种实现方式中的包括的一个或多个特征维度的数据分布特性的维度类型可以相同或不同,本实施例中以卷积神经网络从一个图像中提取出的特征图包括亮度特征、纹理特征和颜色特征为例。具体实现方式可以参阅监控系统的重识别场景的第一种实现方式中生成与摄像机1对应的数据分布特性的描述,此处不做赘述。
服务器利用与第一时间段对应的数据分布特性对所述图像集合中在第一时间段内采集到的图像子集合中的第二图像进行数据分布对齐,通过成熟的神经网络对进行过数据分布对齐的第二图像进行特征提取,在对第二图像进行特征提取的过程中,在得到第二图像的亮度维度的特征图之后,通过成熟的卷积神经网络利用与在第一时间段内采集的图像对应的特征图在亮度特征维度上的数据分布特性,对第二图像在亮度维度的特征图进行数据分布对齐;在得到第二图像的纹理维度的特征图之后,通过成熟的卷积神经网络利用与在第 一时间段内采集的图像对应的特征图在纹理特征维度上的数据分布特性,对第二图像在纹理维度的特征图进行数据分布对齐;得到第二图像的颜色维度的特征图之后,通过成熟的卷积神经网络利用与在第一时间段内采集的图像对应的特征图在颜色特征维度上的数据分布特性,对第二图像在颜色维度的特征图进行数据分布对齐。进而通过成熟的卷积神经网络基于进行过数据分布对齐的亮度特征维度上特征图、进行过数据分布对齐的纹理特征维度上特征图以及进行过数据分布对齐的颜色特征维度上特征图,生成第二图像的特征信息。服务器对在第一时间段内采集的图像子集合中的每个图像均执行前述操作,得到在第一时间段内采集的图像子集合中每个图像的特征信息。
与第二时间段对应的数据分布特性的具体生成方式可以参考与第一时间段对应的数据分布特性的具体生成方式,在第二时间段内采集的图像子集合中每个图像的特征信息的具体生成方式可以参考在第一时间段内采集的图像子集合中每个图像的特征信息的具体生成方式,此次不做赘述。
在用户设备需要对待处理图像进行重识别情况下,用户设备会向服务器发送匹配请求,以接收服务器发送的与待处理图像匹配的至少一个图像。其中,所述匹配到的至少一个图像与待处理图像中包括的为相同的拍摄对象,所述匹配请求中携带有待处理图像以及待处理图像的图像采集时刻,本实施例中以待处理图像是在第一时间段内采集的为例。
服务器在接收到匹配请求之后,根据匹配请求获知需要从图像集合中获取与待处理图像匹配的至少一个图像,待处理图像是在第一时间段内采集的。则服务器获取与第一时间段对应的数据分布特性,根据第一时间段采集的图像的数据分布特性对待处理图像进行数据分布对齐。进而服务器通过成熟的卷积神经网络对执行过数据分布对齐的待处理图像进行特征提取,在进行特征提取的过程中,在得到待处理图像在亮度特征维度的特征图之后,利用与第一时间段采集的图像对应的特征图在亮度特征的数据分布特性对待处理图像在亮度维度的特征图进行数据分布对齐;在得到待处理图像在纹理特征维度的特征图之后,利用与第一时间段采集的图像对应的特征图在纹理特征的数据分布特性对待处理图像在纹理维度的特征图进行数据分布对齐;在得到待处理图像在颜色特征维度的特征图之后,利用与第一时间段采集的图像对应的特征图在颜色特征的数据分布特性对待处理图像在颜色维度的特征图进行数据分布对齐;进而根据与待处理图像对应的且进行过数据分布对齐的亮度特征维度的特征图、纹理特征维度的特征图以及颜色特征维度的特征图,得到待处理图像的特征信息。
服务器在得到待处理图像的特征信息之后,会将待处理图像的特征信息与所述图像集合中的每个图像的特征信息进行匹配,以得到匹配结果,进而将匹配结果发送给客户设备,客户设备向用户展示所述匹配结果。前述步骤的具体实现方式以及匹配结果的具体内容均可以参考监控场景的第一种实现方式中的描述,此处不做赘述。
需要说明的是,上述实施例中对于监控系统中摄像机数量、预设个数、三个特征维度以及时间段的举例均仅为方便理解本方案,不用于限定本方案。
第三种,本实施例结合图3进行举例,本实施例与监控场景的第一种实现方式以及第二种实现方式的区别在于,监控场景的第一种实现方式中是将来源摄像机作为分类标准,监控场景的第二种实现方式中是将图像采集时间作为分类标准,本实现方式中是将图像采 集地点作为分类标准。
服务器在接收到摄像机1至摄像机4发送的图像之后,前述摄像机1至摄像机4发送的图像构成服务器中的图像集合,以图像采集地点作为图像子集合的分类标准。本实施例中以摄像机1位于北京、摄像机2和摄像机3位于山东、摄像机4位于广州为例,将由摄像机1至摄像机4采集的图像构成的图像集合分为三个图像子集合。
服务器生成与图像采集地点北京对应的数据分布特性,与图像采集地点北京对应的数据分布特性中包括所述图像集合中在北京采集的图像的数据分布特性和与所述在北京采集的图像对应的特征图的数据分布特性,具体实现方式可以参阅上述监控场景中的第一种实现方式和第二种实现方式中的描述。
服务器基于所述图像集合中在北京采集的图像的数据分布特性,对所图像集合中在北京采集的每个图像进行数据分布对齐。通过成熟的卷积神经网络对进行过数据分布对齐的图像进行特征提取,在特征提取过程中,根据与所述在北京采集的图像对应的特征图的数据分布特性,对在特征提取过程中生成的特征图进行数据分布对齐,以得到所述图像集合中在北京采集的每个图像的特征信息,具体实现方式可以参阅上述监控场景中的第一种实现方式和第二种实现方式中的描述。
对于与图像采集地点山东对应的数据分布特性的具体生成方式,以及,与图像采集地点广州对应的数据分布特性的具体生成方式,均可参阅与图像采集地点北京对应的数据分布特性的具体生成方式中的描述。对于所述图像集合中在山东采集的每个图像的特征信息的具体生成方式,以及,对于所述图像集合中在广州采集的每个图像的特征信息的具体生成方式,均可参阅所述图像集合中在北京采集的每个图像的特征信息的具体生成方式。
用户设备需要获取与待匹配图像相匹配的至少一个图像时,服务器收到匹配请求,匹配请求中携带有待匹配图像以及待匹配图像的图像采集地点。从而服务器可以利用与待匹配图像的图像采集地点对应的数据分布特性,对待匹配图像以及与待匹配图像对应的特征图进行数据分布对齐,进而得到待匹配图像的特征信息,具体实现方式可以参阅监控系统的重识别场景中第一种和第二种实现方式中的描述。
服务器在得到待处理图像的特征信息之后,会将待处理图像的特征信息与所述图像集合中的每个图像的特征信息进行匹配,以得到匹配结果,进而将匹配结果发送给客户设备。前述步骤的具体实现方式以及匹配结果的具体内容均可以参考监控场景的第一种实现方式中的描述,此处不做赘述。
需要说明的是,本实施例中对于图像采集地点的举例均仅为方便理解本方案,不用于限定本方案。
第四种,本实施例与上述三种实现方式的区别在于,本实现方式中是将图像中拍摄对象的对象类型作为分类标准。其中,所述对象类型指的是对象的物种类型,作为示例,例如人、鸟、猫以及狗分布属于不同的对象类型。
服务器在获取到由摄像机1至摄像机4采集的图像构成的图像集合之后,可以根据图像中拍摄对象的对象类型将所述图像集合分为至少两个不同的图像子集合。服务器生成与每个图像子集合对应的数据分布特性,利用与每个图像子集合对应的数据分布特性,对图像子集合中的图像以及与图像对应的特征图进行数据分布对齐,从而生成图像集合中每个 图像的特征信息。
服务器在接收到匹配请求之后,从匹配请求中获取到待处理图像之后,确定待处理图像中拍摄对象的对象类型,此处以待处理图像中拍摄对象的类型为狗为例,则服务器可以从与每个图像子集合对应的数据分布特性中获取与由拍摄对象为狗的图像构成的图像子集合对应的数据分布特性,进而根据与由拍摄对象为狗的图像构成的图像子集合对应的数据分布特性,对待处理图像以及与待处理图像对应的特征图进行数据分布对齐,得到待处理图像的特征信息。
服务器将待处理图像的特征信息与所述图像集合中的每个图像的特征信息进行匹配,以得到匹配结果,进而将匹配结果发送给客户设备。
需要说明的是,本实施例中对于拍摄对象的对象类型的举例,仅为方便理解本方案,不用于限定本方案。对于本实施例中上述步骤的具体实现方式,可以参考监控系统的重识别场景中第一种实现方式至第三种实现方式的描述,此处不做赘述。
本申请实施例中,在监控系统的重识别场景中使用本申请实施例提供的图像处理方法,提高了卷积神经网络的特征提取性能,从而可以根据更准确的特征信息进行图像匹配操作,有利于提高图像匹配的准确率,也即提高监控系统的图像匹配过程的准确率。
接下来以无人超市中的行人重识别场景为例,介绍本申请实施例提供的图像处理方法的应用阶段的一种实现方式。请参与图4,图4为本申请实施例提供的图像处理方法的一种示意图。图4中以监控系统中包括8个摄像机,训练设备部署于服务器上,执行设备部署于客户设备上,且客户设备以来源摄像机作为不同图像子集合的分类标准为例。
服务器在训练得到成熟的卷积神经网络之后,可以将成熟的卷积神经网络发送给客户设备。摄像机1至摄像机8在采集到视频之后,将采集到的视频实时发送给客户设备,客户设备从每个摄像机发送的视频中获取并存储与每个摄像机对应的图像,也即客户设备基于摄像机1至摄像机8采集到的视频,分别获取并存储与摄像机1至摄像机8对应的图像。前述与摄像机1至摄像机8对应的图像构成客户设备上的图像集合,所述图像集合包括12个图像子集合,分别为与摄像机1对应的图像子集合、与摄像机2对应的图像子集合、与摄像机3对应的图像子集合、……、与摄像机7对应的图像子集合以及与摄像机8对应的图像子集合。
客户设备通过成熟的卷积神经网络生成与每个摄像机对应的数据分布特性,以及,提取每个图像子集合中每个图像的特征信息。客户设备生成与每个摄像机对应的数据分布特性的具体实现方式,以及,客户设备生成每个图像子集合中每个图像的特征信息的具体实现方式,与监控场景的第一种实现方式中服务器生成与摄像机对应的数据分布特性的具体实现方式,以及服务器生成每个图像子集合中每个图像的特征信息的具体实现方式类似,可以参阅与监控场景的第一种实现方式的描述,此处不做赘述。
在客户设备想要对通过摄像机1至摄像机8获取的图像中的某一个待处理图像进行匹配时,可以确定待处理图像的来源于摄像机1至摄像机8中的哪一个摄像机,本实施例中以待处理图像来源于摄像机3为例。则客户设备根据与摄像机3对应的图像的数据分布特性,对待处理图像进行数据分布对齐。通过成熟的卷积神经网络对待处理图像进行特征提取,在对待处理图像进行特征提取的过程中,利用与摄像机3对应的图像的特征图在至少 一个特征维度上的数据分布特性,分别对待处理图像的至少一个特征维度上的特征图进行数据分布对齐,并根据执行过数据分布对齐的特征图生成待处理图像的特征信息,前述步骤的具体实现方式可以参阅监控场景的第一种实现方式的描述。
客户设备将待处理图像的特征信息与图像集合中每个图像的特征信息进行匹配,得到匹配结果,并通过展示界面向用户展示匹配结果,匹配结果的内容可以参阅监控场景的第一种实现方式的描述。
本申请实施例中,在无人超市的行人重识别场景中采用本申请实施例提供的图像处理方法,提高了图像匹配过程的准确率,以提高无人超市在无人监管下超市的安全度。
再接下来以客户设备中配置有图像识别功能场景为例,介绍本申请实施例提供的图像处理方法的应用阶段的两种实现方式。其中,所述客户设备是配置有图像识别功能的客户设备,作为示例,例如配置有人脸识别功能的手机,以下以客户设备为手机形态为例对前述两种实现方式进行详细介绍。
第一种,本实施例中以执行设备配置于手机上,且以来源摄像机作为分类标准为例。
由于手机配置有图像识别功能,则手机在出厂之前被配置有成熟的卷积神经网络,以及与手机上的摄像机对应的数据分布特性。其中,与手机上的摄像机对应的数据分布特性包括手机上的摄像机采集到的图像的数据分布特性,以及,与手机上的摄像机采集到的图像对应的特征图在至少一个特征维度上的数据分布特性。具体的,技术人员在手机出厂之前可以通过手机上的摄像机采集到预设个数的图像,并利用成熟的卷积神经网络对所述预设个数的图像包括的每个图像进行特征提取,得到每个图像在至少一个特征维度上的特征图,进而生成所述预设个数的图像在至少一个特征维度上的特征图的数据分布特性,前述步骤的具体实现方式可以参考监控系统的重识别场景中第一种实现方式中生成与摄像机1对应的数据分布特性的描述,此处不做赘述。
手机在出售之后,用户通过手机的摄像机采集到待处理图像,并需要对通过摄像机采集的待处理图像进行识别时,手机会先根据手机上的摄像机采集到的图像的数据分布特性,对待处理图像进行数据分布对齐,进而利用成熟的卷积神经网络对进行过数据分布对齐的待处理图像进行特征提取,并根据与手机上的摄像机采集到的图像对应的特征图在至少一个特征维度上的数据分布特性,对待处理图像在至少一个特征维度的特征图进行数据分布对齐,并通过成熟的卷积神经网络根据进行过数据分布对齐的至少一个特征维度的特征图,生成待处理图像的特征信息,利用生成的待处理图像的特征信息进行识别,得到待处理图像的描述信息。
第二种,本实施例中以执行设备配置于手机上,且以图像中拍摄对象的对象类型作为分布标准为例。本实施例与客户设备中配置有图像识别功能场景中第一种实现方式类似,区别在于本实施例中手机上配置的数据分布特性是与拍摄对象的至少一种对象类型对应的数据分布特性,所述数据分布特性包括图像级别的数据分布特性和特征图级别的数据分布特性。作为示例,例如对象类型可以包括陆地动物、两栖动物、海洋动物、植物和非生物,技术人员在手机出厂之前可以在手机上配置有与陆地动物对应的数据分布特性、与两栖动物对应的数据分布特性、与海洋动物对应的数据分布特性、与植物对应的数据分布特性、以及与非生物对应的数据分布特性。
手机在出售之后,用户通过手机的摄像机采集到待处理图像,并需要待处理图像进行识别时,会先确定待处理图像中拍摄对象的对象类别,本实施例中以拍摄对象的对象类别为植物为例。则手机会获取与植物对应的数据分布特性包括的图像级别的数据分布特性,对待处理图像进行数据分布对齐。进而通过成熟的卷积神经网络对进行过数据分布对齐的待处理图像进行特征提取,在特征提取的过程中,根据与植物对应的数据分布特性包括的特征图级别的数据分布特性,对特征提取过程中的特征图进行数据分布对齐,并根据进行过数据分布对齐的特征图生成特征信息,进而利用生成的待处理图像的特征信息进行识别,得到待处理图像的描述信息。
需要说明的是,上述两种实现方式中仅以客户设备为手机形态为例进行说明,实际情况中客户设备还可以为平板、笔记本电脑、可穿戴装置或其他终端侧的设备等等。
本申请实施例中,在客户设备中配置有图像识别功能的场景中采用本申请实施例提供的图像处理方法,提高了卷积神经网络的特征提取性能,从而有利于提高图像识别的准确率。
通过上述对三种典型的应用场景的各种实现方式的描述可知,本申请实施例提供的图像处理方法中的卷积神经网络的通用能力主要包括图像匹配和图像识别这两种,而在卷积神经网络的通用能力为图像匹配和图像识别这两种情况下的具体实现方式有所不同,以下分别对前述两种能力在应用阶段的具体实现方式进行描述。
一、图像匹配
本申请的一些实施例中,请参与图5,图5为本申请实施例提供的图像处理方法的一种流程示意图。具体的,本申请实施例提供的图像处理方法可以包括:
501、执行设备生成数据分布特性集合。
本申请的一些实施例中,在进行图像匹配之前,执行设备会生成数据分布特性集合。其中,数据分布特性集合中包括与至少两个图像子集合中每个图像子集合对应的数据分布特性。参阅各个应用场景实施例中的描述,与每个图像子集合对应的数据分布特性可以包括与图像子集合中图像对应的特征图的数据分布特性和图像子集合中图像的数据分布特性,与图像子集合中图像对应的特征图的数据分布特性中可以包括至少一个特征维度的特征图的数据分布特性。进一步地,前述一个或多个特征维度包括但不限于颜色特征维度、纹理特征维度、分辨率特征维度、亮度特征维度等等,对应的,前述至少一个特征维度的特征图的数据分布特性包括但不限于与图像子集合中图像对应的颜色特征图的数据分布特性、与图像子集合中图像对应的纹理特征图的数据分布特性、与图像子集合中图像对应的分辨率特征图的数据分布特性、与图像子集合中图像对应的亮度特征图的数据分布特性等等。
由于图像子集合中的图像或者与图像子集合中图像对应的特征图均可以通过矩阵的形式存储在执行设备中,数据分布特性指的是对与至少一个图像对应的矩阵或者对与至少一个特征图对应的矩阵进行数据分布统计,得到的数据分布特性。作为示例,例如监控系统中在19点至6点这个时间段采集获取到的图像整体亮度偏低,则与19点至6点这个时间段采集的图像构成的图像子集合的数据分布特性可以为亮度偏低;作为另一示例,例如有的摄像机的分辨率较低,则通过该摄像机采集的图像的数据分布特性可以为分辨率偏低 等,此次不做限定。进一步地,数据分布特性中也可以包括多个图像或多个特征图的均值和方差等。为更直观的理解本方案,请参阅图6,图6为本申请实施例提供的图像处理方法中数据分布特性的一种示意图。图6中以通过二维坐标系展示数据分布特性为例示出了两个数据分布特性的示意图,图6中二维坐标系的横轴和纵轴分别对应对图像的数据分布进行描述的两个维度。应理解,数据分布特性还可以通过三维坐标图或其他图形展示。
不同的图像子集合的分类标准可以为图像采集装置的来源,也即不同的图像子集合中的图像来源于不同的图像采集装置;不同的图像子集合的分类标准可以为图像采集时间段,也即不同的图像子集合中的图像是在不同的时间段内采集的;不同的图像子集合的分类标准还可以为图像采集地点,也即不同的图像子集合中的图像是在不同的地点采集的;不同的图像子集合的分类标准还可以为图像中拍摄对象的对象类型,也即不同的图像子集合中的图像中拍摄对象的对象类型不同。进一步地,前述图像采集装置包括但不限于摄像机、雷达或其他类型的图像采集装置;前述时间段指的可以为一天内不同的时间段;前述图像采集地点的划分粒度可以为省、市或县等;前述拍摄对象的对象类型的划分粒度可以为界、门、纲、目、科、属或种等,此处均不做限定。
具体的,执行设备上存储有第二图像集合,从而执行设备根据第二图像集合中的图像生成数据分布特性。其中,第二图像集合中包括至少两个图像子集合。作为示例,例如监控场景的重识别场景中,摄像机1至摄像机4采集的图像构成第二图像集合;作为另一示例,例如无人超市的行人重识别场景中,摄像机1至摄像机8采集的图像构成第二图像集合等,此次不做穷举。
更具体的,针对执行设备获取第二图像集合的过程。若执行设备配置于服务器侧,在一种实现方式中,可以参见上述监控场景的重识别场景中第一种实现方式至第四种实现方式的描述,服务器中的执行设备直接接收图像采集装置发送的图像,从图像采集装置接收到的所有图像组成第二图像集合。在另一种实现方式中,服务器中的执行设备直接接收图像采集装置发送的视频,从图像采集装置接收到的视频中获取图像,从图像采集装置发送的视频中获取的图像组成第二图像集合。在另一种实现方式中,图像采集装置与客户设备连接,图像采集装置采集到图像或视频之后发送给客户设备,由客户设备向服务器中的执行设备发送图像,所述客户设备发送的图像组成第二图像集合。若执行设备配置于终端侧的设备中,在一种实现方式中,可以参见上述无人超市的行人重识别场景的实现方式的描述,终端侧的执行设备直接接收图像采集装置发送的视频,执行设备从接收到的视频中获取图像,所述从图像采集装置发送的视频中获取到的图像组成第二图像集合。在另一种实现方式中,终端侧的执行设备可以接收图像采集装置发送的图像,所述图像采集装置发送的图像组成第二图像集合。
针对执行设备根据第二图像集合中的图像生成与每个图像子集合对应的数据分布特性的过程。参阅上述监控系统的重识别场景以及无人超市的重识别场景中各种实现方式中的描述,执行设备可以在获取到某个图像子集合中的图像达到预设个数时,根据预设个数的图像生成与该图像子集合对应的数据分布特性。其中,预设个数的取值可以为50、100、200、300、400、500、600、700、800、900或其他数值等,此次不做限定。具体实现方式可以参阅上述监控系统的重识别场景以及无人超市的重识别场景中各种实现方式中的描 述,此次不做赘述。可选地,由于执行设备可以实时获取新的图像,也即第二图像集合中的图像是在不断更新的,则执行设备在初次生成与每个图像子集合对应的数据分布特性之后,还可以根据新获取到的图像,更新每个图像子集合对应的数据分布特性。
502、执行设备获取第二待处理图像。
本申请的一些实施例中,执行设备从第二图像集合中获取第二待处理图像,第二待处理图像为第二图像集合中的任一个图像。
503、执行设备获取与第二待处理图像对应的第四数据分布特性,第四数据分布特性为第三图像集合中图像的数据分布特性。
本申请的一些实施例中,执行设备在获取到第二待处理图像之后,获取第二待处理图像归属的第三图像集合,进而可以获取与第三图像集合中图像对应的第四数据分布特性,也即获取到与第二待处理图像对应的第四数据分布特性。其中,第三图像集合为第二图像集合包括的至少两个图像子集合中的任一个图像子集合。第四数据分布特性为所述数据分布特性集合中用于指示第三图像集合中图像的数据分布特性,作为示例,例如监控系统的重识别场景中通过摄像机2采集的图像的数据分布特性,再例如无人超市的行人重识别场景中通过摄像机5采集的图像的数据分布特性等。
504、执行设备根据第四数据分布特性,对第二待处理图像进行数据分布对齐。
本申请的一些实施例中,执行设备会根据第四数据分布特性,对第二待处理图像进行数据分布对齐。其中,对第二待处理图像进行数据分布对齐的过程指的是将第二待处理图像的数据分布向非线性函数的敏感值区域拉拢的过程,方法是减弱第二待处理图像的数据分布中携带的第三图像集合中图像的数据分布特性。
在一种实现方式中,第四数据分布特性包括与第三图像集合中图像对应的均值和与第三图像集合中图像对应的方差,步骤504包括:执行设备根据与第三图像集合中图像对应的均值和与第三图像集合中图像对应的方差,对第二待处理图像进行归一化处理。具体的,执行设备将第二待处理图像的数据分布,与第三图像集合中图像对应的均值相减,并与第三图像集合中图像对应的方差相除,得到进行过数据分布对齐后的第二待处理图像。
为进一步理解本方案,此处以来源摄像机为分类标准为例,第三图像集合中图像为通过摄像机c采集的,则生成与第三图像集合中图像对应的均值的公式如下:
Figure PCTCN2020118076-appb-000002
其中,μ (c)表示第c个摄像机采集的图像中M个图像的均值,c表示第c个摄像机,
Figure PCTCN2020118076-appb-000003
表示M个图像中的一个图像,M的取值可以为50、100、200、300、500或其他数值等。
Figure PCTCN2020118076-appb-000004
其中,
Figure PCTCN2020118076-appb-000005
表示第c个摄像机采集的图像中M个图像的方差。需要说明的是,此处公式的公开仅为方便理解本方案,不用于限定本方案。
在另一种实现方式中,步骤504包括:执行设备根据第四数据分布特性,通过调整第二待处理图像的色彩空间,以实现对第二待处理图像的数据分布对齐。作为示例,例如第四数据分布特性指示第三图像集合中图像的亮度偏高,则可以将第二待处理图像转换到色调、饱和度和亮度(hue saturation value,HSV)通道,然后将第二待处理图像的亮度调低,以实现对第二待处理图像的数据分布对齐。应理解,此处举例仅为方便理解本方案,不用于限定本方案。
为更为直观的展示本方案,请参阅图7,图7为本申请实施例提供的图像处理方法中进行数据分布对齐的一种示意图。图7中以通过二维图展示图像的数据分布特性为例,图7中上面的图为未经过数据分布对齐的数据分布特性,图7中下面的图为执行过数据分布对齐后的数据分布特性,在进行过数据分布对齐后将图像的数据分布拉到了非线性函数的敏感值区域。应理解,图7中的示例仅为方便理解本方案,不用于限定本方案。
本申请实施例中,不仅在特征提取过程中对特征图进行数据分布对齐,在进行特征提取之前,还会对待处理图像进行数据分布对齐,也即神经网络处理的图像也有着相似的数据分布,进一步提高了跨场景的不同图像之间的相似度,也即进一步降低了神经网络的图像处理难度,从而进一步提升神经网络在跨场景的特征提取性能。
505、执行设备获取第三数据分布特性,第三数据分布特性为与第三图像集合中图像对应的特征图的数据分布特性。
本申请的一些实施例中,在对第二待处理图像进行特征提取之前,执行设备还会获取第三数据分布特性。其中,第三数据分布特性为与第二待处理图像归属的第三图像集合中图像对应的特征图的数据分布特性,第三数据分布特性中包括一个或多个特征维度的数据分布特性,前述一个或多个特征维度包括但不限于颜色特征维度、纹理特征维度、分辨率特征维度、亮度特征维度等等。结合上述监控系统的重识别场景中第一种实现方式进行举例,例如第三图像集合为监控系统的重识别场景中通过摄像机3采集的图像,则第三数据分布特性包括与摄像机3采集的图像对应的特征图在颜色特征的数据分布特性、与摄像机3采集的图像对应的特征图在纹理特征的数据分布特性和与摄像机3采集的图像对应的特征图在分辨率特征的数据分布特性。特征图级别的数据分布特性的具体表现形式与图像级别的数据分布特性的具体表现形式类似,可以参考图7中的示例,此次不再赘述。
506、执行设备对第二待处理图像进行特征提取,并根据第三数据分布特性,在进行特征提取过程中对第二特征图进行数据分布对齐,得到第二待处理图像的特征信息。
本申请的一些实施例中,执行设备在获取到第三数据分布特性之后,通过成熟的卷积神经网络对第二待处理图像进行特征提取,得到第二待处理图像在至少一个特征维度上的特征图,利用第三数据分布特性中包括一个或多个特征维度的数据分布特性,对第二待处理图像的每个特征维度上的第二特征图进行数据分布对齐,进而根据执行过数据分布对齐的每个特征维度上的第二特征图,生成第二待处理图像的特征信息。其中,第二待处理图像为第三图像集包括的至少一个图像中任一个图像;第二特征图为对第二待处理图像进行 特征提取过程中生成的,结合监控系统中重识别场景的第一中实现方式进行举例,例如第二特征图为第一图像在颜色特征维度上的特征图、第一图像在纹理特征维度的特征图或第一图像在分辨率特征维度的特征图等。步骤506的具体实现方式可以参阅上述监控系统中重识别场景以及无人超市的重识别场景中各种实现方式中的描述,此处不做赘述。
具体的,由于卷积神经网络对一个图像进行特征提取过程中会生成至少一个特征维度的特征图,对与第一图像集合中图像对应的每个特征维度的特征图进行数据分布统计,都会得到一个均值和一个方差,则根据与第一图像集合中图像对应的特征图生成的第三数据分布特性中包括至少一个均值和至少一个方差,均值和方差的数量和特征维度的维度数量相同。步骤506可以包括:执行设备对第一待处理图像进行特征提取,并根据至少一个均值和至少一个方差,在进行特征提取过程中对第一特征图包括的至少一个特征图进行标准化处理。更具体的,执行设备在通过成熟的卷积神经网络进行特征提取的过程中可以得到目标特征维度的特征图,执行设备从第三数据分布特性中获取与目标特征维度对应的目标均值和目标方差,并将第一待处理图像在目标特征维度的特征图与目标均值相减,再与目标方差相除,得到进行过标准化处理后的目标特征维度的特征图。其中,目标特征维度为所述至少一个特征维度中的任一个特征维度。进一步地,步骤506的具体实现方式可以参见上述各个场景实施例中对特征图进行数据分布对齐部分的描述,此次不做赘述。本申请实施例中,提供了对待处理图像的特征图进行数据分布对齐的具体实现方式,操作简单,易实现。
为了进一步理解本方案,请参阅图8,图8为本申请实施例提供的图像处理方法中卷积神经网络的一种示意图,图8中以不同图像子集合的分类标准为摄像机为例。请参阅图8,本申请实施例中提及的卷积神经网络包括一个输入层、至少一个卷积层、至少一个基于摄像机的批量标准化层(camera-based batch normalization,CBN)、至少一个激活函数层、至少一个隐含层和一个输出层,与目前存在的卷积神经网络的区别在于本实施例中的卷积神经网络将目前存在的卷积神经网络中的批量标准化层(batch normalization,BN)替换为基于摄像机的标准化层(也即CBN)。进一步地,至少一个卷积层可以包括用于提取图像的纹理特征的卷积层、用于提取图像的颜色特征的卷积层、用于提取图像的亮度特征的卷积层、用于提取图像的分辨率特征的卷积层或用于提取其他类型特征维度的卷积层。对应的,至少一个CBN中包括用于对图像在纹理特征维度的特征图进行数据分布对齐的CBN、用于对图像在颜色特征维度的特征图进行数据分布对齐的CBN、用于对图像在亮度特征维度的特征图进行数据分布对齐的CBN、用于对图像在分辨率特征维度的特征图进行数据分布对齐的CBN或用于对图像在其他类型特征维度的特征图进行数据分布对齐的CBN。
结合图8中的示例,步骤506可以包括:执行设备将第二待处理图像输入到输入层,由第一卷积层执行特征提取操作,得到第二待处理图像在第一特征维度的特征图,第一基于摄像机的标准化层根据第三数据分布特性包括的第一特征维度的数据分布特性,对第二待处理图像在第一特征维度的特征图进行数据分布对齐,由第一激活函数层对进行过数据分布对齐操作的第一特征图进行激活。其中,第一卷积层为卷积神经网络包括的至少一个卷积层中的任一个卷积层,第一基于摄像机的标准化层为卷积神经网络包括的至少一个基 于摄像机的标准化层中的任一个基于摄像机的标准化层。执行设备重复执行前述操作,以对每个特征维度的特征图进行数据分布对齐后激活,进而得到第二待处理图像的特征信息。在卷积神经网络的功能为图像匹配的情况下,至少一个隐含层的任务为图像匹配,由输出层输出图像匹配结果。
进一步地,卷积层可以包括很多个卷积算子,卷积算子也称为核,其在图像处理中的作用相当于一个从输入图像矩阵中提取特定信息的过滤器,卷积算子本质上可以是一个权重矩阵,这个权重矩阵通常被预先定义,在对图像进行卷积操作的过程中,权重矩阵通常在输入图像上沿着水平方向一个像素接着一个像素(或两个像素接着两个像素……这取决于步长stride的取值)的进行处理,从而完成从图像中提取特定特征的工作。该权重矩阵的大小应该与图像的大小相关,需要注意的是,权重矩阵的纵深维度(depth dimension)和输入图像的纵深维度是相同的,在进行卷积运算的过程中,权重矩阵会延伸到输入图像的整个深度。因此,和一个单一的权重矩阵进行卷积会产生一个单一纵深维度的卷积化输出,但是大多数情况下不使用单一权重矩阵,而是应用维度相同的多个权重矩阵。每个权重矩阵的输出被堆叠起来形成卷积图像的纵深维度。不同的权重矩阵可以用来提取图像中不同的特征,例如一个权重矩阵用来提取图像边缘信息,另一个权重矩阵用来提取图像的特定颜色,又一个权重矩阵用来对图像中不需要的噪点进行模糊化……该多个权重矩阵维度相同,经过该多个维度相同的权重矩阵提取后的特征图维度也相同,再将提取到的多个维度相同的特征图合并形成卷积运算的输出。
这些权重矩阵中的权重值在实际应用中需要经过大量的训练得到,通过训练得到的权重值形成的各个权重矩阵可以从输入图像中提取信息,从而帮助卷积神经网络进行正确的预测。
当卷积神经网络有多个卷积层的时候,初始的卷积层(例如)往往提取较多的一般特征,该一般特征也可以称之为低级别的特征;随着卷积神经网络深度的加深,越往后的卷积层(例如)提取到的特征越来越复杂,比如高级别的语义之类的特征,语义越高的特征越适用于待解决的问题。
在经过卷积层的处理后,卷积神经网络还不足以输出所需要的输出信息。因为如前所述,卷积层只会提取特征,并减少输入图像带来的参数。然而为了生成最终的输出信息(所需要的类信息或别的相关信息),卷积神经网络需要利用神经网络层来生成一个或者一组所需要的类的数量的输出。因此,在神经网络层中可以包括多层隐含层以及输出层,该多层隐含层中所包含的参数可以根据具体的任务类型的相关训练数据进行预先训练得到,例如该任务类型可以包括图像识别,图像分类,图像超分辨率重建等等,本实施例中多层隐含层的任务类型为图像匹配。
在神经网络层中的多层隐含层之后,也就是整个卷积神经网络的最后层为输出层,该输出层具有类似分类交叉熵的损失函数,具体用于计算预测误差,一旦整个卷积神经网络的前向传播(如图8由输入层至输出层的传播为前向传播)完成,反向传播(如图3由输出层至输入层的传播为反向传播)就会开始更新前面提到的各层的权重值以及偏差,以减少卷积神经网络的损失及卷积神经网络通过输出层输出的结果和理想结果之间的误差。
应理解,如图8所示的卷积神经网络仅作为一种卷积神经网络的示例,在具体的应用 中,卷积神经网络还可以以其他网络模型的形式存在,例如,卷积神经网络中还可以包括池化层等。
需要说明的是,步骤502至504为可选步骤,若执行步骤502至504,则步骤506中执行设备处理的为进行过数据分布对齐的第二待处理图像,且本申请实施例不限定步骤505和506与步骤502至504之间的执行顺序,可以为同时执行步骤502和505,再执行步骤504,再执行步骤506;也可以先执行步骤502至504,再执行步骤505和506。若不执行步骤502至504,则步骤506中执行设备处理的执行设备获取的原始第二待处理图像。
执行设备重复执行步骤502至506,以生成第二图像集合中每个图像的特征信息。
507、执行设备获取第一待处理图像。
本申请的一些实施例中,执行设备获取第一待处理图像,还可以获取以下信息中的一项或多项:第一待处理图像的来源图像采集装置、第一待处理图像的图像采集时间、第一待处理图像的图像采集地点、第一待处理图像中拍摄对象的对象类型或第一待处理图像的其他信息等。
具体的,若执行设备为服务器,则客户设备可以接收用户输入的匹配请求,进而向执行设备发送匹配请求,对应的,执行设备可以接收客户设备发送的匹配请求,匹配请求中携带有第一待处理图像,还可以携带有以下信息中的一项或多项:第一待处理图像的来源图像采集装置、第一待处理图像的图像采集时间、第一待处理图像的图像采集地点或第一待处理图像的其他信息等。具体的,客户设备中可以配置有具有图像匹配功能的客户端,从而用户通过前述客户端输入匹配请求。更具体的,前述客户端中可以接收有第一待处理图像以及第一待处理图像的相关信息的获取接口,客户设备可以从移动存储设备、客户设备中的存储装置中获取第一待处理图像以及第一待处理图像的相关信息;客户设备还可以通过通信网络从其他设备处获取第一待处理图像以及第一待处理图像的相关信息等。
若执行设备为终端侧的设备,则执行设备可以接收用户输入的匹配请求,匹配请求中包括第一待处理图像以及第一待处理图像的相关信息。在一种实现方式中,参阅无人超市的行人重识别场景中,执行设备可以从图像采集装置处直接获取第一待处理图像以及第一待处理图像的相关信息。在另一种实现方式中,执行设备可以从移动存储设备,或者,通过通信网络从其他设备处获取第一待处理图像以及第一待处理图像的相关信息。
508、执行设备获取与第一待处理图像对应的第二数据分布特性,第二数据分布特性为第一图像集合中图像的数据分布特性。
本申请的一些实施例中,执行设备在获取到第一待处理图像之后,可以确定第一待处理图像所归属的第一图像集合。其中,第一图像集合为第二图像集合包括的至少两个图像集合中第一待处理图像所归属的图像集合,第一待处理图像的数据分布规律与第一图像集合中图像的数据分布规律相同。第一图像集合与第三图像集合可以为相同的图像集合,也可以为不同的图像集合。
具体的,在一种实现方式中,第一待处理图像和第一图像集合中的图像来源于同一目标图像采集装置,也即第一图像集合中不同图像子集合的分类标准为来源图像采集装置。则步骤508包括:执行设备根据匹配请求,获取采集第一待处理图像的目标图像采集装置的标识信息,并从第二图像集合包括的至少两个图像子集合中确定与目标图像采集装置的 标识信息对应的第一图像集合。其中,第一图像集合包括通过目标图像采集装置采集到的图像;目标图像采集装置的标识信息用于唯一标识目标图像采集装置,具体可以表现为数字编号、字符编号或其他类型的标识信息等,作为示例,例如目标图像采集装置的标识信息可以表现为“000001”、“BJ00001”或其他标识信息等。更具体的,执行设备上可以存储有图像采集装置的标识信息和图像子集合之间的一一映射关系,从而执行设备在获取到目标图像采集装置的标识信息之后,可以根据预先配置的映射关系,获取到与目标图像采集装置的标识信息对应的第一图像集合。本申请实施例中,不同的图像采集装置由于硬件配置或参数设置的不同,从而同一图像采集装置采集的图像的特征图的数据分布中会带有该图像采集装置的特有风格,以来源图像采集装置作为分类标准,根据第一待处理图像归属的第一图像集合中图像的特征图的数据分布特性,对第一待处理图像的特征图进行数据分布对齐,以减弱第一待处理图像的特征图中携带的图像采集装置的特有风格,也即提高来自于不同的图像采集装置的图像的特征图之间的相似度,以降低神经网络的特征提取难度。
在另一种实现方式中,第一待处理图像的图像采集时刻和第一图像集合中图像的图像采集时刻均位于同一目标时间段内,也即第一图像集合中不同图像子集合的分类标准为图像采集时间段。则步骤508包括:执行设备根据匹配请求,获取采集第一待处理图像的图像采集时刻,并从第二图像集合包括的至少两个图像子集合中确定与第一待处理图像的图像采集时刻对应的第一图像集合,其中,第一图像集合包括在目标时间段内采集的图像,第一待处理图像的图像采集时刻位于目标时间段内。
本申请实施例中,不同时间段由于光线信息的不同,从而同一时间段内采集的图像的特征图的数据分布中会带有该时间段的特有风格,以时间段作为分类标准,根据第一待处理图像归属的第一图像集合中图像的特征图的数据分布特性,对第一待处理图像的特征图进行数据分布对齐,以减弱第一待处理图像的特征图中携带的某一时间段的特有风格,也即提高来自于不同的时间段的图像的特征图之间的相似度,以降低神经网络的特征提取难度。
在另一种实现方式中,第一待处理图像和第一图像集合中的图像来源于同一图像采集地点,也即第一图像集合中不同图像子集合的分类标准为图像采集地点。则步骤508包括:执行设备根据匹配请求,获取第一待处理图像的目标图像采集地点,并从第二图像集合包括的至少两个图像子集合中确定与目标图像采集地点对应的第一图像集合,其中,第一图像集合包括在目标图像采集地点采集的图像。
在另一种实现方式中,第一待处理图像中的拍摄对象和第一图像集合包括的图像中的拍摄对象为同一对象类型,也即第一图像集合中不同图像子集合的分类标准为图像中拍摄对象的对象类型。则步骤508包括:执行设备根据匹配请求,获取第一待处理图像中拍摄对象的目标对象类型,并从第二图像集合包括的至少两个图像子集合中确定与目标对象类型对应的第一图像集合,其中,第一图像集合中包括的图像中拍摄对象的对象类型与第一待处理图像中拍摄对象的对象类型相同。
本申请实施例中,提供了获取与第一待处理图像数据分布规律相同的第一图像集合的多种实现方式,扩展了本方案的应用场景,提高了本方案的实现灵活性。
509、执行设备根据第二数据分布特性,对第一待处理图像进行数据分布对齐。
510、执行设备获取与第一待处理图像对应的第一数据分布特性,第一数据分布特性包括与第一图像集合中图像对应的特征图的数据分布特性。
511、执行设备对第一待处理图像进行特征提取,并根据第一数据分布特性,在进行特征提取过程中对第一特征图进行数据分布对齐,得到第一待处理图像的特征信息。
本申请实施例中,执行设备执行步骤509至511的具体实现方式与执行设备执行步骤504至506的具体实现方式类似,此次不做赘述。
512、执行设备根据第一待处理图像的特征信息,将第一待处理图像与第二图像集合中的图像进行匹配。
本申请的一些实施例中,步骤502至506为可选步骤,若执行步骤502至506,则执行设备在获取到第一待处理图像的特征信息之后,可以通过卷积神经网络与第二图像集合中每个图像的特征信息进行匹配,从而得到匹配结果。匹配结果中包括至少一个图像,所述匹配到的至少一个图像中每个图像的拍摄对象与待处理图像中的拍摄对象相同;匹配结果中还可以包括所述匹配到的至少一个图像中每个图像的图像采集地点和图像采集时间。本申请实施例中,在图像重识别场景中,不是根据第二图像集合中的所有图像的特征图的数据分布特性执行数据分布对齐操作,而是根据图像的数据分布规律,将第二图像集合分为至少两个图像子集合,基于图像子集合中图像的特征图的数据分布特性执行数据分布对齐操作,避免了不同图像子集合之间的数据分布特性的互相干扰,有利于大跨度的将待处理图像的特征图的数据分布拉拢到神经网络的敏感区域,提高特征提取性能;待处理图像的特征信息和第二图像集合中每个图像的特征信息的精准度都提高的情况下,提高了图像匹配过程的准确率。
若不执行步骤502至506,则执行设备可以通过不执行数据分布对齐的方式,对第二图像集合中的每个图像进行特征提取,从而得到第二图像集合中每个图像的特征信息。进而将第一待处理图像的特征信息与第二图像集合中每个图像的特征信息进行匹配,得到匹配结果。
为进一步理解本方案,请参阅图9,图9为本申请实施例提供的图像处理方法中特征图的数据分布的一种示意图。图9中以分类标准为来源摄像机,且采用标准化的方式进行数据分布对齐为例。执行设备对与摄像机1采集的一个图像对应的纹理特征的数据进行标准化(standardize),对与摄像机2采集的一个图像对应的纹理特征的数据进行标准化,以及对与摄像机3采集的一个图像对应的纹理特征的数据进行标准化,分别得到进行过标准化处理后的三个特征图的数据,将进行过标准化处理后的三个特征图的数据进行校准,也即将进行过标准化处理后的三个特征图的数据对其到同一坐标系中,可以看出执行标准化处理之前的三个特征图的数据分布相差较大,进行过标准化处理后的三个特征图的数据分布在相似的数据区域中,从而卷积神经网络处理的数据的数据分布相似,降低了卷积神经网络进行特征提取的难度,提高了卷积神经网络特征提取的性能。
513、执行设备输出匹配结果。
本申请的一些实施例中,执行设备在生成匹配结果之后,会输出匹配结果。若执行设备为服务器,则执行设备会将匹配结果发送给客户设备,由客户设备向用户展示匹配结 果;若执行设备为终端设备侧的设备,则执行设备可以通过展示界面向用户展示匹配结果。
二、图像识别
本申请的一些实施例中,请参与图10,图10为本申请实施例提供的图像处理方法的一种流程示意图。具体的,本申请实施例提供的图像处理方法可以包括:
1001、执行设备获取第一待处理图像。
本申请的一些实施例中,执行设备可以通过执行设备上配置的图像采集装置直接拍摄获取第一待处理图像,也可以从执行设备的图库中选取一张图像作为第一待处理图像。作为示例,例如有些执行设备中配置有车牌识别功能,则执行设备在对车牌进行识别时可以通过与集成于执行设备上的摄像机直接采集获取第一待处理图像。可选地,执行设备还可以获取第一待处理图像中拍摄对象的对象类型。作为示例,例如有些手机形态的执行设备中配置有植物种类识别功能,会需要用户先选择待识别图像中拍摄对象的类别,前述拍摄对象的大类包括但不限于植物、猫、狗或其他类别等。
1002、执行设备获取与第一待处理图像对应的第二数据分布特性,第二数据分布特性为第一图像集合中图像的数据分布特性。
本申请的一些实施例中,执行设备在出厂之前,其中可以配置有第二数据分布特性,第二数据分布特性为第一图像集合中图像的数据分布特性。
具体的,在一种情况下,参阅客户设备中配置有图像识别功能场景中第一种实现方式,第一待处理图像和第一图像集合中的图像来源于同一图像采集装置。则本领域技术人员可以在执行设备出厂之前,可以在执行设备上配置有第二数据分布特性和第一数据分布特性,第一数据分布特性中为与第一图像集合中图像对应的特征图的数据分布特性。具体实现方式参阅图像识别功能场景中第一种实现方式。
在另一种情况下,参阅客户设备中配置有图像识别功能场景中第二种实现方式,第一待处理图像中的拍摄对象和第一图像集合包括的图像中的拍摄对象为同一对象类型。则本领域技术人员可以在执行设备出厂之前,可以获取至少两种对象类别的图像的数据分布特性,以及,与每种对象类别的图像对应的特征图在至少一个特征维度上的数据分布特性,并将其布置于执行设备上。作为示例,例如与植物的图像对应的特征图在纹理特征维度上的数据分布特性,与植物的图形对应的特征图在颜色特征维度上的数据分布特性等等。则步骤1002可以包括:执行设备在获取第一待处理图像中拍摄对象的目标类别之后,从至少两种对象类别的图像的数据分布特性中选取与目标类别对应的第二数据分布特性,第一图像集合中的图像为目标类别。
在另一种情况下,第一待处理图像和第一图像集合包括的图像为在同一图像采集地点采集的。则本领域技术人员可以在执行设备出厂之前,可以获取至少两个图像采集地点的图像的数据分布特性,以及,与每个图像采集地点的图像对应的特征图在至少一个特征维度上的数据分布特性,并将其布置于执行设备上。作为示例,例如与在北京采集的图像对应的特征图在纹理特征维度上的数据分布特性,与在北京采集的图形对应的特征图在颜色特征维度上的数据分布特性等等。则步骤1002可以包括:执行设备在获取第一待处理图像的目标图像采集地点之后,从至少两个图像采集地点的图像的数据分布特性中选取与目 标图像采集地点对应的第二数据分布特性,第一图像集合中的图像为从目标图像采集地点采集的。
1003、执行设备根据第二数据分布特性,对第一待处理图像进行数据分布对齐。
1004、执行设备获取与第一待处理图像对应的第一数据分布特性,第一数据分布特性中为与第一图像集合中图像对应的特征图的数据分布特性。
1005、执行设备对第一待处理图像进行特征提取,并根据第一数据分布特性,在进行特征提取过程中对第一特征图进行数据分布对齐,得到第一待处理图像的特征信息。其中,第一特征图为对第一待处理图像进行特征提取过程中生成的。
本申请实施例中,执行设备执行步骤1003至1005的具体实现方式可以参考执行设备执行步骤504至506的具体实现方式,此次不做赘述。
1006、执行设备根据第一待处理图像的特征信息,对第一待处理图像进行识别,得到第一待处理图像中拍摄对象的描述信息。
本申请实施例中,执行设备根据第一待处理图像的特征信息,通过卷积神经网络对第一待处理图像进行识别,得到第一待处理图像中拍摄对象的描述信息。其中,拍摄对象的描述信息可以包括以下一项或多项:拍摄对象的内容、拍摄对象的品种、拍摄对象的属性。作为示例,例如拍摄对象为车牌,则描述信息可以为拍摄对象的车牌号;作为示例,例如拍摄对象为植物,则描述信息可以为植物品种;作为示例,例如拍摄对象是人,则描述信息可以为人的性别、年龄等描述信息等,此处举例仅为方便理解本方案,不用于限定本方案。
1007、执行设备输出描述信息。
本申请实施例中,在获取第一待处理图像之后,获取与第一待处理图像对应的第一数据分布特性,对第一待处理图像进行特征提取,根据第一数据分布特性,在进行特征提取过程中对生成的特征图进行数据分布对齐,由于神经网络处理的是执行过数据分布对齐后的特征图,因此保证神经网络处理的图像都有着相似的数据分布,以提高跨场景的不同图像的特征图之间的相似度,从而降低神经网络的图像处理难度,以提高神经网络在跨场景的特征提取性能;此外,第一数据分布特性是与第一图像集合中图像对应的特征图的数据分布特性,而第一图像集合中的图像与第一待处理图像的数据分布规律相同,利用第一数据分布特性进行数据分布对齐,可以大跨度的将第一待处理图像的特征图的数据分布向神经网络的敏感数据区域拉近,进一步降低神经网络的图像处理难度,进一步提升神经网络在跨场景的特征提取性能。
以上均是对本申请实施例提供的图像处理方法中的应用阶段的具体实现方式进行描述,以下对本申请实施例提供的图像处理方法中的训练阶段的具体实现方式进行描述,也是分为图像匹配和图像识别这两种通用能力分别进行描述。
一、图像匹配
本申请的一些实施例中,请参与图11,图11为本申请实施例提供的图像处理方法的一种流程示意图。具体的,本申请实施例提供的图像处理方法可以包括:
1101、训练设备获取训练图像集合。
本申请的一些实施例中,训练设备上可以配置有训练图像集合,训练图像集合中包括 至少两个训练图像子集合,不同训练图像子集合的分类标准与图5对应实施例中的分类标准相同,此次不做赘述。训练设备上还配置有与训练图像集合中图像一一对应的标识信息,所述标识信息用于唯一标识一个拍摄对象,具体可以为数字编码、字符编码或其他标识信息等。作为示例,例如在拍摄对象为人的情况下,不同的人的标识信息不同,同一个人在不同的训练图像的标识信息相同;作为另一示例,例如在拍摄对象为狗的情况下,不同的狗的标识信息不同,同一个狗在不同的训练图像中的标识信息相同。在对卷积神经网络进行迭代训练之前,训练设备初始化卷积神经网络。
1102、训练设备从训练图像集合中获取至少两个训练图像。
本申请的一些实施例中,训练设备从训练图像集合中获取至少两个训练图像。所述至少两个训练图像包括第一训练图像和第二训练图像,第一训练图像和第二训练图像中包括的为同一个拍摄对象。第一训练图像和第二训练图像可以归属于同一个训练图像子集合,也可以归属于不同的图像子集合。
可选地,所述至少两个第二训练图像中还包括第三训练图像,第三训练图像与第一训练图像中为不同的拍摄对象。进一步可选地,所述至少两个训练图像中还可以包括更多的训练图像,具体训练图像的数量可以结合损失函数的类型确定。
1103、训练设备获取与第一训练图像对应的数据分布特性,与第一训练图像对应的数据分布特性为第一训练图像归属的训练图像子集合中图像的数据分布特性。
本申请的一些实施例中,训练设备确定第一训练图像归属的训练图像子集合,进而获取与第一训练图像对应的数据分布特性。具体的,训练设备可以根据训练图像集合,预先生成每个训练图像子集合的数据分布特性,从而训练设备从所有训练图像子集合的数据分布特性中获取与第一训练图像对应的数据分布特性。训练设备也可以在确定第一训练图像归属的训练图像子集合之后,生成与第一训练图像对应的数据分布特性。具体图像级的数据分布特性的生成方式可以参阅图5对应实施例中的描述,此次不做赘述。
1104、训练设备根据与第一训练图像对应的数据分布特性,对第一训练图像进行数据分布对齐。
本申请实施例中,执行设备执行步骤1104的具体实现方式均可以参考执行设备执行步骤504的具体实现方式,此次不做赘述。
1105、训练设备获取与第一训练图像的特征图对应的数据分布特性,与第一训练图像的特征图对应的数据分布特性为与第一训练图像归属的训练图像子集合中图像对应的特征图的数据分布特性。
本申请的一些实施例中,训练设备在确定第一训练图像归属的训练图像子集合,获取与第一训练图像归属的训练图像子集合中图像对应的特征图的数据分布特性。具体的,训练设备在确定第一训练图像归属的训练图像子集合之后,通过卷积神经网络生成第一训练图像归属的训练图像子集合中图像对应的特征图的数据分布特性,对于特征图级的数据分布特征的内含和具体生成方式均参阅图5对应实施例中的描述,此次不做赘述。
1106、训练设备通过卷积神经网络对第一训练图像进行特征提取,并根据与第一训练图像的特征图对应的数据分布特性,在进行特征提取过程中对第三特征图进行数据分布对齐,得到第一训练图像的特征信息。
本申请实施例中,执行设备执行步骤1106的具体实现方式均可以参考执行设备执行步骤506具体实现方式,此次不做赘述。
需要说明的是,步骤1103和1104为可选步骤,若执行步骤1103和1104,则步骤1106中执行设备为对进行过数据分布对齐的第一训练图像进行特征提取;若不执行步骤1103和1104,则步骤1106中执行设备为对未进行过数据分布对齐的第一训练图像进行特征提取。
1107、训练设备获取与第二训练图像对应的数据分布特性,与第二训练图像对应的数据分布特性为第二训练图像归属的训练图像子集合中图像的数据分布特性。
1108、训练设备根据与第二训练图像对应的数据分布特性,对第二训练图像进行数据分布对齐。
1109、训练设备获取与第二训练图像的特征图对应的数据分布特性,与第二训练图像的特征图对应的数据分布特性为与第二训练图像归属的训练图像子集合中图像对应的特征图的数据分布特性。
1110、训练设备通过卷积神经网络对第二训练图像进行特征提取,并根据与第二训练图像的特征图对应的数据分布特性,在进行特征提取过程中对第四特征图进行数据分布对齐,得到第二训练图像的特征信息。
本申请实施例中,执行设备执行步骤1107至1110的具体实现方式均可以参考执行设备执行步骤1103至1106的具体实现方式,此次不做赘述。
需要说明的是,步骤1107和1108为可选步骤,若执行步骤1107和1108,则步骤1110中执行设备为对进行过数据分布对齐的第一训练图像进行特征提取;若不执行步骤1107和1108,则步骤1110中执行设备为对未进行过数据分布对齐的第一训练图像进行特征提取。
1111、训练设备获取与第三训练图像对应的数据分布特性,与第三训练图像对应的数据分布特性为第三训练图像归属的训练图像子集合中图像的数据分布特性。
1112、训练设备根据与第三训练图像对应的数据分布特性,对第三训练图像进行数据分布对齐。
1113、训练设备获取与第三训练图像的特征图对应的数据分布特性,与第三训练图像的特征图对应的数据分布特性为与第三训练图像归属的训练图像子集合中图像对应的特征图的数据分布特性。
1114、训练设备通过卷积神经网络对第三训练图像进行特征提取,并根据与第三训练图像的特征图对应的数据分布特性,在进行特征提取过程中对第六特征图进行数据分布对齐,得到第三训练图像的特征信息。
本申请实施例中,执行设备执行步骤1111至1114的具体实现方式以及步骤是否为可选步骤的描述均可以参考执行设备执行步骤1103至1106的具体实现方式,此次不做赘述。
需要说明的是,本申请实施例不限定步骤1103至1110与步骤1111至1114之间的执行关系,可以顺序执行步骤1102至1110,也可以先执行步骤1111至1114,再执行步骤1103至1110。步骤1103至1114之间还可以交叉执行。
1115、训练设备通过损失函数对卷积神经网络进行训练,直至满足收敛条件。
本申请的一些实施例中,损失函数包括但不限于二元组损失函数、三元组损失函数、 四元组损失函数或其他损失函数等。收敛条件可以为满足损失函数的收敛条件,也可以为迭代次数达到预设次数,或其他收敛条件等。
具体的,若损失函数为二元组损失函数,则不需要执行步骤1111至1114,训练设备根据第一训练图像的特征信息和第二训练图像的特征信息,计算生成二元组损失函数的函数值,并基于二元组损失函数的函数值反向调整卷积神经网络的参数值,以完成一次训练操作,训练的目标为拉近第一训练图像的特征信息和第二训练图像的特征信息的相似度。训练设备重复执行步骤1102至1110以及步骤1115,直至满足收敛条件,得到执行过迭代训练操作的卷积神经网络。
若损失函数为三元组损失函数,则不需要执行步骤1111至1114,训练设备根据第一训练图像的特征信息、第二训练图像的特征信息和第三训练图像的特征信息,计算生成三元组损失函数的函数值,并基于三元组损失函数的函数值反向调整卷积神经网络的参数值,以完成一次训练操作,训练的目标为拉近第一训练图像的特征信息和第二训练图像的特征信息的相似度,拉远第一训练图像的特征信息和第三训练图像的特征信息之间的相似度,以及拉远第二训练图像的特征信息和第三训练图像的特征信息之间的相似度。训练设备重复执行步骤1102至1115,直至满足收敛条件,得到执行过迭代训练操作的卷积神经网络。
1116、训练设备输出执行过迭代训练操作的卷积神经网络。
本申请实施例中,提供了通用能力为图像重识别情况下,训练侧的具体实现方式,提供了一种在跨场景过程中依旧可以保持良好的特征提取能力的卷积神经网络,提高了本方案的完整性;仅对特征提取技能进行训练,提高了训练阶段的效率;此外,在训练过程采用的为增量学习的情况下,由于本申请实施例提供的方法可以去除特征图中携带的某个训练图像子集合的数据分布特性,从而避免了将卷积神经网络过拟合到某个小的训练数据集中,解决了增量学习过程的灾难遗忘问题。
二、图像识别
本申请的一些实施例中,请参与图12,图12为本申请实施例提供的图像处理方法的一种流程示意图。具体的,本申请实施例提供的图像处理方法可以包括:
1201、训练设备获取训练图像集合。
本申请的一些实施例中,训练设备上可以配置有训练图像集合,以及,与训练图像集合中图像对应的真实描述信息;训练图像集合中包括至少两个训练图像子集合,描述信息的内容可以参阅上述图10对应实施例中的描述。在对卷积神经网络进行迭代训练之前,训练设备初始化卷积神经网络。
1202、训练设备从训练图像集合中获取第三训练图像,第三训练图像为训练图像集合中的一个图像。
1203、训练设备获取与第三训练图像对应的数据分布特性,与第三训练图像对应的数据分布特性为第三训练图像归属的训练图像子集合中图像的数据分布特性。
1204、训练设备根据与第三训练图像对应的数据分布特性,对第三训练图像进行数据分布对齐。
1205、训练设备获取与第三训练图像的特征图对应的数据分布特性,与第三训练图像 的特征图对应的数据分布特性为与第三训练图像归属的训练图像子集合中图像对应的特征图的数据分布特性。
1206、训练设备通过卷积神经网络对第三训练图像进行特征提取,并根据与第三训练图像的特征图对应的数据分布特性,在进行特征提取过程中对第五特征图进行数据分布对齐,得到第三训练图像的特征信息。
本申请实施例中,执行设备执行步骤1203至1206的具体实现方式均可以参考执行设备执行步骤1103至1106的具体实现方式,此次不做赘述。
1207、训练设备根据第三训练图像的特征信息进行图像识别,得到第三训练图像中拍摄对象的描述信息。
本申请的一些实施例中,训练设备通过卷积神经网络,根据第三训练图像的特征信息进行图像识别,得到第三训练图像中拍摄对象的描述信息。
1208、训练设备根据描述信息,通过损失函数对卷积神经网络进行训练,直至满足收敛条件。
本申请的一些实施例中,训练设备根据生成的第三训练图像中拍摄对象的描述信息(也即预测的描述信息)和训练设备中存储的第三训练图像中拍摄对象的描述信息(也即真实的描述信息),计算损失函数的值,并根据损失函数的值反向传播,以调整卷积神经网络的参数值,从而完成了对卷积神经网络的一次训练。其中,本实施例中的损失函数可以采用交叉熵损失函数或其他用于训练通用能力为图像识别的卷积神经网络的损失函数。训练设备重复执行步骤1202至1208,直至满足收敛条件,得到执行过迭代训练操作的卷积神经网络。
1209、训练设备输出执行过迭代训练操作的卷积神经网络。
本申请实施例中,提供了通用能力为图像识别情况下,训练侧的具体实现方式,提供了一种在跨场景过程中依旧可以保持良好的特征提取能力的卷积神经网络,提高了本方案的完整性,也扩展了本方案的应用场景;此外,在训练过程采用的为增量学习的情况下,由于本申请实施例提供的方法可以去除特征图中携带的某个训练图像子集合的数据分布特性,从而避免了将卷积神经网络过拟合到某个小的训练数据集中,解决了增量学习过程的灾难遗忘问题。
本申请实施例还提供了一种卷积神经网络,该卷积神经网络包括一个输入层、至少一个卷积层、至少一个标准化层、至少一个激活函数层、至少一个神经网络层。
输入层,用于接收待处理图像;
卷积层,用于基于接收到的待处理图像,执行卷积操作,以输出待处理图像的特征图;
标准化层,用于根据目标数据分布特征,对卷积层输出的特征图进行标准化,目标数据分布特征包括与目标图像集合中图像对应的特征图的数据分布特性,待处理图像与目标图像集合的数据分布规律相同;
激活函数层,用于对标准化层输出的进行过标准化处理的特征图进行激活;
神经网络层,用于将激活函数层输出的待处理图像的特征信息与图像集合中每个图像的特征信息进行匹配,并输出匹配结果。
本申请实施例中,上述卷积神经网络的具体工作方式,可以参阅上述图5对应实施例中的描述,此次不做赘述。
本申请实施例还提供了另一种卷积神经网络,该卷积神经网络包括一个输入层、至少一个卷积层、至少一个标准化层、至少一个激活函数层、至少一个神经网络层。
输入层,用于接收待处理图像;
卷积层,用于基于接收到的待处理图像,执行卷积操作,以输出待处理图像的特征图;
标准化层,用于根据数据分布特征,对卷积层输出的特征图进行标准化,目标数据分布特征包括与目标图像集合中图像对应的特征图的数据分布特性,待处理图像与目标图像集合的数据分布规律相同;
激活函数层,用于对标准化层输出的进行过标准化处理的特征图进行激活;
神经网络层,用于根据激活函数层输出的待处理图像的特征信息进行图像识别,并输出待处理图像中拍摄对象的描述信息。
本申请实施例中,上述卷积神经网络的具体工作方式,可以参阅上述图10对应实施例中的描述,此次不做赘述。
为了对本申请带来的有益效果有进一步地理解,以下结合实验数据对本方案的有益效果做进一步展示。本次实验是在公开数据集的跨场景任务上进行的实验,以下通过表格的形式展示实验效果,首先展示的是在应用阶段的有益效果:
表2
Figure PCTCN2020118076-appb-000006
其中,Duke to Market指的是在公开数据集Duke进行训练,在公开数据集Market上进行应用,也即训练数据和应用数据不同。rank-1、rank-5和rank-10分别为三个准确度指标,均值平均精度(mean average precision,mAP)为检测精度的指标。行人迁移对抗生成网络(person transfer generative adversarial network,PTGAN)和异质学习网络(hetero-homogeneous learning,HHL)分布为两个神经网络,通用能力为图像重识别,也可以称为图像匹配。由于本申请实施例中所采用的卷积神经网络是将目前存在的卷积神经网络中的标准化模块进行替换,此处以使用样本记忆卷积神经网络(Exemplar memory convolution network,ECN)作为基础网络,将ECN的标准化层替换为本申请实施例中的标准化层进行试验。通过上述表2可以清楚的看出,本申请实施例相对于目前存在的神经网络在跨场景的图像重识别任务中的准确度和精度均有很大的提升。
接下来对采用增量学习的方式进行训练的情况下,采用本申请实施例提供的图像处理方法带来的有益效果,参见如下表格。
表3
Figure PCTCN2020118076-appb-000007
其中,Market to Duke指的是采用公开数据集Market和公开数据集Duke进行增量式学习。resnet50指的是一种典型的卷积神经网络,Ours+resnet50指的是将resnet50中的批量标准化层替换为基于摄像机的批量标准化层。92.5%指的是采用公开数据集Market和公开数据集Duke进行增量式训练resnet50所得到的rank-1准确度,与,一直采用公开数据集Market训练resnet50所得到的rank-1准确度之间的比值是92.5%,通过上述表3可以看出,采用本申请实施例提供的图像处理方法减缓了增量学习过程的性能衰减度。
在图1至图12所对应的实施例的基础上,为了更好的实施本申请实施例的上述方案,下面还提供用于实施上述方案的相关设备。具体参阅图13,图13为本申请实施例提供的图像处理装置的一种结构示意图,图像处理装置1300包括:
获取模块1301,用于获取第一待处理图像;
获取模块1301,还用于获取与第一待处理图像对应的第一数据分布特性,其中,第一数据分布特性包括与第一图像集合中图像对应的特征图的数据分布特性,第一待处理图像与第一图像集合的数据分布规律相同;
特征提取模块1302,用于对第一待处理图像进行特征提取,并根据第一数据分布特性,在进行特征提取过程中对第一特征图进行数据分布对齐,其中,第一特征图为对第一待处理图像进行特征提取过程中生成的。
本申请实施例中,获取模块1301在获取第一待处理图像之后,获取与第一待处理图像对应的第一数据分布特性,特征提取模块1302对第一待处理图像进行特征提取,根据第一数据分布特性,在进行特征提取过程中对生成的特征图进行数据分布对齐,由于神经网络处理的是执行过数据分布对齐后的特征图,因此保证神经网络处理的图像都有着相似的数据分布,以提高跨场景的不同图像的特征图之间的相似度,从而降低神经网络的图像处理难度,以提高神经网络在跨场景的特征提取性能;此外,第一数据分布特性是与第一图像集合中图像对应的特征图的数据分布特性,而第一图像集合中的图像与第一待处理图像的数据分布规律相同,利用第一数据分布特性进行数据分布对齐,可以大跨度的将第一待处理图像的特征图的数据分布向神经网络的敏感数据区域拉近,进一步降低神经网络的图像处理难度,进一步提升神经网络在跨场景的特征提取性能。
在一种可能的设计中,请参阅图14,图14为本申请实施例提供的图像处理装置的一种结构示意图,获取模块1301,还用于获取与第一待处理图像对应的第二数据分布特性,第二数据分布特性为第一图像集合中图像的数据分布特性,
装置1300还包括:数据分布对齐模块1303,用于根据第二数据分布特性,对第一待处理图像进行数据分布对齐;
特征提取模块1302,具体用于对执行过数据分布对齐的第一待处理图像进行特征提取。
本申请实施例中,不仅在特征提取过程中对特征图进行数据分布对齐,在特征提取模块1302进行特征提取之前,数据分布对齐模块1303还会对待处理图像进行数据分布对齐,也即神经网络处理的图像也有着相似的数据分布,进一步提高了跨场景的不同图像之间的相似度,也即进一步降低了神经网络的图像处理难度,从而进一步提升神经网络在跨场景的特征提取性能。
在一种可能的设计中,第一数据分布特性包括均值和方差,均值和方差为对与第一图像集合中图像对应的特征图进行数据分布统计得到的;
特征提取模块1302,具体用于对第一待处理图像进行特征提取,并根据均值和方差,在进行特征提取过程中对第一特征图包括的特征图进行标准化处理。
本申请实施例中,提供了对待处理图像的特征图进行数据分布对齐的具体实现方式,操作简单,易实现。
在一种可能的设计中,第一待处理图像和第一图像集合中的图像来源于同一目标图像采集装置,或者,第一待处理图像的图像采集时刻和第一图像集合中图像的图像采集时刻均位于同一目标时间段内,或者,第一待处理图像和第一图像集合中的图像来源于同一图像采集地点,或者,第一待处理图像中的拍摄对象和第一图像集合包括的图像中的拍摄对象为同一对象类型。
本申请实施例中,提供了获取与第一待处理图像数据分布规律相同的第一图像集合的多种实现方式,扩展了本方案的应用场景,提高了本方案的实现灵活性。
在一种可能的设计中,获取模块1301,还用于获取采集第一待处理图像的目标图像采集装置的标识信息,并从第二图像集合包括的至少两个图像子集合中获取与目标图像采集装置的标识信息对应的第一图像集合,其中,第一图像集合为第二图像集合包括的至少两个图像子集合中的一个图像子集合,第一图像子集合包括通过目标图像采集装置采集到的图像。
本申请实施例中,不同的图像采集装置由于硬件配置或参数设置的不同,从而同一图像采集装置采集的图像的特征图的数据分布中会带有该图像采集装置的特有风格,获取模块1301以来源图像采集装置作为分类标准,根据第一待处理图像归属的第一图像集合中图像的特征图的数据分布特性,对第一待处理图像的特征图进行数据分布对齐,以减弱第一待处理图像的特征图中携带的图像采集装置的特有风格,也即提高来自于不同的图像采集装置的图像的特征图之间的相似度,以降低神经网络的特征提取难度。
在一种可能的设计中,获取模块1301,还用于获取采集第一待处理图像的图像采集时刻,并从第二图像集合包括的至少两个图像子集合中获取与第一待处理图像的图像采集时刻对应的第一图像集合,其中,第一图像集合为第二图像集合包括的至少两个图像子集合中的一个图像子集合,第一图像集合包括在目标时间段内采集的图像,第一待处理图像的图像采集时刻位于目标时间段内。
本申请实施例中,不同时间段由于光线信息的不同,从而同一时间段内采集的图像的特征图的数据分布中会带有该时间段的特有风格,获取模块1301以时间段作为分类标准,根据第一待处理图像归属的第一图像集合中图像的特征图的数据分布特性,对第一待处理图像的特征图进行数据分布对齐,以减弱第一待处理图像的特征图中携带的某一时间段的 特有风格,也即提高来自于不同的时间段的图像的特征图之间的相似度,以降低神经网络的特征提取难度。
在一种可能的设计中,请参阅图14,特征提取模块1302,具体用于对第一待处理图像进行特征提取,并根据第一数据分布特性,在进行特征提取过程中对第一特征图进行数据分布对齐,得到第一待处理图像的特征信息;
装置1300还包括:匹配模块1304,用于根据第一待处理图像的特征信息,将第一待处理图像与第二图像集合中的图像进行匹配,得到匹配结果,其中,第一图像集合为第二图像集合包括的至少两个图像子集合中的一个图像子集合,匹配结果包括至少一个目标图像,目标图像与第一待处理图像包括同样的拍摄对象;或者,
装置1300还包括:识别模块1305,用于根据第一待处理图像的特征信息,对第一待处理图像进行识别,得到第一待处理图像中拍摄对象的描述信息。
本申请实施例中,将本申请实施例提供的图像处理方法应用于图像匹配中,提高了卷积神经网络的特征提取性能,从而可以根据更准确的特征信息进行图像匹配操作,有利于提高图像匹配的准确率,也即提高监控系统的图像匹配过程的准确率;将本申请实施例提供的图像处理方法应用于图像识别中,提高了卷积神经网络的特征提取性能,从而有利于提高图像识别的准确率。
在一种可能的设计中,获取模块1301,还用于获取第二待处理图像和第三数据分布特性,其中,第二待处理图像为第二图像子集合中任一个图像,第三数据分布特性为与第三图像集合中图像对应的特征图的数据分布特性,第二待处理图像与第三图像集合中图像的数据分布规律相同;
特征提取模块1302,还用于对第二待处理图像进行特征提取,并根据第三数据分布特性,在进行特征提取过程中对第二特征图进行数据分布对齐,得到第二待处理图像的特征信息,其中,第二待处理图像为第三图像集包括的至少一个图像中任一个图像,第二特征图为对第二待处理图像进行特征提取过程中生成的;
通过获取模块1301和特征提取模块1302重复执行上述步骤,直至得到第二图像集合中每个图像的特征信息;
匹配模块1304,具体用于将第一待处理图像的特征信息与第二图像集合中每个图像的特征信息进行匹配,得到匹配结果。
本申请实施例中,在图像重识别场景中,特征提取模块1302不是根据第二图像集合中的所有图像的特征图的数据分布特性执行数据分布对齐操作,而是根据图像的数据分布规律,将第二图像集合分为至少两个图像子集合,基于图像子集合中图像的特征图的数据分布特性执行数据分布对齐操作,避免了不同图像子集合之间的数据分布特性的互相干扰,有利于大跨度的将待处理图像的特征图的数据分布拉拢到神经网络的敏感区域,提高特征提取性能;待处理图像的特征信息和第二图像集合中每个图像的特征信息的精准度都提高的情况下,提高了图像匹配过程的准确率。
需要说明的是,图像处理装置1300中各模块/单元之间的信息交互、执行过程等内容,与本申请中图3至图10对应的各个方法实施例基于同一构思,具体内容可参见本申请前述所示的方法实施例中的叙述,此处不再赘述。
本申请实施例还提供一种图像处理装置,请参阅图15,图15为本申请实施例提供的图像处理装置的一种结构示意图,图像处理装置1500包括:
获取模块1501,用于从训练图像集合中获取至少两个训练图像,至少两个训练图像包括第一训练图像和第二训练图像,第一训练图像和第二训练图像中包括相同的拍摄对象;
获取模块1501,还用于获取与第一训练图像的特征图对应的数据分布特性,与第一训练图像的特征图对应的数据分布特性为与第一训练图像归属的训练图像子集合中图像对应的特征图的数据分布特性,第一训练图像与第一训练图像归属的训练图像子集合中图像的数据分布规律相同;
特征提取模块1502,用于通过卷积神经网络对第一训练图像进行特征提取,并根据与第一训练图像的特征图对应的数据分布特性,在进行特征提取过程中对第三特征图进行数据分布对齐,得到第一训练图像的特征信息,其中,第三特征图为对第一训练图像进行特征提取过程中得到的;
获取模块1501,还用于获取与第二训练图像的特征图对应的数据分布特性,与第二训练图像的特征图对应的数据分布特性为与第二训练图像归属的训练图像子集合中图像对应的特征图的数据分布特性,第二训练图像与第二训练图像归属的训练图像子集合中图像的数据分布规律相同;
特征提取模块1502,还用于通过卷积神经网络对第二训练图像进行特征提取,并根据与第二训练图像的特征图对应的数据分布特性,在进行特征提取过程中对第四特征图进行数据分布对齐,得到第二训练图像的特征信息,其中,第四特征图为对第二训练图像进行特征提取过程中得到的;
训练模块1503,用于根据第一训练图像的特征信息和第二训练图像的特征信息,通过损失函数对卷积神经网络进行训练,直至满足收敛条件,输出执行过迭代训练操作的卷积神经网络,其中,损失函数用于指示第一训练图像的特征信息和第二训练图像的特征信息之间的相似度。
本申请实施例中,提供了通用能力为图像重识别情况下,训练侧的具体实现方式,提供了一种在跨场景过程中依旧可以保持良好的特征提取能力的卷积神经网络,提高了本方案的完整性;仅对特征提取技能进行训练,提高了训练阶段的效率;此外,在训练过程采用的为增量学习的情况下,由于本申请实施例提供的方法可以去除特征图中携带的某个训练图像子集合的数据分布特性,从而避免了将卷积神经网络过拟合到某个小的训练数据集中,解决了增量学习过程的灾难遗忘问题。
需要说明的是,图像处理装置1500中各模块/单元之间的信息交互、执行过程等内容,与本申请中图11对应的各个方法实施例基于同一构思,具体内容可参见本申请前述所示的方法实施例中的叙述,此处不再赘述。
本申请实施例还提供一种图像处理装置,请参阅图16,图16为本申请实施例提供的图像处理装置的一种结构示意图,图像处理装置1600包括:
获取模块1601,用于从训练图像集合中获取第三训练图像,第三训练图像为训练图像集合中的一个图像;
获取模块1601,还用于获取与第三训练图像的特征图对应的数据分布特性,与第三训 练图像的特征图对应的数据分布特性为与第三训练图像归属的训练图像子集合中图像对应的特征图的数据分布特性;
特征提取模块1602,用于通过卷积神经网络对第三训练图像进行特征提取,并根据与第三训练图像的特征图对应的数据分布特性,在进行特征提取过程中对第三特征图进行数据分布对齐,得到第三训练图像的特征信息,其中,第三特征图为对第三训练图像进行特征提取过程中得到的;
识别模块1603,用于根据第三训练图像的特征信息进行图像识别,得到第三训练图像中拍摄对象的描述信息;
训练模块1604,用于根据描述信息,通过损失函数对卷积神经网络进行训练。
本申请实施例中,提供了通用能力为图像识别情况下,训练侧的具体实现方式,提供了一种在跨场景过程中依旧可以保持良好的特征提取能力的卷积神经网络,提高了本方案的完整性,也扩展了本方案的应用场景;此外,在训练过程采用的为增量学习的情况下,由于本申请实施例提供的方法可以去除特征图中携带的某个训练图像子集合的数据分布特性,从而避免了将卷积神经网络过拟合到某个小的训练数据集中,解决了增量学习过程的灾难遗忘问题。
需要说明的是,图像处理装置1600中各模块/单元之间的信息交互、执行过程等内容,与本申请中图11对应的各个方法实施例基于同一构思,具体内容可参见本申请前述所示的方法实施例中的叙述,此处不再赘述。
接下来介绍本申请实施例提供的一种执行设备,请参阅图17,图17为本申请实施例提供的执行设备的一种结构示意图。其中,执行设备1700上可以部署有图13或图14对应实施例中所描述的图像处理装置1300,用于实现图3或图10对应实施例中执行设备的功能。具体的,执行设备1700包括:接收器1701、发射器1702、处理器1703和存储器1704(其中执行设备1700中的处理器1703的数量可以一个或多个,图17中以一个处理器为例),其中,处理器1703可以包括应用处理器17031和通信处理器17032。在本申请的一些实施例中,接收器1701、发射器1702、处理器1703和存储器1704可通过总线或其它方式连接。
存储器1704可以包括只读存储器和随机存取存储器,并向处理器1703提供指令和数据。存储器1704的一部分还可以包括非易失性随机存取存储器(non-volatile random access memory,NVRAM)。存储器1704存储有处理器和操作指令、可执行模块或者数据结构,或者它们的子集,或者它们的扩展集,其中,操作指令可包括各种操作指令,用于实现各种操作。
处理器1703控制执行设备的操作。具体的应用中,执行设备的各个组件通过总线系统耦合在一起,其中总线系统除包括数据总线之外,还可以包括电源总线、控制总线和状态信号总线等。但是为了清楚说明起见,在图中将各种总线都称为总线系统。
上述本申请实施例揭示的方法可以应用于处理器1703中,或者由处理器1703实现。处理器1703可以是一种集成电路芯片,具有信号的处理能力。在实现过程中,上述方法的各步骤可以通过处理器1703中的硬件的集成逻辑电路或者软件形式的指令完成。上述的处理器1703可以是通用处理器、数字信号处理器(digital signal processing,DSP)、微处理器或微控制器,还可进一步包括专用集成电路(application specific integrated circuit, ASIC)、现场可编程门阵列(field-programmable gate array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件。该处理器1703可以实现或者执行本申请实施例中的公开的各方法、步骤及逻辑框图。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合本申请实施例所公开的方法的步骤可以直接体现为硬件译码处理器执行完成,或者用译码处理器中的硬件及软件模块组合执行完成。软件模块可以位于随机存储器,闪存、只读存储器,可编程只读存储器或者电可擦写可编程存储器、寄存器等本领域成熟的存储介质中。该存储介质位于存储器1704,处理器1703读取存储器1704中的信息,结合其硬件完成上述方法的步骤。
接收器1701可用于接收输入的数字或字符信息,以及产生与执行设备的相关设置以及功能控制有关的信号输入。发射器1702可用于通过第一接口输出数字或字符信息;发射器1702还可用于通过第一接口向磁盘组发送指令,以修改磁盘组中的数据;发射器1702还可以包括显示屏等显示设备。
本申请实施例中,在一种情况下,应用处理器17031,用于执行图3至图10对应实施例中的执行设备执行的图像处理方法。具体的,应用处理器17031,用于执行如下步骤:
获取第一待处理图像;
获取与第一待处理图像对应的第一数据分布特性,其中,第一数据分布特性包括与第一图像集合中图像对应的特征图的数据分布特性,第一待处理图像与第一图像集合的数据分布规律相同;
对第一待处理图像进行特征提取,并根据第一数据分布特性,在进行特征提取过程中对第一特征图进行数据分布对齐,其中,第一特征图为对第一待处理图像进行特征提取过程中生成的。
需要说明的是,应用处理器17031还用于执行图3至图10对应实施例中的执行设备执行的其他步骤,应用处理器17031执行上述各个步骤的具体方式,与本申请中图3至图10对应的各个方法实施例基于同一构思,其带来的技术效果与本申请中图3至图10对应的各个方法实施例相同,具体内容可参见本申请前述所示的方法实施例中的叙述,此处不再赘述。
本申请实施例还提供了一种训练设备,请参阅图18,图18是本申请实施例提供的训练设备一种结构示意图,训练设备1800上可以部署有图15对应实施例中所描述的图像处理装置1500,用于实现图11对应实施例中训练设备的功能;或者,训练设备1800上可以部署有图16对应实施例中所描述的图像处理装置1600,用于实现图12对应实施例中训练设备的功能。具体的,训练设备1800由一个或多个服务器实现,训练设备1800可因配置或性能不同而产生比较大的差异,可以包括一个或一个以上中央处理器(central processing units,CPU)1822(例如,一个或一个以上处理器)和存储器1832,一个或一个以上存储应用程序1842或数据1844的存储介质1830(例如一个或一个以上海量存储设备)。其中,存储器1832和存储介质1830可以是短暂存储或持久存储。存储在存储介质1830的程序可以包括一个或一个以上模块(图示没标出),每个模块可以包括对训练设备中的一系列指令操作。更进一步地,中央处理器1822可以设置为与存储介质1830通信,在训练设备1800上执行存储介质1830中的一系列指令操作。
训练设备1800还可以包括一个或一个以上电源1826,一个或一个以上有线或无线网络接口1850,一个或一个以上输入输出接口1858,和/或,一个或一个以上操作系统1841,例如Windows ServerTM,Mac OS XTM,UnixTM,LinuxTM,FreeBSDTM等等。
本申请实施例中,在一种情况下,中央处理器1822,用于执行图11对应实施例中的训练设备执行的图像处理方法。具体的,中央处理器1822,用于执行如下步骤:
从训练图像集合中获取至少两个训练图像,至少两个训练图像包括第一训练图像和第二训练图像,第一训练图像和第二训练图像中包括相同的拍摄对象;
获取与第一训练图像的特征图对应的数据分布特性,与第一训练图像的特征图对应的数据分布特性为与第一训练图像归属的训练图像子集合中图像对应的特征图的数据分布特性,第一训练图像与第一训练图像归属的训练图像子集合中图像的数据分布规律相同;
通过卷积神经网络对第一训练图像进行特征提取,并根据与第一训练图像的特征图对应的数据分布特性,在进行特征提取过程中对第三特征图进行数据分布对齐,得到第一训练图像的特征信息,其中,第三特征图为对第一训练图像进行特征提取过程中得到的;
获取与第二训练图像的特征图对应的数据分布特性,与第二训练图像的特征图对应的数据分布特性为与第二训练图像归属的训练图像子集合中图像对应的特征图的数据分布特性,第二训练图像与第二训练图像归属的训练图像子集合中图像的数据分布规律相同;
通过卷积神经网络对第二训练图像进行特征提取,并根据与第二训练图像的特征图对应的数据分布特性,在进行特征提取过程中对第四特征图进行数据分布对齐,得到第二训练图像的特征信息,其中,第四特征图为对第二训练图像进行特征提取过程中得到的;
根据第一训练图像的特征信息和第二训练图像的特征信息,通过损失函数对卷积神经网络进行训练,直至满足收敛条件,输出执行过迭代训练操作的卷积神经网络,其中,损失函数用于指示第一训练图像的特征信息和第二训练图像的特征信息之间的相似度。
需要说明的是,中央处理器1822还用于执行图11对应实施例中的训练设备执行的其他步骤,中央处理器1822执行上述各个步骤的具体方式,与本申请中图11对应的各个方法实施例基于同一构思,其带来的技术效果与本申请中图11对应的各个方法实施例相同,具体内容可参见本申请前述所示的方法实施例中的叙述,此处不再赘述。
本申请实施例中,在一种情况下,中央处理器1822,用于执行图12对应实施例中的训练设备执行的图像处理方法。具体的,中央处理器1822,用于执行如下步骤:
从训练图像集合中获取第三训练图像,第三训练图像为训练图像集合中的一个图像;
获取与第三训练图像的特征图对应的数据分布特性,与第三训练图像的特征图对应的数据分布特性为与第三训练图像归属的训练图像子集合中图像对应的特征图的数据分布特性;
通过卷积神经网络对第三训练图像进行特征提取,并根据与第三训练图像的特征图对应的数据分布特性,在进行特征提取过程中对第三特征图进行数据分布对齐,得到第三训练图像的特征信息,其中,第三特征图为对第三训练图像进行特征提取过程中得到的;
根据第三训练图像的特征信息进行图像识别,得到第三训练图像中拍摄对象的描述信息;
根据描述信息,通过损失函数对卷积神经网络进行训练。
需要说明的是,中央处理器1822还用于执行图12对应实施例中的训练设备执行的其他步骤,中央处理器1822执行上述各个步骤的具体方式,与本申请中图12对应的各个方法实施例基于同一构思,其带来的技术效果与本申请中图12对应的各个方法实施例相同,具体内容可参见本申请前述所示的方法实施例中的叙述,此处不再赘述。
本申请实施例中还提供一种包括计算机程序产品,当其在计算机上运行时,使得计算机执行如前述图3至图10所示实施例描述的方法中执行设备所执行的步骤,或者,使得计算机执行如前述图11所示实施例描述的方法中训练设备所执行的步骤,或者,使得计算机执行如前述图12所示实施例描述的方法中训练设备所执行的步骤。
本申请实施例中还提供一种计算机可读存储介质,该计算机可读存储介质中存储有用于进行信号处理的程序,当其在计算机上运行时,使得计算机执行如前述图3至图10所示实施例描述的方法中执行设备所执行的步骤,或者,使得计算机执行如前述图11所示实施例描述的方法中训练设备所执行的步骤,或者,使得计算机执行如前述图12所示实施例描述的方法中训练设备所执行的步骤。
本申请实施例提供的执行设备、训练设备、终端设备或通信设备具体可以为芯片,芯片包括:处理单元和通信单元,所述处理单元例如可以是处理器,所述通信单元例如可以是输入/输出接口、管脚或电路等。该处理单元可执行存储单元存储的计算机执行指令,以使执行设备内的芯片执行上述图3至图10所示实施例描述的图像处理方法,或者,以使训练设备内的芯片执行上述图11所示实施例描述的图像处理方法,或者,以使训练设备内的芯片执行上述图12所示实施例描述的图像处理方法。可选地,所述存储单元为所述芯片内的存储单元,如寄存器、缓存等,所述存储单元还可以是所述无线接入设备端内的位于所述芯片外部的存储单元,如只读存储器(read-only memory,ROM)或可存储静态信息和指令的其他类型的静态存储设备,随机存取存储器(random access memory,RAM)等。
具体的,请参阅图19,图19为本申请实施例提供的芯片的一种结构示意图,所述芯片可以表现为神经网络处理器NPU 190,NPU 190作为协处理器挂载到主CPU(Host CPU)上,由Host CPU分配任务。NPU的核心部分为运算电路1903,通过控制器1904控制运算电路1903提取存储器中的矩阵数据并进行乘法运算。
在一些实现中,运算电路1903内部包括多个处理单元(Process Engine,PE)。在一些实现中,运算电路1903是二维脉动阵列。运算电路1903还可以是一维脉动阵列或者能够执行例如乘法和加法这样的数学运算的其它电子线路。在一些实现中,运算电路1903是通用的矩阵处理器。
举例来说,假设有输入矩阵A,权重矩阵B,输出矩阵C。运算电路从权重存储器1902中取矩阵B相应的数据,并缓存在运算电路中每一个PE上。运算电路从输入存储器1901中取矩阵A数据与矩阵B进行矩阵运算,得到的矩阵的部分结果或最终结果,保存在累加器(accumulator)1908中。
统一存储器1906用于存放输入数据以及输出数据。权重数据直接通过存储单元访问控制器(Direct Memory Access Controller,DMAC)1905,DMAC被搬运到权重存储器1902中。输入数据也通过DMAC被搬运到统一存储器1906中。
BIU为Bus Interface Unit即,总线接口单元1910,用于AXI总线与DMAC和取指存 储器(Instruction Fetch Buffer,IFB)1909的交互。
总线接口单元1910(Bus Interface Unit,简称BIU),用于取指存储器1909从外部存储器获取指令,还用于存储单元访问控制器1905从外部存储器获取输入矩阵A或者权重矩阵B的原数据。
DMAC主要用于将外部存储器DDR中的输入数据搬运到统一存储器1906或将权重数据搬运到权重存储器1902中或将输入数据数据搬运到输入存储器1901中。
向量计算单元1907包括多个运算处理单元,在需要的情况下,对运算电路的输出做进一步处理,如向量乘,向量加,指数运算,对数运算,大小比较等等。主要用于神经网络中非卷积/全连接层网络计算,如像素级求和,对特征图进行数据分布对齐等。
在一些实现中,向量计算单元1907能将经处理的输出的向量存储到统一存储器1906。例如,向量计算单元1907可以将线性函数和/或非线性函数应用到运算电路1903的输出,例如对卷积层提取的特征图进行线性插值,再例如累加值的向量,用以生成激活值。在一些实现中,向量计算单元1907生成归一化的值、像素级求和的值,或二者均有。在一些实现中,处理过的输出的向量能够用作到运算电路1903的激活输入,例如用于在神经网络中的后续层中的使用。
控制器1904连接的取指存储器(instruction fetch buffer)1909,用于存储控制器1904使用的指令;
统一存储器1906,输入存储器1901,权重存储器1902以及取指存储器1909均为On-Chip存储器。外部存储器私有于该NPU硬件架构。
其中,上述各个实施例中所示的卷积神经网络中各层的运算可以由运算电路1903或向量计算单元1907执行。
其中,上述任一处提到的处理器,可以是一个通用中央处理器,微处理器,ASIC,或一个或多个用于控制上述第一方面方法的程序执行的集成电路。
另外需说明的是,以上所描述的装置实施例仅仅是示意性的,其中所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部模块来实现本实施例方案的目的。另外,本申请提供的装置实施例附图中,模块之间的连接关系表示它们之间具有通信连接,具体可以实现为一条或多条通信总线或信号线。
通过以上的实施方式的描述,所属领域的技术人员可以清楚地了解到本申请可借助软件加必需的通用硬件的方式来实现,当然也可以通过专用硬件包括专用集成电路、专用CPU、专用存储器、专用元器件等来实现。一般情况下,凡由计算机程序完成的功能都可以很容易地用相应的硬件来实现,而且,用来实现同一功能的具体硬件结构也可以是多种多样的,例如模拟电路、数字电路或专用电路等。但是,对本申请而言更多情况下软件程序实现是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在可读取的存储介质中,如计算机的软盘、U盘、移动硬盘、ROM、RAM、磁碟或者光盘等,包括若干指令用以使得一台计算机设备(可以是个人计算机,训练设备,或者网络设备等)执行本 申请各个实施例所述的方法。
在上述实施例中,可以全部或部分地通过软件、硬件、固件或者其任意组合来实现。当使用软件实现时,可以全部或部分地以计算机程序产品的形式实现。
所述计算机程序产品包括一个或多个计算机指令。在计算机上加载和执行所述计算机程序指令时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以是通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一计算机可读存储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、训练设备或数据中心通过有线(例如同轴电缆、光纤、数字用户线(DSL))或无线(例如红外、无线、微波等)方式向另一个网站站点、计算机、训练设备或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存储的任何可用介质或者是包含一个或多个可用介质集成的训练设备、数据中心等数据存储设备。所述可用介质可以是磁性介质,(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质(例如固态硬盘(Solid State Disk,SSD))等。

Claims (24)

  1. 一种图像处理方法,其特征在于,所述方法包括:
    获取第一待处理图像;
    获取与所述第一待处理图像对应的第一数据分布特性,其中,所述第一数据分布特性包括与第一图像集合中图像对应的特征图的数据分布特性,所述第一待处理图像与所述第一图像集合的数据分布规律相同;
    对所述第一待处理图像进行特征提取,并根据所述第一数据分布特性,在进行特征提取过程中对第一特征图进行数据分布对齐,其中,所述第一特征图为对所述第一待处理图像进行特征提取过程中生成的。
  2. 根据权利要求1所述的方法,其特征在于,所述对所述第一待处理图像进行特征提取之前,所述方法还包括:
    获取与所述第一待处理图像对应的第二数据分布特性,所述第二数据分布特性为所述第一图像集合中图像的数据分布特性;
    根据所述第二数据分布特性,对所述第一待处理图像进行数据分布对齐;
    所述对所述第一待处理图像进行特征提取,包括:
    对执行过数据分布对齐的所述第一待处理图像进行特征提取。
  3. 根据权利要求1或2所述的方法,其特征在于,所述第一数据分布特性包括均值和方差,所述均值和所述方差为对与所述第一图像集合中图像对应的特征图进行数据分布统计得到的;
    所述对所述第一待处理图像进行特征提取,并根据所述第一数据分布特性,在进行特征提取过程中对第一特征图进行数据分布对齐,包括:
    对所述第一待处理图像进行特征提取,并根据所述均值和所述方差,在进行特征提取过程中对所述第一特征图包括的特征图进行标准化处理。
  4. 根据权利要求1或2所述的方法,其特征在于,
    所述第一待处理图像和所述第一图像集合中的图像来源于同一目标图像采集装置,或者,所述第一待处理图像的图像采集时刻和所述第一图像集合中图像的图像采集时刻均位于同一目标时间段内,或者,所述第一待处理图像和所述第一图像集合中的图像来源于同一图像采集地点,或者,所述第一待处理图像中的拍摄对象和所述第一图像集合包括的图像中的拍摄对象为同一对象类型。
  5. 根据权利要求1或2所述的方法,其特征在于,所述获取与所述第一待处理图像对应的第一数据分布特性之前,所述方法还包括:
    获取采集所述第一待处理图像的目标图像采集装置的标识信息,并从第二图像集合包括的至少两个图像子集合中获取与所述目标图像采集装置的标识信息对应的所述第一图像集合,其中,所述第一图像集合为所述第二图像集合包括的至少两个图像子集合中的一个图像子集合,所述第一图像子集合包括通过所述目标图像采集装置采集到的图像。
  6. 根据权利要求1或2所述的方法,其特征在于,所述获取与所述第一待处理图像对应的第一数据分布特性之前,所述方法还包括:
    获取采集所述第一待处理图像的图像采集时刻,并从第二图像集合包括的至少两个图 像子集合中获取与所述第一待处理图像的图像采集时刻对应的所述第一图像集合,其中,所述第一图像集合为所述第二图像集合包括的至少两个图像子集合中的一个图像子集合,所述第一图像集合包括在目标时间段内采集的图像,所述第一待处理图像的图像采集时刻位于所述目标时间段内。
  7. 根据权利要求1或2所述的方法,其特征在于,所述对所述第一待处理图像进行特征提取,并根据所述第一数据分布特性,在进行特征提取过程中对第一特征图进行数据分布对齐,包括:
    对所述第一待处理图像进行特征提取,并根据所述第一数据分布特性,在进行特征提取过程中对所述第一特征图进行数据分布对齐,得到所述第一待处理图像的特征信息;
    所述得到所述第一待处理图像的特征信息之后,所述方法还包括:
    根据所述第一待处理图像的特征信息,将所述第一待处理图像与第二图像集合中的图像进行匹配,得到匹配结果,其中,所述第一图像集合为所述第二图像集合包括的至少两个图像子集合中的一个图像子集合,所述匹配结果包括至少一个目标图像,所述目标图像与所述第一待处理图像包括同样的拍摄对象;或者,
    根据所述第一待处理图像的特征信息,对所述第一待处理图像进行识别,得到所述第一待处理图像中拍摄对象的描述信息。
  8. 根据权利要求7所述的方法,其特征在于,所述根据所述第一待处理图像的特征信息,将所述第一待处理图像与所述第二图像集合中的图像进行匹配之前,所述方法还包括:
    获取第二待处理图像和第三数据分布特性,其中,所述第二待处理图像为所述第二图像子集合中任一个图像,所述第三数据分布特性为与第三图像集合中图像对应的特征图的数据分布特性,所述第二待处理图像与所述第三图像集合中图像的数据分布规律相同;
    对所述第二待处理图像进行特征提取,并根据所述第三数据分布特性,在进行特征提取过程中对第二特征图进行数据分布对齐,得到所述第二待处理图像的特征信息,其中,所述第二待处理图像为所述第三图像集包括的至少一个图像中任一个图像,所述第二特征图为对所述第二待处理图像进行特征提取过程中生成的;
    重复执行上述步骤,直至得到所述第二图像集合中每个图像的特征信息;
    所述根据所述第一待处理图像的特征信息,将所述第一待处理图像与所述第二图像集合中的图像进行匹配,得到匹配结果,包括:
    将所述第一待处理图像的特征信息与所述第二图像集合中每个图像的特征信息进行匹配,得到所述匹配结果。
  9. 一种图像处理方法,其特征在于,所述方法包括:
    从训练图像集合中获取至少两个训练图像,所述至少两个训练图像包括第一训练图像和第二训练图像,所述第一训练图像和所述第二训练图像中包括相同的拍摄对象;
    获取与所述第一训练图像的特征图对应的数据分布特性,与所述第一训练图像的特征图对应的数据分布特性为与所述第一训练图像归属的训练图像子集合中图像对应的特征图的数据分布特性,所述第一训练图像与所述第一训练图像归属的训练图像子集合中图像的数据分布规律相同;
    通过卷积神经网络对所述第一训练图像进行特征提取,并根据与所述第一训练图像的 特征图对应的数据分布特性,在进行特征提取过程中对第三特征图进行数据分布对齐,得到第一训练图像的特征信息,其中,所述第三特征图为对所述第一训练图像进行特征提取过程中得到的;
    获取与所述第二训练图像的特征图对应的数据分布特性,与所述第二训练图像的特征图对应的数据分布特性为与所述第二训练图像归属的训练图像子集合中图像对应的特征图的数据分布特性,所述第二训练图像与所述第二训练图像归属的训练图像子集合中图像的数据分布规律相同;
    通过所述卷积神经网络对所述第二训练图像进行特征提取,并根据与所述第二训练图像的特征图对应的数据分布特性,在进行特征提取过程中对第四特征图进行数据分布对齐,得到所述第二训练图像的特征信息,其中,所述第四特征图为对所述第二训练图像进行特征提取过程中得到的;
    根据所述第一训练图像的特征信息和所述第二训练图像的特征信息,通过损失函数对所述卷积神经网络进行训练,直至满足收敛条件,输出执行过迭代训练操作的卷积神经网络,其中,所述损失函数用于指示所述第一训练图像的特征信息和所述第二训练图像的特征信息之间的相似度。
  10. 一种图像处理方法,其特征在于,所述方法包括:
    从训练图像集合中获取第三训练图像,第三训练图像为训练图像集合中的一个图像;
    获取与所述第三训练图像的特征图对应的数据分布特性,与所述第三训练图像的特征图对应的数据分布特性为与所述第三训练图像归属的训练图像子集合中图像对应的特征图的数据分布特性;
    通过卷积神经网络对所述第三训练图像进行特征提取,并根据与所述第三训练图像的特征图对应的数据分布特性,在进行特征提取过程中对第三特征图进行数据分布对齐,得到第三训练图像的特征信息,其中,所述第三特征图为对所述第三训练图像进行特征提取过程中得到的;
    根据所述第三训练图像的特征信息进行图像识别,得到所述第三训练图像中拍摄对象的描述信息;
    根据所述描述信息,通过损失函数对所述卷积神经网络进行训练,直至满足收敛条件,输出执行过迭代训练操作的卷积神经网络。
  11. 一种图像处理装置,其特征在于,所述装置包括:
    获取模块,用于获取第一待处理图像;
    所述获取模块,还用于获取与所述第一待处理图像对应的第一数据分布特性,其中,所述第一数据分布特性包括与第一图像集合中图像对应的特征图的数据分布特性,所述第一待处理图像与所述第一图像集合的数据分布规律相同;
    特征提取模块,用于对所述第一待处理图像进行特征提取,并根据所述第一数据分布特性,在进行特征提取过程中对第一特征图进行数据分布对齐,其中,所述第一特征图为对所述第一待处理图像进行特征提取过程中生成的。
  12. 根据权利要求11所述的装置,其特征在于,
    所述获取模块,还用于获取与所述第一待处理图像对应的第二数据分布特性,所述第 二数据分布特性为所述第一图像集合中图像的数据分布特性,
    所述装置还包括:数据分布对齐模块,用于根据所述第二数据分布特性,对所述第一待处理图像进行数据分布对齐;
    所述特征提取模块,具体用于对执行过数据分布对齐的所述第一待处理图像进行特征提取。
  13. 根据权利要求11或12所述的装置,其特征在于,所述第一数据分布特性包括均值和方差,所述均值和所述方差为对与所述第一图像集合中图像对应的特征图进行数据分布统计得到的;
    所述特征提取模块,具体用于对所述第一待处理图像进行特征提取,并根据所述均值和所述方差,在进行特征提取过程中对所述第一特征图包括的特征图进行标准化处理。
  14. 根据权利要求11或12所述的装置,其特征在于,
    所述第一待处理图像和所述第一图像集合中的图像来源于同一目标图像采集装置,或者,所述第一待处理图像的图像采集时刻和所述第一图像集合中图像的图像采集时刻均位于同一目标时间段内,或者,所述第一待处理图像和所述第一图像集合中的图像来源于同一图像采集地点,或者,所述第一待处理图像中的拍摄对象和所述第一图像集合包括的图像中的拍摄对象为同一对象类型。
  15. 根据权利要求11或12所述的装置,其特征在于,
    所述获取模块,还用于获取采集所述第一待处理图像的目标图像采集装置的标识信息,并从第二图像集合包括的至少两个图像子集合中获取与所述目标图像采集装置的标识信息对应的所述第一图像集合,其中,所述第一图像集合为所述第二图像集合包括的至少两个图像子集合中的一个图像子集合,所述第一图像子集合包括通过所述目标图像采集装置采集到的图像。
  16. 根据权利要求11或12所述的装置,其特征在于,
    所述获取模块,还用于获取采集所述第一待处理图像的图像采集时刻,并从第二图像集合包括的至少两个图像子集合中获取与所述第一待处理图像的图像采集时刻对应的所述第一图像集合,其中,所述第一图像集合为所述第二图像集合包括的至少两个图像子集合中的一个图像子集合,所述第一图像集合包括在目标时间段内采集的图像,所述第一待处理图像的图像采集时刻位于所述目标时间段内。
  17. 根据权利要求11或12所述的装置,其特征在于,
    所述特征提取模块,具体用于对所述第一待处理图像进行特征提取,并根据所述第一数据分布特性,在进行特征提取过程中对所述第一特征图进行数据分布对齐,得到所述第一待处理图像的特征信息;
    所述装置还包括:匹配模块,用于根据所述第一待处理图像的特征信息,将所述第一待处理图像与第二图像集合中的图像进行匹配,得到匹配结果,其中,所述第一图像集合为所述第二图像集合包括的至少两个图像子集合中的一个图像子集合,所述匹配结果包括至少一个目标图像,所述目标图像与所述第一待处理图像包括同样的拍摄对象;或者,
    所述装置还包括:识别模块,用于根据所述第一待处理图像的特征信息,对所述第一待处理图像进行识别,得到所述第一待处理图像中拍摄对象的描述信息。
  18. 根据权利要求17所述的装置,其特征在于,
    所述获取模块,还用于获取第二待处理图像和第三数据分布特性,其中,所述第二待处理图像为所述第二图像子集合中任一个图像,所述第三数据分布特性为与第三图像集合中图像对应的特征图的数据分布特性,所述第二待处理图像与所述第三图像集合中图像的数据分布规律相同;
    所述特征提取模块,还用于对所述第二待处理图像进行特征提取,并根据所述第三数据分布特性,在进行特征提取过程中对第二特征图进行数据分布对齐,得到所述第二待处理图像的特征信息,其中,所述第二待处理图像为所述第三图像集包括的至少一个图像中任一个图像,所述第二特征图为对所述第二待处理图像进行特征提取过程中生成的;
    通过所述获取模块和所述特征提取模块重复执行上述步骤,直至得到所述第二图像集合中每个图像的特征信息;
    所述匹配模块,具体用于将所述第一待处理图像的特征信息与所述第二图像集合中每个图像的特征信息进行匹配,得到所述匹配结果。
  19. 一种图像处理装置,其特征在于,所述装置包括:
    获取模块,用于从训练图像集合中获取至少两个训练图像,所述至少两个训练图像包括第一训练图像和第二训练图像,所述第一训练图像和所述第二训练图像中包括相同的拍摄对象;
    所述获取模块,还用于获取与所述第一训练图像的特征图对应的数据分布特性,与所述第一训练图像的特征图对应的数据分布特性为与所述第一训练图像归属的训练图像子集合中图像对应的特征图的数据分布特性,所述第一训练图像与所述第一训练图像归属的训练图像子集合中图像的数据分布规律相同;
    特征提取模块,用于通过卷积神经网络对所述第一训练图像进行特征提取,并根据与所述第一训练图像的特征图对应的数据分布特性,在进行特征提取过程中对第三特征图进行数据分布对齐,得到第一训练图像的特征信息,其中,所述第三特征图为对所述第一训练图像进行特征提取过程中得到的;
    所述获取模块,还用于获取与所述第二训练图像的特征图对应的数据分布特性,与所述第二训练图像的特征图对应的数据分布特性为与所述第二训练图像归属的训练图像子集合中图像对应的特征图的数据分布特性,所述第二训练图像与所述第二训练图像归属的训练图像子集合中图像的数据分布规律相同;
    所述特征提取模块,还用于通过所述卷积神经网络对所述第二训练图像进行特征提取,并根据与所述第二训练图像的特征图对应的数据分布特性,在进行特征提取过程中对第四特征图进行数据分布对齐,得到所述第二训练图像的特征信息,其中,所述第四特征图为对所述第二训练图像进行特征提取过程中得到的;
    训练模块,用于根据所述第一训练图像的特征信息和所述第二训练图像的特征信息,通过损失函数对所述卷积神经网络进行训练,直至满足收敛条件,输出执行过迭代训练操作的卷积神经网络,其中,所述损失函数用于指示所述第一训练图像的特征信息和所述第二训练图像的特征信息之间的相似度。
  20. 一种图像处理装置,其特征在于,所述装置包括:
    获取模块,用于从训练图像集合中获取第三训练图像,第三训练图像为训练图像集合中的一个图像;
    所述获取模块,还用于获取与所述第三训练图像的特征图对应的数据分布特性,与所述第三训练图像的特征图对应的数据分布特性为与所述第三训练图像归属的训练图像子集合中图像对应的特征图的数据分布特性;
    特征提取模块,用于通过卷积神经网络对所述第三训练图像进行特征提取,并根据与所述第三训练图像的特征图对应的数据分布特性,在进行特征提取过程中对第三特征图进行数据分布对齐,得到第三训练图像的特征信息,其中,所述第三特征图为对所述第三训练图像进行特征提取过程中得到的;
    识别模块,用于根据所述第三训练图像的特征信息进行图像识别,得到所述第三训练图像中拍摄对象的描述信息;
    训练模块,用于根据所述描述信息,通过损失函数对所述卷积神经网络进行训练,直至满足收敛条件,输出执行过迭代训练操作的卷积神经网络。
  21. 一种执行设备,其特征在于,包括处理器,所述处理器与存储器耦合;
    所述存储器,用于存储程序;
    所述处理器,用于执行所述存储器中的程序,使得所述执行设备执行如权利要求1至8中任一项所述的方法。
  22. 一种训练设备,包括处理器和存储器,包括处理器,所述处理器与存储器耦合;
    所述存储器,用于存储程序;
    所述处理器,用于执行所述存储器中的程序,使得所述通信设备执行如权利要求9所述的方法,或者,使得计算机执行如权利要求10所述的方法。
  23. 一种计算机可读存储介质,其特征在于,包括程序,当其在计算机上运行时,使得计算机执行如权利要求1至8中任一项所述的方法,或者,使得计算机执行如权利要求9所述的方法,或者,使得计算机执行如权利要求10所述的方法。
  24. 一种电路系统,其特征在于,所述电路系统包括处理电路,所述处理电路配置为执行如权利要求1至8中任一项所述的方法,或者,所述处理电路配置为执行如权利要求9所述的方法,或者,使得计算机执行如权利要求10所述的方法。
PCT/CN2020/118076 2020-01-23 2020-09-27 一种图像处理方法以及相关设备 WO2021147366A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010085440.7A CN113159081A (zh) 2020-01-23 2020-01-23 一种图像处理方法以及相关设备
CN202010085440.7 2020-01-23

Publications (1)

Publication Number Publication Date
WO2021147366A1 true WO2021147366A1 (zh) 2021-07-29

Family

ID=76882101

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/118076 WO2021147366A1 (zh) 2020-01-23 2020-09-27 一种图像处理方法以及相关设备

Country Status (2)

Country Link
CN (1) CN113159081A (zh)
WO (1) WO2021147366A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023109551A1 (zh) * 2021-12-15 2023-06-22 腾讯科技(深圳)有限公司 一种活体检测方法、装置和计算机设备

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110363122A (zh) * 2019-07-03 2019-10-22 昆明理工大学 一种基于多层特征对齐的跨域目标检测方法
CN110443273A (zh) * 2019-06-25 2019-11-12 武汉大学 一种用于自然图像跨类识别的对抗零样本学习方法
US20190354807A1 (en) * 2018-05-16 2019-11-21 Nec Laboratories America, Inc. Domain adaptation for structured output via disentangled representations
CN110717526A (zh) * 2019-09-23 2020-01-21 华南理工大学 一种基于图卷积网络的无监督迁移学习方法

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7382897B2 (en) * 2004-04-27 2008-06-03 Microsoft Corporation Multi-image feature matching using multi-scale oriented patches
CN105631413A (zh) * 2015-12-23 2016-06-01 中通服公众信息产业股份有限公司 一种基于深度学习的跨场景行人搜索方法
CN106339435B (zh) * 2016-08-19 2020-11-03 中国银行股份有限公司 一种数据分发方法、装置及系统
CN108304754A (zh) * 2017-03-02 2018-07-20 腾讯科技(深圳)有限公司 车型的识别方法和装置
US10380413B2 (en) * 2017-07-13 2019-08-13 Robert Bosch Gmbh System and method for pose-invariant face alignment
CN109426858B (zh) * 2017-08-29 2021-04-06 京东方科技集团股份有限公司 神经网络、训练方法、图像处理方法及图像处理装置
CN108875732B (zh) * 2018-01-11 2022-07-12 北京旷视科技有限公司 模型训练与实例分割方法、装置和系统及存储介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190354807A1 (en) * 2018-05-16 2019-11-21 Nec Laboratories America, Inc. Domain adaptation for structured output via disentangled representations
CN110443273A (zh) * 2019-06-25 2019-11-12 武汉大学 一种用于自然图像跨类识别的对抗零样本学习方法
CN110363122A (zh) * 2019-07-03 2019-10-22 昆明理工大学 一种基于多层特征对齐的跨域目标检测方法
CN110717526A (zh) * 2019-09-23 2020-01-21 华南理工大学 一种基于图卷积网络的无监督迁移学习方法

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2023109551A1 (zh) * 2021-12-15 2023-06-22 腾讯科技(深圳)有限公司 一种活体检测方法、装置和计算机设备

Also Published As

Publication number Publication date
CN113159081A (zh) 2021-07-23

Similar Documents

Publication Publication Date Title
WO2020253416A1 (zh) 物体检测方法、装置和计算机存储介质
WO2021164752A1 (zh) 一种神经网络通道参数的搜索方法及相关设备
WO2021238281A1 (zh) 一种神经网络的训练方法、图像分类系统及相关设备
WO2021155792A1 (zh) 一种处理装置、方法及存储介质
WO2021238366A1 (zh) 一种神经网络构建方法以及装置
CN112990211B (zh) 一种神经网络的训练方法、图像处理方法以及装置
CN110222718B (zh) 图像处理的方法及装置
WO2021218471A1 (zh) 一种用于图像处理的神经网络以及相关设备
CN112446398A (zh) 图像分类方法以及装置
WO2022001805A1 (zh) 一种神经网络蒸馏方法及装置
WO2022179587A1 (zh) 一种特征提取的方法以及装置
WO2021164750A1 (zh) 一种卷积层量化方法及其装置
WO2022001372A1 (zh) 训练神经网络的方法、图像处理方法及装置
WO2021175278A1 (zh) 一种模型更新方法以及相关装置
WO2022111617A1 (zh) 一种模型训练方法及装置
WO2021227787A1 (zh) 训练神经网络预测器的方法、图像处理方法及装置
WO2021047587A1 (zh) 手势识别方法、电子设备、计算机可读存储介质和芯片
WO2022111387A1 (zh) 一种数据处理方法及相关装置
CN113095475A (zh) 一种神经网络的训练方法、图像处理方法以及相关设备
CN113011562A (zh) 一种模型训练方法及装置
WO2021249114A1 (zh) 目标跟踪方法和目标跟踪装置
WO2021190433A1 (zh) 更新物体识别模型的方法和装置
CN111832592A (zh) Rgbd显著性检测方法以及相关装置
CN113033321A (zh) 目标行人属性识别模型的训练方法及行人属性识别方法
US20230401838A1 (en) Image processing method and related apparatus

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20915448

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20915448

Country of ref document: EP

Kind code of ref document: A1