WO2023279799A1 - 对象识别方法、装置和电子系统 - Google Patents

对象识别方法、装置和电子系统 Download PDF

Info

Publication number
WO2023279799A1
WO2023279799A1 PCT/CN2022/086920 CN2022086920W WO2023279799A1 WO 2023279799 A1 WO2023279799 A1 WO 2023279799A1 CN 2022086920 W CN2022086920 W CN 2022086920W WO 2023279799 A1 WO2023279799 A1 WO 2023279799A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
target object
neural network
visible
initial
Prior art date
Application number
PCT/CN2022/086920
Other languages
English (en)
French (fr)
Inventor
张思朋
Original Assignee
北京旷视科技有限公司
北京迈格威科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京旷视科技有限公司, 北京迈格威科技有限公司 filed Critical 北京旷视科技有限公司
Publication of WO2023279799A1 publication Critical patent/WO2023279799A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the present disclosure relates to the technical field of image processing, and in particular to an object recognition method, device and electronic system.
  • an image may be divided into multiple local small images, the identification features of each local small image may be extracted one by one, and then the extracted identification features of each local small image may be combined to represent the entire image. Due to the large number of identification features extracted, the complexity of calculating the distance between samples is high, and this method usually needs to rely on a model that can accurately predict local visibility, resulting in high complexity of the model and cannot be deployed on a large scale.
  • the full-body image in the comparison between the full-body image and the half-body image, can also be cropped according to the visibility of the half-body image, and then the depth recognition model is used to compare the two. , requires repeated cropping and re-extraction of features, resulting in high computational complexity, and it is also difficult to deploy on a large scale.
  • the present disclosure provides an object recognition method, device and electronic system to at least reduce the complexity of object recognition in an image and facilitate large-scale deployment.
  • An object recognition method provided in the present disclosure may include: acquiring a first image containing a target object; if the visible part of the target object in the first image does not include all parts of the target object, performing deformation and filling processing on the first image , so that the relative position of the visible part of the target object contained in the first image in the first image meets the specified standard; the specified standard includes: when the first image contains all parts of the target object, the visible part is in the first image The relative position in; extracting the object features of the target object from the processed first image, and identifying the target object based on the object features.
  • the step of meeting the specified standard may include: inputting the first image into a pre-trained first neural network model, identifying the visible part of the target object in the first image through the first neural network model, and determining based on the visible part The proportion of the visible area of the first image and the filling boundary identification; wherein, the filling boundary identification is used to: indicate the position of the invisible part of the target object in the first image; if the proportion of the visible area is less than 1, determine the target object in the first image The visible part of the target object does not include all parts of the target object. Based on the proportion of the visible area and the filling boundary mark, the first image is deformed and filled, so that the visible part of the target object contained in the first image is within the first image. Relative position, meeting the specified criteria.
  • the first image is deformed and filled, so that the relative position of the visible part of the target object contained in the first image in the first image satisfies the specified standard.
  • the steps may include: adjusting the size of the first image based on the proportion of the visible area, so that the relative position of the visible part of the target object contained in the first image in the first image meets a specified standard; based on the filling boundary mark, adjusting the size The area corresponding to the invisible part of the adjusted first image is filled to restore the size of the first image to the size before the size adjustment.
  • the first image is deformed and filled, so that the relative position of the visible part of the target object contained in the first image in the first image satisfies the specified standard.
  • the steps may include: filling the area corresponding to the invisible part of the first image based on the proportion of the visible area and the filling boundary mark; adjusting the size of the first image after the filling process, so that the first image after the filling process The size of is restored to the size before the filling process, and the relative position of the visible part of the target object included in the adjusted first image in the adjusted first image meets the specified standard.
  • the object feature of the target object is extracted from the processed first image
  • the step of identifying the target object based on the object feature may include: extracting the object of the target object from the processed first image through the second neural network model feature; wherein, the object feature includes the feature of the visible part of the target object; calculate the feature distance between the object feature of the target object and the object feature of the specified object in the preset reference image, and determine whether the target object and the specified object are the same object.
  • the pre-trained first neural network model is determined in the following manner: acquiring a first sample image including all parts of the first object; cropping the first sample image, including at least a part of the first object The specified area of the second sample image is obtained, as well as the cropping ratio of the second sample image and the reference filling demarcation mark; the second sample image is input into the initial first neural network model to output the first initial neural network model through the initial first neural network model The initial visible area ratio and the initial filling boundary mark of the two-sample image; determine the first loss value based on the initial visible area proportion, the initial filling boundary mark, the cropping ratio, and the reference filling boundary mark, and update the initial first loss value based on the first loss value A weight parameter of the neural network model; continue to execute the step of obtaining the first sample image including all parts of the first object until the initial first neural network model converges to obtain the first neural network model.
  • the pre-trained first neural network model is determined in the following manner: acquiring a third sample image containing the second object, and all part detection frames and visible part detection frames corresponding to the second object;
  • the sample image is input into the initial first neural network model to output the first detection frame containing all parts corresponding to the second object and the second detection frame containing visible parts through the initial first neural network model, based on the first detection frame and the second detection frame, determine the initial visible area ratio and initial filling boundary mark of the second object; determine the second loss value based on the initial visible area proportion, initial filling boundary mark, all part detection frame and visible part detection frame, Updating the weight parameters of the initial first neural network model based on the second loss value; continue to execute the step of acquiring the third sample image containing the second object until the initial first neural network model converges to obtain the first neural network model.
  • the second neural network model is determined in the following manner: acquiring a fourth sample image including all parts of the third object, and target features of the third object; cropping the fourth sample image, including at least The specified area of a part of the part is obtained to obtain the fifth sample image; the fifth sample image is filled to obtain the sixth sample image; wherein, the relative position of the specified part of the third object in the sixth sample image is the same as that of the third object The relative position of the specified part in the fourth sample image is matched; the sixth sample image is input into the initial second neural network model to output the initial features of the third object in the sixth sample image through the initial second neural network model ; Determine the third loss value based on the initial feature and the target feature, and update the weight parameters of the initial second neural network model based on the third loss value; continue to execute the step of obtaining the fourth sample image including all parts of the third object until the initial second The neural network model converges to obtain a second neural network model.
  • An object recognition device may include: an acquisition module, configured to acquire a first image containing a target object; a processing module, configured to if the visible part of the target object in the first image does not include all parts of the target object , deform and fill the first image, so that the relative position of the visible part of the target object contained in the first image in the first image meets the specified standard; the specified standard includes: when the first image contains the target object For all parts, the relative positions of the visible parts in the first image; the identification module is used to extract the object features of the target object from the processed first image, and identify the target object based on the object features.
  • An electronic system provided by the present disclosure may include: a processing device and a storage device; a computer program is stored in the storage device, and the computer program executes any one of the above object recognition methods when the processed device is run.
  • the present disclosure provides a computer-readable storage medium.
  • a computer program is stored on the computer-readable storage medium.
  • the steps of any one of the above-mentioned object recognition methods are executed.
  • the object recognition method, device, and electronic system provided by the present disclosure first acquire a first image containing a target object; if the visible part of the target object in the first image does not include all parts of the target object, deform and fill the first image processing, so that the relative position of the visible part of the target object contained in the first image in the first image meets the specified standard; the specified standard includes: when the first image contains all the parts of the target object, the visible part is at the first The relative position in an image; finally, the object features of the target object are extracted from the processed first image, and the target object is identified based on the object features.
  • the target object when the part of the target object in the image is incomplete, the image is deformed and filled, so that the relative position of each part of the target object in the image matches the relative position when the image contains the complete part of the target object , the target object can be identified by directly extracting the object features of the target object from the processed image, without the need for local segmentation and identification of each part, which reduces the computational complexity of object recognition and is conducive to large-scale deployment.
  • FIG. 1 is a schematic structural diagram of an electronic system provided by an embodiment of the present disclosure
  • FIG. 2 is a flow chart of an object recognition method provided by an embodiment of the present disclosure
  • FIG. 3 is a flow chart of another object recognition method provided by an embodiment of the present disclosure.
  • FIG. 4 is a flowchart of another object recognition method provided by an embodiment of the present disclosure.
  • FIG. 5 is a schematic diagram of an image preprocessing process provided by an embodiment of the present disclosure.
  • FIG. 6 is a schematic structural diagram of an object recognition device provided by an embodiment of the present disclosure.
  • Artificial Intelligence is an emerging science and technology that studies and develops theories, methods, technologies and application systems for simulating and extending human intelligence.
  • the subject of artificial intelligence is a comprehensive subject that involves many technologies such as chips, big data, cloud computing, Internet of Things, distributed storage, deep learning, machine learning, and neural networks.
  • computer vision is specifically to allow machines to recognize the world.
  • Computer vision technology usually includes face recognition, liveness detection, fingerprint recognition and anti-counterfeiting verification, biometric recognition, face detection, pedestrian detection, target detection, pedestrian detection, etc.
  • Pedestrian re-identification is a technology that uses computer vision technology to judge whether there is a specific pedestrian in an image or video sequence; when re-identifying pedestrians, it is necessary to compare the similarity of different pedestrian images.
  • Pedestrian images are matched, wherein the normal pedestrian images may include pedestrian full-body images and the like.
  • an image may be divided into multiple local small images, and identification features of each local small image may be extracted one by one, and combined into whole-body identification features to represent the entire image.
  • This method usually needs to rely on a model that can accurately predict local visibility, such as a pose estimation model, a human body parsing model, etc. Due to the high accuracy required by the model, to achieve the corresponding accuracy, the model needs to be higher than the normal model.
  • the depth and complexity of the model lead to high complexity of the model, and, in the process of using the combined whole-body recognition features to calculate the distance with the features of the normal pedestrian image and confirm the similarity, due to the large number of recognition features extracted , resulting in a high complexity of distance calculation between samples, which cannot be deployed on a large scale; in the comparison of full-body images and bust images, related technologies can crop the full-body images according to the visibility of the bust images, and then use the depth recognition model Comparing the two, this method requires repeated cropping and re-extraction of features for different bust images and full-body images, and multiple cropping and splicing of the feature maps generated in the middle of the model, resulting in higher computational complexity and more difficult large-scale deployment .
  • embodiments of the present disclosure provide an object recognition method, device, and electronic system. This technology can be applied to the application of recognizing objects in images. This technology can be implemented by using corresponding software and hardware. The following describes the present disclosure Examples are described in detail.
  • FIG. 1 An example electronic system 100 for implementing the object recognition method, apparatus and electronic system of the embodiments of the present disclosure is described with reference to FIG. 1 .
  • the electronic system 100 may include one or more processing devices 102, one or more storage devices 104, an input device 106, an output device 108, and one or more image acquisition devices 110, these components are interconnected via a bus system 112 and/or other forms of connection mechanisms (not shown). It should be noted that the components and structure of the electronic system 100 shown in FIG. 1 are only exemplary rather than limiting, and the electronic system may also have other components and structures as required.
  • the processing device 102 may be a gateway, or an intelligent terminal, or a device including a central processing unit (CPU) or other forms of processing units with data processing capabilities and/or instruction execution capabilities, which can control the electronic system Data from other components in the electronic system 100 can be processed, and other components in the electronic system 100 can be controlled to perform desired functions.
  • CPU central processing unit
  • the storage device 104 may include one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory.
  • the volatile memory may include, for example, random access memory (RAM) and/or cache memory (cache).
  • the non-volatile memory may include, for example, a read-only memory (ROM), a hard disk, a flash memory, and the like.
  • One or more computer program instructions can be stored on the computer-readable storage medium, and the processing device 102 can execute the program instructions to realize the client functions (implemented by the processing device) in the embodiments of the present disclosure described below and/or other desired functionality.
  • Various application programs and various data such as various data used and/or generated by the application programs, may also be stored in the computer-readable storage medium.
  • the input device 106 may be a device used by a user to input instructions, and may include one or more of a keyboard, a mouse, a microphone, and a touch screen.
  • the output device 108 may output various information (eg, images or sounds) to the outside (eg, a user), and may include one or more of a display, a speaker, and the like.
  • the image capture device 110 can capture preview video frames or image data, and store the captured preview video frames or image data in the storage device 104 for use by other components.
  • each device in the example electronic system for realizing the object recognition method, device and electronic system may be integrated or distributed, such as processing device 102, storage device 104, input device 106 and the output device 108 are integrated into one body, and the image capture device 110 is set at a designated position where the target image can be captured.
  • the electronic system can be realized as an intelligent terminal such as a camera, a smart phone, a tablet computer, a computer, and a vehicle-mounted terminal.
  • This embodiment provides an object recognition method, which can be executed by the processing device in the above-mentioned electronic system; the processing device can be any device or chip with data processing capability.
  • the processing device can independently process the received information, or can be connected with a server to jointly analyze and process the information, and upload the processing results to the cloud.
  • the method may include the following steps:
  • Step S202 acquiring a first image including a target object.
  • the above-mentioned target object may be a person, an animal, or any other item; the above-mentioned first image may be a photo, picture, or video image containing the target object.
  • the first image may contain all body parts of the pedestrian, or may only contain some body parts of the pedestrian. If the first image contains all body parts of the pedestrian , then the first image is a full-body image of the pedestrian; if the first image only contains part of the pedestrian’s body parts, for example, only the head and upper body of the pedestrian, then the first image is a half-body image of the pedestrian .
  • a target object needs to be identified, it is usually necessary to first obtain a first image containing the target object, such as a photo, picture or video image containing part or all of the body parts of the target object.
  • Step S204 if the visible part of the target object in the first image does not include all parts of the target object, perform deformation and filling processing on the first image, so that the visible part of the target object contained in the first image is within the first image.
  • the relative position satisfies the specified standard; the specified standard includes: when the first image contains all parts of the target object, the relative position of the visible part in the first image.
  • the above-mentioned visible part can be understood as the part of the target object displayed in the first image.
  • the target object is a pedestrian and the first image is a bust image of the pedestrian
  • the visible part of the pedestrian in the first image may only be Including the head and upper body, etc.; all the above parts can be understood as all parts of the target object.
  • the target object is a pedestrian as an example, then all parts of the pedestrian can be understood as all body parts, including the head, upper body and lower body Wait.
  • the above-mentioned deformation processing of the first image can be understood as a process of adjusting the display size of the first image, for example, the first image can be reduced to reduce the size of the first image; Any side or multiple sides of the left side, the right side of the border, the upper side of the border and the lower side of the border are filled with preset values, and the size of the image can be changed by filling the preset data; the relative positions of the above-mentioned visible parts in the first image can be Including: the upper half, the lower half, the left half or the right half of the visible part in the first image, etc.
  • the visible part of the target object in the first image does not include all parts of the target object, consider the relative position of the corresponding visible part in the first image when all parts of the target object are included in the first image Usually there will be differences. Therefore, deformation processing and filling processing can be performed on the first image, so that the relative position of the visible part of the target object contained in the processed first image in the first image is the same as that in the first image.
  • the relative positions of the corresponding visible parts in the first image match; for example, if the first image is a half-body image containing the pedestrian's head and upper body, then the half-body image can be compressed and Filling processing, the relative position of the pedestrian's head and upper body in the first image in the processed half-body image is the same as when the first image contains all body parts of the pedestrian, the pedestrian's head and upper body in the first image
  • the relative positions in an image are the same or aligned, and the respective corresponding sizes of the head and the upper body can usually be the same.
  • Step S206 extracting object features of the target object from the processed first image, and identifying the target object based on the object features.
  • the above object characteristics can be understood as the relevant characteristics of the target object.
  • the object characteristics of the pedestrian can include the pedestrian's gender characteristics, age characteristics, clothing color characteristics or appearance characteristics, etc. ;
  • the corresponding object features of the target object can be extracted from the processed first image, and then the target object can be processed according to the extracted object features. to identify.
  • the object recognition method provided by the embodiment of the present disclosure first acquires the first image containing the target object; if the visible part of the target object in the first image does not include all parts of the target object, deform and fill the first image to obtain Make the relative position of the visible part of the target object contained in the first image in the first image meet the specified standard; the specified standard includes: when the first image contains all parts of the target object, the visible part is in the first image The relative position of the target object; finally, the object feature of the target object is extracted from the processed first image, and the target object is identified based on the object feature.
  • the target object when the part of the target object in the image is incomplete, the image is deformed and filled, so that the relative position of each part of the target object in the image matches the relative position when the image contains the complete part of the target object , the target object can be identified by directly extracting the object features of the target object from the processed image, without the need for local segmentation and identification of each part, which reduces the computational complexity of object recognition and is conducive to large-scale deployment.
  • Embodiments of the present disclosure also provide another object recognition method, which is implemented on the basis of the methods in the above embodiments; this method focuses on describing that if the visible parts of the target object in the first image do not include all parts of the target object, the first The image is deformed and filled so that the relative position of the visible part of the target object contained in the first image in the first image meets the specific implementation process of the specified standard.
  • the method may include the following steps:
  • Step S302 acquiring a first image including a target object.
  • Step S304 inputting the first image into the pre-trained first neural network model, identifying the visible part of the target object in the first image through the first neural network model, and determining the proportion and sum of the visible area of the first image based on the visible part Filling the boundary mark; wherein, the filling boundary mark is used to: indicate the position of the invisible part of the target object in the first image.
  • the above-mentioned first neural network model can also be called a visibility prediction model, which can be realized by various convolutional neural networks, such as residual network, VGG network, etc.
  • the first neural network model can be a convolutional neural network of any size
  • the model for example, can be resnet34_05x, etc.; usually the first neural network model is a lightweight convolutional neural network model, and the lightweight convolutional neural network model can ensure the accuracy of the neural network model on the basis of reducing the consumption of computing resources. Improve the efficiency of the neural network model to a certain extent.
  • the above visible area ratio can be understood as the proportion of the image area corresponding to the visible part of the target object in the first image when the first image contains all parts of the target object;
  • the visible part of the target object in the first image is usually firstly identified and processed. For example, taking the first image as a bust image of a pedestrian as an example, by Performing recognition processing, it can be determined that the visible parts of the pedestrian include the head and upper body of the pedestrian.
  • the recognition processing process usually also includes positioning processing of the visible parts.
  • the target object in the first image can be determined The position of the head and the position of the upper body, etc.; if the first image is a full-body image of the pedestrian, and the area corresponding to the head and upper body accounts for 70% of the whole-body image, only the head and upper body of the pedestrian are included
  • the proportion of the visible area of the image is 70%; the above-mentioned filling boundary mark can indicate the position of the invisible part in the target object, and according to the difference of the visible part of the target object in the first image, the target object indicated by the filling boundary mark
  • the positions of the invisible parts are also different.
  • the invisible part of the pedestrian is the lower body of the pedestrian.
  • the corresponding The position of the invisible part of the pedestrian indicated by the filled boundary mark may be below the boundary of the first image.
  • the first neural network model may include but not limited to the following two training methods.
  • the first training method will be introduced below, which can be specifically implemented through the following steps 1 to 4.
  • Step 1 Acquire a first sample image including all parts of a first object.
  • the above-mentioned first object may be a person, an animal, or any other item; the above-mentioned first sample image may be a photo, picture, or video image containing the first object.
  • the first object is a pedestrian as an example.
  • the first sample image including all parts of the pedestrian is obtained first, that is, the first sample image is the Full-body images of pedestrians.
  • Step 2 Crop a designated area including at least a part of the first object in the first sample image to obtain a second sample image, as well as a cropping ratio and a reference filling boundary mark of the second sample image.
  • the above at least a part of the part may be any part of the pedestrian in the first sample image, for example, it may be the lower body of the pedestrian.
  • One sample image obtain the second sample image after cropping and the corresponding cropping ratio and reference filling demarcation mark; for example, after cropping the first sample image, obtain the second sample image including the pedestrian's head and upper body, The corresponding crop ratio is 30%, and the reference fill demarcation is identified as below the border of the second sample image.
  • Step 3 Input the second sample image into the initial first neural network model, so as to output the initial visible area ratio and the initial filling boundary mark of the second sample image through the initial first neural network model.
  • the second sample image when the second sample image is obtained, the second sample image is usually adjusted to a preset size, and then the resized second sample image is input into the initial first neural network model to pass the The initial first neural network model outputs the initial visible area ratio and initial filling boundary mark of the second sample image.
  • Step 4 Determine the first loss value based on the proportion of the initial visible area, the initial filling boundary mark, the cropping ratio, and the reference filling boundary mark, and update the weight parameters of the initial first neural network model based on the first loss value; continue to execute the acquisition including The step of the first sample images of all parts of the first object until the initial first neural network model converges to obtain the first neural network model.
  • the training process of the first neural network model can be supervised by the cropping ratio obtained in the random cropping process and the reference filling boundary mark, based on the proportion of the initial visible area, the initial filling boundary mark, the cropping ratio and Determine the first loss value with reference to the filling boundary mark, and update the weight parameters of the initial first neural network model based on the first loss value; continue to perform the step of obtaining the first sample image including all parts of the pedestrian until the initial first neural network model converge to obtain the first neural network model.
  • the first neural network model is trained in a self-learning manner.
  • each image can be randomly cut out corresponding to the lower body of the pedestrian. image area, and adjust the cropped image to a uniform size, and record the cropping ratio and the reference padding boundary mark at the same time.
  • GT Round Truth
  • the proportion of the visible area of the first image and the filling boundary mark can be predicted by the trained first neural network model.
  • the second training method of the first neural network model is introduced below, which can be implemented specifically through the following steps 5 to 7.
  • Step 5 Obtain the third sample image including the second object, and all part detection frames and visible part detection frames corresponding to the second object.
  • the above-mentioned second object may be a person, an animal, or any other item; the above-mentioned third sample image may be a photo, picture, or video image containing the second object.
  • the second object is a pedestrian as an example.
  • the third sample image may be a panorama image corresponding to all the part detection frames and the visible part detection frames that only include the visible parts of pedestrians.
  • Step 6 Input the third sample image into the initial first neural network model, so as to output the first detection frame containing all parts and the second detection frame containing visible parts corresponding to the second object through the initial first neural network model , based on the first detection frame and the second detection frame, determine the initial visible area ratio and the initial filling boundary mark of the second object.
  • the third sample image is input into the initial first neural network model, and the first neural network model corresponding to the pedestrian including all parts is output through the initial first neural network model.
  • detection frame, and a second detection frame containing visible parts wherein the first detection frame can also be called an initial full-body frame, and the second detection frame can also be called an initial visible frame, etc., based on the second detection frame and the first
  • the ratio and relative position between the detection frames determine the proportion of the initial visible area of the pedestrian in the third sample image and the initial filling boundary mark.
  • Step 7 Determine the second loss value based on the proportion of the initial visible area, the initial filling boundary mark, all part detection frames and the visible part detection frame, and update the weight parameters of the initial first neural network model based on the second loss value; continue to perform acquisition The step of including the third sample image of the second object, until the initial first neural network model converges to obtain the first neural network model.
  • the training process of the first neural network model can be supervised by the detection frames of all parts of the pedestrian, and the corresponding detection frames of visible parts or pedestrian segmentation results, combined with the proportion of the initial visible area and the initial fill boundary mark to determine The second loss value, updating the weight parameters of the initial first neural network model based on the second loss value; continue to perform the step of obtaining the third sample image containing pedestrians until the initial first neural network model converges, and obtain the first neural network Model.
  • the proportion of the visible area of the first image and the filling boundary mark can be predicted by the trained first neural network model.
  • the above-mentioned second training method for the first neural network model can integrate the first neural network model into the pedestrian detection model, wherein the pedestrian detection model can adopt the model structure in the related art, and the pedestrian detection model can be used in the While predicting the pedestrian's full-body frame, predict the pedestrian's visible frame, and then calculate the proportion of the visible area and fill the boundary mark of the third sample image according to the ratio of the visible frame to the full-body frame.
  • Step S306 if the proportion of the visible area is less than 1, determine that the visible part of the target object in the first image does not include all parts of the target object, and perform deformation and filling processing on the first image based on the proportion of the visible area and the filling boundary mark, so as to The relative position of the visible part of the target object included in the first image in the first image satisfies a specified standard.
  • the proportion of the visible area of the first image determined by the first neural network model may be less than 1, or may be equal to 1. If the proportion of the visible area is equal to 1, it can be understood as the visible part of the target object in the first image Contains all parts of the target object, it does not need to be filled, or the area of the filled area is 0, if the visible area ratio is less than 1, it can be understood that the visible parts of the target object in the first image do not include all parts of the target object; for example, with For example, if the target object is a pedestrian, if the proportion of the visible area is equal to 1, it means that the first image is a full-body image of the pedestrian; if the proportion of the visible area is less than 1, it means that the first image is a half-length image of the pedestrian, and the half-length image may be Contains only the head of the pedestrian, or only the head and upper body of the pedestrian.
  • step S306 the deformation and filling processing of the first image may be realized through the following steps 8 and 9:
  • Step 8 Adjust the size of the first image based on the proportion of the visible area, so that the relative position of the visible part of the target object contained in the first image in the first image meets a specified standard.
  • the size of the first image can be adjusted according to the proportion of the visible area. For example, if the target object is a pedestrian, the visible parts of the pedestrian in the first image include the head and upper body , the proportion of the visible area is 0.7, and the size of the first image is 256*128 pixels.
  • the relative position of the visible part in the first image satisfies the specified standard above, and the specified standard includes: when the first image contains all parts of the pedestrian, the relative position of the visible part, that is, the head and the upper body in the first image Location.
  • Step 9 Fill the area corresponding to the invisible part of the adjusted first image based on the filling boundary mark, so as to restore the size of the first image to the size before the size adjustment.
  • the aforementioned invisible parts can be understood as other parts of the target object in the first image except for the visible parts.
  • the visible parts of the pedestrian in the first image include the head and upper body as an example.
  • the part is all the body parts of the pedestrian except the head and upper body; in actual implementation, since the filling boundary mark can indicate the position of the invisible part of the target object in the first image, it can be based on the filling
  • the demarcation mark is used to fill the area corresponding to the invisible part of the first image after size adjustment.
  • the visible part of the pedestrian in the first image includes the head and upper body, and the visible area occupies The ratio is 0.7, and the size of the first image is 256*128 pixel size as an example, then the invisible parts of the pedestrian in the first image include the lower body, and the filled boundary mark of the first image indicates the invisible parts of the pedestrian
  • step S306 the deformation and filling processing of the first image can also be realized through the following steps ten and eleven:
  • Step 10 Based on the proportion of the visible area and the filling boundary mark, perform filling processing on the area corresponding to the invisible part of the first image.
  • the area corresponding to the invisible part of the first image can be filled first based on the proportion of the visible area determined in the above steps and the filling boundary mark.
  • the first image The visible parts of the pedestrian include the head and upper body, the proportion of the visible area is 0.7, and the size of the first image is 256*128 pixels.
  • the invisible parts of the pedestrian in the first image include the lower body, the first image The position of the invisible part of the pedestrian indicated by the filled boundary mark is below the boundary of the first image.
  • the proportion of the visible area is 0.7
  • the size of the width direction remains unchanged, so the size of the area corresponding to the invisible part is 109.7*128, at the boundary of the first image Below, the area corresponding to the invisible part is filled.
  • the size in the width direction remains unchanged, that is, filling
  • the size of the first image after is 365.7*128.
  • Step 11 Adjust the size of the first image after the filling process, so as to restore the size of the first image after the filling process to the size before the filling process, and the visible part of the target object contained in the adjusted first image is at The adjusted relative position in the first image that satisfies the specified criteria.
  • the size of the filled first image can be adjusted based on the proportion of the above visible area.
  • the proportion of the visible area is 0.7
  • the size of the images is the same, which is still 256*128, and the relative position of the visible part of the pedestrian contained in the adjusted first image in the adjusted first image satisfies the specified standard, and the specified standard includes: when the first When an image includes all parts of the pedestrian, the visible parts are the relative positions of the head and upper body in the first image.
  • Step S308 extracting object features of the target object from the processed first image, and identifying the target object based on the object features.
  • the object recognition method first acquires a first image containing a target object; inputs the first image to a pre-trained first neural network model, and identifies the target object in the first image through the first neural network model
  • the visible part of the first image is determined based on the visible part, and the proportion of the visible area of the first image and the filling boundary mark are determined; if the proportion of the visible area is less than 1, it is determined that the visible part of the target object in the first image does not include all parts of the target object, based on the visible area Proportion and fill demarcation marks, deform and fill the first image, so that the relative position of the visible part of the target object contained in the first image in the first image meets the specified standard; finally, from the processed first
  • the object features of the target object are extracted from the image, and the target object is identified based on the object features.
  • the target object when the part of the target object in the image is incomplete, the image is deformed and filled, so that the relative position of each part of the target object in the image matches the relative position when the image contains the complete part of the target object , the target object can be identified by directly extracting the object features of the target object from the processed image, without the need for local segmentation and identification of each part, which reduces the computational complexity of object recognition and is conducive to large-scale deployment.
  • the embodiment of the present disclosure also provides another object recognition method, which is implemented on the basis of the method in the above embodiment; the method focuses on extracting the object features of the target object from the processed first image, and identifying the target object based on the object features
  • the method may include the following steps:
  • Step S402 acquiring a first image including a target object.
  • Step S404 if the visible part of the target object in the first image does not include all parts of the target object, perform deformation and filling processing on the first image, so that the visible part of the target object contained in the first image is within the first image.
  • the relative position satisfies the specified standard; the specified standard includes: when the first image contains all parts of the target object, the relative position of the visible part in the first image.
  • Step S406 using the second neural network model to extract object features of the target object from the processed first image; wherein, the object features include features of visible parts of the target object.
  • the above-mentioned second neural network model can also be called a pedestrian re-identification model, and this model can be realized through various convolutional neural networks, such as residual network, VGG network, etc.; the training method of the second neural network model is introduced below, Specifically, it can be realized through the following steps 15 to 19.
  • Step fifteen acquiring a fourth sample image including all parts of the third object and target features of the third object.
  • the above-mentioned third object may be a person, an animal, or any other item; the above-mentioned fourth sample image may be a photo, picture, or video image containing the third object.
  • the third object is a pedestrian as an example for description, refer to the schematic diagram of an image preprocessing process shown in FIG. 5 .
  • the training data original including all parts of the pedestrian is first obtained, that is, the training data original is the whole-body image of the pedestrian, corresponding to the fourth sample image above.
  • the above-mentioned target feature may be the gender feature, age feature, clothes color feature or appearance feature of the pedestrian.
  • Step 16 cropping a designated area including at least a part of the third object in the fourth sample image to obtain a fifth sample image.
  • the above at least a part of the part may be any part of the pedestrian in the fourth sample image, for example, it may be the lower body of the pedestrian. Randomly crop the training data original lower body to obtain a partial image, and the partial image is the pedestrian's half-body image, corresponding to the fifth sample image above.
  • Step seventeen fill the fifth sample image to obtain the sixth sample image; wherein, the relative position of the designated part of the third object in the sixth sample image is the same as that of the designated part of the third object in the fourth sample image match the relative position of the .
  • the partial image when the fifth sample image is obtained, that is, after obtaining the above partial image, the partial image can be filled with the value v to obtain a pad, where the filling value v can be selected as 0, and the lower boundary value (replica), 128, (103.939 , 116.779, 123.68) etc. Usually, its size will be restored to the size before filling.
  • the relative position of the visible part of the pedestrian in the filled partial image is the same as the relative position of the corresponding visible part in the training data original, that is, the filled partial image
  • the visible parts in the training data are aligned with the visible parts in the whole body image of the original training data, such as the head, shoulders and other body parts.
  • the size of each visible part can usually correspond to the same.
  • the data input of the model After alignment, the data input of the model The distribution is more uniform, which can reduce the noise level of the input.
  • the bust image usually includes the pedestrian's head and shoulders, etc., and the network can use this spatial pattern to learn the discrimination ability of the corresponding position.
  • the above The partial image after padding corresponds to the sixth sample image above.
  • the filled partial image is usually deformed to adjust its size to a specified size.
  • the specified size is constrained by computing power, and the size is not fixed.
  • the size of an image containing a human body can be 256x128 or 384x192.
  • Step eighteen input the sixth sample image into the initial second neural network model, so as to output the initial features of the third object in the sixth sample image through the initial second neural network model.
  • the training data original after the above-mentioned deformation and filling processing is completed on the training data original, it can be input to the initial second neural network model, and the initial features of the processed training data original are output through the initial second neural network model.
  • supervision information may not be required, or there may be supervision information. If there is supervision information, the corresponding supervision information may be the target feature of pedestrians in the image.
  • Step nineteen determine the third loss value based on the initial features and the target features, and update the weight parameters of the initial second neural network model based on the third loss value; continue to perform the step of obtaining the fourth sample image including all parts of the third object until The initial second neural network model converges to obtain the second neural network model.
  • the third loss value can be determined based on the initial features and the target features, and the third loss value can be used to indicate the difference between the initial features and the target features
  • the gap between; the weight parameters of the initial second neural network model can be updated based on the third loss value; continue to perform the step of obtaining the fourth sample image including all parts of the third object, and the target feature of the third object until the initial first 2.
  • the neural network model converges.
  • multiple training data originals are required, and multiple training data originals can be obtained from the preset data set. Each training data original needs to undergo the above-mentioned deformation and filling process, that is,
  • the second neural network model is obtained by using the "pad augmentation" pre-training method.
  • the processed first image can be input to the trained second neural network model, and the object features of the target object are output through the second neural network model, and the extracted object features usually include the target object features of the visible parts.
  • Step S408 calculating the characteristic distance between the object feature of the target object and the object feature of the specified object in the preset reference image, and determining whether the target object and the specified object are the same object.
  • the above-mentioned specified object can be understood as an object that is expected to be recognized when performing object recognition;
  • the above-mentioned preset reference image can be a pre-acquired image containing the specified object, and usually the object features of the specified object are obtained in advance;
  • the object feature of the target object can be extracted from the processed first image, the feature distance between the object feature of the target object and the object feature of the specified object can be detected, and the target object and the specified object can be judged according to the feature distance , and then confirm whether the target object and the specified object are the same object, for example, when the feature distance is less than or equal to the preset threshold, it is judged that the target object and the specified object are the same object; when the feature distance is greater than the preset threshold , judging that the target object and the specified object are not the same object.
  • the object features of the specified object can be extracted based on related technologies; if the visible parts of the specified object in the reference image do not include all parts of the specified object, the specified
  • the object characteristics of the object can be determined by the following steps twenty and twenty one:
  • Step 20 if the visible part of the specified object in the reference image does not include all parts of the specified object, perform deformation and filling processing on the reference image, so that the relative position of the visible part of the specified object contained in the reference image in the reference image, Satisfying the preset standard; the preset standard includes: when the reference image contains all parts of the specified object, the relative positions of the visible parts in the reference image.
  • the visible part of the specified object in the above reference image may or may not contain all parts of the specified object. If the visible part of the specified object in the reference image does not contain all parts of the specified object, considering the When the reference image contains all parts of the specified object, the relative positions of the corresponding visible parts in the reference image are usually different. Therefore, the reference image can be deformed and filled to make the processed reference image contain
  • the relative position of the visible part of the specified object in the reference image matches the relative position of the corresponding visible part in the first image when the reference image contains all parts of the specified object; for example, the reference image contains pedestrians
  • the half-body image of the head and upper body of the pedestrian can be compressed and filled.
  • the relative position of the head and upper body of the pedestrian in the reference image is the same as when the reference image contains the
  • the relative positions of the head and upper body of the pedestrian in the reference image are the same or aligned, and the corresponding sizes of the head and upper body can usually be the same.
  • Step 21 extracting object features of the specified object from the processed reference image.
  • the corresponding object features of the specified object can be extracted from the processed reference image.
  • the first image containing the target object is obtained; if the visible part of the target object in the first image does not include all parts of the target object, deformation and filling processing are performed on the first image, so that The relative position of the visible part of the target object contained in the first image in the first image meets a specified standard; the object features of the target object are extracted from the processed first image through the second neural network model. Calculate the feature distance between the object feature of the target object and the object feature of the specified object in the preset reference image, and determine whether the target object and the specified object are the same object.
  • the target object when the part of the target object in the image is incomplete, the image is deformed and filled, so that the relative position of each part of the target object in the image matches the relative position when the image contains the complete part of the target object , the target object can be identified by directly extracting the object features of the target object from the processed image, without the need for local segmentation and identification of each part, which reduces the computational complexity of object recognition and is conducive to large-scale deployment.
  • the following takes the first image as an example of a pedestrian image to further illustrate the object recognition method.
  • a visibility prediction model (corresponding to the above-mentioned first neural network) can be trained in a self-learning manner. model) to perceive the visible parts of the pedestrian in the pedestrian image; then when training and testing the pedestrian re-identification model (corresponding to the second neural network model above), the pedestrian image is preprocessed, and the visible parts predicted by the visibility prediction model , fill the corresponding invisible parts with a uniform value to ensure that the aspect ratio of the pedestrian image remains unchanged, and the visible parts are aligned with the corresponding visible parts when the pedestrian image contains all parts of the pedestrian.
  • the training process of the above-mentioned visibility prediction model and pedestrian re-identification model is usually adopted.
  • the parallel structure is carried out in parallel, and the serial structure is generally not used to carry out serially.
  • the half-body image and the full-body image can be mapped to the same feature subspace, and then in the feature subspace, the similarity comparison between the half-body image and the full-body image is carried out.
  • the features of the half-body image tend to form a relatively independent feature subspace, which leads to the closeness between the half-length images, but the big difference between the half-length image and the corresponding whole-body image.
  • This solution adjusts the overall distribution of the bust image to be consistent with the whole body image.
  • Each pedestrian in the image can have an identity ID corresponding to the pedestrian.
  • Each ID can have multiple images, and the bust image of each ID can return to its own ID.
  • the recall of bust images with the same ID is improved, and the misidentification of pedestrians in bust images with different IDs is reduced at the same time.
  • this method can adopt a self-learning training scheme and does not require additional labeling information, such as labeling of human body parts, human body poses, or visible parts of the human body, etc., thereby further simplifying the process of object recognition.
  • the device may include: an acquisition module 60 configured to acquire a first image containing a target object; a processing module 61 configured to If the visible parts of the target object in the first image do not include all parts of the target object, deform and fill the first image so that the visible parts of the target object contained in the first image are relatively The position meets the specified standard; the specified standard includes: when the first image contains all parts of the target object, the relative position of the visible part in the first image; the identification module 62 is configured to be used to extract the target object from the processed first image Object features of the target object are extracted, and the target object is identified based on the object features.
  • the object recognition device provided by the embodiments of the present disclosure first acquires the first image containing the target object; if the visible part of the target object in the first image does not include all parts of the target object, the first image is deformed and filled to obtain Make the relative position of the visible part of the target object contained in the first image in the first image meet the specified standard; the specified standard includes: when the first image contains all parts of the target object, the visible part is in the first image The relative position of the target object; finally, the object feature of the target object is extracted from the processed first image, and the target object is identified based on the object feature.
  • the target object when the part of the target object in the image is incomplete, the image is deformed and filled, so that the relative position of each part of the target object in the image matches the relative position when the image contains the complete part of the target object , the target object can be identified by directly extracting the object features of the target object from the processed image, without the need for local segmentation and identification of each part, which reduces the computational complexity of object recognition and is conducive to large-scale deployment.
  • the processing module 61 may also be configured to: input the first image into a pre-trained first neural network model, identify the visible part of the target object in the first image through the first neural network model, and based on the visible Determine the proportion of the visible area of the first image and the mark of the filling boundary; wherein, the mark of filling the boundary is used to: indicate the position of the invisible part of the target object in the first image; if the proportion of the visible area is less than 1, determine the The visible part of the target object does not include all parts of the target object. Based on the proportion of the visible area and the filling boundary mark, the first image is deformed and filled, so that the visible part of the target object contained in the first image is within the first image. The relative position in , satisfying the specified criteria.
  • the processing module 61 may also be configured to: adjust the size of the first image based on the proportion of the visible area, so that the relative position of the visible part of the target object contained in the first image in the first image satisfies A standard is specified; based on the filling boundary mark, the area corresponding to the invisible part of the first image after size adjustment is filled, so as to restore the size of the first image to the size before the size adjustment.
  • the processing module 61 may also be configured to: fill the area corresponding to the invisible part of the first image based on the proportion of the visible area and the filling boundary mark; adjust the size of the filled first image , to restore the size of the first image after the filling process to the size before the filling process, and the relative position of the visible part of the target object contained in the adjusted first image in the adjusted first image meets the specified standard .
  • the recognition module 62 may also be configured to: extract object features of the target object from the processed first image through a second neural network model; wherein, the object features include features of visible parts of the target object; The characteristic distance between the object feature of the target object and the object feature of the specified object in the preset reference image determines whether the target object and the specified object are the same object.
  • the device may further include a first determination module, through which the pre-trained first neural network model is determined, and the first determination module is configured to: obtain the first determination module including all parts of the first object A sample image; cropping a specified area containing at least a part of the first object in the first sample image to obtain a second sample image, as well as the cropping ratio of the second sample image and the reference filling boundary mark; the second sample The image is input into the initial first neural network model, so as to output the initial visible area proportion and the initial filling boundary mark of the second sample image through the initial first neural network model; based on the initial visible area proportion, initial filling boundary mark, cropping Determine the first loss value based on the proportion and the filling boundary mark, and update the weight parameters of the initial first neural network model based on the first loss value; continue to perform the step of obtaining the first sample image including all parts of the first object until the initial first A neural network model converges to obtain a first neural network model.
  • a first determination module through which the pre-trained first neural network
  • the device may further include a second determination module through which the pre-trained first neural network model is determined, and the second determination module is configured to: acquire a third sample containing the second object image, and all parts detection frames and visible parts detection frames corresponding to the second object; the third sample image is input into the initial first neural network model to output the second object corresponding to all parts through the initial first neural network model
  • the first detection frame of , and the second detection frame containing the visible part based on the first detection frame and the second detection frame, determine the initial visible area ratio and the initial filling boundary mark of the second object; based on the initial visible area ratio, Initially fill the demarcation mark, all part detection frames and visible part detection frames, determine the second loss value, and update the initial weight parameters of the first neural network model based on the second loss value; continue to execute the process of obtaining the third sample image containing the second object Step until the initial first neural network model converges to obtain the first neural network model.
  • the device may further include a third determination module through which the second neural network model is determined, and the third determination module is configured to: acquire a fourth sample image including all parts of the third object, And the target feature of the third object; cutting out the specified area containing at least a part of the third object in the fourth sample image to obtain the fifth sample image; filling the fifth sample image to obtain the sixth sample image; wherein, The relative position of the designated part of the third object in the sixth sample image matches the relative position of the designated part of the third object in the fourth sample image; the sixth sample image is input into the initial second neural network model, In order to output the initial feature of the third object in the sixth sample image through the initial second neural network model; determine a third loss value based on the initial feature and the target feature, and update the weight parameter of the initial second neural network model based on the third loss value; Continue to execute the step of acquiring the fourth sample image including all parts of the third object until the initial second neural network model converges to obtain the second neural network model.
  • the third determination module is configured to: acquire
  • the object recognition device provided by the embodiment of the present disclosure has the same realization principle and technical effect as the aforementioned object recognition method embodiment.
  • the parts not mentioned in the object recognition device embodiment part you can refer to the aforementioned object recognition method Corresponding content in the embodiment.
  • An embodiment of the present disclosure also provides an electronic system, which may include: an image acquisition device, a processing device, and a storage device; the image acquisition device is used to obtain a preview video frame or image data; a computer program is stored on the storage device, The computer program executes the steps of the object recognition method as described above when the processed device is run.
  • An embodiment of the present disclosure also provides a computer-readable storage medium, on which a computer program is stored, and when the computer program is run by a processing device, the steps of the above-mentioned object recognition method are executed.
  • the computer program product of the object recognition method, device, and electronic system provided by the embodiments of the present disclosure includes a computer-readable storage medium storing program codes, and the instructions included in the program codes can be used to execute the methods described in the preceding method embodiments. For the specific implementation of the method, reference may be made to the method embodiments, which will not be repeated here.
  • connection should be interpreted in a broad sense, for example, it can be a fixed connection or a detachable connection , or integrally connected; it may be mechanically connected or electrically connected; it may be directly connected or indirectly connected through an intermediary, and it may be the internal communication of two components.
  • installation e.g., it can be a fixed connection or a detachable connection , or integrally connected; it may be mechanically connected or electrically connected; it may be directly connected or indirectly connected through an intermediary, and it may be the internal communication of two components.
  • the computer software product is stored in a storage medium, including several
  • the instructions are used to make a computer device (which may be a personal computer, a server, or a network device, etc.) execute all or part of the steps of the methods described in various embodiments of the present disclosure.
  • the aforementioned storage media include: U disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), magnetic disk or optical disc, etc., which can store program codes. .
  • the present disclosure provides an object recognition method, device, and electronic system for acquiring a first image containing a target object; if the visible part of the target object in the first image does not include all parts of the target object, the first image is deformed and Filling processing, so that the relative position of the visible part of the target object contained in the first image in the first image meets the specified standard; then extract the object features of the target object from the processed first image, and then identify the target object.
  • the target object when the part of the target object in the image is incomplete, the image is deformed and filled, so that the relative position of each part of the target object in the image matches the relative position when the image contains the complete part of the target object , the target object can be identified by directly extracting the object features of the target object from the processed image, without the need for local segmentation and identification of each part, which reduces the computational complexity of object recognition and is conducive to large-scale deployment.
  • the object recognition method, device and electronic system of the present disclosure are reproducible and can be applied in various applications.
  • the relative object recognition method, device and electronic system of the present disclosure may be applied in the technical field of image processing and the like.

Abstract

本公开提供了一种对象识别方法、装置和电子系统,获取包含目标对象的第一图像;如果第一图像中目标对象的可见部位不包含目标对象的全部部位,对该第一图像进行形变和填充处理,以使第一图像中包含的目标对象的可见部位在第一图像中的相对位置满足指定标准;再从处理后的第一图像中提取目标对象的对象特征,进而识别目标对象。该方式中,当图像中的目标对象的部位不完整时,对图像进行形变和填充处理,使目标对象中各个部位在图像中的相对位置与图像包含目标对象的完整部位时的相对位置相匹配,直接从处理后的图像中提取目标对象的对象特征,即可识别目标对象,无需进行各个部位的局部分割和识别,降低了对象识别的计算复杂度,有利于大规模部署。

Description

对象识别方法、装置和电子系统
相关申请的交叉引用
本公开要求于2021年07月05日提交中国国家知识产权局的公开号为202110756923X、名称为“对象识别方法、装置和电子系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本公开涉及图像处理技术领域,尤其是涉及一种对象识别方法、装置和电子系统。
背景技术
行人重识别时需要比对不同行人图像的相似性,如果图像中的行人被遮挡,该行人的人体区域出现大面积缺失,图像的表观信息就会出现较大偏差,从而难以与正常行人图像进行匹配。相关技术中,可以将图像划分成多个局部小图,逐个提取每个局部小图的识别特征,再将所提取的每个局部小图的识别特征进行组合来代表整张图像。由于提取的识别特征较多,导致样本间距离计算的复杂度较高,并且,该方式通常需要依赖一个可以准确预测局部可见性的模型,导致模型的复杂度较高,无法大规模部署。相关技术中在对全身图像与半身图像的比对中,还可以根据半身图像的可见程度对全身图像进行裁剪,然后采用深度识别模型对二者进行比较,该方式对不同的半身图像和全身图像,需要重复裁剪以及重新提取特征,导致计算复杂度也较高,同样难以大规模部署。
发明内容
本公开提供了一种对象识别方法、装置和电子系统,以至少降低对图像中对象识别的复杂度,便于大规模部署。
本公开提供的一种对象识别方法,方法可以包括:获取包含目标对象的第一图像;如果第一图像中目标对象的可见部位不包含目标对象的全部部位,对第一图像进行形变和填充处理,以使第一图像中包含的目标对象的可见部位在第一图像中的相对位置,满足指定标准;指定标准包括:当第一图像中包含目标对象的全部部位时,可见部位在第一图像中的相对位置;从处理后的第一图像中提取目标对象的对象特征,基于对象特征识别目标对象。
可选地,如果第一图像中目标对象的可见部位不包含目标对象的全部部位,对第一图像进行形变和填充处理,以使第一图像中包含的目标对象的可见部位在第一图像中的相对位置,满足指定标准的步骤可以包括:将第一图像输入至预先训练好的第一神经网络模型,通过第一神经网络模型识别第一图像中,目标对象的可见部位,基于可见部位确定第一图像的可见区域占比和填充分界标识;其中,填充分界标识用于:指示第一图像中目标对象的不可见部位的位置;如果可见区域占比小于1,确定第一图像中目标对象的可见部位不包含目标对象的全部部位,基于可见区域占比和填充分界标识,对第一图像进行形变和填充处理,以使第一图像中包含的目标对象的可见部位在第一图像中的相对位置,满足指定标准。
可选地,基于可见区域占比和填充分界标识,对第一图像进行形变和填充处理,以使第一图像中包含的目标对象的可见部位在第一图像中的相对位置,满足指定标准的步骤可以包括:基于可见区域占比,调整第一图像的尺寸,以使第一图像中包含的目标对象的可见部位在第一图像中的相对位置,满足指定标准;基于填充分界标识,对尺寸调整后的第一图像的不可见部位所对应的区域进行填充处理,以将第一图像的尺寸恢复至尺寸调整之前的尺寸。
可选地,基于可见区域占比和填充分界标识,对第一图像进行形变和填充处理,以使第一图像中包含的目标对象的可见部位在第一图像中的相对位置,满足指定标准的步骤可以包括:基于可见区域占比和填充分界标识,对第一图像的不可见部位所对应的区域进行填充处理;调整填充处理后的第一图像的尺寸,以将填充处理后的第一图像的尺寸恢复至填充处理之前的尺寸,且调整后的第一图像中包含的目标对象的可见部位在调整后的第一图像中的相对位置,满足指定标准。
可选地,从处理后的第一图像中提取目标对象的对象特征,基于对象特征识别目标对象的步骤可以包括:通过第二神经网络模型,从处理后的第一图像中提取目标对象的对象特征;其中,对象特征包括目标对象的可见部位的特征;计算目标对象的对象特征与预设参考图像中指定对象的对象特征之间的特征距离,确定目标对象和指定对象是否为同一对象。
可选地,预先训练好的第一神经网络模型,通过下述方式确定:获取包含第一对象全部部位的第一样本图像;裁切第一样本图像中,包含第一对象至少一部分部位的指定区域,得到第二样本图像,以及第二样本图像的裁切比例和参考填充分界标识;将第二样本图像输入至初始第一神经网络模型中,以通过初始第一神经网络模型输出第二样本图像的初始可见区域占比和初始填充分界标识;基于初始可见区域占比、初始填充分界标识、裁切比例,和参考填充分界标识确定第一损失值,基于第一损失值更新初始第一神经网络模型的权重参数;继续执行获取包含第一对象全部部位的第一样本图像的步骤,直至初始第一神经网络模型收敛,得到第一神经网络模型。
可选地,预先训练好的第一神经网络模型,通过下述方式确定:获取包含第二对象的第三样本图像,以及第二对象对应的全部部位检测框和可见部位检测框;将第三样本图像输入至初始第一神经网络模型中,以通过初始第一神经网络模型输出第二对象对应的包含全部部位的第一检测框,以及包含可见部位的第二检测框,基于第一检测框和第二检测框,确定第二对象的初始可见区域占比和初始填充分界标识;基于初始可见区域占比、初始填充分界标识、全部部位检测框和可见部位检测框,确定第二损失值,基于第二损失值更新初始第一神经网络模型的权重参数;继续执行获取包含第二对象的第三样本图像的步骤,直至初始第一神经网络模型收敛,得到第一神经网络模型。
可选地,第二神经网络模型,通过下述方式确定:获取包含第三对象全部部位的第四样本图像,以及第三对象的目标特征;裁切第四样本图像中,包含第三对象至少一部分部位的指定区域,得到第五样本图像;对第五样本图像进行填充处理,得到第六样本图像;其中,第三对象的指定部位在第六样本图像中的相对位置,与第三对象的指定部位在第四样本图像中的相对位置相匹配;将第六样本图像输入至初始第二神经网络模型中,以通过初始第二神经网络模型输出第六样本图 像中,第三对象的初始特征;基于初始特征和目标特征确定第三损失值,基于第三损失值更新初始第二神经网络模型的权重参数;继续执行获取包含第三对象全部部位的第四样本图像的步骤,直至初始第二神经网络模型收敛,得到第二神经网络模型。
本公开提供的一种对象识别装置,装置可以包括:获取模块,用于获取包含目标对象的第一图像;处理模块,用于如果第一图像中目标对象的可见部位不包含目标对象的全部部位,对第一图像进行形变和填充处理,以使第一图像中包含的目标对象的可见部位在第一图像中的相对位置,满足指定标准;指定标准包括:当第一图像中包含目标对象的全部部位时,可见部位在第一图像中的相对位置;识别模块,用于从处理后的第一图像中提取目标对象的对象特征,基于对象特征识别目标对象。
本公开提供的一种电子系统,电子系统可以包括:处理设备和存储装置;存储装置上存储有计算机程序,计算机程序在被处理设备运行时执行上述任一项的对象识别方法。
本公开提供的一种计算机可读存储介质,计算机可读存储介质上存储有计算机程序,计算机程序被处理设备运行时执行上述任一项的对象识别方法的步骤。
本公开提供的对象识别方法、装置和电子系统,首先获取包含目标对象的第一图像;如果第一图像中目标对象的可见部位不包含目标对象的全部部位,对该第一图像进行形变和填充处理,以使该第一图像中包含的目标对象的可见部位在第一图像中的相对位置满足指定标准;该指定标准包括:当第一图像中包含目标对象的全部部位时,可见部位在第一图像中的相对位置;最后从处理后的第一图像中提取目标对象的对象特征,基于对象特征识别目标对象。该方式中,当图像中的目标对象的部位不完整时,对图像进行形变和填充处理,使目标对象中各个部位在图像中的相对位置与图像包含目标对象的完整部位时的相对位置相匹配,直接从处理后的图像中提取目标对象的对象特征,即可识别目标对象,无需进行各个部位的局部分割和识别,降低了对象识别的计算复杂度,有利于大规模部署。
附图说明
为了更清楚地说明本公开具体实施方式或相关技术中的技术方案,下面将对具体实施方式或相关技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本公开的一些实施方式,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为本公开实施例提供的一种电子系统的结构示意图;
图2为本公开实施例提供的一种对象识别方法的流程图;
图3为本公开实施例提供的另一种对象识别方法的流程图;
图4为本公开实施例提供的又一种对象识别方法的流程图;
图5为本公开实施例提供的一种图像预处理过程的示意图;
图6为本公开实施例提供的一种对象识别装置的结构示意图。
具体实施方式
下面将结合实施例对本公开的技术方案进行清楚、完整地描述,显然,所描述的实施例是本公开一部分实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本公开保护的范围。
近年来,基于人工智能的计算机视觉、深度学习、机器学习、图像处理、图像识别等技术研究取得了重要进展。人工智能(Artificial Intelligence,AI)是研究、开发用于模拟、延伸人的智能的理论、方法、技术及应用系统的新兴科学技术。人工智能学科是一门综合性学科,涉及芯片、大数据、云计算、物联网、分布式存储、深度学习、机器学习、神经网络等诸多技术种类。计算机视觉作为人工智能的一个重要分支,具体是让机器识别世界,计算机视觉技术通常包括人脸识别、活体检测、指纹识别与防伪验证、生物特征识别、人脸检测、行人检测、目标检测、行人识别、图像处理、图像识别、图像语义理解、图像检索、文字识别、视频处理、视频内容识别、行为识别、三维重建、虚拟现实、增强现实、同步定位与地图构建(SLAM)、计算摄影、机器人导航与定位等技术。随着人工智能技术的研究和进步,该项技术在众多领域展开了应用,例如安防、城市管理、交通管理、楼宇管理、园区管理、人脸通行、人脸考勤、物流管理、仓储管理、机器人、智能营销、计算摄影、手机影像、云服务、智能家居、穿戴设备、无人驾驶、自动驾驶、智能医疗、人脸支付、人脸解锁、指纹解锁、人证核验、智慧屏、智能电视、摄像机、移动互联网、网络直播、美颜、美妆、医疗美容、智能测温等领域。
行人重识别是利用计算机视觉技术判断图像或者视频序列中是否存在特定行人的技术;行人重识别时需要比对不同行人图像的相似性,如果图像的表观信息出现较大偏差,就难以与正常行人图像进行匹配,其中,该正常行人图像可以包括行人的全身图像等。相关技术中,可以将图像划分成多个局部小图,逐个提取每个局部小图的识别特征,组合成全身识别特征来代表整张图像。该方式通常需要依赖一个可以准确预测局部可见性的模型,如姿态估计模型、人体parsing模型等,由于模型要求的准确度较高,要达到相应的准确度,需要该模型具有比正常模型更高的模型深度和复杂度,导致模型的复杂度较高,并且,在采用组合成的全身识别特征在与正常行人图像的特征进行距离计算,确认相似度的过程中,由于提取的识别特征较多,导致样本间距离计算的复杂度也较高,无法大规模部署;相关技术在对全身图像与半身图像的比对中,可以根据半身图像的可见程度对全身图像进行裁剪,然后使用深度识别模型对二者进行比较,该方式对不同的半身图像和全身图像需要重复裁剪以及重新提取特征,对模型中间产生的特征图进行多次裁剪、拼接,导致计算复杂度更高,更难以大规模部署。基于此,本公开实施例提供了一种对象识别方法、装置和电子系统,该技术可以应用于对图像中的对象进行识别的应用中,该技术可采用相应的软件和硬件实现,以下对本公开实施例进行详细介绍。
下面将参照附图对根据本公开的实施例所提供的电子系统进行详细地描述。
首先,参照图1来描述用于实现本公开实施例的对象识别方法、装置和电子系统的示例电子系统100。
如图1所示的一种电子系统的结构示意图,电子系统100可以包括一个或多个处理设备102、 一个或多个存储装置104、输入装置106、输出装置108以及一个或多个图像采集设备110,这些组件通过总线系统112和/或其它形式的连接机构(未示出)互连。应当注意,图1所示的电子系统100的组件和结构只是示例性的,而非限制性的,根据需要,所述电子系统也可以具有其他组件和结构。
所述处理设备102可以是网关,也可以为智能终端,或者是包含中央处理单元(CPU)或者具有数据处理能力和/或指令执行能力的其它形式的处理单元的设备,可以对所述电子系统100中的其它组件的数据进行处理,还可以控制所述电子系统100中的其它组件以执行期望的功能。
所述存储装置104可以包括一个或多个计算机程序产品,所述计算机程序产品可以包括各种形式的计算机可读存储介质,例如易失性存储器和/或非易失性存储器。所述易失性存储器例如可以包括随机存取存储器(RAM)和/或高速缓冲存储器(cache)等。所述非易失性存储器例如可以包括只读存储器(ROM)、硬盘、闪存等。在所述计算机可读存储介质上可以存储一个或多个计算机程序指令,处理设备102可以运行所述程序指令,以实现下文所述的本公开实施例中(由处理设备实现)的客户端功能以及/或者其它期望的功能。在所述计算机可读存储介质中还可以存储各种应用程序和各种数据,例如所述应用程序使用和/或产生的各种数据等。
所述输入装置106可以是用户用来输入指令的装置,并且可以包括键盘、鼠标、麦克风和触摸屏等中的一个或多个。
所述输出装置108可以向外部(例如,用户)输出各种信息(例如,图像或声音),并且可以包括显示器、扬声器等中的一个或多个。
所述图像采集设备110可以采集预览视频帧或图像数据,并且将采集到的预览视频帧或图像数据存储在所述存储装置104中以供其它组件使用。
示例性地,用于实现根据本公开实施例的对象识别方法、装置和电子系统的示例电子系统中的各器件可以集成设置,也可以分散设置,诸如将处理设备102、存储装置104、输入装置106和输出装置108集成设置于一体,而将图像采集设备110设置于可以采集到目标图像的指定位置。当上述电子系统中的各器件集成设置时,该电子系统可以被实现为诸如相机、智能手机、平板电脑、计算机、车载终端等智能终端。
下面将参照附图对根据本公开的一些实施例所提供的对象识别方法进行详细地描述。
本实施例提供了一种对象识别方法,该方法可以由上述电子系统中的处理设备执行;该处理设备可以是具有数据处理能力的任何设备或芯片。该处理设备可以独立对接收到的信息进行处理,也可以与服务器相连,共同对信息进行分析处理,并将处理结果上传至云端。如图2所示,该方法可以包括如下步骤:
步骤S202,获取包含目标对象的第一图像。
上述目标对象可以是人物、动物或其他任何物品等;上述第一图像可以是包含目标对象的照片、图片或视频图像等。为方便说明,以该目标对象是行人为例,该第一图像中可能包含该行人 的全部身体部位,也可能只包含该行人的部分身体部位,如果该第一图像包含该行人的全部身体部位,则该第一图像是该行人的全身图像;如果该第一图像只包含该行人的部分身体部位,比如,只包含该行人的头部和上半身,则该第一图像是该行人的半身图像。在实际实现时,当需要对目标对象进行识别时,通常需要先获取到包含该目标对象的第一图像,如,包含该目标对象的部分或全部身体部位的照片、图片或视频图像等。
步骤S204,如果第一图像中目标对象的可见部位不包含目标对象的全部部位,对第一图像进行形变和填充处理,以使第一图像中包含的目标对象的可见部位在第一图像中的相对位置,满足指定标准;该指定标准包括:当第一图像中包含目标对象的全部部位时,可见部位在第一图像中的相对位置。
上述可见部位可以理解为在第一图像中所显示的目标对象的部位,比如,如果目标对象是行人,该第一图像是行人的半身图像,则该第一图像中该行人的可见部位可能只包括头部和上半身等;上述全部部位可以理解为目标对象的所有部位,比如,仍以目标对象是行人为例,则该行人的全部部位可以理解为全部身体部位,包括头部、上半身和下半身等。上述对第一图像的形变处理可以理解为对第一图像的显示尺寸进行调整的过程,比如可以缩小第一图像,以减小第一图像的尺寸等;上述填充处理可以理解为在图像的边界左侧、边界右侧、边界上侧和边界下侧中的任意一侧或多侧填充预设数值,通过填充预设数据可以改变图像的尺寸;上述可见部位在第一图像中的相对位置可以包括:可见部位在第一图像中的上半部分、下半部分、左半部分或右半部分等。在实际实现时,如果第一图像中目标对象的可见部位不包含目标对象的全部部位,考虑到当第一图像中包含目标对象的全部部位时,相应的可见部位在第一图像中的相对位置通常会有差异,因此,可以对该第一图像进行形变处理和填充处理,使处理后的第一图像中包含的目标对象的可见部位在第一图像中的相对位置,与当第一图像中包含目标对象的全部部位时,相应的可见部位在第一图像中的相对位置相匹配;比如,该第一图像是包含行人的头部和上半身的半身图像,则可以对该半身图像进行压缩和填充处理,处理后的半身图像中该行人的头部和上半身在第一图像中的相对位置,与当该第一图像中包含该行人的全部身体部位时,该行人的头部和上半身在第一图像中的相对位置相同或对齐,且头部和上半身分别对应的尺寸通常也可以相同。
步骤S206,从处理后的第一图像中提取目标对象的对象特征,基于该对象特征识别目标对象。
上述对象特征可以理解为该目标对象所具有的相关特征,比如,以该目标对象是行人为例,则该行人的对象特征可以包括该行人的性别特征、年龄特征、衣服颜色特征或相貌特征等;在实际实现时,当对获取到的第一图像进行形变和填充处理后,可以从处理后的第一图像中提取出目标对象相应的对象特征,再根据提取到的对象特征对该目标对象进行识别。
本公开实施例提供的对象识别方法,首先获取包含目标对象的第一图像;如果第一图像中目标对象的可见部位不包含目标对象的全部部位,对该第一图像进行形变和填充处理,以使该第一图像中包含的目标对象的可见部位在第一图像中的相对位置满足指定标准;该指定标准包括:当第一图像中包含目标对象的全部部位时,可见部位在第一图像中的相对位置;最后从处理后的第 一图像中提取目标对象的对象特征,基于对象特征识别目标对象。该方式中,当图像中的目标对象的部位不完整时,对图像进行形变和填充处理,使目标对象中各个部位在图像中的相对位置与图像包含目标对象的完整部位时的相对位置相匹配,直接从处理后的图像中提取目标对象的对象特征,即可识别目标对象,无需进行各个部位的局部分割和识别,降低了对象识别的计算复杂度,有利于大规模部署。
下面将参照附图对根据本公开的另一些实施例所提供的对象识别方法进行详细地描述。
本公开实施例还提供另一种对象识别方法,该方法在上述实施例方法的基础上实现;该方法重点描述如果第一图像中目标对象的可见部位不包含目标对象的全部部位,对第一图像进行形变和填充处理,以使第一图像中包含的目标对象的可见部位在第一图像中的相对位置,满足指定标准的具体实现过程,如图3所示,该方法可以包括如下步骤:
步骤S302,获取包含目标对象的第一图像。
步骤S304,将第一图像输入至预先训练好的第一神经网络模型,通过第一神经网络模型识别第一图像中,目标对象的可见部位,基于可见部位确定第一图像的可见区域占比和填充分界标识;其中,填充分界标识用于:指示第一图像中目标对象的不可见部位的位置。
上述第一神经网络模型也可以称为可见度预测模型,该模型可以通过多种卷积神经网络实现,如残差网络、VGG网络等,该第一神经网络模型可以是任意大小的卷积神经网络模型,比如,可以是resnet34_05x等;通常该第一神经网络模型是一个轻量卷积神经网络模型,轻量卷积神经网络模型可以在减少消耗计算资源的基础上,保证神经网络模型的精度,在一定程度上提升神经网络模型的效率。
上述可见区域占比可以理解为当第一图像中包含目标对象的全部部位时,该目标对象的可见部位对应的图像区域在第一图像中所占的比例;在实际实现时,在获取到包含目标对象的第一图像后,通常会先对该第一图像中目标对象的可见部位进行识别处理等,比如,以第一图像是行人的半身图像为例,通过对该第一图像中的行人进行识别处理,可以确定该行人的可见部位包括该行人的头部和上半身,该识别处理过程通常还包括对可见部位的定位处理,通过定位处理,就可以确定出该第一图像中,目标对象的头部的位置和上半身的位置等;如果该第一图像是该行人的全身图像,头部和上半身对应的区域占该全身图像的70%,则只包含该行人的头部和上半身的半身图像的可见区域占比即为70%;上述填充分界标识可以指示目标对象中的不可见部位的位置,根据第一图像中目标对象的可见部位的不同,该填充分界标识所指示的该目标对象的不可见部位的位置也不同,比如,仍以第一图像是行人的半身图像,可见部位包括该行人的头部和上半身为例,则该行人的不可见部位是该行人的下半身,相应的填充分界标识所指示的该行人的不可见部位的位置可以是在第一图像的边界下方。
在实际实现时,第一神经网络模型可以包括但不限于以下两种训练方式,下面首先对第一种训练方式进行介绍,具体可以通过下述步骤一至步骤四实现。
步骤一,获取包含第一对象全部部位的第一样本图像。
上述第一对象可以是人物、动物或其他任何物品等;上述第一样本图像可以是包含第一对象的照片、图片或视频图像等。为方便说明,以第一对象是行人为例进行说明,该方式中,为了训练得到第一神经网络模型,首先获取包含行人全部部位的第一样本图像,即该第一样本图像是该行人的全身图像。
步骤二,裁切第一样本图像中,包含第一对象至少一部分部位的指定区域,得到第二样本图像,以及第二样本图像的裁切比例和参考填充分界标识。
上述至少一部分部位可以是第一样本图像中行人的任一部分部位,比如,可以是行人的下半身等,在实际实现时,当获取到上述第一样本图像后,通常需要随机裁切该第一样本图像,得到裁切后的第二样本图像及相应的裁切比例和参考填充分界标识;比如,裁切第一样本图像后,得到包含行人头部和上半身的第二样本图像,相应的裁切比例为30%,参考填充分界标识为第二样本图像的边界下方。
步骤三,将第二样本图像输入至初始第一神经网络模型中,以通过初始第一神经网络模型输出第二样本图像的初始可见区域占比和初始填充分界标识。
在实际实现时,当得到第二样本图像后,通常会将该第二样本图像调整至预设尺寸,然后将调整尺寸后的第二样本图像输入至初始第一神经网络模型中,以通过该初始第一神经网络模型输出第二样本图像的初始可见区域占比和初始填充分界标识。
步骤四,基于初始可见区域占比、初始填充分界标识、裁切比例,和参考填充分界标识确定第一损失值,基于第一损失值更新初始第一神经网络模型的权重参数;继续执行获取包含第一对象全部部位的第一样本图像的步骤,直至初始第一神经网络模型收敛,得到第一神经网络模型。
该方式中,对第一神经网络模型的训练过程可以以随机裁切过程中获取的裁切比例和参考填充分界标识进行监督,基于初始可见区域占比、初始填充分界标识、该裁切比例和参考填充分界标识确定第一损失值,基于该第一损失值更新初始第一神经网络模型的权重参数;继续执行获取包含行人全部部位的第一样本图像的步骤,直至初始第一神经网络模型收敛,得到该第一神经网络模型。
上述对第一神经网络模型的第一种训练方式中,第一神经网络模型采用自学习的方式进行训练,在输入初始第一神经网络模型前,可以对每张图像随机裁切行人下半身对应的图像区域,并将裁切后的图像调整到统一大小,同时记录下裁切比例和参考填充分界标识,该裁切比例可以以r表示,将该裁切比例r和参考填充分界标识作为初始第一神经网络模型的GT(Ground Truth,表示有监督学习的训练集的分类准确性,用于证明或者推翻某个假设)。通过训练好的第一神经网络模型就可以预测出第一图像的可见区域占比和填充分界标识。
下面对第一神经网络模型的第二种训练方式进行介绍,具体可以通过下述步骤五至步骤七实现。
步骤五,获取包含第二对象的第三样本图像,以及第二对象对应的全部部位检测框和可见部 位检测框。
上述第二对象可以是人物、动物或其他任何物品等;上述第三样本图像可以是包含第二对象的照片、图片或视频图像等。为方便说明,以第二对象是行人为例进行说明,该方式中,为了训练得到第一神经网络模型,首先获取包含行人的第三样本图像,以及该第三样本图像中包含行人全部部位时所对应的全部部位检测框,以及仅包含行人可见部位的可见部位检测框,该第三样本图像可以是全景图。
步骤六,将第三样本图像输入至初始第一神经网络模型中,以通过初始第一神经网络模型输出第二对象对应的包含全部部位的第一检测框,以及包含可见部位的第二检测框,基于第一检测框和第二检测框,确定第二对象的初始可见区域占比和初始填充分界标识。
在实际实现时,当获取到上述第三样本图像后,将该第三样本图像输入至初始第一神经网络模型中,通过该初始第一神经网络模型输出该行人对应的包含全部部位的第一检测框,以及包含可见部位的第二检测框,其中,该第一检测框也可以称为初始全身框,该第二检测框也可以称为初始可见框等,基于该第二检测框与第一检测框之间的比例和相对位置,确定该第三样本图像中行人的初始可见区域占比和初始填充分界标识。
步骤七,基于初始可见区域占比、初始填充分界标识、全部部位检测框和可见部位检测框,确定第二损失值,基于第二损失值更新初始第一神经网络模型的权重参数;继续执行获取包含第二对象的第三样本图像的步骤,直至初始第一神经网络模型收敛,得到第一神经网络模型。
该方式中,对第一神经网络模型的训练过程可以以该行人的全部部位检测框,及对应的可见部位检测框或行人分割结果等进行监督,结合初始可见区域占比和初始填充分界标识确定第二损失值,基于该第二损失值更新初始第一神经网络模型的权重参数;继续执行获取包含行人的第三样本图像的步骤,直至初始第一神经网络模型收敛,得到该第一神经网络模型。通过训练好的该第一神经网络模型就可以预测出第一图像的可见区域占比和填充分界标识。
上述对第一神经网络模型的第二种训练方式,可以将第一神经网络模型集成到行人检测模型中,其中,该行人检测模型可以采用相关技术中的模型结构,通过该行人检测模型可以在预测出行人全身框的同时,预测出该行人的可见框,再根据可见框和全身框的比例,计算出该第三样本图像的可见区域占比和填充分界标识。
步骤S306,如果可见区域占比小于1,确定第一图像中目标对象的可见部位不包含目标对象的全部部位,基于可见区域占比和填充分界标识,对第一图像进行形变和填充处理,以使第一图像中包含的目标对象的可见部位在第一图像中的相对位置,满足指定标准。
在实际实现时,通过第一神经网络模型确定的第一图像的可见区域占比可能小于1,也可能等于1,如果可见区域占比等于1,可以理解为第一图像中目标对象的可见部位包含目标对象的全部部位,可以不用填充,或者填充区域的面积为0,如果可见区域占比小于1,可以理解为第一图像中目标对象的可见部位不包含目标对象的全部部位;比如,以目标对象是行人为例,如果可见区域占比等于1,表示第一图像是该行人的全身图像,如果可见区域占比小于1,表示第一 图像是该行人的半身图像,该半身图像可能是只包含该行人的头部,或者只包含该行人的头部和上半身等。
如果确定第一图像中目标对象的可见部位不包含目标对象的全部部位,该步骤S306可以通过下述步骤八和步骤九来实现对第一图像的形变和填充处理:
步骤八,基于可见区域占比,调整第一图像的尺寸,以使第一图像中包含的目标对象的可见部位在第一图像中的相对位置,满足指定标准。
在实际实现时,如果可见区域占比小于1,可以根据该可见区域占比,调整第一图像的尺寸,比如,以目标对象是行人,第一图像中该行人的可见部位包括头部和上半身,可见区域占比是0.7,第一图像的尺寸是256*128像素尺寸为例,由于缺少的是行人的下半身,可以按照可见区域占比,对第一图像的长度方向的尺寸进行调整,调整后的第一图像的长度方向的尺寸为256*0.7=179.2,宽度方向的尺寸保持不变,即调整后的第一图像的尺寸为179.2*128,调整后的第一图像中包含的行人的可见部位的在第一图像中的相对位置,满足上述指定标准,该指定标准包括:当第一图像中包含该行人的全部部位时,上述可见部位即头部和上半身在第一图像中的相对位置。
步骤九,基于填充分界标识,对尺寸调整后的第一图像的不可见部位所对应的区域进行填充处理,以将第一图像的尺寸恢复至尺寸调整之前的尺寸。
上述不可见部位可以理解为第一图像中目标对象的除可见部位以外的其他部位,比如,以目标对象是行人,第一图像中该行人的可见部位包括头部和上半身为例,则不可见部位即为该行人的全部身体部位中除头部和上半身以外的其他部位;在实际实现时,由于填充分界标识可以指示第一图像中目标对象的不可见部位的位置,因此,可以基于该填充分界标识,对尺寸调整后的第一图像的不可见部位所对应的区域进行填充处理,比如,仍以目标对象是行人,第一图像中该行人的可见部位包括头部和上半身,可见区域占比是0.7,第一图像的尺寸是256*128像素尺寸为例,则第一图像中该行人的不可见部位包括下半身,该第一图像的填充分界标识所指示的该行人的不可见部位的位置在该第一图像的边界下方,由于可见区域占比是0.7,则不可见部位所对应的区域占比为1-0.7=0.3,相应的不可见部位所对应的区域在长度方向的尺寸是256*0.3=76.8,宽度方向的尺寸不变,则不可见部位所对应的区域的尺寸为76.8*128;因此可以在尺寸调整后的第一图像的边界下方,对该不可见部位所对应的区域进行填充处理,当完成填充处理后,填充后的第一图像在长度方向上的尺寸为179.2+76.8=256,宽度方向的尺寸不变,因此,填充后的第一图像的尺寸与尺寸调整之前的第一图像的尺寸相同,即填充后的第一图像的尺寸恢复为256*128。
该步骤S306还可以通过下述步骤十和步骤十一来实现对第一图像的形变和填充处理:
步骤十,基于可见区域占比和填充分界标识,对第一图像的不可见部位所对应的区域进行填充处理。
在实际实现时,可以基于上述步骤所确定的可见区域占比和填充分界标识,先对第一图像的不可见部位所对应的区域进行填充,比如,仍以目标对象是行人,第一图像中该行人的可见部位包括头部和上半身,可见区域占比是0.7,第一图像的尺寸是256*128像素尺寸为例,则第一图 像中该行人的不可见部位包括下半身,该第一图像的填充分界标识所指示的该行人的不可见部位的位置在该第一图像的边界下方,由于可见区域占比是0.7,则不可见部位所对应的区域占比为1-0.7=0.3,相应的不可见部位所对应的区域在长度方向的尺寸是256/0.7*0.3=109.7,宽度方向的尺寸不变,则不可见部位所对应的区域的尺寸为109.7*128,在第一图像的边界下方,对该不可见部位所对应的区域进行填充处理,当完成填充处理后,填充后的第一图像在长度方向上的尺寸即为256+109.7=365.7,宽度方向的尺寸不变,即填充后的第一图像的尺寸为365.7*128。
步骤十一,调整填充处理后的第一图像的尺寸,以将填充处理后的第一图像的尺寸恢复至填充处理之前的尺寸,且调整后的第一图像中包含的目标对象的可见部位在调整后的第一图像中的相对位置,满足指定标准。
在实际实现时,可以基于上述可见区域占比,调整填充处理后的第一图像的尺寸,继续以上述步骤十中的示例进行说明,由于可见区域占比是0.7,填充后的第一图像的尺寸为365.7*128,并且是在第一图像的边界下方进行的填充处理,因此,基于该可见区域占比,对填充处理后的第一图像的长度方向的尺寸进行调整,调整后第一图像在长度方向上的尺寸即为365.7*0.7=256,宽度方向上的尺寸保持不变,仍为128,即通过该调整处理,使填充处理后的第一图像的尺寸与填充处理之前的第一图像的尺寸相同,仍为256*128,并且,尺寸调整后的第一图像中包含的行人的可见部位在调整后的第一图像中的相对位置,满足指定标准,该指定标准包括:当第一图像中包含该行人的全部部位时,上述可见部位即头部和上半身在第一图像中的相对位置。
步骤S308,从处理后的第一图像中提取目标对象的对象特征,基于对象特征识别目标对象。
本公开实施例提供的对象识别方法,首先获取包含目标对象的第一图像;将第一图像输入至预先训练好的第一神经网络模型,通过第一神经网络模型识别第一图像中,目标对象的可见部位,基于可见部位确定第一图像的可见区域占比和填充分界标识;如果可见区域占比小于1,确定第一图像中目标对象的可见部位不包含目标对象的全部部位,基于可见区域占比和填充分界标识,对第一图像进行形变和填充处理,以使第一图像中包含的目标对象的可见部位在第一图像中的相对位置,满足指定标准;最后从处理后的第一图像中提取目标对象的对象特征,基于对象特征识别目标对象。该方式中,当图像中的目标对象的部位不完整时,对图像进行形变和填充处理,使目标对象中各个部位在图像中的相对位置与图像包含目标对象的完整部位时的相对位置相匹配,直接从处理后的图像中提取目标对象的对象特征,即可识别目标对象,无需进行各个部位的局部分割和识别,降低了对象识别的计算复杂度,有利于大规模部署。
下面将参照附图对根据本公开的又一些实施例所提供的对象识别方法进行详细地描述。
本公开实施例还提供又一种对象识别方法,该方法在上述实施例方法的基础上实现;该方法重点描述从处理后的第一图像中提取目标对象的对象特征,基于对象特征识别目标对象的具体实现过程,如图4所示,该方法可以包括如下步骤:
步骤S402,获取包含目标对象的第一图像。
步骤S404,如果第一图像中目标对象的可见部位不包含目标对象的全部部位,对第一图像进行形变和填充处理,以使第一图像中包含的目标对象的可见部位在第一图像中的相对位置,满足指定标准;指定标准包括:当第一图像中包含目标对象的全部部位时,可见部位在第一图像中的相对位置。
步骤S406,通过第二神经网络模型,从处理后的第一图像中提取目标对象的对象特征;其中,对象特征包括目标对象的可见部位的特征。
上述第二神经网络模型也可以称为行人重识别模型,该模型可以通过多种卷积神经网络实现,如残差网络、VGG网络等;下面对第二神经网络模型的训练方式进行介绍,具体可以通过下述步骤十五至步骤十九实现。
步骤十五,获取包含第三对象全部部位的第四样本图像,以及第三对象的目标特征。
上述第三对象可以是人物、动物或其他任何物品等;上述第四样本图像可以是包含第三对象的照片、图片或视频图像等。为方便说明,以第三对象是行人为例进行说明,参见图5所示的一种图像预处理过程的示意图。如图5所示,该方式中,为了训练得到第二神经网络模型,首先获取包含行人全部部位的训练数据original,即该训练数据original是该行人的全身图像,对应上述第四样本图像。上述目标特征可以是行人的性别特征、年龄特征、衣服颜色特征或相貌特征等。
步骤十六,裁切第四样本图像中,包含第三对象至少一部分部位的指定区域,得到第五样本图像。
上述至少一部分部位可以是第四样本图像中行人的任一部分部位,比如,可以是行人的下半身等,在实际实现时,当获取到上述第四样本图像后,即获取到训练数据original后,可以随机裁切该训练数据original下半身得到partial图像,该partial图像即为该行人的半身图像,对应上述第五样本图像。
步骤十七,对第五样本图像进行填充处理,得到第六样本图像;其中,第三对象的指定部位在第六样本图像中的相对位置,与第三对象的指定部位在第四样本图像中的相对位置相匹配。
在实际实现时,当得到第五样本图像,即得到上述partial图像后,可以用数值v填充partial图像,得到pad,其中,填充数值v可以选择0,下边界数值(replica),128,(103.939,116.779,123.68)等。通常会将其尺寸恢复至填充之前的尺寸,该行人的可见部位在填充处理后的partial图像中的相对位置,与相应的可见部位在训练数据original中的相对位置相同,即填充处理后partial图像中的可见部位,与训练数据original的全身图像中的可见部位,如头部、肩部等身体部位的是对齐的,各个可见部位的尺寸通常也可以对应相同,在对齐后,模型的数据输入分布更统一,可以减小输入的噪声水平,例如对网络来说,半身图像通常总会包括行人的头部和肩部等,网络就可以利用这种空间模式,学习对应位置的辨别能力,上述填充处理后的partial图像对应上述第六样本图像。
对填充后的partial图像通常还会进行形变处理,将其尺寸调整至指定大小,该指定大小受算力约束,尺寸不是固定不变的,一般包含人体的图像尺寸可以为256x128或384x192等。
步骤十八,将第六样本图像输入至初始第二神经网络模型中,以通过初始第二神经网络模型输出第六样本图像中,第三对象的初始特征。
在实际实现时,在对训练数据original完成上述形变和填充处理后,即可输入到初始第二神经网络模型,通过该初始第二神经网络模型输出该处理后的训练数据original的初始特征,在训练过程中可以不需要监督信息,也可以有监督信息,如果有监督信息,则相应的监督信息可以是图像中行人的目标特征。
步骤十九,基于初始特征和目标特征确定第三损失值,基于第三损失值更新初始第二神经网络模型的权重参数;继续执行获取包含第三对象全部部位的第四样本图像的步骤,直至初始第二神经网络模型收敛,得到第二神经网络模型。
在实际实现时,在得到处理后的训练数据original中行人的初始特征后,可以基于初始特征和该目标特征,确定第三损失值,该第三损失值可以用于指示初始特征与目标特征之间的差距;可以基于该第三损失值更新初始第二神经网络模型的权重参数;继续执行获取包含第三对象全部部位的第四样本图像,以及第三对象的目标特征的步骤,直至初始第二神经网络模型收敛,该训练过程中,需要多个训练数据original,可以从预设数据集中获取得到多个训练数据original,每个训练数据original都需要经过上述形变和填充处理的过程,即,采用“pad augmentation”的预训练方式,得到该第二神经网络模型。
在实际实现时,可以将处理后的第一图像,输入至训练好的第二神经网络模型,通过该第二神经网络模型输出目标对象的对象特征,所提取的对象特征中通常包括该目标对象的可见部位的特征。
步骤S408,计算目标对象的对象特征与预设参考图像中指定对象的对象特征之间的特征距离,确定目标对象和指定对象是否为同一对象。
上述指定对象可以理解为在进行对象识别时,希望识别到的对象;上述预设参考图像可以是预先获取到的包含该指定对象的图像,并且通常预先获取到了该指定对象的对象特征;在实际实现时,可以从处理后的第一图像中提取目标对象的对象特征,检测该目标对象的对象特征与指定对象的对象特征之间的特征距离,并根据该特征距离来判断目标对象和指定对象的相似度,进而确认该目标对象和指定对象是否为同一对象,比如,当该特征距离小于或等于预设阈值时,判断目标对象和指定对象为同一对象;当该特征距离大于预设阈值时,判断目标对象和指定对象不是同一对象。
如果参考图像中指定对象的可见部位包含指定对象的全部部位,可以基于相关技术提取该指定对象的对象特征,如果参考图像中指定对象的可见部位不包含指定对象的全部部位,则参考图像中指定对象的对象特征可以通过下述步骤二十和步骤二十一确定:
步骤二十,如果参考图像中指定对象的可见部位不包含指定对象的全部部位,对参考图像进行形变和填充处理,以使参考图像中包含的指定对象的可见部位在参考图像中的相对位置,满足预设标准;预设标准包括:当参考图像中包含指定对象的全部部位时,可见部位在参考图像中的 相对位置。
上述参考图像中指定对象的可见部位可能包含该指定对象的全部部位,也可能没有包含该指定对象的全部部位,如果参考图像中指定对象的可见部位不包含该指定对象的全部部位,考虑到当参考图像中包含指定对象的全部部位时,相应的可见部位的在参考图像中的相对位置通常会有差异,因此,可以对该参考图像进行形变处理和填充处理,使处理后的参考图像中包含的指定对象的可见部位在参考图像中的相对位置,与当参考图像中包含指定对象的全部部位时,相应的可见部位在第一图像中的相对位置相匹配;比如,该参考图像是包含行人的头部和上半身的半身图像,则可以对该半身图像进行压缩和填充处理,处理后的半身图像中该行人的头部和上半身在参考图像中的相对位置,与当该参考图像中包含该行人的全部身体部位时,该行人的头部和上半身在参考图像中的相对位置相同或对齐,且头部和上半身分别对应的尺寸通常也可以相同。
步骤二十一,从处理后的参考图像中提取指定对象的对象特征。
当对获取到的参考图像进行形变和填充处理后,就可以从处理后的参考图像中提取出指定对象相应的对象特征。
本公开实施例提供的对象识别方法,首先获取包含目标对象的第一图像;如果第一图像中目标对象的可见部位不包含目标对象的全部部位,对第一图像进行形变和填充处理,以使第一图像中包含的目标对象的可见部位在第一图像中的相对位置,满足指定标准;通过第二神经网络模型,从处理后的第一图像中提取目标对象的对象特征。计算目标对象的对象特征与预设参考图像中指定对象的对象特征之间的特征距离,确定目标对象和指定对象是否为同一对象。该方式中,当图像中的目标对象的部位不完整时,对图像进行形变和填充处理,使目标对象中各个部位在图像中的相对位置与图像包含目标对象的完整部位时的相对位置相匹配,直接从处理后的图像中提取目标对象的对象特征,即可识别目标对象,无需进行各个部位的局部分割和识别,降低了对象识别的计算复杂度,有利于大规模部署。
为进一步理解上述实施例,下面以第一图像是行人图像为例,对该对象识别方法作进一步说明,在实际使用时,可以采用自学习的方式训练一个可见度预测模型(对应上述第一神经网络模型),用以感知行人图像中行人的可见部位;然后在训练和测试行人重识别模型(对应上述第二神经网络模型)时,对行人图像进行预处理,根据可见度预测模型预测出的可见部位,对相应的不可见部位用统一数值填充,以保证行人图像长宽比不变,且可见部位与当行人图像中包含该行人的全部部位时,相应的可见部位是对齐的。需要说明的是,考虑到可见度预测模型要求的图像大小可能小于行人重识别模型要求的图像大小,为了保证图像的信息量不会降低,上述可见度预测模型和行人重识别模型的训练过程通常是采用并行结构并行进行,一般不采用串行结构串行进行。
使用本方法训练的行人重识别模型,可以将半身图像和全身图像映射到同一特征子空间,再在该特征子空间中,对该半身图像和全身图像进行相似度比较,使用过程中每张图像只需要抽取 一次特征,不论图像是半身图像还是全身图像,都使用这个单一的全局特征进行匹配,计算图像间的相似度,即可同时完成与全身图像和半身图像的匹配,不需要逐个提取局部特征,由于每张图像仅需要提取一次特征,所以降低了模型复杂度,且样本间距离仍然保持两两之间计算一次,距离计算复杂度最低,降低了样本间距离计算的复杂度,方便大规模部署。
相关技术中,半身图像的特征容易形成一个较独立的特征子空间,导致半身图像之间很接近,半身图像和对应的全身图像反而差别很大。本方案将半身图像整体分布调整到和全身图像一致,图像中每个行人可以有该行人对应的身份ID,每个ID可以有多张图像,每个ID的半身图像就可以回到属于自己ID的特征子空间之中,从而提高同ID的半身图像的召回,同时减少不同ID间半身图像中行人的误识别。另外,该方式可以采用自学习的训练方案,不需要额外的标注信息,如人体部件标注、人体姿态标注或人体可见部位标注等,从而进一步简化了对象识别的处理过程。
下面将参照附图对根据本公开的一些实施例所提供的对象识别装置进行详细地描述。
对应于上述方法实施例,参见图6所示的一种对象识别装置的结构示意图,该装置可以包括:获取模块60,配置成用于获取包含目标对象的第一图像;处理模块61,配置成用于如果第一图像中目标对象的可见部位不包含目标对象的全部部位,对第一图像进行形变和填充处理,以使第一图像中包含的目标对象的可见部位在第一图像中的相对位置,满足指定标准;指定标准包括:当第一图像中包含目标对象的全部部位时,可见部位在第一图像中的相对位置;识别模块62,配置成用于从处理后的第一图像中提取目标对象的对象特征,基于对象特征识别目标对象。
本公开实施例提供的对象识别装置,首先获取包含目标对象的第一图像;如果第一图像中目标对象的可见部位不包含目标对象的全部部位,对该第一图像进行形变和填充处理,以使该第一图像中包含的目标对象的可见部位在第一图像中的相对位置满足指定标准;该指定标准包括:当第一图像中包含目标对象的全部部位时,可见部位在第一图像中的相对位置;最后从处理后的第一图像中提取目标对象的对象特征,基于对象特征识别目标对象。该装置中,当图像中的目标对象的部位不完整时,对图像进行形变和填充处理,使目标对象中各个部位在图像中的相对位置与图像包含目标对象的完整部位时的相对位置相匹配,直接从处理后的图像中提取目标对象的对象特征,即可识别目标对象,无需进行各个部位的局部分割和识别,降低了对象识别的计算复杂度,有利于大规模部署。
可选地,处理模块61还可以配置成用于:将第一图像输入至预先训练好的第一神经网络模型,通过第一神经网络模型识别第一图像中,目标对象的可见部位,基于可见部位确定第一图像的可见区域占比和填充分界标识;其中,填充分界标识用于:指示第一图像中目标对象的不可见部位的位置;如果可见区域占比小于1,确定第一图像中目标对象的可见部位不包含目标对象的全部部位,基于可见区域占比和填充分界标识,对第一图像进行形变和填充处理,以使第一图像中包含的目标对象的可见部位在第一图像中的相对位置,满足指定标准。
可选地,处理模块61还可以配置成用于:基于可见区域占比,调整第一图像的尺寸,以使 第一图像中包含的目标对象的可见部位在第一图像中的相对位置,满足指定标准;基于填充分界标识,对尺寸调整后的第一图像的不可见部位所对应的区域进行填充处理,以将第一图像的尺寸恢复至尺寸调整之前的尺寸。
可选地,处理模块61还可以配置成用于:基于可见区域占比和填充分界标识,对第一图像的不可见部位所对应的区域进行填充处理;调整填充处理后的第一图像的尺寸,以将填充处理后的第一图像的尺寸恢复至填充处理之前的尺寸,且调整后的第一图像中包含的目标对象的可见部位在调整后的第一图像中的相对位置,满足指定标准。
可选地,识别模块62还可以配置成用于:通过第二神经网络模型,从处理后的第一图像中提取目标对象的对象特征;其中,对象特征包括目标对象的可见部位的特征;计算目标对象的对象特征与预设参考图像中指定对象的对象特征之间的特征距离,确定目标对象和指定对象是否为同一对象。
可选地,该装置还可以包括第一确定模块,预先训练好的第一神经网络模型通过该第一确定模块确定,该第一确定模块配置成用于:获取包含第一对象全部部位的第一样本图像;裁切第一样本图像中,包含第一对象至少一部分部位的指定区域,得到第二样本图像,以及第二样本图像的裁切比例和参考填充分界标识;将第二样本图像输入至初始第一神经网络模型中,以通过初始第一神经网络模型输出第二样本图像的初始可见区域占比和初始填充分界标识;基于初始可见区域占比、初始填充分界标识、裁切比例,和参考填充分界标识确定第一损失值,基于第一损失值更新初始第一神经网络模型的权重参数;继续执行获取包含第一对象全部部位的第一样本图像的步骤,直至初始第一神经网络模型收敛,得到第一神经网络模型。
可选地,该装置还可以包括第二确定模块,预先训练好的第一神经网络模型通过该第二确定模块确定,该第二确定模块配置成用于:获取包含第二对象的第三样本图像,以及第二对象对应的全部部位检测框和可见部位检测框;将第三样本图像输入至初始第一神经网络模型中,以通过初始第一神经网络模型输出第二对象对应的包含全部部位的第一检测框,以及包含可见部位的第二检测框,基于第一检测框和第二检测框,确定第二对象的初始可见区域占比和初始填充分界标识;基于初始可见区域占比、初始填充分界标识、全部部位检测框和可见部位检测框,确定第二损失值,基于第二损失值更新初始第一神经网络模型的权重参数;继续执行获取包含第二对象的第三样本图像的步骤,直至初始第一神经网络模型收敛,得到第一神经网络模型。
可选地,该装置还可以包括第三确定模块,第二神经网络模型通过该第三确定模块确定,该第三确定模块配置成用于:获取包含第三对象全部部位的第四样本图像,以及第三对象的目标特征;裁切第四样本图像中,包含第三对象至少一部分部位的指定区域,得到第五样本图像;对第五样本图像进行填充处理,得到第六样本图像;其中,第三对象的指定部位在第六样本图像中的相对位置,与第三对象的指定部位在第四样本图像中的相对位置相匹配;将第六样本图像输入至初始第二神经网络模型中,以通过初始第二神经网络模型输出第六样本图像中,第三对象的初始特征;基于初始特征和目标特征确定第三损失值,基于第三损失值更新初始第二神经网络模型的 权重参数;继续执行获取包含第三对象全部部位的第四样本图像的步骤,直至初始第二神经网络模型收敛,得到第二神经网络模型。
本公开实施例所提供的对象识别装置,其实现原理及产生的技术效果和前述对象识别方法实施例相同,为简要描述,对象识别装置实施例部分未提及之处,可参考前述对象识别方法实施例中相应内容。
下面将对根据本公开的一些实施例所提供的电子系统进行详细地描述。
本公开实施例还提供了一种电子系统,该电子系统可以包括:图像采集设备、处理设备和存储装置;图像采集设备,用于获取预览视频帧或图像数据;存储装置上存储有计算机程序,计算机程序在被处理设备运行时执行如上述对象识别方法的步骤。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的电子系统的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
本公开实施例还提供了一种计算机可读存储介质,计算机可读存储介质上存储有计算机程序,计算机程序被处理设备运行时执行如上述对象识别方法的步骤。
本公开实施例所提供的对象识别方法、装置和电子系统的计算机程序产品,包括存储了程序代码的计算机可读存储介质,所述程序代码包括的指令可用于执行前面方法实施例中所述的方法,具体实现可参见方法实施例,在此不再赘述。
所属领域的技术人员可以清楚地了解到,为描述的方便和简洁,上述描述的系统和/或装置的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。
另外,在本公开实施例的描述中,除非另有明确的规定和限定,术语“安装”、“相连”、“连接”应做广义理解,例如,可以是固定连接,也可以是可拆卸连接,或一体地连接;可以是机械连接,也可以是电连接;可以是直接相连,也可以通过中间媒介间接相连,可以是两个元件内部的连通。对于本领域的普通技术人员而言,可以具体情况理解上述术语在本公开中的具体含义。
所述功能如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本公开的技术方案本质上或者说对相关技术做出贡献的部分或者该技术方案的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本公开各个实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
在本公开的描述中,需要说明的是,术语“中心”、“上”、“下”、“左”、“右”、“竖直”、“水平”、“内”、“外”等指示的方位或位置关系为基于附图所示的方位或位置关系,仅是为了便于描述本公开和简化描述,而不是指示或暗示所指的装置或元件必须具有特定的方位、以特定的方位构造和操作,因此不能理解为对本公开的限制。此外,术语“第一”、“第二”、“第三”仅用于描 述目的,而不能理解为指示或暗示相对重要性。
最后应说明的是:以上各实施例仅用以说明本公开的技术方案,而非对其限制;尽管参照前述各实施例对本公开进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本公开各实施例技术方案的范围。
工业实用性
本公开提供了一种对象识别方法、装置和电子系统,获取包含目标对象的第一图像;如果第一图像中目标对象的可见部位不包含目标对象的全部部位,对该第一图像进行形变和填充处理,以使第一图像中包含的目标对象的可见部位在第一图像中的相对位置满足指定标准;再从处理后的第一图像中提取目标对象的对象特征,进而识别目标对象。该方式中,当图像中的目标对象的部位不完整时,对图像进行形变和填充处理,使目标对象中各个部位在图像中的相对位置与图像包含目标对象的完整部位时的相对位置相匹配,直接从处理后的图像中提取目标对象的对象特征,即可识别目标对象,无需进行各个部位的局部分割和识别,降低了对象识别的计算复杂度,有利于大规模部署。
此外,可以理解的是,本公开的对象识别方法、装置和电子系统是可以重现的,并且可以应用在多种应用中。例如,本公开的相对象识别方法、装置和电子系统可以应用于图像处理技术领域等。

Claims (11)

  1. 一种对象识别方法,其特征在于,所述方法包括:
    获取包含目标对象的第一图像;
    如果所述第一图像中所述目标对象的可见部位不包含所述目标对象的全部部位,对所述第一图像进行形变和填充处理,以使所述第一图像中包含的所述目标对象的可见部位在所述第一图像中的相对位置,满足指定标准;所述指定标准包括:当所述第一图像中包含所述目标对象的全部部位时,所述可见部位在所述第一图像中的相对位置;
    从处理后的所述第一图像中提取所述目标对象的对象特征,基于所述对象特征识别所述目标对象。
  2. 根据权利要求1所述的方法,其特征在于,所述如果所述第一图像中所述目标对象的可见部位不包含所述目标对象的全部部位,对所述第一图像进行形变和填充处理,以使所述第一图像中包含的所述目标对象的可见部位在所述第一图像中的相对位置,满足指定标准的步骤包括:
    将所述第一图像输入至预先训练好的第一神经网络模型,通过所述第一神经网络模型识别所述第一图像中,所述目标对象的可见部位,基于所述可见部位确定所述第一图像的可见区域占比和填充分界标识;其中,所述填充分界标识用于:指示所述第一图像中所述目标对象的不可见部位的位置;
    如果所述可见区域占比小于1,确定所述第一图像中所述目标对象的可见部位不包含所述目标对象的全部部位,基于所述可见区域占比和所述填充分界标识,对所述第一图像进行形变和填充处理,以使所述第一图像中包含的所述目标对象的可见部位在所述第一图像中的相对位置,满足指定标准。
  3. 根据权利要求2所述的方法,其特征在于,所述基于所述可见区域占比和所述填充分界标识,对所述第一图像进行形变和填充处理,以使所述第一图像中包含的所述目标对象的可见部位在所述第一图像中的相对位置,满足指定标准的步骤包括:
    基于所述可见区域占比,调整所述第一图像的尺寸,以使所述第一图像中包含的所述目标对象的可见部位在所述第一图像中的相对位置,满足指定标准;
    基于所述填充分界标识,对尺寸调整后的所述第一图像的不可见部位所对应的区域进行填充处理,以将所述第一图像的尺寸恢复至尺寸调整之前的尺寸。
  4. 根据权利要求2所述的方法,其特征在于,基于所述可见区域占比和所述填充分界标识,对所述第一图像进行形变和填充处理,以使所述第一图像中包含的所述目标对象的可见部位在所述第一图像中的相对位置,满足指定标准的步骤包括:
    基于所述可见区域占比和所述填充分界标识,对所述第一图像的不可见部位所对应的区域进行填充处理;
    调整填充处理后的所述第一图像的尺寸,以将填充处理后的所述第一图像的尺寸恢复至填充 处理之前的尺寸,且调整后的所述第一图像中包含的所述目标对象的可见部位在调整后的所述第一图像中的相对位置,满足指定标准。
  5. 根据权利要求1至4中的任一项所述的方法,其特征在于,所述从处理后的所述第一图像中提取所述目标对象的对象特征,基于所述对象特征识别所述目标对象的步骤包括:
    通过第二神经网络模型,从处理后的所述第一图像中提取所述目标对象的对象特征;其中,所述对象特征包括所述目标对象的可见部位的特征;
    计算所述目标对象的对象特征与预设参考图像中指定对象的对象特征之间的特征距离,确定所述目标对象和所述指定对象是否为同一对象。
  6. 根据权利要求2至5中的任一项所述的方法,其特征在于,所述预先训练好的第一神经网络模型,通过下述方式确定:
    获取包含第一对象全部部位的第一样本图像;
    裁切所述第一样本图像中,包含所述第一对象至少一部分部位的指定区域,得到第二样本图像,以及所述第二样本图像的裁切比例和参考填充分界标识;
    将所述第二样本图像输入至初始第一神经网络模型中,以通过所述初始第一神经网络模型输出所述第二样本图像的初始可见区域占比和初始填充分界标识;
    基于所述初始可见区域占比、所述初始填充分界标识、所述裁切比例,和所述参考填充分界标识确定第一损失值,基于所述第一损失值更新所述初始第一神经网络模型的权重参数;继续执行获取包含第一对象全部部位的第一样本图像的步骤,直至所述初始第一神经网络模型收敛,得到所述第一神经网络模型。
  7. 根据权利要求2至5中的任一项所述的方法,其特征在于,所述预先训练好的第一神经网络模型,通过下述方式确定:
    获取包含第二对象的第三样本图像,以及所述第二对象对应的全部部位检测框和可见部位检测框;
    将所述第三样本图像输入至初始第一神经网络模型中,以通过所述初始第一神经网络模型输出所述第二对象对应的包含全部部位的第一检测框,以及包含可见部位的第二检测框,基于所述第一检测框和所述第二检测框,确定所述第二对象的初始可见区域占比和初始填充分界标识;
    基于所述初始可见区域占比、初始填充分界标识、所述全部部位检测框和所述可见部位检测框,确定第二损失值,基于所述第二损失值更新所述初始第一神经网络模型的权重参数;继续执行获取包含第二对象的第三样本图像的步骤,直至所述初始第一神经网络模型收敛,得到所述第一神经网络模型。
  8. 根据权利要求5至7中的任一项所述的方法,其特征在于,所述第二神经网络模型,通过下述方式确定:
    获取包含第三对象全部部位的第四样本图像,以及所述第三对象的目标特征;
    裁切所述第四样本图像中,包含所述第三对象至少一部分部位的指定区域,得到第五样本图 像;
    对所述第五样本图像进行填充处理,得到第六样本图像;其中,所述第三对象的指定部位在所述第六样本图像中的相对位置,与所述第三对象的指定部位在所述第四样本图像中的相对位置相匹配;
    将所述第六样本图像输入至初始第二神经网络模型中,以通过所述初始第二神经网络模型输出所述第六样本图像中,所述第三对象的初始特征;
    基于所述初始特征和所述目标特征确定第三损失值,基于所述第三损失值更新所述初始第二神经网络模型的权重参数;继续执行获取包含第三对象全部部位的第四样本图像的步骤,直至所述初始第二神经网络模型收敛,得到所述第二神经网络模型。
  9. 一种对象识别装置,其特征在于,所述对象识别装置包括:
    获取模块,配置成用于获取包含目标对象的第一图像;
    处理模块,配置成用于如果所述第一图像中所述目标对象的可见部位不包含所述目标对象的全部部位,对所述第一图像进行形变和填充处理,以使所述第一图像中包含的所述目标对象的可见部位在所述第一图像中的相对位置,满足指定标准;所述指定标准包括:当所述第一图像中包含所述目标对象的全部部位时,所述可见部位在所述第一图像中的相对位置;
    识别模块,配置成用于从处理后的所述第一图像中提取所述目标对象的对象特征,基于所述对象特征识别所述目标对象。
  10. 一种电子系统,其特征在于,所述电子系统包括:图像采集设备、处理设备和存储装置;
    所述图像采集设备,用于获取预览视频帧或图像数据;
    所述存储装置上存储有计算机程序,所述计算机程序在被所述处理设备运行时执行如权利要求1至8中的任一项所述的对象识别方法。
  11. 一种计算机可读存储介质,所述计算机可读存储介质上存储有计算机程序,其特征在于,所述计算机程序被处理设备运行时执行如权利要求1至8中的任一项所述的对象识别方法的步骤。
PCT/CN2022/086920 2021-07-05 2022-04-14 对象识别方法、装置和电子系统 WO2023279799A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110756923.XA CN113673308A (zh) 2021-07-05 2021-07-05 对象识别方法、装置和电子系统
CN202110756923X 2021-07-05

Publications (1)

Publication Number Publication Date
WO2023279799A1 true WO2023279799A1 (zh) 2023-01-12

Family

ID=78538588

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/086920 WO2023279799A1 (zh) 2021-07-05 2022-04-14 对象识别方法、装置和电子系统

Country Status (2)

Country Link
CN (1) CN113673308A (zh)
WO (1) WO2023279799A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113673308A (zh) * 2021-07-05 2021-11-19 北京旷视科技有限公司 对象识别方法、装置和电子系统
CN115731517B (zh) * 2022-11-22 2024-02-20 南京邮电大学 一种基于Crowd-RetinaNet网络的拥挤人群检测方法

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150379422A1 (en) * 2014-06-30 2015-12-31 Hewlett-Packard Development Company, L.P. Dataset Augmentation Based on Occlusion and Inpainting
CN111242852A (zh) * 2018-11-29 2020-06-05 奥多比公司 边界感知对象移除和内容填充
CN112801008A (zh) * 2021-02-05 2021-05-14 电子科技大学中山学院 行人重识别方法、装置、电子设备及可读存储介质
CN113673308A (zh) * 2021-07-05 2021-11-19 北京旷视科技有限公司 对象识别方法、装置和电子系统

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20150379422A1 (en) * 2014-06-30 2015-12-31 Hewlett-Packard Development Company, L.P. Dataset Augmentation Based on Occlusion and Inpainting
CN111242852A (zh) * 2018-11-29 2020-06-05 奥多比公司 边界感知对象移除和内容填充
CN112801008A (zh) * 2021-02-05 2021-05-14 电子科技大学中山学院 行人重识别方法、装置、电子设备及可读存储介质
CN113673308A (zh) * 2021-07-05 2021-11-19 北京旷视科技有限公司 对象识别方法、装置和电子系统

Also Published As

Publication number Publication date
CN113673308A (zh) 2021-11-19

Similar Documents

Publication Publication Date Title
US20200364443A1 (en) Method for acquiring motion track and device thereof, storage medium, and terminal
CN110235138B (zh) 用于外观搜索的系统和方法
WO2018188453A1 (zh) 人脸区域的确定方法、存储介质、计算机设备
González-Briones et al. A multi-agent system for the classification of gender and age from images
TWI766201B (zh) 活體檢測方法、裝置以及儲存介質
WO2021139324A1 (zh) 图像识别方法、装置、计算机可读存储介质及电子设备
CN109657533A (zh) 行人重识别方法及相关产品
WO2023279799A1 (zh) 对象识别方法、装置和电子系统
CN110163188B (zh) 视频处理以及在视频中嵌入目标对象的方法、装置和设备
CN109299658B (zh) 脸部检测方法、脸部图像渲染方法、装置及存储介质
CN112232293A (zh) 图像处理模型训练、图像处理方法及相关设备
CN105022999A (zh) 一种人码伴随实时采集系统
CN112800978A (zh) 属性识别方法、部位属性提取网络的训练方法和装置
US11605220B2 (en) Systems and methods for video surveillance
CN111353429A (zh) 基于眼球转向的感兴趣度方法与系统
CN113706550A (zh) 图像场景识别和模型训练方法、装置和计算机设备
CN115410240A (zh) 智能人脸的痘痘和色斑分析方法、装置及存储介质
CN113570615A (zh) 一种基于深度学习的图像处理方法、电子设备及存储介质
CN115115552B (zh) 图像矫正模型训练及图像矫正方法、装置和计算机设备
CN109741243B (zh) 彩色素描图像生成方法及相关产品
CN114387670A (zh) 基于时空特征融合的步态识别方法、装置及存储介质
CN114038045A (zh) 一种跨模态人脸识别模型构建方法、装置及电子设备
CN115830720A (zh) 活体检测方法、装置、计算机设备和存储介质
CN115331315A (zh) 活体检测方法和活体检测装置
CN117152440A (zh) 基于图像分割的对象识别方法及装置、电子设备

Legal Events

Date Code Title Description
NENP Non-entry into the national phase

Ref country code: DE