WO2019212501A1 - Trained recognition models - Google Patents

Trained recognition models Download PDF

Info

Publication number
WO2019212501A1
WO2019212501A1 PCT/US2018/030262 US2018030262W WO2019212501A1 WO 2019212501 A1 WO2019212501 A1 WO 2019212501A1 US 2018030262 W US2018030262 W US 2018030262W WO 2019212501 A1 WO2019212501 A1 WO 2019212501A1
Authority
WO
WIPO (PCT)
Prior art keywords
model
computing device
unique
digital images
unique object
Prior art date
Application number
PCT/US2018/030262
Other languages
French (fr)
Inventor
Arjun Angur PATEL
Original Assignee
Hewlett-Packard Development Company, L.P.
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hewlett-Packard Development Company, L.P. filed Critical Hewlett-Packard Development Company, L.P.
Priority to PCT/US2018/030262 priority Critical patent/WO2019212501A1/en
Publication of WO2019212501A1 publication Critical patent/WO2019212501A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/20Image preprocessing
    • G06V10/25Determination of region of interest [ROI] or a volume of interest [VOI]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • G06V10/7753Incorporation of unlabelled data, e.g. multiple instance learning [MIL]
    • GPHYSICS
    • G06COMPUTING OR CALCULATING; COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks

Definitions

  • Computer vision may include acquiring, analyzing, and understanding digital images at a computing device.
  • CV may involve transforming visual images captured of the real world into descriptions, such as numerical and/or symbolic information, that can be processed and elicit an appropriate operation.
  • computing systems may extract information from digital images and apply models to detect operation triggers.
  • CV may utilize object identification to identify objects in the digital image.
  • Figure 1 illustrates a diagram of an example of a system for training recognition models according to the present disclosure.
  • Figure 2 illustrates a diagram of an example of a processing resource and a non-transitory machine-readable medium for training recognition models according to the present disclosure.
  • Figure 3 illustrates a diagram of an example of a method for training recognition models according to the present disclosure.
  • Deep learning techniques may include machine learning methods that learning data representations, as opposed to task-specific algorithm.
  • the machine learning may be supervised.
  • Deep learning architectures may include neural networks such as artificial neural network with multiple layers between an input and an output.
  • An artificial neural network may include a computing system that progressively improves performance on tasks by considering examples. For example, in object recognition in digital images, a neural network may learn to identify images that contain a person by analyzing example training images that have been manually labeled by a human as "a person” or“not a person.”
  • hundreds, thousands, and/or many more example training images may be analyzed by the neural network. Accordingly, hundreds, thousands, or many more example training images may be manually tagged by a human. As such, training a neural network may be a time consuming and expensive endeavor relying on substantial human intervention.
  • examples of the present disclosure may train C V systems and/or neural networks to detect unique objects within seconds, avoiding the lengthy, expensive, and manual intervention process.
  • the examples of the present disclosure may utilize generic models for deep learning to identify a generic object in a digital image and use that identified object as example training images to seed a more extensive specific model for recognizing the object, such as a person.
  • the examples of the present disclosure may facilitate rapid specific model development that may be utilized to perform tracking of unique object.
  • an example of the present disclosure may include a system including a processing resource and a computing device.
  • the computing device may include instructions executable by the processing resource to: gather digital images of a unique object in a physical environment, utilize a generic model for identifying a type of the object to localize a portion of the unique object in the digital image; and train, from the portion of the unique object localized in the digital images, a new specific model for recognizing the unique object
  • FIG. 1 illustrates a diagram of an example of a system 100 for training recognition models according to the present disclosure.
  • the system 100 may include a computing device 105.
  • the computing device 105 may include a processing resource.
  • the computing device 105 may include a memory resource.
  • the memory resource may include a non-iransiiory machine-readable medium.
  • the instructions may include instructions executable by the processing resources to cause the computing device 105 to execute operations, such as operations associated with training recognition models.
  • An example of a computing device 105 may include a mobile computing device, a smartphone, a robot, a digital image capturing device, a surveillance device, an internet-of-things device, etc.
  • the computing device 105 may capture digital images 102 and/or be associated with a separate device that may capture digital images 102.
  • the digital images may include digital video.
  • the system 100 may be utilized to capture digital images 102.
  • the digital images 102 may include digital video and/or digital still images of objects 103-1...103-N in a rea!-wor!d physical environment.
  • the digital images 102 may include digital images 102 of real world physical objects 103-1...103- N.
  • the objects 103-1...103-N may be people, places, things etc. that are present and/or visible to the digital image capturing device in the physical environment.
  • the computing device may extract digital images 102 as stills from the source video. For example, if the source of the digital images 102 is a ten-second-long digital video, the computing device 105 may extract ten still digital images 102, with one digital image 102 extracted from each one second period of the ten-second-long video.
  • the digital images 102 may be minified.
  • Minification of the digital images may include reformatting and/or resize the digital representation of the image to achieve a compatibility with the other operations of the computing device 105 and/or with the generic model 106 or specific model 110 discussed later.
  • the digital images 102 may include a single object and/or a plurality of objects 103-1... 103-N.
  • the digital images 102 may include different types of objects 103-1...103-N in a same digital image.
  • the digital image may include an object such as a bicycle, an object such as a toy car, an object such as a person, and an object such as an animal in the same digital image 102.
  • the digital Image 102 may include a plurality of the same type of object
  • a same digital image may include a plurality of different people (e.g., 103-2 and 103-4).
  • the computing device 105 may acquire and/or receive the digital images 102 of the physical environment including the objects 103-1... 103-N.
  • the acquisition and/or reception of the digital images 102 may be part of the standard course of operations of the computing device 105. However, in some examples, the acquisition and/or reception of the digital Images 102 may be performed in response to a command.
  • a person in the physical environment may issue a command, such as a voice command, instructing the computing device 105 to train a specific model of an object, such as a voice command to“follow me,”“film me,”“meet our new dog,”“enroll Mabel,” etc.
  • Such a command may trigger the acquisition and/or reception of the digital images 102 [0017]
  • a command may trigger the utilization of the digital images 102, acquired or received in a standard course of operations of the computing device 105, to be utilized in training the specific model of the object where, prior to the command, they might not have been utilized in that manner in either case, digital images 102 may be gathered by the computing device 105, and those digital images 102 may include images of a unique real- world physical object 103-1... 103-N (such as a specific or particular person such as specific person 103-2 or specific person 103-4, a specific place, or a specific thing) that is able to be captured in a digital image 102 by the image capturing device at the physical environment.
  • the computing device 105 may include a generic model 106 to identify objects 103-1... 103-N in the digital image 102.
  • the generic model 106 may be a model that may be utilized to identify a generic type of object in a digital image.
  • An object type may be a classifier such as“person,”“dog,”“bike,” “cat,”“furniture,”“machine component,”“package,” etc. that may be utilized to characterize a type that is applicable to a plurality of individually unique objects that share common characteristics of the type such as human anatomy, dog anatomy, bike structure, cat anatomy, furniture structure, toy car structure, machine component structure, package markings and/or structure, etc
  • a generic model 106 may be generic in that it may be utilized to identify objects 103-1... 103-N that share the common characteristics with the type. However, the generic model 106 may not be trained to identify individual, specific, or unique objects from other objects of the same type. For example, a generic model 106 may be trained to analyze digital images 102 and identify a plurality of people (e.g , person 103-2 and person 103-4) appearing in the digital images 102 as people, but the generic model 106 may not be trained to identify distinct individuals or a specific person (e.g., person 103-2 versus person 103-4) from the plurality of people. That is, the understanding of the generic model 106 may extend so far as what type an object 103-1...
  • the generic mode! 106 may include a mathematical data model that may utilize a plurality of classification layers arranged as a neural network.
  • the generic model 106 may be trained to process digital images 102 and recognize types of objects 103-1... 103-N based on recognizing shapes, colors, shading, textures, structures in the digital image 102 that match the shapes, colors, shading, textures, structures in the classification layers of the generic mode! 106.
  • the generic model 106 may develop these classification layers by analysis and construction of the model from hundreds, thousands, or more example training images that have been manually labeled by a human as“an X” or“not an X,” where X is the type of object that the model is trained for.
  • the generic mode! 106 may be a relatively light-weight (e.g., relatively smaller, relatively less data, relatively less classification layers, relatively less ability to accurately, recognize shapes, colors, shading, textures, structures in the digital image 102, etc.) as compared to a generic model on a computing device with relatively more computing resources available than are available on the computing device 105.
  • the computing device 105 may be a mobile computing device with computing resources and/or power supply limited by the mobility, weight, price, and purpose constraints of the computing device 105.
  • a generic model on a separate computing device with greater computing resources and/or power supply may be one hundred classification layers deep while the generic model 106 utilized by the computing device 105 may include a ten classification layer deep distillation of the one hundred classification layer generic model.
  • the computing device 105 may utilize the generic model 106 for identifying the type of the object to analyze the digital images 102.
  • the digital images 102 may be analyzed and its appearance and/or data contained therein compared to the generic model 106.
  • the generic model 106 By applying the generic model 106 to the digital images 102, a portion of a unique object in the digital images 102 may be identified and localized within the digital images 102.
  • the term“unique” object may refer to a particular or specific object.
  • the unique object may be a particular or specific object of a plurality of objects of the same type.
  • both objects of the same type may be identified by the generic model 108. Therefore, the identification of a particular unique or specific one of the objects of the same type may be incidental to that particular unique or specific one of the objects of the same type belonging to the type identifiable by the generic model 108.
  • object 103-2 is utilized as an example of a unique object.
  • Object 103-2 may be a“person” type of object, but may, for example, also be a unique object such as a particular person.
  • the unique object 103-2 may be the only object of that type in the digital image 102 or the unique object may be one of a plurality of objects of that type in the digital image 102.
  • the digital image 102 may also include a distinct person type object such as person 103-4 that may also be identified by the generic model 106. Either way, the unique object 103-2 may possess unique characteristics that distinguish it from other objects 103-4 of a same or a similar type.
  • the computing device 105 may utilize the generic model 106 to identify a portion of the unique object 103-2 in the digital images 102 by identifying portions of the unique object 103-2 visibly included in the digital images 102 by comparing the visibly included portions to models of
  • the computing device 105 may identify the arm, leg, head, neck legs, eyes, ears, hair, clothing, mouth, entire body, etc. of the unique object 103-2 utilizing matches with the corresponding structural forms of those portions of the person saved in the generic model 106. Again, any other unique objects 103-4 of the same type as unique object 103-2 may also be identified by comparison with the generic model 106. [0024] Localizing the object 103-2 within the digital images 102 may include identifying the portions of the digital images 102 that include the object 103-2 and those portions of the image that do not include the object 103-2.
  • a plurality of individually unique objects of the same type may be identified in the same digital image 102 through this process.
  • the digital images 102 included more than one person e.g., person 103-2 and person 103-4
  • all of the people appearing in the digital images 102 may be localized. That is, while the generic model 106 may localize the unique object 103-2, it is not so limited and may localize ail of the unique objects of the same type (e.g., person 103-2 and person 103-4) present in the digital images 102.
  • the generic model 106 is trained to identify a generic type of an object, it isn ' t that the generic model 106 is specifically identifying and localizing just the unique object 103-2 or just the unique object 103-4 based on the unique characteristics of the corresponding unique object. Rather the generic model 106 may be identifying and/or localizing the object 103-2, which also happens to be unique, and the object 103-4, which also just happens to be unique, as belonging to the type of object, such as“person,” as defined by the generic model 106 applicable to identify the generic characteristics shared between the two objects.
  • Localizing objects such as object 103-2 and/or object 103-4, utilizing the generic model 106 may include identifying the position and/or the boundaries of the identified objects in the digital images 102.
  • localizing an object 103-2 identified by the generic model 106 may include defining pixels containing the object 103-2 and/or digital coordinates of boundaries enclosing the object 103-2 within the digital images 102.
  • localizing an object 103-2 identified by the generic model 106 may include utilizing a bounding mechanism 104 to define a bounding box, bounding sphere, a plurality of bounding boxes, a plurality of bounding spheres, a bounding volume, etc. for each of the objects in the digital images that are identified by the generic model 106.
  • the area within the bounding mechanism 104 may be treated as including the object 103-2 while the area outside the bounding mechanism 104 may be treated as not including the object 103-2.
  • a bounding mechanism may be utilized to also localize object 103-4.
  • the computing device 105 may utilize the localized portion of the unique object 103-2 in the digital image 102 to train a new model.
  • the new model may include a specific model 108.
  • a specific model 108 may include a model that is specific to and/or contains specificity of detail to identify a unique, specific, and/or particular object of a plurality of objects of the same type (e.g., person 103-2 versus person 103-4).
  • the specific model 108 may be a model that may be utilized to identify a unique object 103-2 based on the unique characteristics of that object relative to other objects of the same type, such as object 103-4, in the digital image 102.
  • An object classifier such as a person’s name, an animal’s name, a model of bike, a label of a piece of furniture, a serial number of a machine component, a package tracking number, etc. may be utilized to characterize a unique, specific, and/or particular object of a plurality of objects of the same type.
  • a specific model 108 for unique object 103-2 may be a specific model 108 for the unique characteristic of“Mabel,” the name of the particular girl that the model is specific to.
  • another specific model for unique object 103-4 may be a specific model for the unique characteristics of“Charlie,” the name of the particular boy that the model is specific to.
  • the specific model 108 may be trained to specifically identify a unique object 103-2 appearing in the digital images 102 based on its unique characteristics relative to other objects 103-1... 103-N, including other objects of the same type (e.g., other people such as 103-4, other animals, etc.), appearing in the same digital image 102. That is, the understanding of the specific model 108 may be inclusive of the understanding of the generic model 106 in its identification of a type of object, but may include additional understanding for identifying a unique, specific, and/or particular person based on their unique characteristics.
  • the specific model 108 may include a mathematical data model that may utilize a plurality of image classification layers organized in a neural network.
  • the specific model 106 may be trained to process digital images 102 and recognize individual unique objects based on recognizing shapes, colors, shading, textures, structures in the digital image 102 that match the shapes, colors, shading, textures, structures in the classification layers of the specific model 106.
  • the specific model 106 may develop these classification layers by analysis and construction of the model from a plurality of training images.
  • the training images for the specific model 108 may include the digital images 102 acquired and/or received by the computing device 105 including an indication of the location of the object 103-2 that was identified by the generic model 106 in the digital images.
  • the training images for the specific model 108 may include the digital images 102 acquired and/or received by the computing device 105, where each of the digital images 102 include a bounding mechanism 104 defining the portion of the digital image 102 that contains and/or the portion of the digital image 102 that does not contain the object 103-2 identified by the generic model 106.
  • the training images for the specific model 106 may include just the portions of the digital images 102 acquired and/or received by the computing device 105 that have been determined to include the object 103-2 identified by the generic model 106.
  • the digital Images 102 may include a plurality of objects of a same type 103-2 and 103-4 that may be equally identified by the generic model 106 based on common generic characteristics.
  • a specific model 108 for a particular unique object such as 103-2
  • utilizing training images of another particular unique object 103-4 would degrade and/or destruct the specific model 108 as the inclusion of characteristics that object 103-2 does not have, but that object 103- 4 does have (e.g., red shirt, blue pants, blonde hair, green eyes, short hair, etc.), would prevent the specific model 108 from being able to accurately identify object 103-2.
  • the system 100 may include a set of rules to identify which one of a plurality of objects of the same type 103-2 and 103-4 the specific mode! 108 should be trained to and, as a result, which of the portions of the digital images 102 should be utilized as training images for the specific mode! 108.
  • the computing device 105 and/or any other device training the specific model 108 may assume that the specific unique object 103- 2 of the plurality of objects 103-2 and 103-4 that is closest to the digital image capturing device, largest in the digital image, closest to on center in the digital image, located in a position consistent with a source of a sound of a voice command, largest object by bounding mechanism, etc. is the specific unique object 103-2 upon which to train the specific mode! 108.
  • the computing device 105 and/or any other device training the specific model 108 may select, as training images for the specific mode! 108, the portions of the digital image 102 identified by a bounding mechanism 104, developed by an application of the generic model 108, as containing an object of a type that will be utilized to train the specific mode! 108, based on those portions indicating the identified object is closest to the digital image capturing device, largest in the digital image, closest to on center in the digital image, located in a position consistent with a source of a sound of a voice command, largest object by bounding mechanism, etc.
  • training images may include the localized portion of the unique object from the digital images 102 that is output by the computing device 105
  • the computing device 105 may produce a plurality of augmented digital images from the digital images 102.
  • the computing device 105 may manipulate the digital images 102 including the localized portion of the unique object 103-N and/or a portion of the digital images 102 including the localized portion of the unique object 103-N. in some examples, the computing device 105 may darken, lighten, stretch, rotate, change a viewing angle of, change the contrast, etc. of such images. By manipulating these images to create augmented images, a single image may be turned into a plurality of images with distinct properties that may be utilized as additional training images to train a more robust specific model 108.
  • the computing device 105 may train the specific model 108 at the computing device 105.
  • the computing device 105 may train a neural network that is relatively light-weight (e.g., relatively smaller, relatively less data, relatively less classification layers, relatively less ability to accurately, recognize shapes, colors, shading, textures, structures in the digital image 102, etc.) as compared to a specific model on a computing device with relatively more computing resources available than are available on the computing device 105.
  • the computing device 105 may be a mobile computing device with computing resources and/or power supply limited by the mobility, weight, price, and purpose constraints of the computing device 105.
  • a specific model on a separate computing device with greater computing resources and/or power supply may be one hundred classification layers deep while the specific model 108 utilized by the computing device 105 may include a ten layer deep distillation of the one hundred layers deep model.
  • the digital images 102 including an indication of the portion of the digital image that contains the object 103-2 identified by the generic model 106, just the portion of the digital image that contains the object 103-2 identified by the generic model 106, and/or the augmented digital images may be communicated from the computing device 105 to a second computing device.
  • the second computing device may include a computing device with relatively more computing resources that the computing device 105.
  • the second computing device may be a single computing device and/or a plurality of computing devices in communication with one another.
  • the second computing device may include a cloud-based service utilizing centralized or distributed processing resources, memory resources, and/or instructions executable by the processing resource that are remote from the computing device 105.
  • the second computing device may train a neural network that is relatively heavy-weight (e.g., relatively larger, relatively more data, relatively more classification layers, relatively more ability to accurately recognize shapes, colors, shading, textures, structures in the digital image 102, etc.) as compared to a specific model 108 on a computing device 105 which has relatively less computing resources available than are available on the second computing device.
  • a specific model on the second computing device with greater computing resources and/or power supply may be one hundred classification layers deep while the specific model 108 utilized by the computing device 105 may include a ten layer deep distillation of the one hundred layers deep model.
  • the specific model trained on the second computing device may be modified to operate on the computing device 105.
  • the heavy-weight version of the specific model trained on the second device may be distilled down to the specific model 108 optimized with fewer classification layers for the computing device 105.
  • the specific model 108 may be utilized by the computing device 105 to identify the unique object in digital images captured subsequent to the digital images 102 that the generic model 106 was applied to.
  • the specific model 108 may be able to specifically recognize the unique object 103-2 from the unique characteristics of the unique object. For example, where the unique object 103-2 is a person, the specific model 108 may be able to specifically identify images of the particular person 103-2 based on unique characteristics of the person’s face, their clothing, their anatomical dimensions, their gestures, their posture, and/or any other distinguishable characteristics trainable to the specific model 108
  • the specific model may be utilized to specifically track the presence of the unique object 103-2 in a digital video and/or a series of digital image stills captured after the specific model 108 Is trained.
  • the specific model 108 may be trained in response to a command issued to the computing device 105.
  • the computing device may cause the specific model 108 for the particular person to be trained from the particular unique person 103-2 identified and localized from the generic model 106 and then utilize the specific model in subsequent digital images to track the location and coordinate filming of the particular person.
  • the computing device 105 and/or any other device training the specific model 108 may assume that the specific unique object 103-2 of the plurality of objects 103-2 and 103-4 that is closest to the digital image capturing device, largest in the digital image, closest to on center In the digital image, located in a position consistent with a source of a sound of a voice command, largest object by bounding mechanism, etc. is the specific unique object 103-2 upon which to train the specific model 108.
  • examples of the present disclosure may be progressively fine-tuned.
  • the specific model 108 may be repeatedly updated and/or replaced based on updated data.
  • the generic model 106 may be utilized to identify images of the unique object that will be utilized to train the specific mode! 108 of the unique object.
  • that specific model 108 is applied to digital images of the objects 103-1... 103-N captured subsequent to training the specific model 108 for the first-time additional data to modify or replace the specific mode!
  • the computing device 105 may identify the unique object 103-2 that the specific model 108 is specific to in thirty consecutive digital video frames, but then the computing device may not be able to identify the unique object in the thirty first consecutive digital video frame utilizing the specific model 108.
  • the computing device 105 may determine that it is unlikely that the unique object 103-2 has totally disappeared from the digital images in the intervening second between frames.
  • the computing device 105 may determine that it is relatively more likely that the unique object 103-2 is in a pose or position that is not being recognized by the specific mode! 108 as identifiable as the unique object 103-2.
  • the computing device 105 may cause these frames to be analyzed utilizing a computationally heavier generic model for the type of object that the unique object 103-2 is.
  • Such a heaver model may be larger and/or more accurate than the generic model 106 and/or the specific model 108 for objects of that type.
  • the computing device 105 may rely on a separate second computing device with relatively greater processing power and/or a relatively larger network of computational resources than the computing device 105.
  • a separate second computing device with relatively greater processing power and/or a relatively larger network of computational resources than the computing device 105.
  • an object of the same type as the unique object 103-2 may be identified in the frames where the unique object 103-2 was not previously detected with the specific model 108.
  • the newly defected object may be indicated with a bounding mechanism and the bounded portion of the frames may be utilized as an additional training image to modify and/or replace the specific model 108.
  • the specific model 108 may continue to be trained until a threshold level of unique object identification accuracy has been achieved. At that point, training of the specific model 108 may be discontinued until such time as the accuracy rate falls below the threshold again.
  • the unique object as described above can be a person, place, or thing.
  • the generic model may be a model generic to that type of person, place, or thing.
  • the specific model may be a specific model specific to the uniquely identifiable characteristics of the person, place, or thing among others of the same type.
  • the system 100 is not intended to be limited to any particular example recited herein.
  • the system 100 may utilize and/or include elements of the non-transitory machine-readable medium of Figure 2 and/or the method of Figure 3.
  • Figure 2 illustrates a diagram 220 of a processing resource 222 and a non-transitory machine-readable medium 224 for training recognition models according to the present disclosure.
  • a memory resource such as the non-transitory machine-readable medium 224, may be used to store instructions (e.g , 226, 228, 230, 232) executed by the processing resource 222 to perform the operations as described herein.
  • the operations are not limited to a particular example described herein and may include additional operations such as those described with regard to the system 100 described in Figure 1 , and the method 340 described in Figure 3.
  • a processing resource 222 may execute the instructions stored on the non-transitory machine readable medium 224.
  • the non-transitory machine- readable medium 224 may be any type of volatile or non-volatile memory or storage, such as random-access memory (RAM), flash memory, read-only memory (ROM), storage volumes, a hard disk, or a combination thereof.
  • the machine-readable medium 224 may store instructions 226 executable by the processing resource 222 to analyze digital images.
  • the digital images may be digital images of a physical environment.
  • the digital images may include a plurality of objects. Some of the objects may be the same type of object.
  • the plurality of objects of the same type may include a unique object that is distinguishable from the other objects of the same type by some distinguishing characteristics.
  • the machine-readable medium 224 may store instructions 228 executable by the processing resource 222 to identify a segment of each one of the digital images that includes the unique object.
  • the segment identification may be performed at a first computing device.
  • the segments may be identified as a boundary superimposed on the digital image. Inside the boundary the unique object may be present and outside the boundary the unique object may not be present.
  • the segments of each one of the digital images that includes the unique object may be identified by utilizing a first model.
  • the first model may be a generic model.
  • the first model may be utilized to identify generic structures of the unique object.
  • Generic structures may include structures that are generic to the type of object.
  • a generic structure may include a human head. While the detailed characteristics and metrics of the features on the head may be unique, the generic human head gross structure is a common structure generic to objects of the human type.
  • the machine-readable medium 224 may store instructions 230 executable by the processing resource 222 to communicate the identified segments of each one of the digital images. The identified segments may be communicated from the first computing device to a second computing device.
  • the second computing device may include distributed computing resources providing a cloud service including a deep learning neural network that may include heavy-weight generic models and may be utilized to train a specific model.
  • the segments may be communicated as a digital image with the segment containing the unique object defined, or the segments may be communicated as the information from the segment containing the unique object.
  • the machine-readable medium 224 may store instructions 232 executable by the processing resource 222 to receive a second model.
  • the second model may be communicated from the second computing device to the first computing device.
  • the second model may be a specific model to recognize unique structures of the unique objects.
  • the second model may be trained to identify a face, clothing, posture, anatomical dimensions, that are unique to a particular person in the digital image.
  • the second model may be trained to recognize the unique structures from the segments of the digital images identified as including the unique object.
  • the second model may be a distilled version of a heavier-weight specific model that is optimized for the computational resources of the first computing device.
  • the first computing device may track the unique object.
  • the first computing device may track the object with a camera, causing the camera to pan, tilt, and/or otherwise move to keep the object in frame.
  • the first computing device may include a robot.
  • the robot may watch, with a camera, and or physically follow the unique object.
  • the first computing device may track the unique object based on the second specific model received from the second computing device. That is, the first computing device may analyze subsequent digital images of the physical environment utilizing the second model that is specific for unique structures of the unique object. Subsequent digital images may be digital images that are captured and/or analyzed after the initial digital images utilized to train the second model and/or that are captured after the second model is trained and/or being utilized by the first computing device. Tracking the unique object may include repeatedly identifying and/or marking the location of the unique object in the subsequent digital image.
  • the digital image may be communicated to a second computing device where it may be subjected to further analysis with more detailed generic or specific models.
  • the more detailed analysis may determine if the unique object was present and to train the specific model that will be implemented in the first computing device with the images if they are determined to include the unique object after all. That is, the first computing device may identify a subsequent digital image including the unique object where the unique object was not identified while utilizing the second model to track the unique object.
  • the subsequent digital image may be communicated from the first computing device to the second computing device.
  • the first computing device may receive, from the second computing device, a third model.
  • the third model may include a replacement or refinement for the second model.
  • the third model may also be a model to recognize the unique structures of the unique object. However, in the third model the unique structures may be refined from the second model based on analysis and/or use of the subsequent digital image as example training images.
  • Figure 3 illustrates a diagram of a method 340 for training recognition models according to the present disclosure.
  • the method 340 is not limited to any particular set and/or order of operations.
  • the method 340 may include additional operations such as those described with regard to the system 100 described in Figure 1 , and the machine-readable medium 224 described in Figure 2.
  • the method 340 may include utilizing a first model to identify and place a bounding mechanism around a particular person appearing in a digital image captured in a physical location of a first computing device.
  • the digital image may be captured by a digital image capturing system in communication with and/or incorporated into the first computing device in some examples, the digital image may include a digital image still that is extracted from a video of the physical location.
  • the first computing device may utilize the first model that is generic to identifying human anatomy in order to identify a bounding mechanism around a person or a plurality of people in the digital image.
  • the particular person, being of the object type“person” may have their location defined by a bounding mechanism.
  • the method 340 may include utilizing, at a second computing device, the digital image of the particular person appearing within the bounding box to train a second model.
  • a particular person may be selected from the plurality to train the second model for.
  • the particular person may be selected from a plurality of people in the physical environment based on the particular person being a largest person of the plurality of people in the physical environment by bounding box area
  • the second model may be a model that is specific to recognizing digital images of the particular person.
  • examples of the present disclosure may Include models that are automatically fed bounded definitions of the training target from application of the generic model to the digital images. The specific model Is then able to leverage these images to train the specific model while cutting out the time and expense of manual human intervention.
  • the method 340 may include utilizing, at the first device, the second model to identify the particular person In subsequent digital images captured by the first computing device.
  • the first computing device may compare subsequent digital images to the second model to identify the specific characteristics defined in the model that indicate that the particular person is present in the digital image.
  • the method 340 may include communicating, from the first device to the second device, additional images that include the particular person captured in the physical location. That is, as the first computing device identifies the particular person in the additional images and/or assigns bounding mechanisms to the portion of the image where they are located the first computing device may utilize this data in a feedback loop to further train the second model to achieve greater specificity and/or accuracy of identification.
  • the method 340 may include refining the second model.
  • Refining the second model may include utilizing the additional images that include the particular person that were identified utilizing the second model as example training images to train further specificity into the second model.
  • the second model may improve its understanding of the unique characteristics of the particular person.
  • the first computing device may track the particular person. For example, responsive to receiving a voice command at the first computing device from the particular person, the first computing device may track the particular person. Tracking the person may include performing operations that keep the particular person in frame of a digital image capturing device and/or allow a mobile computing device to stay within a proximity to the particular person.
  • the method 340 may include comparing an accuracy metric of the second model to a threshold accuracy model.
  • the accuracy metric may be determined based on how frequently the second model was not able to identify the particular person in the digital image when they were, in fact, in the digital picture (false negative).
  • the accuracy metric may be determined based on how frequently the second model identified the particular person in the digital image even though they were not present in the digital picture (false positive).
  • the second model may continue to be refined by training with the additional images until the threshold accuracy metric is reached.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Software Systems (AREA)
  • Medical Informatics (AREA)
  • General Health & Medical Sciences (AREA)
  • Computing Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Image Analysis (AREA)

Abstract

A system may include a processing resource, and a computing device comprising instructions executable by the processing resource to: gather digital images of a unique object in a physical environment; utilize a generic model for identifying a type of the object to localize a portion of the unique object in the digital images; and train, form the portion of the unique object localized in the digital images, a new specific model for recognizing the unique object.

Description

TRAINED RECOGNITION MODELS
Background
[0001] Computer vision (CV) may include acquiring, analyzing, and understanding digital images at a computing device. CV may involve transforming visual images captured of the real world into descriptions, such as numerical and/or symbolic information, that can be processed and elicit an appropriate operation. For example, computing systems may extract information from digital images and apply models to detect operation triggers. CV may utilize object identification to identify objects in the digital image.
Brief Description of the Drawings
[0002] Figure 1 illustrates a diagram of an example of a system for training recognition models according to the present disclosure.
[0003] Figure 2 illustrates a diagram of an example of a processing resource and a non-transitory machine-readable medium for training recognition models according to the present disclosure.
[0004] Figure 3 illustrates a diagram of an example of a method for training recognition models according to the present disclosure.
Detailed Description
[0005] To achieve an understanding of a real world physical environment from a digital image of the physical environment, objects in the digital image may be recognized and/or located with CV systems. Objects appearing in a digital image may be identified through object recognition systems utilizing deep learning techniques. Deep learning techniques may include machine learning methods that learning data representations, as opposed to task-specific algorithm. The machine learning may be supervised. Deep learning architectures may include neural networks such as artificial neural network with multiple layers between an input and an output.
[0006] An artificial neural network may include a computing system that progressively improves performance on tasks by considering examples. For example, in object recognition in digital images, a neural network may learn to identify images that contain a person by analyzing example training images that have been manually labeled by a human as "a person” or“not a person.”
[0007] To develop a computational understanding of what a person looks like, hundreds, thousands, and/or many more example training images may be analyzed by the neural network. Accordingly, hundreds, thousands, or many more example training images may be manually tagged by a human. As such, training a neural network may be a time consuming and expensive endeavor relying on substantial human intervention.
[0008] Being able to pick out individual objects from a plurality of similar objects may involve further training or refining of the neural network. As such, tracking individual objects involves the above described time-consuming and expensive manual intervention to develop specific models that may be utilized to identify the specific object in a digital image. As such, on demand models to identify specific objects are not able to be rapidly deployed due to the time delays and costs associated with producing them.
[0009] In contrast, examples of the present disclosure may train C V systems and/or neural networks to detect unique objects within seconds, avoiding the lengthy, expensive, and manual intervention process. The examples of the present disclosure may utilize generic models for deep learning to identify a generic object in a digital image and use that identified object as example training images to seed a more extensive specific model for recognizing the object, such as a person. The examples of the present disclosure may facilitate rapid specific model development that may be utilized to perform tracking of unique object. For example, an example of the present disclosure may include a system including a processing resource and a computing device. The computing device may include instructions executable by the processing resource to: gather digital images of a unique object in a physical environment, utilize a generic model for identifying a type of the object to localize a portion of the unique object in the digital image; and train, from the portion of the unique object localized in the digital images, a new specific model for recognizing the unique object
[0010] Figure 1 illustrates a diagram of an example of a system 100 for training recognition models according to the present disclosure. The system 100 may include a computing device 105. The computing device 105 may include a processing resource. The computing device 105 may include a memory resource. The memory resource may include a non-iransiiory machine-readable medium. The instructions may include instructions executable by the processing resources to cause the computing device 105 to execute operations, such as operations associated with training recognition models.
[0011] An example of a computing device 105 may include a mobile computing device, a smartphone, a robot, a digital image capturing device, a surveillance device, an internet-of-things device, etc. The computing device 105 may capture digital images 102 and/or be associated with a separate device that may capture digital images 102. In some examples, the digital images may include digital video.
[0012] The system 100 may be utilized to capture digital images 102.
The digital images 102 may include digital video and/or digital still images of objects 103-1...103-N in a rea!-wor!d physical environment. The digital images 102 may include digital images 102 of real world physical objects 103-1...103- N. The objects 103-1...103-N may be people, places, things etc. that are present and/or visible to the digital image capturing device in the physical environment. [0013] In examples where the source of the digital images 102 is a digital video of the physical environment, the computing device may extract digital images 102 as stills from the source video. For example, if the source of the digital images 102 is a ten-second-long digital video, the computing device 105 may extract ten still digital images 102, with one digital image 102 extracted from each one second period of the ten-second-long video.
[0014] In some examples, the digital images 102 may be minified.
Minification of the digital images may include reformatting and/or resize the digital representation of the image to achieve a compatibility with the other operations of the computing device 105 and/or with the generic model 106 or specific model 110 discussed later.
[0015] The digital images 102 may include a single object and/or a plurality of objects 103-1... 103-N. The digital images 102 may include different types of objects 103-1...103-N in a same digital image. For example, the digital image may include an object such as a bicycle, an object such as a toy car, an object such as a person, and an object such as an animal in the same digital image 102. in some examples, the digital Image 102 may include a plurality of the same type of object For example, a same digital image may include a plurality of different people (e.g., 103-2 and 103-4).
[0016] The computing device 105 may acquire and/or receive the digital images 102 of the physical environment including the objects 103-1... 103-N. The acquisition and/or reception of the digital images 102 may be part of the standard course of operations of the computing device 105. However, in some examples, the acquisition and/or reception of the digital Images 102 may be performed in response to a command. For example, a person in the physical environment may issue a command, such as a voice command, instructing the computing device 105 to train a specific model of an object, such as a voice command to“follow me,”“film me,”“meet our new dog,”“enroll Mabel,” etc. Such a command may trigger the acquisition and/or reception of the digital images 102 [0017] Alternatively, such a command may trigger the utilization of the digital images 102, acquired or received in a standard course of operations of the computing device 105, to be utilized in training the specific model of the object where, prior to the command, they might not have been utilized in that manner in either case, digital images 102 may be gathered by the computing device 105, and those digital images 102 may include images of a unique real- world physical object 103-1... 103-N (such as a specific or particular person such as specific person 103-2 or specific person 103-4, a specific place, or a specific thing) that is able to be captured in a digital image 102 by the image capturing device at the physical environment.
[0018] The computing device 105 may include a generic model 106 to identify objects 103-1... 103-N in the digital image 102. The generic model 106 may be a model that may be utilized to identify a generic type of object in a digital image. An object type may be a classifier such as“person,”“dog,”“bike,” “cat,”“furniture,”“machine component,”“package,” etc. that may be utilized to characterize a type that is applicable to a plurality of individually unique objects that share common characteristics of the type such as human anatomy, dog anatomy, bike structure, cat anatomy, furniture structure, toy car structure, machine component structure, package markings and/or structure, etc
[0019] A generic model 106 may be generic in that it may be utilized to identify objects 103-1... 103-N that share the common characteristics with the type. However, the generic model 106 may not be trained to identify individual, specific, or unique objects from other objects of the same type. For example, a generic model 106 may be trained to analyze digital images 102 and identify a plurality of people (e.g , person 103-2 and person 103-4) appearing in the digital images 102 as people, but the generic model 106 may not be trained to identify distinct individuals or a specific person (e.g., person 103-2 versus person 103-4) from the plurality of people. That is, the understanding of the generic model 106 may extend so far as what type an object 103-1... 103-N belongs to, but not what subtype within that type the object 103-1... 103-N belongs to. [0020] The generic mode! 106 may include a mathematical data model that may utilize a plurality of classification layers arranged as a neural network. The generic model 106 may be trained to process digital images 102 and recognize types of objects 103-1... 103-N based on recognizing shapes, colors, shading, textures, structures in the digital image 102 that match the shapes, colors, shading, textures, structures in the classification layers of the generic mode! 106. The generic model 106 may develop these classification layers by analysis and construction of the model from hundreds, thousands, or more example training images that have been manually labeled by a human as“an X” or“not an X,” where X is the type of object that the model is trained for. in some examples, the generic mode! 106 may be a relatively light-weight (e.g., relatively smaller, relatively less data, relatively less classification layers, relatively less ability to accurately, recognize shapes, colors, shading, textures, structures in the digital image 102, etc.) as compared to a generic model on a computing device with relatively more computing resources available than are available on the computing device 105. The computing device 105 may be a mobile computing device with computing resources and/or power supply limited by the mobility, weight, price, and purpose constraints of the computing device 105.
For example, a generic model on a separate computing device with greater computing resources and/or power supply may be one hundred classification layers deep while the generic model 106 utilized by the computing device 105 may include a ten classification layer deep distillation of the one hundred classification layer generic model.
[0021] The computing device 105 may utilize the generic model 106 for identifying the type of the object to analyze the digital images 102. For examples, the digital images 102 may be analyzed and its appearance and/or data contained therein compared to the generic model 106. By applying the generic model 106 to the digital images 102, a portion of a unique object in the digital images 102 may be identified and localized within the digital images 102. As used herein, the term“unique” object may refer to a particular or specific object. In an example, the unique object may be a particular or specific object of a plurality of objects of the same type. However, where the digital images 102 contain two objects of a same type, such as person 103-2 and person 103 4, both objects of the same type may be identified by the generic model 108. Therefore, the identification of a particular unique or specific one of the objects of the same type may be incidental to that particular unique or specific one of the objects of the same type belonging to the type identifiable by the generic model 108.
[0022] In Figure 1 , object 103-2 is utilized as an example of a unique object. Object 103-2 may be a“person” type of object, but may, for example, also be a unique object such as a particular person. The unique object 103-2 may be the only object of that type in the digital image 102 or the unique object may be one of a plurality of objects of that type in the digital image 102. For example, the digital image 102 may also include a distinct person type object such as person 103-4 that may also be identified by the generic model 106. Either way, the unique object 103-2 may possess unique characteristics that distinguish it from other objects 103-4 of a same or a similar type.
[0023] The computing device 105 may utilize the generic model 106 to identify a portion of the unique object 103-2 in the digital images 102 by identifying portions of the unique object 103-2 visibly included in the digital images 102 by comparing the visibly included portions to models of
corresponding portions of the type of object defined in the generic model 108. For example, where the generic model 106 is a generic model for a person, the computing device 105 may identify the arm, leg, head, neck legs, eyes, ears, hair, clothing, mouth, entire body, etc. of the unique object 103-2 utilizing matches with the corresponding structural forms of those portions of the person saved in the generic model 106. Again, any other unique objects 103-4 of the same type as unique object 103-2 may also be identified by comparison with the generic model 106. [0024] Localizing the object 103-2 within the digital images 102 may include identifying the portions of the digital images 102 that include the object 103-2 and those portions of the image that do not include the object 103-2. As described above, a plurality of individually unique objects of the same type (e.g , person 103-2 and person 103-4) may be identified in the same digital image 102 through this process. For example, if the digital images 102 included more than one person (e.g., person 103-2 and person 103-4), all of the people appearing in the digital images 102 may be localized. That is, while the generic model 106 may localize the unique object 103-2, it is not so limited and may localize ail of the unique objects of the same type (e.g., person 103-2 and person 103-4) present in the digital images 102. Again, since the generic model 106 is trained to identify a generic type of an object, it isn't that the generic model 106 is specifically identifying and localizing just the unique object 103-2 or just the unique object 103-4 based on the unique characteristics of the corresponding unique object. Rather the generic model 106 may be identifying and/or localizing the object 103-2, which also happens to be unique, and the object 103-4, which also just happens to be unique, as belonging to the type of object, such as“person,” as defined by the generic model 106 applicable to identify the generic characteristics shared between the two objects.
[0025] Localizing objects, such as object 103-2 and/or object 103-4, utilizing the generic model 106 may include identifying the position and/or the boundaries of the identified objects in the digital images 102. For example, localizing an object 103-2 identified by the generic model 106 may include defining pixels containing the object 103-2 and/or digital coordinates of boundaries enclosing the object 103-2 within the digital images 102. In an example, localizing an object 103-2 identified by the generic model 106 may include utilizing a bounding mechanism 104 to define a bounding box, bounding sphere, a plurality of bounding boxes, a plurality of bounding spheres, a bounding volume, etc. for each of the objects in the digital images that are identified by the generic model 106. The area within the bounding mechanism 104 may be treated as including the object 103-2 while the area outside the bounding mechanism 104 may be treated as not including the object 103-2. Again, although not illustrated, a bounding mechanism may be utilized to also localize object 103-4.
[0026] The computing device 105 may utilize the localized portion of the unique object 103-2 in the digital image 102 to train a new model. The new model may include a specific model 108. A specific model 108 may include a model that is specific to and/or contains specificity of detail to identify a unique, specific, and/or particular object of a plurality of objects of the same type (e.g., person 103-2 versus person 103-4). The specific model 108 may be a model that may be utilized to identify a unique object 103-2 based on the unique characteristics of that object relative to other objects of the same type, such as object 103-4, in the digital image 102. An object classifier such as a person’s name, an animal’s name, a model of bike, a label of a piece of furniture, a serial number of a machine component, a package tracking number, etc. may be utilized to characterize a unique, specific, and/or particular object of a plurality of objects of the same type. For example, a specific model 108 for unique object 103-2 may be a specific model 108 for the unique characteristic of“Mabel,” the name of the particular girl that the model is specific to. In another example, another specific model for unique object 103-4 may be a specific model for the unique characteristics of“Charlie,” the name of the particular boy that the model is specific to.
[0027] The specific model 108 may be trained to specifically identify a unique object 103-2 appearing in the digital images 102 based on its unique characteristics relative to other objects 103-1... 103-N, including other objects of the same type (e.g., other people such as 103-4, other animals, etc.), appearing in the same digital image 102. That is, the understanding of the specific model 108 may be inclusive of the understanding of the generic model 106 in its identification of a type of object, but may include additional understanding for identifying a unique, specific, and/or particular person based on their unique characteristics.
[0028] The specific model 108 may include a mathematical data model that may utilize a plurality of image classification layers organized in a neural network. The specific model 106 may be trained to process digital images 102 and recognize individual unique objects based on recognizing shapes, colors, shading, textures, structures in the digital image 102 that match the shapes, colors, shading, textures, structures in the classification layers of the specific model 106. The specific model 106 may develop these classification layers by analysis and construction of the model from a plurality of training images.
[0029] The training images for the specific model 108 may include the digital images 102 acquired and/or received by the computing device 105 including an indication of the location of the object 103-2 that was identified by the generic model 106 in the digital images. For example, the training images for the specific model 108 may include the digital images 102 acquired and/or received by the computing device 105, where each of the digital images 102 include a bounding mechanism 104 defining the portion of the digital image 102 that contains and/or the portion of the digital image 102 that does not contain the object 103-2 identified by the generic model 106. In some examples, the training images for the specific model 106 may include just the portions of the digital images 102 acquired and/or received by the computing device 105 that have been determined to include the object 103-2 identified by the generic model 106.
[0030] As described above, some in some examples the digital Images 102 may include a plurality of objects of a same type 103-2 and 103-4 that may be equally identified by the generic model 106 based on common generic characteristics. However, in training a specific model 108 for a particular unique object, such as 103-2, utilizing training images of another particular unique object 103-4 would degrade and/or destruct the specific model 108 as the inclusion of characteristics that object 103-2 does not have, but that object 103- 4 does have (e.g., red shirt, blue pants, blonde hair, green eyes, short hair, etc.), would prevent the specific model 108 from being able to accurately identify object 103-2. As such, the system 100 may include a set of rules to identify which one of a plurality of objects of the same type 103-2 and 103-4 the specific mode! 108 should be trained to and, as a result, which of the portions of the digital images 102 should be utilized as training images for the specific mode! 108. For example, the computing device 105 and/or any other device training the specific model 108 may assume that the specific unique object 103- 2 of the plurality of objects 103-2 and 103-4 that is closest to the digital image capturing device, largest in the digital image, closest to on center in the digital image, located in a position consistent with a source of a sound of a voice command, largest object by bounding mechanism, etc. is the specific unique object 103-2 upon which to train the specific mode! 108. As a result, the computing device 105 and/or any other device training the specific model 108 may select, as training images for the specific mode! 108, the portions of the digital image 102 identified by a bounding mechanism 104, developed by an application of the generic model 108, as containing an object of a type that will be utilized to train the specific mode! 108, based on those portions indicating the identified object is closest to the digital image capturing device, largest in the digital image, closest to on center in the digital image, located in a position consistent with a source of a sound of a voice command, largest object by bounding mechanism, etc.
[0031] That is, in contrast to the development of the classification layers of the specific model 108 by training on hundreds, thousands, or more example training images that have been manually labeled by a human as“an X” or“not an X,” where X Is the unique object 103-2 that the model Is trained for, examples of the present disclosure may utilize training images that were not manually marked. Instead the training images may include the localized portion of the unique object from the digital images 102 that is output by the computing device 105 [0032] In some examples, the computing device 105 may produce a plurality of augmented digital images from the digital images 102. For example, the computing device 105 may manipulate the digital images 102 including the localized portion of the unique object 103-N and/or a portion of the digital images 102 including the localized portion of the unique object 103-N. in some examples, the computing device 105 may darken, lighten, stretch, rotate, change a viewing angle of, change the contrast, etc. of such images. By manipulating these images to create augmented images, a single image may be turned into a plurality of images with distinct properties that may be utilized as additional training images to train a more robust specific model 108.
[0033] In some examples, the computing device 105 may train the specific model 108 at the computing device 105. in such examples, the computing device 105 may train a neural network that is relatively light-weight (e.g., relatively smaller, relatively less data, relatively less classification layers, relatively less ability to accurately, recognize shapes, colors, shading, textures, structures in the digital image 102, etc.) as compared to a specific model on a computing device with relatively more computing resources available than are available on the computing device 105. The computing device 105 may be a mobile computing device with computing resources and/or power supply limited by the mobility, weight, price, and purpose constraints of the computing device 105. For example, a specific model on a separate computing device with greater computing resources and/or power supply may be one hundred classification layers deep while the specific model 108 utilized by the computing device 105 may include a ten layer deep distillation of the one hundred layers deep model.
[0034] Fiowever, in some examples, the digital images 102 including an indication of the portion of the digital image that contains the object 103-2 identified by the generic model 106, just the portion of the digital image that contains the object 103-2 identified by the generic model 106, and/or the augmented digital images may be communicated from the computing device 105 to a second computing device. The second computing device may include a computing device with relatively more computing resources that the computing device 105. The second computing device may be a single computing device and/or a plurality of computing devices in communication with one another.
[0035] In an example, the second computing device may include a cloud-based service utilizing centralized or distributed processing resources, memory resources, and/or instructions executable by the processing resource that are remote from the computing device 105. The second computing device may train a neural network that is relatively heavy-weight (e.g., relatively larger, relatively more data, relatively more classification layers, relatively more ability to accurately recognize shapes, colors, shading, textures, structures in the digital image 102, etc.) as compared to a specific model 108 on a computing device 105 which has relatively less computing resources available than are available on the second computing device. For example, a specific model on the second computing device with greater computing resources and/or power supply may be one hundred classification layers deep while the specific model 108 utilized by the computing device 105 may include a ten layer deep distillation of the one hundred layers deep model.
[0036] In the above described examples, the specific model trained on the second computing device may be modified to operate on the computing device 105. For example, the heavy-weight version of the specific model trained on the second device may be distilled down to the specific model 108 optimized with fewer classification layers for the computing device 105.
[0037] The specific model 108 may be utilized by the computing device 105 to identify the unique object in digital images captured subsequent to the digital images 102 that the generic model 106 was applied to. The specific model 108 may be able to specifically recognize the unique object 103-2 from the unique characteristics of the unique object. For example, where the unique object 103-2 is a person, the specific model 108 may be able to specifically identify images of the particular person 103-2 based on unique characteristics of the person’s face, their clothing, their anatomical dimensions, their gestures, their posture, and/or any other distinguishable characteristics trainable to the specific model 108
[0038] In some examples, the specific model may be utilized to specifically track the presence of the unique object 103-2 in a digital video and/or a series of digital image stills captured after the specific model 108 Is trained. As described above, in some examples the specific model 108 may be trained in response to a command issued to the computing device 105. In response to a particular person telling the computing device 105 to film them, the computing device may cause the specific model 108 for the particular person to be trained from the particular unique person 103-2 identified and localized from the generic model 106 and then utilize the specific model in subsequent digital images to track the location and coordinate filming of the particular person.
[0039] Again, in examples where the digital images 102 contain a plurality of objects of the same type (e.g., object 103-2 and object 103-4) that are identified by the generic model 106, the computing device 105 and/or any other device training the specific model 108 may assume that the specific unique object 103-2 of the plurality of objects 103-2 and 103-4 that is closest to the digital image capturing device, largest in the digital image, closest to on center In the digital image, located in a position consistent with a source of a sound of a voice command, largest object by bounding mechanism, etc. is the specific unique object 103-2 upon which to train the specific model 108.
[0040] In contrast to static specific models that may be defined by an initial training set of digital images manually characterized by humans, examples of the present disclosure may be progressively fine-tuned. For example, the specific model 108 may be repeatedly updated and/or replaced based on updated data. For example, the generic model 106 may be utilized to identify images of the unique object that will be utilized to train the specific mode! 108 of the unique object. However, while that specific model 108 is applied to digital images of the objects 103-1... 103-N captured subsequent to training the specific model 108 for the first-time additional data to modify or replace the specific mode! 108 may be being collected in an example, the computing device 105 may identify the unique object 103-2 that the specific model 108 is specific to in thirty consecutive digital video frames, but then the computing device may not be able to identify the unique object in the thirty first consecutive digital video frame utilizing the specific model 108.
[0041] The computing device 105 may determine that it is unlikely that the unique object 103-2 has totally disappeared from the digital images in the intervening second between frames. The computing device 105 may determine that it is relatively more likely that the unique object 103-2 is in a pose or position that is not being recognized by the specific mode! 108 as identifiable as the unique object 103-2. As a result, the computing device 105 may cause these frames to be analyzed utilizing a computationally heavier generic model for the type of object that the unique object 103-2 is. Such a heaver model may be larger and/or more accurate than the generic model 106 and/or the specific model 108 for objects of that type. In some examples, the computing device 105 may rely on a separate second computing device with relatively greater processing power and/or a relatively larger network of computational resources than the computing device 105. From application of the heavier generic model, perhaps on the separate second computing device, an object of the same type as the unique object 103-2 may be identified in the frames where the unique object 103-2 was not previously detected with the specific model 108. The newly defected object may be indicated with a bounding mechanism and the bounded portion of the frames may be utilized as an additional training image to modify and/or replace the specific model 108.
[0042] In addition to frames where the unique object 103-2 is
unexpectedly absent, digital images where the unique object 103-2 was successfully identified with the specific model 108 may be fed back in as training images to further train the specific mode! 108. The location of the unique object 103-2 identified in the digital images by applying the specific model 108 may be utilized as an example training image of the unique object 103-2 for further refining or replacing the specific model 108. As such, a new specific model may be developed by training a neural network, which may be remote from the physical environment where the digital images are captured, with the unique object 103-2 identified by the bounding boxes in digital images regardless of whether the bounding boxes are defined by the generic mode!
106, a relatively more heavy-weight generic model, and or the specific model 108. The specific model 108 may continue to be trained until a threshold level of unique object identification accuracy has been achieved. At that point, training of the specific model 108 may be discontinued until such time as the accuracy rate falls below the threshold again.
[0043] Some of the examples provided above describe a person as the unique object, a generic person model as the generic model, and a specific model specific to the unique person. However, the system 100 is not intended to be so limited. The unique object, as described above can be a person, place, or thing. The generic model may be a model generic to that type of person, place, or thing. The specific model may be a specific model specific to the uniquely identifiable characteristics of the person, place, or thing among others of the same type. The system 100 is not intended to be limited to any particular example recited herein. The system 100 may utilize and/or include elements of the non-transitory machine-readable medium of Figure 2 and/or the method of Figure 3.
[0044] Figure 2 illustrates a diagram 220 of a processing resource 222 and a non-transitory machine-readable medium 224 for training recognition models according to the present disclosure. A memory resource, such as the non-transitory machine-readable medium 224, may be used to store instructions (e.g , 226, 228, 230, 232) executed by the processing resource 222 to perform the operations as described herein. The operations are not limited to a particular example described herein and may include additional operations such as those described with regard to the system 100 described in Figure 1 , and the method 340 described in Figure 3.
[0045] A processing resource 222 may execute the instructions stored on the non-transitory machine readable medium 224. The non-transitory machine- readable medium 224 may be any type of volatile or non-volatile memory or storage, such as random-access memory (RAM), flash memory, read-only memory (ROM), storage volumes, a hard disk, or a combination thereof.
[0046] The machine-readable medium 224 may store instructions 226 executable by the processing resource 222 to analyze digital images. The digital images may be digital images of a physical environment. The digital images may include a plurality of objects. Some of the objects may be the same type of object. The plurality of objects of the same type may include a unique object that is distinguishable from the other objects of the same type by some distinguishing characteristics.
[0047] The machine-readable medium 224 may store instructions 228 executable by the processing resource 222 to identify a segment of each one of the digital images that includes the unique object. The segment identification may be performed at a first computing device. The segments may be identified as a boundary superimposed on the digital image. Inside the boundary the unique object may be present and outside the boundary the unique object may not be present.
[0048] The segments of each one of the digital images that includes the unique object may be identified by utilizing a first model. The first model may be a generic model. For example, the first model may be utilized to identify generic structures of the unique object. Generic structures may include structures that are generic to the type of object. In an example, a generic structure may include a human head. While the detailed characteristics and metrics of the features on the head may be unique, the generic human head gross structure is a common structure generic to objects of the human type. [0049] The machine-readable medium 224 may store instructions 230 executable by the processing resource 222 to communicate the identified segments of each one of the digital images. The identified segments may be communicated from the first computing device to a second computing device. The second computing device may include distributed computing resources providing a cloud service including a deep learning neural network that may include heavy-weight generic models and may be utilized to train a specific model. The segments may be communicated as a digital image with the segment containing the unique object defined, or the segments may be communicated as the information from the segment containing the unique object.
[0050] The machine-readable medium 224 may store instructions 232 executable by the processing resource 222 to receive a second model. The second model may be communicated from the second computing device to the first computing device. The second model may be a specific model to recognize unique structures of the unique objects. For example, the second model may be trained to identify a face, clothing, posture, anatomical dimensions, that are unique to a particular person in the digital image. The second model may be trained to recognize the unique structures from the segments of the digital images identified as including the unique object. The second model may be a distilled version of a heavier-weight specific model that is optimized for the computational resources of the first computing device.
[0051] The first computing device may track the unique object. For example, the first computing device may track the object with a camera, causing the camera to pan, tilt, and/or otherwise move to keep the object in frame. The first computing device may include a robot. The robot may watch, with a camera, and or physically follow the unique object.
[0052] The first computing device may track the unique object based on the second specific model received from the second computing device. That is, the first computing device may analyze subsequent digital images of the physical environment utilizing the second model that is specific for unique structures of the unique object. Subsequent digital images may be digital images that are captured and/or analyzed after the initial digital images utilized to train the second model and/or that are captured after the second model is trained and/or being utilized by the first computing device. Tracking the unique object may include repeatedly identifying and/or marking the location of the unique object in the subsequent digital image.
[0053] As described above, instances where the unique object is suspiciously not identified in the subsequent digital image, the digital image may be communicated to a second computing device where it may be subjected to further analysis with more detailed generic or specific models. The more detailed analysis may determine if the unique object was present and to train the specific model that will be implemented in the first computing device with the images if they are determined to include the unique object after all. That is, the first computing device may identify a subsequent digital image including the unique object where the unique object was not identified while utilizing the second model to track the unique object. The subsequent digital image may be communicated from the first computing device to the second computing device. Then the first computing device may receive, from the second computing device, a third model. The third model may include a replacement or refinement for the second model. The third model may also be a model to recognize the unique structures of the unique object. However, in the third model the unique structures may be refined from the second model based on analysis and/or use of the subsequent digital image as example training images.
[0054] Figure 3 illustrates a diagram of a method 340 for training recognition models according to the present disclosure. The method 340 is not limited to any particular set and/or order of operations. The method 340 may include additional operations such as those described with regard to the system 100 described in Figure 1 , and the machine-readable medium 224 described in Figure 2. [0055] At 342, the method 340 may include utilizing a first model to identify and place a bounding mechanism around a particular person appearing in a digital image captured in a physical location of a first computing device.
The digital image may be captured by a digital image capturing system in communication with and/or incorporated into the first computing device in some examples, the digital image may include a digital image still that is extracted from a video of the physical location.
[0056] The first computing device may utilize the first model that is generic to identifying human anatomy in order to identify a bounding mechanism around a person or a plurality of people in the digital image. The particular person, being of the object type“person” may have their location defined by a bounding mechanism.
[0057] At 344, the method 340 may include utilizing, at a second computing device, the digital image of the particular person appearing within the bounding box to train a second model. Where the digital image includes a plurality of people, a particular person may be selected from the plurality to train the second model for. The particular person may be selected from a plurality of people in the physical environment based on the particular person being a largest person of the plurality of people in the physical environment by bounding box area
[0058] The second model may be a model that is specific to recognizing digital images of the particular person. In contrast to specific models that are trained by analyzing a large amount of manually tagged training images, examples of the present disclosure may Include models that are automatically fed bounded definitions of the training target from application of the generic model to the digital images. The specific model Is then able to leverage these images to train the specific model while cutting out the time and expense of manual human intervention.
[0059] At 346, the method 340 may include utilizing, at the first device, the second model to identify the particular person In subsequent digital images captured by the first computing device. The first computing device may compare subsequent digital images to the second model to identify the specific characteristics defined in the model that indicate that the particular person is present in the digital image.
[0060] At 348, the method 340 may include communicating, from the first device to the second device, additional images that include the particular person captured in the physical location. That is, as the first computing device identifies the particular person in the additional images and/or assigns bounding mechanisms to the portion of the image where they are located the first computing device may utilize this data in a feedback loop to further train the second model to achieve greater specificity and/or accuracy of identification.
[0061] At 350, the method 340 may include refining the second model. Refining the second model may include utilizing the additional images that include the particular person that were identified utilizing the second model as example training images to train further specificity into the second model. By training the second model with additional images, the second model may improve its understanding of the unique characteristics of the particular person.
[0062] In some examples, the first computing device may track the particular person. For example, responsive to receiving a voice command at the first computing device from the particular person, the first computing device may track the particular person. Tracking the person may include performing operations that keep the particular person in frame of a digital image capturing device and/or allow a mobile computing device to stay within a proximity to the particular person.
[0063] Further, the method 340 may include comparing an accuracy metric of the second model to a threshold accuracy model. The accuracy metric may be determined based on how frequently the second model was not able to identify the particular person in the digital image when they were, in fact, in the digital picture (false negative). Alternatively, the accuracy metric may be determined based on how frequently the second model identified the particular person in the digital image even though they were not present in the digital picture (false positive). The second model may continue to be refined by training with the additional images until the threshold accuracy metric is reached.
[0064] In the foregoing detailed description of the present disclosure, reference is made to the accompanying drawings that form a part hereof, and in which is shown by way of illustration how examples of the disclosure may be practiced. These examples are described in sufficient detail to enable those of ordinary skill in the art to practice the examples of this disclosure, and it is to be understood that other examples may be utilized and that process, electrical, and/or structural changes may be made without departing from the scope of the present disclosure.
[0065] The figures herein follow a numbering convention in which the first digit corresponds to the drawing figure number and the remaining digits identify an element or component in the drawing. Elements shown in the various figures herein can be added, exchanged, and/or eliminated so as to provide a number of additional examples of the present disclosure in addition, the proportion and the relative scale of the elements provided in the figures are intended to illustrate the examples of the present disclosure, and should not be taken in a limiting sense. Further, as used herein, "a” element and/or feature can refer to one or more of such elements and/or features.

Claims

What is claimed:
1. A system comprising:
a processing resource;
a computing device comprising instructions executable by the processing resource to:
gather digital images of a unique object in a physical environment; utilize a generic model for identifying a type of the object to localize a portion of the unique object in the digital images; and
train, from the portion of the unique object localized in the digital images, a new specific model for recognizing the unique object.
2. The system of claim 1 , wherein the object is a specific person of a plurality of people.
3. The system of claim 2, wherein the generic model is a model trained to identify a generic portion human body.
4. The system of claim 1 , wherein the instructions to gather the digital images include instructions executable by the processing resource to extract and minify the digital images from a digital video.
5. The system of claim 1 , including instructions executable by the processing resource to produce a plurality of augmented digital images from the digital images and utilize the augmented digital images to train the new specific model.
6. The system of claim 1 , wherein the new specific model is developed by training a neural network with the unique localized in the digital images.
7. A non-transitory machine-readable medium containing instructions executable by a processor to cause the processor to:
analyze digital images including a unique object in a physical environment;
identify, at a first computing device, a segment of each one of the digital images that includes the unique object by utilizing a first model of generic structures of the unique object;
communicate the segments of each one of the digital images from the first computing device to a second computing device; and
receiving, from the second computing device, a second model to recognize unique structures of the unique object, wherein the second model is trained from the segments of the digital images including the unique object
8. The non-transitory machine-readable medium of claim 7, comprising instructions executable by the processor to cause the processor to track, at the first computing device, the unique object based on the received second model of the unique structures of the unique object.
9. The non-transitory machine-readable medium of claim 8, comprising instructions executable by the processor to cause the processor to identify, at the first computing device, a subsequent digital image including the unique object where the unique object was not identified while utilizing the second model to track the unique object.
10. The non-transitory machine-readable medium of claim 9, comprising instructions executable by the processor to cause the processor to:
communicate the subsequent digital image from the first computing device to the second computing device; and receive, from the second computing device, a third model to recognize unique structures of the unique object that has been refined from the second model based on the subsequent digital image.
11. A method, comprising:
at a first computing device, utilizing a first model, generic to identifying human anatomy, to identify and place a bounding mechanism around a particular person appearing in a digital image captured in a physical location of the first computing device;
at a second computing device, utilizing the digital image of the particular person appearing within the bounding mechanism to train a second model specific to recognize the particular person;
at the first device, utilizing the second model to identify the particular person; and
communicating, from the first device to the second device, additional images including the particular person captured in the physical location; and refining the second model based on the additional images including the particular person.
12. The method of claim 11 , comprising tracking the particular person with the first computing device responsive to a voice command from the particular person received at the first computing device.
13. The method of claim 11 , comprising:
comparing an accuracy metric of the second model to a threshold accuracy metric; and
continuing the refining of the second model until the threshold accuracy metric is reached.
14. The method of claim 11 , comprising extracting the digital image as a still image from a video of the physical location captured at a first computing device.
15. The method of claim 11 , comprising:
selecting the particular person from a plurality of people in the physical environment based on the particular person being a largest person of the plurality of people in the physical environment by bounding mechanism area.
PCT/US2018/030262 2018-04-30 2018-04-30 Trained recognition models WO2019212501A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/US2018/030262 WO2019212501A1 (en) 2018-04-30 2018-04-30 Trained recognition models

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/US2018/030262 WO2019212501A1 (en) 2018-04-30 2018-04-30 Trained recognition models

Publications (1)

Publication Number Publication Date
WO2019212501A1 true WO2019212501A1 (en) 2019-11-07

Family

ID=68385999

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/US2018/030262 WO2019212501A1 (en) 2018-04-30 2018-04-30 Trained recognition models

Country Status (1)

Country Link
WO (1) WO2019212501A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111814618A (en) * 2020-06-28 2020-10-23 浙江大华技术股份有限公司 Pedestrian re-identification method, gait identification network training method and related device
US20220269252A1 (en) * 2019-08-13 2022-08-25 Omron Corporation Method, apparatuses, computer program and medium including computer instructions for performing inspection of an item
WO2022216867A1 (en) * 2021-04-06 2022-10-13 Wyze Labs, Inc. Dynamic edge-cloud collaboration with knowledge adaptation

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20060210264A1 (en) * 2005-03-17 2006-09-21 Canon Kabushiki Kaisha Imaging apparatus and method for controlling display device
US7809192B2 (en) * 2005-05-09 2010-10-05 Like.Com System and method for recognizing objects from images and identifying relevancy amongst images and information
US7844076B2 (en) * 2003-06-26 2010-11-30 Fotonation Vision Limited Digital image processing using face detection and skin tone information

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7844076B2 (en) * 2003-06-26 2010-11-30 Fotonation Vision Limited Digital image processing using face detection and skin tone information
US20060210264A1 (en) * 2005-03-17 2006-09-21 Canon Kabushiki Kaisha Imaging apparatus and method for controlling display device
US7809192B2 (en) * 2005-05-09 2010-10-05 Like.Com System and method for recognizing objects from images and identifying relevancy amongst images and information

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20220269252A1 (en) * 2019-08-13 2022-08-25 Omron Corporation Method, apparatuses, computer program and medium including computer instructions for performing inspection of an item
CN111814618A (en) * 2020-06-28 2020-10-23 浙江大华技术股份有限公司 Pedestrian re-identification method, gait identification network training method and related device
CN111814618B (en) * 2020-06-28 2023-09-01 浙江大华技术股份有限公司 Pedestrian re-recognition method, gait recognition network training method and related devices
WO2022216867A1 (en) * 2021-04-06 2022-10-13 Wyze Labs, Inc. Dynamic edge-cloud collaboration with knowledge adaptation
US20240203127A1 (en) * 2021-04-06 2024-06-20 Wyze Labs, Inc. Dynamic edge-cloud collaboration with knowledge adaptation

Similar Documents

Publication Publication Date Title
US11455495B2 (en) System and method for visual recognition using synthetic training data
TWI684164B (en) A non-transitory computer-readable medium and system for detecting objects from aerial imagery and a method for detecting objects from aerial imagery
Bhagat et al. Indian sign language gesture recognition using image processing and deep learning
WO2021047232A1 (en) Interaction behavior recognition method, apparatus, computer device, and storage medium
TWI753588B (en) Face attribute recognition method, electronic device and computer-readable storage medium
US10089556B1 (en) Self-attention deep neural network for action recognition in surveillance videos
WO2019174439A1 (en) Image recognition method and apparatus, and terminal and storage medium
Obinata et al. Temporal extension module for skeleton-based action recognition
CN105518744B (en) Pedestrian recognition methods and equipment again
Ansar et al. Robust hand gesture tracking and recognition for healthcare via Recurrent neural network
GB2566762A (en) Personal identification across multiple captured images
CN110443210A (en) A kind of pedestrian tracting method, device and terminal
CN114937232B (en) Method, system and equipment for detecting the wearing of protective equipment for medical waste disposal personnel
CN111062328B (en) Image processing method and device and intelligent robot
EP2786313A1 (en) System and method for tracking and recognizing people
WO2019212501A1 (en) Trained recognition models
Niu et al. Automatic localization of optic disc based on deep learning in fundus images
CN104794446A (en) Human body action recognition method and system based on synthetic descriptors
US20160110909A1 (en) Method and apparatus for creating texture map and method of creating database
CN114038045A (en) Cross-modal face recognition model construction method and device and electronic equipment
CN114550201A (en) Clothing standardization detection method and device
KR102247481B1 (en) Device and method for generating job image having face to which age transformation is applied
CN113096155B (en) Community multi-feature fusion target tracking method and device
CN112232272B (en) Pedestrian recognition method by fusing laser and visual image sensor
CN114529991B (en) Pedestrian attribute identification method and device, storage medium and electronic device

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18917270

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18917270

Country of ref document: EP

Kind code of ref document: A1