EP3762855A1 - Procédé et dispositif de reconnaissance d'objet - Google Patents

Procédé et dispositif de reconnaissance d'objet

Info

Publication number
EP3762855A1
EP3762855A1 EP18723591.6A EP18723591A EP3762855A1 EP 3762855 A1 EP3762855 A1 EP 3762855A1 EP 18723591 A EP18723591 A EP 18723591A EP 3762855 A1 EP3762855 A1 EP 3762855A1
Authority
EP
European Patent Office
Prior art keywords
output
machine learning
learning model
image
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
EP18723591.6A
Other languages
German (de)
English (en)
Inventor
Yukiko Yanagawa
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Omron Corp
Original Assignee
Omron Corp
Omron Tateisi Electronics Co
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Omron Corp, Omron Tateisi Electronics Co filed Critical Omron Corp
Publication of EP3762855A1 publication Critical patent/EP3762855A1/fr
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/103Static body considered as a whole, e.g. static pedestrian or occupant recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • G06F18/2178Validation; Performance evaluation; Active pattern learning techniques based on feedback of a supervisor
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24317Piecewise classification, i.e. whereby each classification requires several discriminant rules
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/10Terrestrial scenes

Definitions

  • the present disclosure relates to a method and device for recognizing an object.
  • Object recognition using a Convolutional Neural Network is described in JP2017-538999A (Patent Document 1 ).
  • the CNN outputs a recognition result upon receiving an input such as an image, as a result of processing in the CNN with the optimized parameters set through training.
  • Such a conventional configuration disables a person to know a judgment process experienced by the CNN. Therefore, whether the object is successfully recognized or not as well as a cause thereof are also unknown, this makes it difficult to improve the accuracy of the object recognition.
  • Patent document 1 JP2017-538999A
  • a method for recognizing an object includes: acquiring an image of the object; inputting the image into a first machine learning model, and acquiring, by means of the first machine learning model, one or more first probabilities respectively corresponding to one or more features of the object in the image, as a first output of the first machine learning model; inputting at least the first output into a second machine learning model, and acquiring, by means of the second machine learning model, a recognition result of the object in the image, as a second output of the second machine learning model; and providing the first output and the second output corresponding to the first output.
  • the features of the object recognized from the image are output by the first machine learning model and input to the second learning model so as to output the recognition result in relation to the recognized features in the image.
  • the second output may include a second probability corresponding to a class of the object in the image.
  • inputting at least the first output into the second machine learning model may include: inputting the first output and the image into the second machine learning model. Further, inputting the first output and the image into the second machine learning model may include: associating the one or more first probabilities with the image; and inputting the one or more first probabilities and the image associated with each other into the second machine learning model.
  • the image is input into the second machine learning model, so that recognition accuracy of the second machine learning model may be improved.
  • the method may further include: acquiring the first machine learning model and the second machine learning model which has been trained by means of a training method including: pre-acquiring a training image of the object, a class of the object as label data, and one or more predefined features of the object, the predefined features being defined in association with the class of the object; using the training image and the one or more predefined features to train the first machine learning model; and using an output of the first machine learning model and the label data to train the second machine learning model.
  • pre-acquiring the one or more predefined features may include: acquiring the one or more predefined features from a table in which the one or more predefined features are associated with the class of the object.
  • the first machine learning model acquires ability to recognize the features of the object
  • the second machine learning model acquires ability to recognize the object
  • the foregoing method may further include: determining whether the correlation between the first output and the second output corresponding to the first output satisfies a predetermined standard; and, if it is determined that the correlation does not satisfy the predetermined standard, notifying a user.
  • the association between the features recognized from the image and the recognition result of the object may be established and prompted, so that the user may estimate whether recognition succeeds or not on the basis of the features recognized from the image, and particularly if the object is failed to be recognized, the user may realize the specific feature of which imperfect recognition is a main cause for a recognition failure.
  • the foregoing method may further include: determining whether the first output is not consistent with the second output; and, if it is determined that the first output is not consistent with the second output, notifying a user.
  • the used is notified that there may be an error in the recognition result.
  • the user can easily determine what features affect object recognition, so that the factors that make the object recognition accuracy low can be effectively estimated.
  • the one or more features of the object may include the color feature, texture feature, shape feature or space relationship feature of the class of the object.
  • a device for recognizing an object includes: an acquisition unit configured to acquire an image of the object; a processing unit configured to receive the image of the object, input the image into a first machine learning model and acquire, by means of the first machine learning model, one or more first probabilities respectively corresponding to one or more features of the object in the image, as a first output of the first machine learning model, and input at least the first output into a second machine learning model and acquire, by means of the second machine learning model, a recognition result of the object in the image, as a second output of the second machine learning model; and a providing unit configured to provide the first output and the second output corresponding to the first output.
  • a user can estimate the features of the object that affects success or failure in the object recognition and training data to be used in additional training required for improving the accuracy of the object recognition.
  • the second output may include a second probability of a class of the object in the image.
  • the processing unit may be configured to input the first output and the image into the second machine learning model. Further, the processing unit may be configured to: associate the one or more first probabilities with the image; and input the one or more first probabilities and the image associated with each other in the second machine learning model.
  • the foregoing device may further include: a determination unit configured to determine whether the correlation between the first output and the second output corresponding to the first output satisfies a predetermined standard; and a notification unit configured to, if it is determined that the correlation does not satisfy the predetermined standard, notify a user.
  • the foregoing device may further include: a determination unit configured to determine whether the first output is not consistent with the second output; and a notification unit configured to, if it is determined that the first output is not consistent with the second output, notify a user.
  • a device for training the above device includes: an acquisition unit configured to pre-acquire a training image of the object, a class of the object as label data, and one or more predefined features of the object, wherein the predefined features is defined in association with the class of the object; and a training unit configured to train the first machine learning model using the training image and the one or more predefined features, and train the second machine learning model using an output of the first machine learning model and the label data.
  • the device may further include: a storage unit configured to store a table in which the one or more predefined features are associated with the class of the object, wherein the acquiring unit is configures to acquire the one or more predefined features from the table.
  • the first machine learning model acquires ability to recognize the features of the object
  • the second machine learning model acquires ability to recognize the object.
  • the first machine learning model and the second machine learning model can be constructed to be utilized in recognizing the object according to the present disclosure.
  • a program for recognizing an object includes instructions which, when the program is executed by a computer, cause the computer to perform the forgoing method.
  • a storage medium stores a program for recognizing an object, the program including instructions which, when the program is executed by a computer, cause the computer to perform the foregoing method.
  • Fig. 1 is a hardware structure of a recognition system according to an implementation mode of the present disclosure
  • Fig. 2 is a functional block diagram of a recognition system according to an implementation mode of the present disclosure
  • Fig. 3 is an exemplary block diagram of output of a first machine learning model and second machine learning model of a recognition system according to an implementation mode of the present disclosure
  • Fig. 4 is an exemplary block diagram of a content stored in a storage unit of a recognition system according to an implementation mode of the present disclosure
  • Fig. 5 is a flowchart of a recognition phase in a recognition method according to an implementation mode of the present disclosure
  • Fig. 6 is a flowchart of a learning phase in a recognition method according to an implementation mode of the present disclosure
  • Fig. 7 is a functional block diagram of a recognition system according to another implementation mode of the present disclosure.
  • Fig. 8 is a flowchart of a recognition phase in a recognition method according to another implementation mode of the present disclosure.
  • Fig. 9 is a functional block diagram of a device for training according to another implementation mode of the present disclosure.
  • Fig. 10 is a flowchart of a learning phase in a recognition method according to another implementation mode of the present disclosure.
  • Fig. 1 is a mode diagram of a hardware structure of a recognition system 100 according to an implementation mode of the present disclosure.
  • the recognition system 100 may be implemented by a general-purpose computer of a general-purpose computer architecture.
  • the recognition system 100 may include a processor 110, a main memory 112, a memory 114, an input interface 116, a display interface 118 and a communication interface 120. These parts may, for example, communicate with one another through an internal bus 122.
  • the processor 110 extends a program stored in the memory 114 on the main memory 112 for execution, thereby realizing functions and processing described hereinafter.
  • the main memory 112 may be structured to be a nonvolatile memory, and plays a role as a working memory required by program execution of the processor 110.
  • the input interface 116 may be connected with an input unit such as a mouse and a keyboard, and receives an instruction input by operating the input portion by an operator.
  • an input unit such as a mouse and a keyboard
  • the display interface 118 may be connected with a display, and may output various processing results generated by program execution of the processor 110 to the display.
  • the communication interface 120 is configured to communicate with a Programmable Logic Controller (PLC), a database device and the like through a network 200.
  • PLC Programmable Logic Controller
  • the memory 114 may store a program capable of determining a computer as the recognition system 100 to realize functions, for example, an object recognition program and an Operating System (OS).
  • OS Operating System
  • the object recognition program stored in the memory 114 may be installed in the recognition system 100 through an optical recording medium such as a Digital Versatile Disc (DVD) or a semiconductor recording medium such as a Universal Serial Bus (USB) memory. Or, the object recognition program may also be downloaded from a server device and the like on the network.
  • an optical recording medium such as a Digital Versatile Disc (DVD) or a semiconductor recording medium such as a Universal Serial Bus (USB) memory.
  • DVD Digital Versatile Disc
  • USB Universal Serial Bus
  • the object recognition program according to the implementation mode may also be provided in a manner of combination with another program. Under such a condition, the object recognition program does not include a module included in the other program of such a combination, but cooperates with the other program for processing. Therefore, the object recognition program according to the implementation mode may also be in a form of combination with the other program.
  • Fig. 1 shows an example of implementing the recognition system 100 by virtue of a general-purpose computer.
  • the present disclosure is not limited, and all or part of functions thereof may be realized through a dedicated circuit, for example, an Application Specific Integrated Circuit (ASIC) or a Field-Programmable Gate Array (FPGA).
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • part of processing of the recognition system 100 may also be implemented on an external device connected with the network.
  • Fig. 2 is a functional block diagram of a recognition system 200 for recognizing an object according to an implementation mode of the present disclosure. Each part in the recognition system 200 will be described below in detail, by example of the recognition system 200 that is a system for recognizing an object during driving assistance or self-driving.
  • the recognition system 200 may include a storage unit 218, an acquisition unit 202, a processing unit including a first machine learning model 204 and a second machine learning model 206, a display unit 208, a determination unit 210, a prompting unit 212, a first training unit 214 and a second training unit 216. These units may be implemented by the above recognition system 100, and the division or combination of these units are not limited as described here.
  • the acquisition unit 202 is, for example, a camera mounted on a vehicle.
  • the acquisition unit 202 may also be an image acquisition unit configured on a mobile phone, a smart phone or other similar mobile equipment.
  • the recognition system 200 is used to recognize an object under a running condition of a vehicle
  • the mobile equipment is equipment capable of being attached to the vehicle.
  • the acquisition unit acquires an image of a recognition target object such as a movable object (for example, a pedestrian, an animal and a vehicle) or still object (for example, a still obstacle, a road sign and a traffic light) appearing in the vicinity (for example, in front) of the vehicle.
  • a recognition target object such as a movable object (for example, a pedestrian, an animal and a vehicle) or still object (for example, a still obstacle, a road sign and a traffic light) appearing in the vicinity (for example, in front) of the vehicle.
  • the acquisition unit 202 is not required to acquire an image in real time.
  • the acquisition unit 202 may be a device pre-storing the image, or the acquisition unit 202 may acquires image data from another source (for example, a server and a memory).
  • the acquisition unit 202 may also be arranged outside the recognition system 200.
  • the acquisition unit 202 communicates with the recognition system 200 through a network.
  • the acquisition unit 202 may further preprocess acquired image data. For example, contrast regulation and brightness balancing may be performed on the image data to broaden a dynamic range presented in the captured image.
  • the image data is further scaled to a bit depth suitable for feeding into an image recognition algorithm.
  • the acquisition unit 202 inputs the acquired image into the first machine learning model 204 that has been trained, and the first machine learning model 204 outputs one or more first probabilities respectively corresponding to one or more predefined features of the recognition target object in the image as a first output.
  • the first probability may refer to probability that the corresponding predefined feature is included in the image.
  • the one or more features of the object characterize a color feature, texture feature, shape feature or space relationship feature corresponding to a class of the object.
  • the one or more predefined features are defined in association with the class of the recognition target objects, as shown in Fig.4.
  • the recognition target object includes, but not limited to: a pedestrian, a vehicle and the like. If the object is a pedestrian, the predefined features includes , for example, there are two legs and the shape is a cylinder. If the object is a vehicle, the predefined features includes , for example, there are tires and the shape is a quadrangle.
  • FIG. 3 An exemplary block diagram of the first machine learning model 204 of a recognition system 200 according to an implementation mode of the present disclosure is shown in Fig. 3.
  • the first machine learning model 204 outputs the first probabilities respectively corresponding to the predefined features in the image. For example, the probability corresponding to the feature “there are tires” in the image is 60%; the probability corresponding to the feature“the shape is a quadrangle” in the image is 32%; the probability corresponding to the feature“there are two legs” in the image is 8%; and the probability corresponding to the feature“the shape is a cylinder” in the image is 10%.
  • the second machine learning model 206 that has been trained takes one or more first probabilities output by the first machine learning model 204 as input, and outputs a recognition result of the object in the image as second output.
  • the recognition result may indicate whether the recognition target object is included in the image.
  • the recognition target object includes: a pedestrian and a vehicle.
  • FIG. 3 An exemplary block diagram of the second machine learning model 206 of a recognition system according to an implementation mode of the present disclosure is shown in Fig. 3.
  • the second machine learning model 206 outputs the recognition result of the object in the image. For example, a probability that the object in the image is a“pedestrian” is 3%; a probability that the object in the image is a“building” is 5%; a probability that the object in the image is a“vehicle” is 90%; and a probability that the object in the image is a“tree” is 2%.
  • the second machine learning model 206 may further take both one or more first probabilities output by the first machine learning model 204 and the image, acquired by the acquisition unit 202, as the input, and outputs the recognition result of the object in the image as the second output.
  • the acquired image of the object is also taken as the input, so that the recognition accuracy of the second machine learning model for the object may be improved.
  • the first probabilities may be associated with the image, acquired by the acquisition unit 202.
  • the vector or matrix representing the first probabilities and the 3-dimentional matrix representing the image may be combined as one matrix, so as to be taken as an object to be recognized for input into the second machine learning model 206.
  • the second machine learning model 206 only takes the first probabilities output by the first machine learning model 204 as the input, calculation resources are saved. The user may select the specific data taken as the input by the second machine learning model 206 according to a requirement on the recognition accuracy.
  • the display unit 208 may be a liquid crystal display.
  • the display unit 208 displays the first output of the first machine learning model 204 and the second output, corresponding to the first output, of the second machine learning model 206 to the user.
  • the display unit may display the output of the first machine learning model 204 and the second machine learning model 206 in a manner shown in Fig. 3.
  • the display unit 208 may perform displaying according to a manner associate the class of the object with the corresponding features of the object. For example, the class“pedestrian” and the corresponding features“there are two legs” and“the shape is a cylinder” may be displayed in the same color or the same text format. Under such a condition, since the class of the object and the corresponding features may be displayed in a unified manner, recognition of the user is more facilitated.
  • the display unit 208 may be regarded as a specific example of the “providing unit” in the present disclosure. Flowever, the present disclosure is not limited thereto.
  • the providing unit may also be implemented in other manners.
  • the first output and the second output may be provided to the user through a paper document, a cloud download, or an email.
  • the determination unit 210 judges whether a corresponding relationship between the first output of the first machine learning model 204 and the second output, corresponding to the first output, of the second machine learning model 206 satisfies a predetermined standard or not according to displaying of the display unit 208. For example, if the object is a pedestrian, the probability of the corresponding feature“there are two legs” is only 8%.
  • the predetermined standard is that: under the condition that the object is a pedestrian, the probability of the corresponding feature“there are two legs” is 70%. Therefore, the determination unit 210 determines that the corresponding relationship between the first output and the second output corresponding to the first output does not satisfy the predetermined standard.
  • the determination unit 210 outputs a judgment result to the prompting unit 212.
  • the prompting unit 212 prompts the user.
  • the prompting unit 212 may be a buzzer, that is, the user is prompted with a sound.
  • the prompting unit 212 may also be a visual prompting unit, for example, a Light-Emitting Diode (LED) lamp.
  • the prompting unit 212 may further be a vibrator, or a combination of the abovementioned types.
  • the recognition system 200 may further include a memory unit 218, in another implementation mode, the memory unit 218 may also be arranged outside the recognition system 200.
  • the first output and the second output may be provided to and stored in the storage unit 218.
  • the storage unit 218 may also be regarded as a specific example of the "providing unit" in the present disclosure. The user may read the stored outputs of the first machine learning model 204 and the second machine learning model 206 from the storage unit 218, and analyze or operate on the outputs.
  • the memory unit 218 maps both the features“there are tires” and“the shape is a quadrangle” into the class“vehicle” of the object in a many-to-one manner, and maps both the features“there are two legs” and“the shape is a cylinder” into the class“person” of the object in the many-to-one manner.
  • one feature may also be mapped into multiple categories of the object in a one-to-many manner.
  • the features of the object recognized from the image are output by the first machine learning model and input to the second learning model so as to output the recognition result in relation to the recognized features in the image.
  • the failure in recognizing the specific feature may be estimated as the factor of failure in object recognition. Based on such estimation, user can perform additional learning or augmented training for improving the accuracy in recognizing the specific feature, in order to improve the accuracy in recognizing the object.
  • the recognition system 200 may further include a first training unit 214 and a second training unit 216.
  • the first training unit 214 and the second training unit 216 may also be arranged outside the recognition system 200. Under such a condition, for example, the first training unit 214 and the second training unit 216 communicate with the recognition system 200 through the network, and the first training unit 214 and the second training unit 216 may refer to the“a device for training” in the present disclosure.
  • the first training unit 214 trains the first machine learning model 204, wherein the first training unit 214 trains the first machine learning model 204 in the following manner: pre-acquiring a training image, a class of an object in the training image and one or more features of the object; respectively storing the class of the object and the one or more features of the object corresponding to the training image as the class of the object and the one or more features of the object which are predefined; and taking the class of the object as label data, and using the training image and the one or more predefined features of the object corresponding to the label data to train the first machine learning model.
  • the first training unit 214 captures an image through, for example, a camera, thereby collecting a large amount of image data including the recognition target object.
  • the user defines a name of an object included in the captured image as the label data and features of the object.
  • the recognition target object required includes: a pedestrian, a vehicle and the like. If the object is a pedestrian, the user defines one or more features of the object, for example, there are two legs and the shape if a cylinder. If the object is a vehicle, the user defines one or more features of the object, for example, there are tires and the shape is a quadrangle..
  • the features“there are two legs” and“the shape is a cylinder” are mapped into the class“person” of the object in the many-to-one manner.
  • one feature may also be mapped into multiple categories of the object in the one-to-many manner. Under such a condition, the one or more features of the object are stored in the table in the manner of association with the class of the object. Then, the captured image and the one or more features, of the object corresponding to the label data are stored as training data.
  • the storage unit 218 may store the one or more predefined features of the object in a table in a manner of association with the class of the object.
  • Fig. 4 is an exemplary block diagram of a content stored in a storage unit of a recognition system according to an implementation mode of the present disclosure. In this case, the one or more predefined features of the object may be acquired from the table in the storage unit 218.
  • the storage unit 218 may also be arranged outside the recognition system 200, and communicates with the recognition system 200 through the Internet.
  • the first training unit 214 uses the training data to train the first machine learning model 204.
  • the algorithm parameters of the neural network may be adjusted so as to obtain a learned neural network.
  • the first machine learning model 204 may be implemented not only by the above neural network, but also by other common learning machines.
  • the second training unit 216 is configured to train the second machine learning model 206, wherein the second training unit 216 uses the first probabilities output by the first machine learning model 204 and the above label data as training data to train the second machine learning model 206.
  • the second machine learning model 206 in the present disclosure may be implemented not only by the above neural network, but also by other common learning machines.
  • the user may determine whether the object is successfully recognized or not as well as a cause thereof.
  • the user may know about additional learning data to be prepared for construction of a CNN capability of accurately recognizing the object. Therefore, compared with the conventional art, machine-learning-based recognition may be efficiently trained.
  • Fig. 5 is a flowchart of a recognition phase in a recognition method according to an implementation mode of the present disclosure.
  • the recognition phase starts with Step 502, and in the step, an acquisition unit 202 acquires an image of a recognition target object.
  • the acquisition unit 202 may acquire an image of a movable object (for example, a pedestrian, an animal and a vehicle) or still object (for example, a still obstacle, a road sign and a traffic light) appearing in the vicinity (for example, in front) of the vehicle.
  • a movable object for example, a pedestrian, an animal and a vehicle
  • still object for example, a still obstacle, a road sign and a traffic light
  • the acquisition unit 202 is not required to acquire the image in real time.
  • the acquisition unit 202 may acquire image data from another source (for example, a server and a memory).
  • the first machine learning model 204 outputs one or more first probabilities respectively corresponding to one or more predefined features of the recognition target object in the image. For example, the probability corresponding to the feature“there are tires” in the image is 60%; the probability corresponding to the feature“the shape is a quadrangle” in the image is 32%; the probability corresponding to the feature“there are two legs” in the image is 8%; and the probability corresponding to the feature“the shape is a cylinder” in the image is 10%.
  • Step 506 the second machine learning model 206 outputs the recognition result of the object in the image. For example, a probability that the object in the image is a“pedestrian” is 3%; a probability that the object in the image is a“building” is 5%; a probability that the object in the image is a “vehicle” is 90%; and a probability that the object in the image is a“tree” is 2%.
  • the second machine learning model 206 may further take both one or more first probabilities output by the first machine learning model 204 in Step 504 and the image, acquired by the acquisition unit 202 in Step 502, of the object as the input, and outputs the recognition result of the object in the image as the second output. Under such a condition, the acquired image of the object is also taken as the input, so that the recognition accuracy of the second machine learning model 206 for the object may be improved.
  • a display unit 208 displays the first output of the first machine learning model 204 in Step 504 and the second output of the second machine learning model 206 in Step 506.
  • the display unit 208 displays the first output of the first machine learning model 204 and the second output, corresponding to the first output, of the second machine learning model 206 to the user.
  • a determination unit 210 judges whether a corresponding relationship between the first output in Step 504 and the second output in Step 506 satisfies a predetermined standard or not according to displaying of the display unit 208 in Step 508. For example, if the object is a pedestrian, the probability of the corresponding predefined feature“there are two legs” is only 8%.
  • the predetermined standard is that: under the condition that the object is a pedestrian, the probability of the corresponding predefined feature“there are two legs” is 70%. Therefore, the determination unit 210 determines that the corresponding relationship between the first output and the second output corresponding to the first output does not satisfy the predetermined standard.
  • Step 512 processing goes to Step 512.
  • a prompting unit 212 prompts the user. Then, processing is ended.
  • Step 510 under the condition that the determination unit 210 determines that the corresponding relationship between the first output and the second output corresponding to the first output satisfies the predetermined standard, and processing is ended.
  • an association between the features recognized from the image and the recognition result of the object may be established and prompted, so that the user may estimate whether recognition succeeds or not on the basis of the features recognized from the image, and particularly if the object is failed to be recognized, the user may determine the specific feature of which imperfect recognition is a main cause for a recognition failure.
  • the user may estimate the feature with a low recognition rate, so that additional learning is performed to further increase a recognition rate of the feature, and a recognition rate of the object may be improved.
  • Fig. 6 is a flowchart of a learning phase in a recognition method according to an implementation mode of the present disclosure.
  • a first training unit 214 trains the first machine learning model 204, wherein the first training unit 214 trains the first machine learning model 204 in the following manner: pre-acquiring a training image, and a class of an object in the training image and one or more features of the object; respectively storing the class of the object and the one or more features of the object, acquired in association with the training image, as the class of the object and the one or more features of the object which are predefined; and taking the class of the object in the training image as label data, and using the training image and the one or more predefined features of the object corresponding to the label data to train the first machine learning model 204.
  • a second training unit trains the second machine learning model 206, wherein the second training unit 216 uses the class, taken as the label data, of the object in the training image to train the second machine learning model 206.
  • the learning phase may further include the following step.
  • the storage unit 218 stores the one or more predefined features of the object in a table in a manner of association with the class of the object, as shown in Fig. 4. This step may be implemented in advance of Steps 602 and 604.
  • Step 604 processing is ended.
  • the initial training for constructing the first machine learning model 204 and the second machine learning model 206 has been completed, and this constructed first machine model 204 and second machine learning model 206 may be used in the recognition system 200.
  • the first training unit 214 may retrain the first machine learning model 204 with specific features of the object. For example, if the recognition accuracy of the first machine learning model 204 for a specific feature is low, an image with the specific feature may be added as training data to implement augmented training over the first machine learning model 204 for the specific feature, thereby improving the recognition accuracy for the specific feature. For example, if the recognition accuracy of the first machine learning model 204 for a feature (for example, there are two feet) of the pedestrian is low, the user may add an image including two feet of a person as a training image.
  • the second training unit 216 may retrain the second machine learning model 206 with specific features of the object. For example, if the accuracy of the second machine learning model 206 for an object of a specific type is low, an image including the object of the specific type may be added as training data to implement augmented training over the second machine learning model 206 for the specific type, thereby improving the recognition accuracy for the specific type.
  • Fig. 7 shows a recognition system 300 according to an implementation mode of the present disclosure.
  • an industrial robot that performs, for example, pick and place operations is included in a factory's production site
  • the industrial robot needs to stop its operation to prevent a safety accident if an object such as a person appears within a predetermined range of the industrial robot.
  • an object such as a mobile robot that performs, for example, transportation operations may appear in the predetermined range of the industrial robot and interact with the industrial robot, and therefore, the industrial robot does not need to stop operation at the time.
  • the recognition system 300 may be a system that recognizes these objects (recognition target objects) at the production site, that is, recognizes a person and a mobile robot from images (including video), and controls the industrial robot accordingly according to the recognized object.
  • the recognition system 300 may be part of a robotic safety system (not shown).
  • the robotic safety system may further include a surveillance camera, and the image captured by the surveillance camera may be provided to the recognition system 300 for use in recognizing objects in the image.
  • the recognition system 300 may include an acquisition unit 302, a processing unit 304, and a prompting unit 306.
  • the acquisition unit 302 may acquire an image of the predetermined range of the industrial robot from the surveillance camera and provide the image to the processing unit 304.
  • the processing unit 304 may include a first machine learning model 3041 and a second machine learning model 3042 for outputting a recognition result based on the image provided by the acquisition unit 302.
  • the image is inputted into the first machine learning model 3041 , and the first machine learning model 3041 outputs first probabilities corresponding to the predefined features of the recognition target object in the image, as a first output that is provided to the second machine learning model 3042.
  • the predefined features here are features defined in association with the recognition target objects.
  • the second machine learning model 3042 outputs the recognition result of the object in the image as a second output.
  • the prompting unit 306 displays the first output and the second output correspondingly on a management purpose display.
  • Part or all of the recognition system 300 may be implemented using a general purpose computer, but the present disclosure is not limited thereto.
  • all or a part of the functions may be implemented by a dedicated circuit such as an application specific integrated circuit (ASIC) or a field-programmable gate array (FPGA).
  • ASIC application specific integrated circuit
  • FPGA field-programmable gate array
  • the first machine learning model 3041 and the second machine learning model 3042 may be implemented by the same processor or by separate processors.
  • the acquisition unit 302 acquires an image of the predetermined range of the industrial robot from the surveillance camera.
  • the image is inputted into the processing unit 304.
  • the image is inputted into the first machine learning model 3041 to obtain the first output.
  • the first output may represent the probability/likelihood that a particular feature (each predefined feature) is included in the image.
  • An example of the first output is shown in Table 1 .
  • the second machine learning model 3042 receives the first output as an input, and outputs the object recognition result as a second output.
  • the second output may represent the probability/likelihood that a particular object (each recognition target object) is included in the image.
  • An example of the second output is shown in Table 1 .
  • the second machine learning model 3042 may also receive both the first output and the image as inputs, so as to improve recognition accuracy.
  • the processing unit 304 may send the object recognition result as the second output to a controller (not shown) of the industrial robot. Based on the object recognition result, the controller determines whether the recognized object is a person or a mobile robot, and sends a stop signal to the industrial robot when the recognized object is a person.
  • the prompting unit 306 displays the first output and the corresponding second output on a management purpose display (not shown) external to the recognition system 200.
  • the management purpose display can be placed near the industrial robot.
  • the prompting unit 306 may also send the first output and the corresponding second output to a management purpose computer, or store the first output and the corresponding second output in the memory or in the cloud. In this case, by analyzing the stored data, a factor that makes the recognition accuracy low lower can be estimated more reliably, and the first machine learning model 3041 and the second machine learning model 3042 may be subjected to additional learning so as to improve recognition accuracy.
  • step 812 if the first output is not consistent with the second output, the prompting unit 306 notifies the user that there may be an error in the recognition result.
  • non-consistency includes a case where the features corresponding to the recognized object determined by the second output do not much the features (the recognized feature) determined by the first output, and more specifically, a feature not included in the predefined features is included in the recognized features, or a feature included in the predefined features is not included in the recognized features.
  • Table 2 shows an example in which the recognized object is determined to be a "person" based on the second output (e.g., based on the result of a comparison of the second output with a predetermined threshold).
  • the predefined features include “head”, “cylindrical body”, “two legs”, and “two arms.”
  • the recognized features include “head”, “cylindrical body”, “box-shaped body” based on the first output (e.g., based on the result of a comparison of the first output with a predetermined threshold). That is, the recognized features do not include “two legs", "two arms", while the predefined features do not include “box-shaped body”.
  • a sound or voice message may be outputted to alert the user when the non-consistency is detected, or the outputs with the non-consistency is highlighted on the management purpose display to alert the user.
  • the user can easily determine what features affect object recognition, so that the factors that make the object recognition accuracy low can be effectively estimated.
  • step 808 the object recognition result as the second output is displayed on the management purpose display through the prompting unit 306.
  • step 812 the non-consistency between the first output and second output mentioned in step 812 may be determined before step 810 and the first output and the second output may be displayed on the management purpose display unless the non-consistency is detected.
  • Fig. 9 shows a functional block diagram of a training device 400 according to an implementation mode of the present disclosure
  • Fig. 10 shows a flowchart of training the recognition system 300 using the training device 400.
  • the training device 400 may include, for example, an acquisition unit 402 and a training unit 404.
  • the acquisition unit 402 is used for acquiring training data.
  • the training unit 404 uses the training data to train the first machine learning model 3041 and the second machine learning model 3042 included in the recognition system 300.
  • the training unit 404 includes, for example, a first training unit 4041 that performs training on the first machine learning model 3041 and a second training unit 4042 that performs training on the second machine learning model 3042.
  • the first machine learning model 3041 and the second machine learning model 3042 may also be trained by the same training unit.
  • the learning phase may be an initial training for constructing the first machine learning model 3041 and the second machine learning model 3042.
  • a machine learning model for example, a neural network can be used.
  • Part or all of the training device 400 may be implemented using a general purpose computer, but the present disclosure is not limited thereto.
  • all or part of the functions may be implemented by a dedicated circuit such as an application specific integrated circuit (ASIC) or a field-programmable gate array (FPGA).
  • ASIC application specific integrated circuit
  • FPGA field-programmable gate array
  • the first training unit 4041 and the second training unit 4042 may be implemented by the same processor or by separate processors.
  • the acquisition unit 402 acquires the image data and the label data corresponding to the image data.
  • an image including the recognition target object i.e., a person or a mobile robot may be provided as image data.
  • label data "person” is given to an image (or image region) including a person
  • label data "mobile robot” is given to an image (or image region) including a mobile robot.
  • the label data can be viewed as the correct value (the desired output value of the machine learning model) for the corresponding input value of the machine learning model.
  • the acquisition unit 402 acquires one or more predefined features defined in association with each recognition target object, and examples of the predefined features are shown in Table 3.
  • the object "person” has features such as “head”, “cylindrical body”, “two legs”, and “two arms.”
  • the acquisition unit 402 may read the predefined features corresponding to "person” and "mobile robot” from a database or table prepared in advance. Or alternatively, the acquisition unit 402 may also acquire the oredefined features based on the user's input.
  • the first training unit 4041 trains the first machine learning model 3041 .
  • the training data used includes the above-described image data and the predefined features associated with the object corresponding to the label data of the image data.
  • the first machine learning model 3041 performs supervised learning so as to generate a first machine learning model 3041 capable of recognizing one or more features of an object included in the input image.
  • the second training unit 4042 trains the second machine learning model 3042.
  • the training data used includes the output of the first machine learning model 3041 and the label data of the image data.
  • the second machine learning model 3042 performs supervised learning so as to generate a second machine learning model 3042 capable of recognizing the object included in the image based on the output of the first machine learning model 3041.
  • the training data used may further include the image data corresponding to the output of the first machine learning model 3041.
  • the second machine learning model 3042 may output the recognition result using features other than the features included in the output of the first machine learning model 3041 , so that recognition accuracy may be improved.
  • An image as an input may be a still image or a moving image of any type including a thermal image, an infrared image, and a range image (depth image).
  • an object to be recognized may be an object of any type including a movable object, a still object.
  • All or part of the device and system for recognizing the object may be implemented in the form of a software functional unit.
  • the software functional unit When being sold or used as an independent product, the software functional unit may be stored in a computer-readable storage medium.
  • the technical solutions of the present disclosure essentially, or the part contributing to the prior art, or all or part of the technical solutions may be implemented in the form of a software product, and the computer software product is stored in a storage medium, including several instructions for causing a piece of computer equipment (which may be a personal computer, a server or network equipment) to execute all or part of the steps of the method according to each example of the present disclosure.
  • the foregoing storage medium includes various media capable of storing program codes such as a USB disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk, and moreover, may further include a data stream downloaded from a server or a cloud.
  • program codes such as a USB disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a removable hard disk, a magnetic disk, or an optical disk, and moreover, may further include a data stream downloaded from a server or a cloud.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Multimedia (AREA)
  • Data Mining & Analysis (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Artificial Intelligence (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • General Engineering & Computer Science (AREA)
  • Image Analysis (AREA)

Abstract

La présente invention concerne un procédé et un dispositif de reconnaissance d'un objet. Le procédé comprend les étapes qui consistent : à acquérir une image de l'objet ; à entrer l'image dans un premier modèle d'apprentissage automatique, et à acquérir, au moyen de ce premier modèle, une ou plusieurs premières probabilités correspondant respectivement à une ou plusieurs caractéristiques de l'objet dans l'image, qui font office de première sortie du premier modèle d'apprentissage automatique ; à entrer au moins la première sortie dans un second modèle d'apprentissage automatique, et à acquérir, au moyen de ce second modèle, un résultat de reconnaissance de l'objet dans l'image, qui fait office de seconde sortie du second modèle d'apprentissage automatique ; et à transmettre la première sortie et la seconde sortie correspondant à la première sortie.
EP18723591.6A 2018-03-05 2018-03-05 Procédé et dispositif de reconnaissance d'objet Pending EP3762855A1 (fr)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/IB2018/051387 WO2019171116A1 (fr) 2018-03-05 2018-03-05 Procédé et dispositif de reconnaissance d'objet

Publications (1)

Publication Number Publication Date
EP3762855A1 true EP3762855A1 (fr) 2021-01-13

Family

ID=62143424

Family Applications (1)

Application Number Title Priority Date Filing Date
EP18723591.6A Pending EP3762855A1 (fr) 2018-03-05 2018-03-05 Procédé et dispositif de reconnaissance d'objet

Country Status (2)

Country Link
EP (1) EP3762855A1 (fr)
WO (1) WO2019171116A1 (fr)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP7467157B2 (ja) 2020-02-19 2024-04-15 キヤノン株式会社 学習装置、画像認識装置、学習方法、画像認識装置の制御方法およびプログラム
DE112020006315B4 (de) 2020-02-27 2023-12-14 Mitsubishi Electric Corporation Robotersteuervorrichtung, robotersteuerverfahren und vorrichtung zur erzeugung von lernmodellen
CN113627449A (zh) * 2020-05-07 2021-11-09 阿里巴巴集团控股有限公司 模型训练方法及装置、标签确定方法及装置

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP3234867A4 (fr) 2014-12-17 2018-08-15 Nokia Technologies Oy Détection d'objet avec un réseau neuronal
CN106874921B (zh) * 2015-12-11 2020-12-04 清华大学 图像分类方法和装置
US20170206426A1 (en) * 2016-01-15 2017-07-20 Ford Global Technologies, Llc Pedestrian Detection With Saliency Maps
US10255522B2 (en) * 2016-06-17 2019-04-09 Facebook, Inc. Generating object proposals using deep-learning models

Also Published As

Publication number Publication date
WO2019171116A1 (fr) 2019-09-12

Similar Documents

Publication Publication Date Title
CN111108507B (zh) 根据二维图像和点云数据生成三维边界框
CN109426801B (zh) 一种车道线实例检测方法和装置
EP3828784A1 (fr) Machine de défense de systèmes d'apprentissage automatique contre des attaques contradictoires
US10650259B2 (en) Human face recognition method and recognition system based on lip movement information and voice information
US11557208B2 (en) Facial recognition technology for improving motor carrier regulatory compliance
KR102008290B1 (ko) 영상에서 객체의 행동을 인식하는 방법 및 그 장치
JP6814673B2 (ja) 移動経路予測装置、及び移動経路予測方法
JP6030240B2 (ja) 顔認識のための方法および装置
CN110826370B (zh) 车内人员的身份识别方法、装置、车辆及存储介质
CN114258559A (zh) 用于标识具有不受控制的光照条件的图像中的肤色的技术
US11157723B1 (en) Facial recognition for drivers
EP3762855A1 (fr) Procédé et dispositif de reconnaissance d'objet
WO2018036277A1 (fr) Procédé, dispositif, serveur et support d'informations pour détection de véhicules
US10945888B2 (en) Intelligent blind guide method and apparatus
US11250279B2 (en) Generative adversarial network models for small roadway object detection
KR20180054407A (ko) 로봇 시스템
US20220067405A1 (en) System and method for road sign ground truth construction with a knowledge graph and machine learning
KR20220063127A (ko) 얼굴 생체 검출 방법, 장치, 전자 기기, 저장 매체, 및 컴퓨터 프로그램
CN113283347B (zh) 装配作业指导方法、装置、系统、服务器及可读存储介质
US11887331B2 (en) Information processing apparatus, control method, and non-transitory storage medium
CN108197628B (zh) 基于深度神经网络的图像特征的联合判断方法
US11551379B2 (en) Learning template representation libraries
JP2013029933A (ja) パターン認識装置
WO2020155998A1 (fr) Procédé et appareil d'identification d'orientation de véhicule
CN113127058A (zh) 数据标注方法、相关装置及计算机程序产品

Legal Events

Date Code Title Description
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: UNKNOWN

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: THE INTERNATIONAL PUBLICATION HAS BEEN MADE

PUAI Public reference made under article 153(3) epc to a published international application that has entered the european phase

Free format text: ORIGINAL CODE: 0009012

STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: REQUEST FOR EXAMINATION WAS MADE

17P Request for examination filed

Effective date: 20200907

AK Designated contracting states

Kind code of ref document: A1

Designated state(s): AL AT BE BG CH CY CZ DE DK EE ES FI FR GB GR HR HU IE IS IT LI LT LU LV MC MK MT NL NO PL PT RO RS SE SI SK SM TR

AX Request for extension of the european patent

Extension state: BA ME

DAV Request for validation of the european patent (deleted)
DAX Request for extension of the european patent (deleted)
STAA Information on the status of an ep patent application or granted ep patent

Free format text: STATUS: EXAMINATION IS IN PROGRESS

17Q First examination report despatched

Effective date: 20221108