CN111079468B

CN111079468B - Method and device for identifying object by robot

Info

Publication number: CN111079468B
Application number: CN201811216932.4A
Authority: CN
Inventors: 文旷瑜; 陈浩广; 许权南; 杨万波
Original assignee: Gree Electric Appliances Inc of Zhuhai
Current assignee: Gree Electric Appliances Inc of Zhuhai
Priority date: 2018-10-18
Filing date: 2018-10-18
Publication date: 2024-05-07
Anticipated expiration: 2038-10-18
Also published as: CN111079468A

Abstract

The application provides a method and a device for identifying an object by a robot, wherein the method comprises the following steps: the robot establishes a zero sample classification model according to the following information: sample video segment data, labels to which the sample video segment data belong, and semantic auxiliary information of each label; the zero sample classification model acquires the appointed characteristic of each label according to the information; acquiring a first appointed characteristic of the current video segment data to be identified through the zero sample classification model; and determining the label to which the video segment data to be identified belongs according to the matching degree of the first designated feature and the designated feature of each label. By adopting the technical scheme, the problem that the recognition range is smaller when the robot recognizes the object in the related technology is solved, the video segments can be recognized by the robot by applying the zero sample classification model, and the recognition capability of the robot is greatly improved.

Description

Method and device for identifying object by robot

Technical Field

The present application relates to the field of robots, and in particular, but not exclusively, to a method and apparatus for identifying objects by a robot.

Background

In the related art, along with the continuous development of technology, robots gradually become assistants and tools in daily life of people, and particularly, various auxiliary robots are used, so that users can learn and entertain by using the robots, and the production and life standard of people is improved. For example, a recognition robot, a user can make the robot acquire the video segment of an object by shooting only before the object to be recognized is placed on the camera of the robot, and then the recognition of the object is realized by the video segment recognition technology stored in the robot in advance, so that the user can recognize and learn an object which is not recognized through the robot, and the recognition robot has huge effect especially for children or people with poor memory.

However, the video segment recognition system in the existing robot can only recognize the stored specific type of object in the use process after the video segment is trained in advance, and the robot cannot recognize the object without the training, namely the robot can completely lose function.

Aiming at the problem of small identifiable range when a robot identifies an object in the related art, no effective solution exists at present.

Disclosure of Invention

The embodiment of the application provides a method and a device for identifying an object by a robot, which at least solve the problem that the identifiable range is smaller when the robot identifies the object in the related technology.

According to another embodiment of the present application, there is also provided a method of recognizing an object by a robot, including: the robot establishes a zero sample classification model according to the following information: sample video segment data, labels to which the sample video segment data belong, and semantic auxiliary information of each label; the zero sample classification model acquires the appointed characteristic of each label according to the information; acquiring a first appointed characteristic of the current video segment data to be identified through the zero sample classification model; and determining the label to which the video segment data to be identified belongs according to the matching degree of the first designated feature and the designated feature of each label.

According to another embodiment of the present application, there is also provided an apparatus for recognizing an object, applied to a robot, including: the modeling module is used for establishing a zero sample classification model according to the following information: sample video segment data, labels to which the sample video segment data belong, and semantic auxiliary information of each label; the zero sample classification model acquires the appointed characteristic of each label according to the information; the acquisition module is used for acquiring a first appointed characteristic of the current video segment data to be identified through the zero sample classification model; and the determining module is used for determining the label to which the video segment data to be identified belongs according to the matching degree of the first designated characteristic and the designated characteristic of each label.

According to a further embodiment of the application, there is also provided a storage medium having stored therein a computer program, wherein the computer program is arranged to perform the steps of any of the method embodiments described above when run.

According to a further embodiment of the application, there is also provided an electronic device comprising a memory having stored therein a computer program and a processor arranged to run the computer program to perform the steps of any of the method embodiments described above.

According to the application, the robot establishes a zero sample classification model according to the following information: sample video segment data, labels to which the sample video segment data belong, and semantic auxiliary information of each label; the zero sample classification model acquires the appointed characteristic of each label according to the information; acquiring a first appointed characteristic of the current video segment data to be identified through the zero sample classification model; and determining the label to which the video segment data to be identified belongs according to the matching degree of the first designated feature and the designated feature of each label. By adopting the technical scheme, the problem that the recognition range is smaller when the robot recognizes the object in the related technology is solved, the video segments can be recognized by the robot by applying the zero sample classification model, and the recognition capability of the robot is greatly improved.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this specification, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute a limitation on the application. In the drawings:

Fig. 1 is a block diagram of a hardware structure of a mobile terminal of a method for recognizing an object by a robot according to an embodiment of the present application;

FIG. 2 is a flow chart of a method of robotically recognizing an object in accordance with an embodiment of the present application;

Fig. 3 is a block diagram of an image recognition apparatus according to another embodiment of the present application.

Detailed Description

The application will be described in detail hereinafter with reference to the drawings in conjunction with embodiments. It should be noted that, without conflict, the embodiments of the present application and features of the embodiments may be combined with each other.

It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order.

Example 1

The technical solutions in the present document may be applied in a processor, which may be integrated in a robot, or the solutions in the present document may be applied in a computer terminal, which may be integrated in a robot.

The method according to the first embodiment of the present application may be implemented in a computer terminal or a similar computing device. Taking a mobile terminal as an example, fig. 1 is a block diagram of a hardware structure of a mobile terminal according to a method for recognizing an object by a robot according to an embodiment of the present application, as shown in fig. 1, the computer terminal may include one or more (only one is shown in fig. 1) processors 102 (the processors 102 may include, but are not limited to, a microprocessor MCU or a programmable logic device FPGA, etc.) and a memory 104 for storing data, and optionally, the mobile terminal may further include a transmission device 106 for a communication function and an input/output device 108. It will be appreciated by those skilled in the art that the structure shown in fig. 1 is merely illustrative and not limiting of the structure of the mobile terminal described above. For example, the computer terminal may also include more or fewer components than shown in FIG. 1, or have a different configuration than shown in FIG. 1.

The memory 104 may be used to store software programs and modules of application software, such as program instructions/modules corresponding to the method for recognizing an object by the robot in the embodiment of the present application, and the processor 102 executes the software programs and modules stored in the memory 104 to perform various functional applications and data processing, that is, implement the above-mentioned method. Memory 104 may include high-speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory remotely located relative to the processor 102, which may be connected to the computer terminal via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission means 106 is arranged to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of a computer terminal. In one example, the transmission device 106 includes a network adapter (Network Interface Controller, NIC) that can connect to other network devices through a base station to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module for communicating with the internet wirelessly.

In this embodiment, there is provided a method for identifying an object by a robot operating on the computer terminal, and fig. 2 is a flowchart of a method for identifying an object by a robot according to an embodiment of the present application, as shown in fig. 2, the flowchart includes the steps of:

step S202, the robot establishes a zero sample classification model according to the following information: sample video segment data, labels to which the sample video segment data belong, and semantic auxiliary information of each label; the zero sample classification model acquires the appointed characteristic of each label according to the information;

the semantic auxiliary information may be output by a worker to assist the robot in identifying the tag.

Step S204, obtaining a first appointed characteristic of the current video segment data to be identified through the zero sample classification model;

Step S206, determining the label to which the video segment data to be identified belongs according to the matching degree of the first designated feature and the designated feature of each label.

And identifying the label to which the video segment belongs, namely identifying what object is in the video segment or what happens, so that the robot can respond conveniently. For example, the user gives an instruction to take an apple, the robot detects a basket of fruit, identifies which is the apple, i.e. the label belonging to the apple, and then selects the apple to perform the subsequent operation.

Through the steps, the robot establishes a zero sample classification model according to the following information: sample video segment data, labels to which the sample video segment data belong, and semantic auxiliary information of each label; the zero sample classification model acquires the appointed characteristic of each label according to the information; acquiring a first appointed characteristic of the current video segment data to be identified through the zero sample classification model; and determining the label to which the video segment data to be identified belongs according to the matching degree of the first designated feature and the designated feature of each label. By adopting the technical scheme, the problem that the recognition range is smaller when the robot recognizes the object in the related technology is solved, the video segments can be recognized by the robot by applying the zero sample classification model, and the recognition capability of the robot is greatly improved.

Optionally, according to the matching degree of the first specified feature and the specified feature of each tag, the method includes: acquiring a plurality of matching degrees between the first specified feature and the specified feature in each tag; and determining that the video segment data to be identified belongs to the tag with the highest matching degree in the plurality of matching degrees.

Optionally, the zero sample classification model is established by: extracting features of the sample video segment data and labels to which the sample video segment data belong to, and obtaining sample features of the video segment data; acquiring semantic embedded representation of each label and semantic difference measurement among the labels according to the labels to which the sample video segment data belong and the semantic auxiliary information; and establishing the zero sample classification model according to the video segment data sample characteristics and the semantic embedded representation.

Optionally, after the zero sample classification model is established according to the video segment data sample characteristics and the semantic embedded representation, performing semantic consistency regularization on the zero sample classification model according to the semantic difference metric.

Optionally, the zero sample classification model obtains the specified feature of each tag according to the information, including at least one of the following: extracting a first characteristic of the sample video segment data, and determining a designated characteristic of a label to which the sample video segment data belongs as the first characteristic; and extracting second features described in the semantic auxiliary information in each tag, and determining the designated features of the tag as the second features. This embodiment gives two schemes for determining the characteristics of the tags, and can accurately identify common characteristics in each tag or class.

The following description is made in connection with another embodiment of the present application.

Another embodiment of the present application aims to solve the technical problem: for an object that is not trained, the robot cannot recognize.

In another embodiment of the present disclosure, a service robot based on a zero sample is provided, which can effectively identify a new object and improve the interaction effect with a user. The method specifically comprises the following steps:

Step one, extracting the characteristics of the video segment data and the labels of the known categories to obtain the video segment sample characteristics which can be used for machine learning. Inputting semantic auxiliary information of labels of known type video segments and target video segment type labels to obtain vectorized semantic embedded representations of the labels and semantic difference measurement among the labels;

the video segments may be images, and the video segments may be classified according to the content of each video frame.

Step two, establishing a zero sample classification model based on the known category video segment sample characteristics and the category label semantic embedded representation obtained in the step one;

And on the basis of the semantic difference measurement among the class labels obtained in the step one, carrying out semantic consistency regularization on the model, so that the output of the model is consistent with the semantic neighbor relation among the classes.

The method specifically comprises the following steps: and modeling the matching score between the video segment sample and the category label according to the sample data characteristic of the known video segment and the semantic embedded representation of the category label, and establishing a zero sample classification model based on a maximum interval principle. According to semantic difference measurement among category labels, a zero sample classification model performs semantic consistency regularization constraint; the constraint has the effect that for each training sample, a target video segment category similar to the real category semantics of the sample should obtain a higher matching score, and a target category with large difference from the real category semantics of the sample should obtain a lower matching score; the matching score may correspond to the matching degree in the above-described embodiment.

Step three, iteratively updating parameters of the model according to the zero sample classification model regularized by semantic consistency obtained in the step two until the parameters are converged;

The method specifically comprises the following steps: model parameters are solved iteratively based on convex optimization methods (e.g., gradient descent methods, quasi-newton methods, etc.) until the parameters converge.

And predicting the newly input test video segments belonging to the target category according to the zero sample classification model obtained by learning, and outputting the category labels of the newly input test video segments.

By adopting the scheme, the application can be completed, for example, the service robot can perform corresponding processing by analyzing the actions of a person or the dynamic conditions of an object. The old or child is supported when falling or is picked up when it is detected that an object falls from the table onto the floor.

According to the scheme, on the basis of the maximum interval classifier, the matching score of each known video segment sample class on the target video segment class is restrained from being calibrated with the semantic neighbor relation between the known video segment and the target video segment class, so that the classifier focuses on the distinguishing capability of training samples and the adaptability to the semantic structure of the target video segment class, and balance and compromise are made between the known video segment and the target video segment. Therefore, the recognition rate of the robot to the target video segment is improved.

Fig. 3 is a schematic block diagram of an image recognition apparatus according to another embodiment of the present application, as shown in fig. 3, including an image data preprocessing module and an image tag preprocessing module, which together establish a zero sample classification model, then establish a matching score between an image sample and a class tag through a regularization modeling module, iteratively update parameters of the zero sample classification model through a parameter updating module, and finally output the zero sample classification model to a classification prediction module.

In robot video segment recognition, through a machine learning algorithm based on zero samples, the classifier simultaneously focuses on the distinguishing capability of training samples and the adaptability to the semantic structure of the target video segment category by calibrating the semantic close relation between the known video segment samples and the target video segment samples. The specific steps include:

And selecting a sample of the known video segment, a label of the sample and a sample of the target video segment, and extracting features of the sample of the known video segment, wherein the feature extraction method can be principal component analysis or linear analysis and the like. The semantic auxiliary information of the video segment can be an attribute of the video segment, for example, the video segment is a red apple or a green apple, so that the difference of the semantic tags of the video segments is a difference of colors. Assuming an apple video segment, the color of the apple is divided by the maturity of the apple, i.e., the video segment includes apples from not yet mature apples to all colors. Some of the known video segments, including all color apple video segments, may be, for example, red apples and green apples, while the target video segment is the remaining video segment, which may be, for example, half-cooked apples (cyan-red).

Modeling the matching score between the video segment data and the label, wherein the label is the color of the apple in the embodiment, and establishing a zero sample classification model based on the maximum interval principle. According to the difference of colors, a zero sample classification model is used for carrying out semantic consistency regularization constraint; the effect of this constraint is that for each training sample, a higher matching score is obtained that is semantically similar to the known video segment categories, and a lower matching score is obtained that is more different. For example, the target video segment has a higher matching score when it is semantically similar to the red apple and a lower matching score when it is matched to the green apple. Hereby is achieved that an object that has not been trained can be accurately identified.

By adopting the scheme, the recognition accuracy of the robot is improved, and the interaction effect with the user is improved.

From the description of the above embodiments, it will be clear to a person skilled in the art that the method according to the above embodiments may be implemented by means of software plus the necessary general hardware platform, but of course also by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal device (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present application.

Example two

In this embodiment, a device for identifying an object is further provided, and the device is used to implement the foregoing embodiments and preferred embodiments, and will not be described in detail. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. While the means described in the following embodiments are preferably implemented in software, implementation in hardware, or a combination of software and hardware, is also possible and contemplated.

According to another embodiment of the present application, there is provided an apparatus for recognizing an object, applied to a robot, including:

The modeling module is used for establishing a zero sample classification model according to the following information: sample video segment data, labels to which the sample video segment data belong, and semantic auxiliary information of each label; the zero sample classification model acquires the appointed characteristic of each label according to the information;

The acquisition module is used for acquiring a first appointed characteristic of the current video segment data to be identified through the zero sample classification model;

And the determining module is used for determining the label to which the video segment data to be identified belongs according to the matching degree of the first designated characteristic and the designated characteristic of each label.

Optionally, the determining module is further configured to obtain a plurality of matching degrees between the first specified feature and the specified feature in each tag; and determining that the video segment data to be identified belongs to a tag with the highest matching degree in the plurality of matching degrees.

Optionally, the modeling module is further configured to obtain the specified feature of each tag according to the information by one of the following ways:

extracting a first characteristic of the sample video segment data, and determining a designated characteristic of a label to which the sample video segment data belongs as the first characteristic;

and extracting second features described in the semantic auxiliary information in each tag, and determining the designated features of the tag as the second features.

It should be noted that each of the above modules may be implemented by software or hardware, and for the latter, it may be implemented by, but not limited to: the modules are all located in the same processor; or the above modules may be located in different processors in any combination.

Example III

The embodiment of the application also provides a storage medium. Alternatively, in the present embodiment, the above-described storage medium may be configured to store program code for performing the steps of:

S1, the robot establishes a zero sample classification model according to the following information: sample video segment data, labels to which the sample video segment data belong, and semantic auxiliary information of each label; the zero sample classification model acquires the appointed characteristic of each label according to the information;

s2, acquiring a first appointed characteristic of the current video segment data to be identified through the zero sample classification model;

And S3, determining the label to which the video segment data to be identified belongs according to the matching degree of the first designated feature and the designated feature of each label.

Alternatively, in the present embodiment, the storage medium may include, but is not limited to: a usb disk, a Read-Only Memory (ROM), a random access Memory (RAM, random Access Memory), a removable hard disk, a magnetic disk, or an optical disk, or other various media capable of storing program codes.

An embodiment of the application also provides an electronic device comprising a memory having stored therein a computer program and a processor arranged to run the computer program to perform the steps of any of the method embodiments described above.

Optionally, the electronic device may further include a transmission device and an input/output device, where the transmission device is connected to the processor, and the input/output device is connected to the processor.

Alternatively, in the present embodiment, the above-described processor may be configured to execute the following steps by a computer program:

Alternatively, specific examples in this embodiment may refer to examples described in the foregoing embodiments and optional implementations, and this embodiment is not described herein.

It will be appreciated by those skilled in the art that the modules or steps of the application described above may be implemented in a general purpose computing device, they may be concentrated on a single computing device, or distributed across a network of computing devices, they may alternatively be implemented in program code executable by computing devices, so that they may be stored in a memory device for execution by computing devices, and in some cases, the steps shown or described may be performed in a different order than that shown or described, or they may be separately fabricated into individual integrated circuit modules, or multiple modules or steps within them may be fabricated into a single integrated circuit module for implementation. Thus, the present application is not limited to any specific combination of hardware and software.

The above description is only of the preferred embodiments of the present application and is not intended to limit the present application, but various modifications and variations can be made to the present application by those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the protection scope of the present application.

Claims

1. A method for robotically identifying objects, comprising:

The robot establishes a zero sample classification model according to the following information: sample video segment data, labels to which the sample video segment data belong, and semantic auxiliary information of each label; the zero sample classification model acquires the appointed characteristic of each label according to the information;

acquiring a first appointed characteristic of the current video segment data to be identified through the zero sample classification model;

Determining the label to which the video segment data to be identified belongs according to the matching degree of the first designated feature and the designated feature of each label;

According to the matching degree of the first specified feature and the specified feature in each label, the method comprises the following steps:

Acquiring a plurality of matching degrees between the first specified feature and the specified feature in each tag;

determining that the video segment data to be identified belongs to a tag with the highest matching degree in the plurality of matching degrees;

the zero sample classification model obtains the appointed characteristic of each label according to the information, and the zero sample classification model comprises at least one of the following steps:

extracting second features described in semantic auxiliary information in each tag, and determining the appointed features of the tag as the second features;

the method for identifying the object by the robot is at least used for analyzing the actions of a person or the dynamic conditions of the object and carrying out corresponding processing.

2. The method of claim 1, wherein the zero sample classification model is established by:

Extracting features of the sample video segment data and labels to which the sample video segment data belong to, and obtaining sample features of the video segment data;

acquiring semantic embedded representation of each label and semantic difference measurement among the labels according to the labels to which the sample video segment data belong and the semantic auxiliary information;

and establishing the zero sample classification model according to the video segment data sample characteristics and the semantic embedded representation.

3. The method of claim 2, wherein after establishing the zero sample classification model from the video segment data sample characteristics and the semantic embedded representation, the method further comprises:

And carrying out semantic consistency regularization on the zero sample classification model according to the semantic difference measure.

4. An apparatus for identifying an object, characterized by being applied to a robot, comprising:

the determining module is used for determining the label to which the video segment data to be identified belongs according to the matching degree of the first designated feature and the designated feature of each label;

the determining module is further used for obtaining a plurality of matching degrees between the first specified feature and the specified feature in each tag; the method comprises the steps of determining that video segment data to be identified belong to a tag with the highest matching degree in the plurality of matching degrees;

the modeling module is further configured to obtain the specified feature of each tag according to the information by one of:

The device for identifying the object is at least used for analyzing the actions of people or the dynamic conditions of the object and carrying out corresponding processing.

5. A storage medium having a computer program stored therein, wherein the computer program is arranged to perform the method of any of claims 1 to 3 when run.

6. An electronic device comprising a memory and a processor, characterized in that the memory has stored therein a computer program, the processor being arranged to run the computer program to perform the method of any of the claims 1 to 3.