CN110084187B

CN110084187B - Position identification method, device, equipment and storage medium based on computer vision

Info

Publication number: CN110084187B
Application number: CN201910338374.7A
Authority: CN
Inventors: 张峰; 张斌
Original assignee: Jiangsu Zimi Electronic Technology Co Ltd
Current assignee: Jiangsu Zimi Electronic Technology Co Ltd
Priority date: 2019-04-25
Filing date: 2019-04-25
Publication date: 2023-08-11
Anticipated expiration: 2039-04-25
Also published as: CN110084187A

Abstract

The embodiment of the invention discloses a position identification method, a device, equipment and a storage medium based on computer vision. The method comprises the following steps: acquiring a target picture in a target object searching range; identifying a target object and at least two other objects from the target picture; selecting a reference object from other objects according to attention indexes applied to other objects by a user; and identifying the relative position information of the target object relative to the reference object in the target object searching range according to the position of the target object and the position of the reference object. The technical scheme of the embodiment of the invention realizes the effect of quickly and accurately identifying the specific position of the target object in the actual environment and saving the time and energy of the user.

Description

Position identification method, device, equipment and storage medium based on computer vision

Technical Field

The embodiment of the invention relates to the technical field of computer vision recognition, in particular to a position recognition method, device, equipment and storage medium based on computer vision.

Background

The computer vision means that a camera and a computer are used for replacing human eyes to perform machine vision such as recognition, tracking and measurement on targets, and further graphic processing is performed, so that the computer is processed into images which are more suitable for human eyes to observe or transmit to an instrument to detect.

The existing computer vision technology generally performs detection and identification processing of a target object on an image according to a preset deep learning model, namely, identifies whether the image comprises the target object. Currently, in a practical environment, people need to locate a target object in a disordered space, and simply detecting and identifying the target object is not capable. How to quickly and accurately locate a target object by means of a computer recognition technology in a practical environment is an urgent technical problem to be solved.

Disclosure of Invention

The embodiment of the invention provides a position identification method, device, equipment and storage medium based on computer vision, which are used for quickly and accurately identifying the specific position of a target object in an actual environment and saving time and energy of a user.

In a first aspect, an embodiment of the present invention provides a position identifying method based on computer vision, including:

acquiring a target picture in a target object searching range;

identifying a target object and at least two other objects from the target picture;

selecting a reference object from other objects according to attention indexes applied to other objects by a user;

and identifying relative position information of the target object relative to the reference object in a target object searching range according to the position of the target object and the position of the reference object.

Optionally, the attention index applied by the user to each other object at least comprises one of the size of the other object, the recognition degree of the other object, the distance between the other object and the target object and the commonality of the other object;

the selecting a reference object from other objects according to the attention index applied by the user to each other object comprises the following steps:

obtaining reference values of other objects according to attention indexes applied to the other objects by a user;

and selecting an object with a reference value larger than or equal to the reference threshold value as a reference object, or selecting an object with the largest reference value as a reference object.

Optionally, the method further comprises:

and if the number of the objects with the maximum reference value is at least two, selecting the object closest to the target object as the reference object.

Optionally, the obtaining the target picture in the target object searching range includes:

and responding to a target object searching request input by a user, and shooting a target picture in the searching range of the target object.

Optionally, the responding to the target object searching request input by the user captures a target picture in the searching range of the target object, including:

receiving a voice signal input by a user, and identifying information of a target object from the voice signal;

controlling a camera to shoot a target picture in a visual range of the camera, or identifying a searching range of a target object from the voice signal, and controlling the camera to shoot the target picture in the searching range of the target object;

the identifying the target object from the target picture includes:

and identifying the target object from the target picture according to the information of the target object.

Optionally, after identifying the relative position information of the target object relative to the reference object within the target object searching range according to the position of the target object and the position of the reference object, the method further includes:

and outputting relative position information of the target object relative to the reference object in the target object searching range.

Optionally, before the identifying the target object and other objects from the target picture, the method further includes:

acquiring the computing performance of local equipment;

if the computing performance meets the position identification requirement, executing the operation of identifying the target object and other objects from the target picture through the local equipment and the subsequent operation;

and if the computing performance does not meet the position identification requirement, sending the target picture to a cloud server, and executing the operation of identifying the target object and other objects from the target picture through the cloud server and the subsequent operation.

In a second aspect, an embodiment of the present invention further provides a position identifying device based on computer vision, including:

the target picture acquisition module is used for acquiring a target picture in a target object searching range;

the object identification module is used for identifying a target object and at least two other objects from the target picture;

the reference object selection module is used for selecting a reference object from other objects according to attention indexes applied to other objects by a user;

and the position identification module is used for identifying relative position information of the target object relative to the reference object in the target object searching range according to the position of the target object and the position of the reference object.

In a third aspect, an embodiment of the present invention further provides a computer apparatus, including:

one or more processors;

storage means for storing one or more programs,

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement a computer vision based location identification method as provided by any embodiment of the present invention.

In a fourth aspect, embodiments of the present invention further provide a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements a computer vision-based location recognition method as provided by any embodiment of the present invention.

According to the embodiment of the invention, the target object and at least two other objects are identified from the target image by acquiring the target image in the target object searching range, the reference object which is easy to draw attention of a user in the actual environment is selected from the other objects according to the preset attention index, the relative position information of the target object relative to the reference object is identified according to the position of the target object and the position of the reference object, and the position of the target object relative to the reference object is obtained according to the principle that the relative position relation of the target object and the reference object is consistent in the image and the actual environment based on the position of the reference object which is easy to draw attention of the user in the actual environment, so that the user can quickly and accurately position the searched object in the actual environment, and the time and energy of the user are saved.

Drawings

FIG. 1 is a flow chart of a computer vision based position identification method in accordance with a first embodiment of the present invention;

FIG. 2 is a flow chart of a computer vision based position recognition method in a second embodiment of the invention;

FIG. 3 is a schematic structural diagram of a position recognition device based on computer vision in a third embodiment of the present invention;

fig. 4 is a schematic structural view of an apparatus according to a fourth embodiment of the present invention.

Detailed Description

The invention is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present invention are shown in the drawings.

Example 1

Fig. 1 is a flowchart of a position identifying method based on computer vision according to a first embodiment of the present invention, which is applicable to a case of identifying a position of a target object in an actual environment based on a target picture, and the method may be performed by a position identifying device based on computer vision, which may be implemented by software and/or hardware, and may be generally integrated in various computer devices capable of identifying a position of a target object in an actual environment. Specifically, referring to fig. 1, the method may include the steps of:

step 110, obtaining a target picture in a target object searching range.

In this embodiment, the target object is an object to be searched by the user, which may be a static object commonly used in a home or an office area, for example, an electric hair dryer, a towel, a usb disk, or a mobile phone, or may be a movable object such as a sweeping robot. The target object search range is a valid area including the target object. The target picture can be a depth image shot by a camera, also called a distance image, the gray value of each pixel point of the target picture can be used for representing the distance between a certain point in a shooting scene and the camera, and the geometric shape of the visible surface of a shooting object can be directly reflected; the target picture can also be a common two-dimensional image taken by a camera.

Optionally, the target object may be a preset object to be searched, or an object input by the user according to the searching requirement; the target picture in the target object searching range can be periodically acquired, and the target picture can be shot in the target object searching range in response to a target object searching request input by a user.

Optionally, in response to a target object search request input by a user, shooting a target picture in a search range of a target object, including: receiving a voice signal input by a user, and identifying information of a target object from the voice signal; and controlling the camera to shoot the target picture in the visual range, or identifying the searching range of the target object from the voice signal, and controlling the camera to shoot the target picture in the searching range of the target object.

Specifically, after the camera receives a voice signal input by a user by using a built-in microphone or a microphone of other equipment connected with the camera through a network, the voice signal is converted into text information by adopting a voice recognition technology, and then the information of a target object, such as the name, the shape, the size and the like of the target object, is analyzed from the text information by using a semantic analysis technology. And then taking the visual range of the camera as the searching range of the target object, controlling the camera to shoot a target picture in the visual range of the camera, or controlling the camera to shoot the target picture in the searching range of the target object by identifying the searching range of the target object from the voice signal. Then, the target object is identified from the target picture according to the information of the target object.

The target object search request is not limited to the voice signal input by the user, but may be a target object search request in other forms such as a text signal or a picture signal input by the user, and according to the type of the target object search request, the embodiment may acquire the information of the target object from the target object search request by adopting a corresponding processing technology.

For example, a user inputs a voice signal "find a mobile phone in an office" through a microphone built in a camera, the camera recognizes the name of a target object from the voice signal as "mobile phone" by using voice recognition and semantic analysis technology, and meanwhile, the search range from the voice signal to the target object is "office", so that the camera is controlled to shoot a target picture in the office. Then, the target object is identified from the target picture according to the name of the target object.

Step 120, identifying a target object from the target picture, and at least two other objects.

After the target picture is obtained, image analysis is carried out on the target picture to obtain characteristic information of each object in the target picture, each object in the target picture is identified by comparing the characteristic information of each object with characteristic information of each common object stored in a pre-established database, and then the target object and at least two other objects are identified from the target picture according to the information of the target object. The camera may perform image analysis on the target image by using a segmentation technique of a depth image, an edge detection technique of the depth image, a three-dimensional target recognition technique based on the depth image, and the like.

Optionally, before identifying the target object and other objects from the target picture, the method further includes: acquiring the computing performance of local equipment; if the computing performance meets the position identification requirement, executing the operation of identifying the target object and other objects from the target picture through the local equipment and the subsequent operation; and if the computing performance does not meet the position identification requirement, sending the target picture to a cloud server, and executing the operation of identifying the target object and other objects from the target picture through the cloud server and the subsequent operation. The calculation performance may include calculation response time, calculation accuracy, and the like, and accordingly, the position recognition requirement includes response time, calculation accuracy, and the like required to recognize the position of the target object. In this embodiment, no matter whether the computing performance of the local device can meet the position identification requirement, the device that meets the position identification requirement can execute the corresponding operation, so as to ensure the smooth implementation of the scheme.

Step 130, selecting a reference object from other objects according to the attention index applied by the user to each other object.

In real life, when facing a variety of objects, a user often easily applies attention to objects that are relatively bulky, relatively bright in color, or relatively commonly used, and therefore, attention indicators may be used to characterize the attention of the user to each object to select a reference object that is easily found by the user in a real environment. Optionally, the attention index applied by the user to each other object at least includes one of the size of the other object, the recognition degree of the other object, the distance between the other object and the target object, and the commonality of the other object.

Optionally, selecting the reference object from the other objects according to the attention index applied by the user to each other object includes: obtaining reference values of other objects according to attention indexes applied to the other objects by a user; and selecting an object with a reference value larger than or equal to the reference threshold value as a reference object, or selecting an object with the largest reference value as a reference object. The reference threshold may be twenty, thirty, or other preset values.

For example, if the attention index applied by the user to each other object includes: the size of other objects, the recognition degree of other objects and the commonality of other objects, and the total reference value of each attention index is quite high, and the initial reference value of each other object is zero, the position recognition device based on computer vision can calculate the reference value of each other object according to the following ways: for each attention index, if the current object meets the attention index, accumulating the reference values of the current object to be quite high, and if the current object does not meet the attention index, keeping the reference values unchanged; alternatively, each attention index is divided into several levels, each level corresponding to a reference score, for each attention index, the level of the current object corresponding to the attention index is determined, and the reference values thereof are accumulated to a score corresponding to the level. The present embodiment is not limited to the above two methods of calculating the reference values of other objects, and may be other calculation methods.

After the reference value of each other object is obtained, the reference values of the objects are ordered in a descending order, if the reference threshold is twenty, all objects with the reference values greater than or equal to the reference threshold are selected as reference objects according to the order from high to low, or the object with the largest reference value is directly selected as the reference object.

Alternatively, when the object with the largest reference value is selected as the reference object, if the number of objects with the largest reference value is at least two, the distances between the target object and the objects with the largest reference values are calculated respectively, and the object with the closest distance to the target object is selected as the reference object. The distance between the target object and the other objects can also be calculated when the camera recognizes the target object and at least two other objects from the target picture.

And 140, identifying relative position information of the target object relative to the reference object in the target object searching range according to the position of the target object and the position of the reference object.

After the reference object is determined, identifying relative position information of the target object relative to the reference object in the target picture according to the position of the target object in the target picture and the position of the reference object; and then, obtaining the relative position information of the target object relative to the reference object in the target object searching range, namely in the actual environment, based on the relative position information in the target picture. For example, the target object in the target picture is located 0.1 meters to the right of the reference object. And then, according to the scale of 1:10 of the target picture and the target object searching range, obtaining the position of the target object at the distance of 1 meter in front of the right of the reference object in the target object searching range.

Optionally, after identifying the relative position information of the target object relative to the reference object within the target object searching range according to the position of the target object and the position of the reference object, the method further includes: and outputting the relative position information of the target object relative to the reference object in the target object searching range. The camera can output relative position information of a target object corresponding to a reference object in an actual environment in a voice signal mode by means of semantic analysis and voice synthesis technology through a built-in speaker of the camera or speakers of other equipment connected with the camera through a network; of course, the camera may also inform the user of the relative position information of the target object with respect to the reference object in the actual environment in a text output, a picture output or other manners, so that the user can quickly find the target object based on the reference object.

Example two

Fig. 2 is a flowchart of a computer vision-based position recognition method according to a second embodiment of the present invention, which may be combined with each of the alternatives of one or more of the above embodiments. Specifically, referring to fig. 2, the method may include the steps of:

step 210, the local device builds an object feature database in advance.

In this embodiment, for most common objects existing in daily life, especially objects in home or office, the local device uses machine learning and big data training to pre-establish an object feature database for object identification and labeling within the search range of the target object.

Step 220, the local device receives a target object search request input by a user.

Alternatively, the user may input the target object voice search request through the built-in microphone of the camera, in this embodiment, the target object search request is not limited to the form of voice signals, and the user may also input the target object search request in the form of text signals, picture signals, or even video signals.

Step 230, the local device controls the camera to shoot the target picture according to the target object searching request.

Optionally, after receiving the target object search request input by the user, the local device processes the target object search request by adopting a corresponding processing technology according to the type of the target object search request, so as to obtain information such as the name, shape, size and the like of the target object. And then identifying the searching range of the target object from the target object searching request, if the identification is successful, controlling the camera to shoot a target picture in the searching range of the target object, and if the identification is failed, taking the visual range of the camera as the searching range of the target object, and controlling the camera to shoot the target picture in the visual range of the camera.

Step 240, judging whether the computing performance of the local device meets the position identification requirement, if so, jumping to step 250, and if not, jumping to step 251.

Optionally, after the camera shoots the target picture, in order to ensure that the subsequent operation can meet the position recognition requirement of the embodiment, whether the computing performance of the local device meets the position recognition requirement needs to be judged first. Specifically, firstly, the computing performance of the local equipment is obtained; if the computing performance meets the position identification requirement, the local equipment is execution equipment, and the operation of identifying the target object and other objects from the target picture through the local equipment and the subsequent operation are executed; if the computing performance does not meet the position identification requirement, the cloud server needs to send the target image to the cloud server for executing the operation of identifying the target object and other objects from the target image through the cloud server and the subsequent operation.

Step 250, the local device identifies a target object and a reference object from the target picture, and identifies relative position information of the target object relative to the reference object according to the position of the target object and the position of the reference object. Execution continues with step 260.

Step 251, the local device sends the target image to the cloud server, so that the cloud server can identify the target object and the reference object from the target image, and identify the relative position information of the target object relative to the reference object according to the position of the target object and the position of the reference object. Execution continues with step 270.

Optionally, after the target picture is obtained, the local device or the cloud server performs image analysis on the target picture, identifies each object in the target picture, and further identifies the target object and at least two other objects in the target picture according to the information of the target object. Then, according to the attention index applied by the user to other objects, obtaining the reference value of the other objects; and selecting an object with a reference value larger than or equal to the reference threshold value as a reference object, or selecting an object with the largest reference value as a reference object.

After the reference object is determined, the local device or the cloud server identifies the relative position information of the target object relative to the reference object in the actual environment according to the position of the target object in the target picture and the position of the reference object, and outputs the relative position information of the target object relative to the reference object to a user through a preset output mode.

Step 260, the local device receives the recognition result of the target object returned by the user, if the recognition result is that the recognition is wrong, the wrong object is marked, and the learning is performed again. Ending the operation.

Step 270, the cloud server receives the recognition result of the target object returned by the user, if the recognition result is that the recognition is wrong, the wrong object is marked, and the learning is performed again. Ending the operation.

Optionally, the user searches the target object in the target object searching range according to the relative position of the target object output by the local device or the cloud server relative to the reference object, if the target object found according to the relative position is found to be wrong, the identification result of the identification error is returned to the local device or the cloud server, the local device or the cloud server receives the identification result of the target object returned by the user, if the identification result is found to be the identification error, the current target object is wrongly marked in the object feature database, and the current target object is learned again.

Example III

Fig. 3 is a schematic structural diagram of a position recognition device based on computer vision in a third embodiment of the present invention. As shown in fig. 3, the computer vision-based position recognition apparatus includes: the system comprises a target picture acquisition module, an object identification module, a reference object selection module and a position identification module;

a target picture obtaining module 310, configured to obtain a target picture within a target object searching range;

an object recognition module 320, configured to recognize a target object and at least two other objects from the target picture;

a reference object selection module 330, configured to select a reference object from other objects according to attention indexes applied by a user to each other object;

the position identifying module 340 is configured to identify relative position information of the target object with respect to the reference object within the target object searching range according to the position of the target object and the position of the reference object.

Further, the attention index applied by the user to each other object at least comprises one of the size of the other object, the recognition degree of the other object, the distance between the other object and the target object and the commonality of the other object;

accordingly, the reference object selection module 330 includes: the reference value acquisition unit is used for acquiring the reference value of each other object according to the attention index applied by the user to each other object; and the reference object selection unit is used for selecting an object with a reference value larger than or equal to the reference threshold value as a reference object or selecting an object with the largest reference value as a reference object.

Further, the reference object selecting unit is further configured to select, as the reference object, an object having a closest distance to the target object if the number of objects having the largest reference value is at least two.

Further, the target picture obtaining module 310 includes: and the target picture shooting unit is used for responding to a target object searching request input by a user and shooting a target picture in the searching range of the target object.

Further, the target picture shooting unit is specifically configured to receive a voice signal input by a user, and identify information of a target object from the voice signal; controlling the camera to shoot a target picture in the visual range of the camera, or identifying the searching range of the target object from the voice signal, and controlling the camera to shoot the target picture in the searching range of the target object;

the object identifying module 320 is specifically configured to identify, from the target picture, a target object according to the information of the target object.

Further, the location identification module 340 further includes: and the output unit is used for outputting the relative position information of the target object relative to the reference object in the target object searching range.

Further, the object recognition module 320 is further configured to: acquiring the computing performance of local equipment; if the computing performance meets the position identification requirement, executing the operation of identifying the target object and other objects from the target picture through the local equipment and the subsequent operation; and if the computing performance does not meet the position identification requirement, sending the target picture to a cloud server, and executing the operation of identifying the target object and other objects from the target picture through the cloud server and the subsequent operation.

The position recognition device based on computer vision provided by the embodiment of the invention can execute the position recognition method based on computer vision provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.

Example IV

Referring to fig. 4, fig. 4 is a schematic structural diagram of an apparatus according to a fourth embodiment of the present invention, and as shown in fig. 4, the apparatus includes a processor 410, a memory 420, an input device 430, and an output device 440; the number of processors 410 in the device may be one or more, one processor 410 being taken as an example in fig. 4; the processor 410, memory 420, input means 430 and output means 440 in the device may be connected by a bus or other means, for example in fig. 4.

The memory 420 is used as a computer readable storage medium for storing software programs, computer executable programs, and modules, such as program instructions/modules corresponding to the computer vision-based location recognition method in the embodiment of the present invention (e.g., the target picture acquisition module 310, the object recognition module 320, the reference object selection module 330, and the location recognition module 340 in the computer vision-based location recognition device). The processor 410 executes various functional applications of the device and data processing by running software programs, instructions and modules stored in the memory 420, i.e., implements the computer vision-based location recognition method described above.

The processor 410 implements a computer vision based location identification method comprising:

acquiring a target picture in a target object searching range;

Memory 420 may include primarily a program storage area and a data storage area, wherein the program storage area may store an operating system, at least one application program required for functionality; the storage data area may store data created according to the use of the terminal, etc. In addition, memory 420 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid-state storage device. In some examples, memory 420 may further include memory located remotely from processor 410, which may be connected to the device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The input means 430 may be used to receive entered numeric or character information and to generate key signal inputs related to user settings and function control of the device. The output 440 may include a display device such as a display screen.

Example five

A fifth embodiment of the present invention provides a computer-readable storage medium having stored thereon computer instructions which, when executed by a processor, implement a computer vision-based position identification method, the method comprising:

acquiring a target picture in a target object searching range;

Of course, the computer-readable storage medium provided in the embodiments of the present invention may have computer instructions capable of executing the relevant operations in the computer vision-based position recognition method provided in any embodiment of the present invention, not limited to the above method operations.

From the above description of embodiments, it will be clear to a person skilled in the art that the present invention may be implemented by means of software and necessary general purpose hardware, but of course also by means of hardware, although in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present invention may be embodied essentially or in a part contributing to the prior art in the form of a software product, which may be stored in a computer readable storage medium, such as a floppy disk, a Read-Only Memory (ROM), a random access Memory (Random Access Memory, RAM), a FLASH Memory (FLASH), a hard disk or an optical disk of a computer, etc., including several instructions for causing a computer device (which may be a personal computer, a server, a network device, etc.) to execute the method of the embodiments of the present invention.

It should be noted that, in the embodiment of the position identifying device based on computer vision, each unit and module included are only divided according to the functional logic, but not limited to the above division, so long as the corresponding functions can be implemented; in addition, the specific names of the functional units are also only for distinguishing from each other, and are not used to limit the protection scope of the present invention.

Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the invention, which is set forth in the following claims.

Claims

1. A computer vision-based position recognition method, comprising:

acquiring a target picture in a target object searching range;

identifying relative position information of a target object relative to a reference object in a target object searching range according to the position of the target object and the position of the reference object;

the attention index applied by the user to each other object at least comprises one of the size of the other object, the recognition degree of the other object, the distance between the other object and the target object and the commonality of the other object.

2. The method of claim 1, wherein selecting the reference object from the other objects based on the attention index applied by the user to each of the other objects comprises:

3. The method according to claim 2, wherein the method further comprises:

4. The method according to claim 1, wherein the obtaining the target picture within the target object search range includes:

5. The method according to claim 4, wherein the capturing a target picture within a search range of the target object in response to a target object search request input by a user comprises:

the identifying the target object from the target picture includes:

6. The method according to claim 1, further comprising, after identifying relative position information of the target object with respect to the reference object within a target object search range based on a position of the target object and a position of the reference object:

and outputting relative position information of the target object relative to the reference object in a target object searching range.

7. The method according to any one of claims 1-6, further comprising, prior to said identifying the target object and other objects from the target picture:

acquiring the computing performance of local equipment;

8. A computer vision-based position recognition apparatus, comprising:

the position identification module is used for identifying relative position information of the target object relative to the reference object in a target object searching range according to the position of the target object and the position of the reference object;

9. A computer device, the computer device comprising:

one or more processors;

storage means for storing one or more programs,

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the computer vision-based location identification method of any of claims 1-7.

10. A computer-readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the computer vision-based position recognition method according to any one of claims 1-7.