CN111866375A

CN111866375A - Target action recognition method and device and camera system

Info

Publication number: CN111866375A
Application number: CN202010574823.0A
Authority: CN
Inventors: 梁峰; 欧金超; 刘煜; 浦汉来
Original assignee: Shanghai Moxiang Network Technology Co ltd
Current assignee: Shanghai Moxiang Network Technology Co ltd
Priority date: 2020-06-22
Filing date: 2020-06-22
Publication date: 2020-10-30

Abstract

The invention provides a target action recognition method, a target action recognition device and a camera system, wherein the method is applied to a double-camera system, and the double-camera system comprises a main camera and an auxiliary camera; at least one imaging parameter of the primary camera is superior to the secondary camera, the imaging parameters including: view angle, resolution, aperture; the method comprises the following steps: acquiring first image data acquired by the main camera and second image data acquired by the auxiliary camera; performing action recognition according to the second image data to obtain an action recognition result; wherein the action recognition result comprises an action classification label; adding the action classification label to the first image data. The embodiment of the invention can improve the operation efficiency and add the label in the first image data, thereby facilitating the management of the image data by a user and meeting the requirements of the user.

Description

Target action recognition method and device and camera system

Technical Field

The invention relates to the technical field of computer vision, in particular to a target action recognition method, a target action recognition device and a camera system.

Background

The target detection and identification technology is a research direction in which the computer vision field develops rapidly in recent years. With the development of visual processing technology and artificial intelligence technology, the intelligent camera can track the target to be shot and perform intelligent processing such as target recognition on the shot image, so that a user can manage the shot picture or video conveniently.

The existing camera system usually aims at improving the view quality of the camera, the camera needs to ensure that the data volume generated by the view quality is huge, but is limited by the processing capacity of the processor, when the target is identified, the view image of the camera needs to be compressed and then sent to the processor for processing, the occupied resource is large, the processing efficiency is low, and the satisfactory target action identification effect cannot be achieved.

The target action recognition is the basis for managing the shooting data by the user, and for the reasons, the existing camera cannot meet the requirement of the user on managing the shooting data.

Disclosure of Invention

The invention solves the problem that the existing camera can not meet the requirement of a user on management of shooting data.

In order to solve the above problems, the present invention provides a target motion recognition method, which is applied to a dual-camera system, wherein the dual-camera system comprises a main camera and an auxiliary camera; at least one imaging parameter of the primary camera is superior to the secondary camera, the imaging parameters including: view angle, resolution, aperture; the method comprises the following steps: acquiring first image data acquired by the main camera and second image data acquired by the auxiliary camera; performing action recognition according to the second image data to obtain an action recognition result; wherein the action recognition result comprises an action classification label; adding the action classification label to the first image data.

Optionally, the adding the action classification label to the first image data comprises: determining the acquisition time of the image frame corresponding to the action classification label; determining a target image frame in the first image data according to the acquisition time; adding the action classification label to the target image frame.

Optionally, the adding the action classification label to the first image data comprises: determining the acquisition time of the image frame corresponding to the action classification label and a target object; determining a target image frame in the first image data according to the acquisition time; performing target identification according to the target image frame, and determining the position of the target object in the target image frame; adding the action classification label to the location in the target image frame.

Optionally, the method further comprises: and if the target object does not exist in the target image frame, adding the action classification label to the second image data, and saving the second image data added with the action classification label.

Optionally, the method further comprises: performing target recognition according to the first image data, and determining the acquisition time of a first image frame in which at least one target object is located; determining a second image frame in the second image data corresponding to the acquisition time; if the target object does not exist in the second image frame, performing action recognition according to the first image frame to obtain an action classification label corresponding to the first image data; and adding the action classification label corresponding to the first image data.

Optionally, the method further comprises: if a target action classification label retrieval operation is received, retrieving the first image data according to the target action classification label; and outputting the image and/or video including the target action classification label in the first image data.

Optionally, the method further comprises: if a target action classification label retrieval operation is received, retrieving the second image data according to the target action classification label; and outputting the image and/or video including the target action classification label in the second image data.

Optionally, the action classification label includes at least one of: a facial motion classification label, a hand motion classification label, a leg motion classification label, and a whole body motion classification label.

The invention also provides a target action recognition device, which is characterized by being applied to a double-camera system, wherein the double-camera system comprises a main camera and an auxiliary camera; at least one imaging parameter of the primary camera is superior to the secondary camera, the imaging parameters including: view angle, resolution, aperture; the method comprises the following steps:

the image acquisition module is used for acquiring first image data acquired by the main camera and second image data acquired by the auxiliary camera;

The action recognition module is used for carrying out action recognition according to the second image data to obtain an action recognition result; wherein the action recognition result comprises an action classification label;

a tag adding module for adding the action classification tag to the first image data.

The present invention also provides a camera system, including: the system comprises a main camera, an auxiliary camera and a processor; at least one imaging parameter of the primary camera is superior to the secondary camera, the imaging parameters including: view angle, resolution, aperture; the processor is used for the target action recognition method of any one of the above items.

According to the target action identification method provided by the embodiment, the first image data can be acquired through the main camera, the second image data can be acquired through the auxiliary camera, action identification is performed according to the second image data, and the obtained action classification label is added to the first image data; the imaging quality of the second image data is lower than that of the first image data, so that the second image data occupies a smaller storage space, does not need to be compressed, and occupies smaller computing resources when action recognition is carried out, thereby improving the operation efficiency; through target action recognition, a label can be added in the first image data, so that a user can manage the image data conveniently, and the user requirements are met.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is a schematic flow chart of a method for target action recognition in one embodiment of the present invention;

FIG. 2 is a schematic structural diagram of a target motion recognition apparatus according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a dual-camera system according to an embodiment of the present invention.

Description of reference numerals:

201-an image acquisition module; 202-an action recognition module; 203-label adding module; 31-a handle; 32-a three-axis pan-tilt; 33-a main camera; 34-a secondary camera; 35-display screen.

Detailed Description

In order to make the aforementioned objects, features and advantages of the present invention comprehensible, embodiments accompanied with figures are described in detail below. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

In the existing camera with only one camera, a processor processes the acquired data of the camera, so that the output of high image quality and real-time target action identification are ensured, and two functions are mixed and executed. In the target motion recognition algorithm, image data with high definition is not required to be processed, and therefore, resolution conversion and compression are required, which results in low processing efficiency of the algorithm.

In the embodiment of the invention, the functions of the main camera and the auxiliary camera are separated, the main camera is used for collecting high-definition image data, the auxiliary camera is used for collecting general-quality image data, and the processor performs target action identification operation according to the image data of the auxiliary camera so as to improve the operation efficiency.

The target action recognition method provided by the embodiment of the invention can be applied to a double-camera system, wherein the double-camera system comprises a main camera and an auxiliary camera. Wherein, at least one imaging parameter of the main camera is better than that of the auxiliary camera, and the imaging parameter can comprise: angle of view, resolution, aperture, etc.

It should be noted that the imaging parameter of the main camera is better than that of the sub camera, which means that the final imaging quality of the main camera is higher than that of the sub camera, for example, the viewing angle of the main camera is larger than that of the sub camera, the resolution of the main camera is higher than that of the sub camera, and the aperture of the main camera is smaller than that of the sub camera.

Fig. 1 is a schematic flow chart of a target action recognition method in one embodiment of the invention, the method comprising:

and S102, acquiring first image data acquired by the main camera and second image data acquired by the auxiliary camera.

The imaging quality of the first image data collected by the main camera is higher than that of the second image data collected by the auxiliary camera. The image data may be a video or a photograph.

And S104, performing motion recognition according to the second image data to obtain a motion recognition result.

In this embodiment, the motion recognition process is performed by using a preset motion recognition algorithm, and a suitable motion recognition algorithm may be determined based on a usage scenario, which is not limited in this embodiment.

The action recognition result may include an action classification label corresponding to the target object, and the action classification label may include: a facial motion classification tag, a hand motion classification tag, a leg motion classification tag, a whole body motion classification tag, and the like. For example, the facial motion classification tag may be an expression classification tag, the hand motion classification tag may be a gesture classification tag, and the leg motion classification tag may be a walking mode tag (slow, jump, run, etc.).

It should be noted that each action category label may include one or more action keywords. For example, the facial action classification label may be an expression classification label of the target object, and one expression classification label may include: a plurality of keywords such as "facial movements", "expressions", "laughter", etc.

Because the imaging quality of the second image data is lower than that of the first image data, the second image data occupies smaller storage space, does not need to be compressed, and occupies smaller computing resources when action recognition is carried out, so that the operation efficiency can be improved.

S106, adding an action classification label to the first image data.

Alternatively, the motion classification tag may be added by the acquisition time of the first image data corresponding to the second image data, or may be added based on the motion occurrence position of the first image data corresponding to the second image data.

As an embodiment, the action classification label adding operation may be performed as follows:

and A1, determining the acquisition time of the image frame corresponding to the action classification label.

Generally, in an image frame including a target object, a motion classification of the target object can be identified, so as to obtain a motion classification tag. The collection time of the action corresponding to the action classification label and the collection time of the image frame.

A2, determining the target image frame in the first image data according to the acquisition time.

It can be understood that the main camera and the auxiliary camera are arranged in parallel in the same direction, the imaging ranges of the main camera and the auxiliary camera are at least partially repeated, and the main camera and the auxiliary camera are operated at the same time in time, namely, images are acquired at the same time. Therefore, the target image frame corresponding to the acquisition time in the first image data can be obtained based on the acquisition time.

A3, adding the action classification label to the target image frame.

Specifically, the action classification tag may be associated with the target image frame, and an association relationship between the action classification tag and the target image frame is stored, and when the action classification tag is searched, the target image frame corresponding to the action classification tag may be obtained according to the association relationship; the action classification label can also be added to the corresponding position of the target image frame on the video time axis, and when the action classification label is searched, the corresponding target image frame can be determined according to the position of the action classification label on the time axis.

As another embodiment, the action category label adding operation may be performed as follows:

and B1, determining the acquisition time of the image frame corresponding to the action classification label and the target object.

Similar to the above embodiments, the acquisition time of the image frame corresponding to the classification tag may be determined. In order to accurately add the motion classification tag to the first image data, it may be added to a target object in the first image data, the target object having made a certain motion, the above motion classification tag being recognized based on the certain motion. Therefore, when the acquisition time is determined, the target object corresponding to the action classification label is also determined.

B2, determining the target image frame in the first image data according to the acquisition time.

And B3, performing target recognition according to the target image frame, and determining the position of the target object in the target image frame.

The target recognition may be performed by using a preset target recognition algorithm, and a suitable target recognition algorithm may be determined based on the usage scenario, which is not limited in this embodiment.

B4, adding the action classification label to the position in the target image frame.

In the present embodiment, not only the motion classification tag is added to a certain image frame in the first image data, but the motion classification tag is more accurately added to the position where the target object is located in the image frame. Alternatively, an action classification tag may be associated with the target image frame and with the position of the target object in the target image frame. When displayed, the action classification label may be displayed at the location of the target object in the target image frame.

The situation that the target object only appears in the image collected by one camera at a certain moment is considered to occur due to the fact that shooting parameters such as the visual angles and the focal lengths of the main camera and the auxiliary camera are different. Based on this, if the target object appears only in the image captured by the sub-camera, the method may further include the steps of:

and if the target object does not exist in the target image frame, adding the action classification label to the second image data, and saving the second image data added with the action classification label.

The target object only appears in the image collected by the auxiliary camera, and the image collected by the main camera does not have the target object, so that the image collected by the auxiliary camera can be used as a final output image to make up for the defect of the image collected by the main camera. Specifically, the motion classification tag may be added to the second image data, and the second image data to which the motion classification tag is added may be saved. The specific adding process is similar to the process of adding to the first image data, and is not described herein again.

If the target object only appears in the image collected by the main camera, the method can further comprise the following steps:

c1, performing object recognition according to the first image data, and determining the acquisition time of the first image frame where at least one target object is located.

C2, determining a second image frame in the second image data corresponding to the acquisition time.

C3, if there is no target object in the second image frame, performing motion recognition based on the first image frame to obtain a motion classification label corresponding to the first image data.

The target object only appears in the image collected by the main camera, but the target object is not in the image collected by the auxiliary camera at the moment, and the target object can be simultaneously used as a final output image and an image recognized by the target action through the first image data. And performing motion recognition based on a first image frame of the first image data to obtain a motion classification label.

C4, adding the action classification label corresponding to the first image data.

According to the target identification method provided by the embodiment, the main camera outputs a high-definition image, the auxiliary camera outputs an image for target specific action identification, and the image is embedded into the image shot by the main camera after the label is obtained through identification, so that the video and the photo output by the main camera are stable and clear and contain the label, and the processing is convenient for a user to process in the later video editing process.

The user can search for a history picture and a video collected by the camera system at a server (local or cloud) for storing the picture (for example, inputting a keyword "smiling face"), so as to search out the picture with the smiling face under the label, or the image frame with the smiling face in the video, and the video is searched out and shows the smiling face frame on which time axes of the video. Based on this, the above method may further include the following two retrieval modes:

(1) if receiving a target action classification label retrieval operation, retrieving first image data according to the target action classification label; and outputting the image and/or video including the target action classification label in the first image data.

(2) If the target action classification label retrieval operation is received, retrieving second image data according to the target action classification label; and outputting the image and/or the video comprising the target action classification label in the second image data.

Since the target object only appears in the image collected by the auxiliary camera at a certain moment, the target object can be searched in the second image data, and the image and/or the video including the target action classification label can be searched and obtained from the stored second image data.

Fig. 2 is a schematic structural diagram of a target motion recognition device in an embodiment of the present invention, where the target motion recognition device is applied to a dual-camera system, and the dual-camera system includes a main camera and a sub-camera; at least one imaging parameter of the primary camera is superior to the secondary camera, the imaging parameters including: view angle, resolution, aperture; the device comprises:

an image obtaining module 201, configured to obtain first image data collected by the main camera and second image data collected by the auxiliary camera;

the action recognition module 202 is configured to perform action recognition according to the second image data to obtain an action recognition result; wherein the action recognition result comprises an action classification label;

A tag adding module 203, configured to add the action classification tag to the first image data.

In the target action recognition device provided by this embodiment, the imaging quality of the second image data is lower than that of the first image data, the occupied storage space is smaller, compression is not required, and the occupied computing resources are smaller when action recognition is performed, so that the operation efficiency can be improved; through target action recognition, a label can be added in the first image data, so that a user can manage the image data conveniently, and the user requirements are met.

Optionally, as an embodiment, the tag adding module 203 is specifically configured to: determining the acquisition time of the image frame corresponding to the action classification label; determining a target image frame in the first image data according to the acquisition time; adding the action classification label to the target image frame.

Optionally, as an embodiment, the tag adding module 203 is specifically configured to: determining the acquisition time of the image frame corresponding to the action classification label and a target object; determining a target image frame in the first image data according to the acquisition time; performing target identification according to the target image frame, and determining the position of the target object in the target image frame; adding the action classification label to the location in the target image frame.

Optionally, as an embodiment, the apparatus further includes a saving module, specifically configured to: and if the target object does not exist in the target image frame, adding the action classification label to the second image data, and saving the second image data added with the action classification label.

Optionally, as an embodiment, the tag adding module 203 is further configured to: performing target recognition according to the first image data, and determining the acquisition time of a first image frame in which at least one target object is located; determining a second image frame in the second image data corresponding to the acquisition time; if the target object does not exist in the second image frame, performing action recognition according to the first image frame to obtain an action classification label corresponding to the first image data; and adding the action classification label corresponding to the first image data.

Optionally, as an embodiment, the apparatus further includes a retrieving module, specifically configured to: if a target action classification label retrieval operation is received, retrieving the first image data according to the target action classification label; and outputting the image and/or video including the target action classification label in the first image data.

Optionally, as an embodiment, the retrieving module is further configured to: if a target action classification label retrieval operation is received, retrieving the second image data according to the target action classification label; and outputting the image and/or video including the target action classification label in the second image data.

Optionally, as an embodiment, the action classification label includes at least one of: a facial motion classification label, a hand motion classification label, a leg motion classification label, and a whole body motion classification label.

The present embodiment further provides a camera system, including: the system comprises a main camera, an auxiliary camera and a processor; at least one imaging parameter of the primary camera is superior to the secondary camera, the imaging parameters including: view angle, resolution, aperture; the processor is used for executing the target action recognition method.

Referring to fig. 3, a schematic structural diagram of a dual-camera system includes a handle 31 and a three-axis pan-tilt 32 mounted on the handle 31, and the three-axis pan-tilt 32 is provided with a dual-camera system, which includes a main camera 33 and a sub-camera 34.

A display screen 35 for displaying the shot content of the dual-camera system is provided on the handle 31.

Through setting up display screen 35 at handle 31, this display screen can show main camera 33's shooting content to realize that the user can browse the picture or the video that main camera 33 was taken through this display screen 35 fast, thereby improve double-camera and user's interactivity and interest, satisfy user's diversified demand.

The present embodiment further provides a computer-readable storage medium, where a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the computer program implements each process of the above target action identification method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here. The computer-readable storage medium may be a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk.

Of course, those skilled in the art will understand that all or part of the processes in the methods of the above embodiments may be implemented by instructing the control device to perform operations through a computer, and the programs may be stored in a computer-readable storage medium, and when executed, the programs may include the processes of the above method embodiments, where the storage medium may be a memory, a magnetic disk, an optical disk, and the like.

Although the present invention is disclosed above, the present invention is not limited thereto. Various changes and modifications may be effected therein by one skilled in the art without departing from the spirit and scope of the invention as defined in the appended claims.

Finally, it should also be noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

The embodiments in the present description are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments are referred to each other. The target motion recognition device and the camera system disclosed in the embodiments correspond to the target motion recognition method disclosed in the above embodiments, so the description is simple, and the relevant points can be referred to the description of the method part.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A target action recognition method is characterized by being applied to a double-camera system, wherein the double-camera system comprises a main camera and an auxiliary camera; at least one imaging parameter of the primary camera is superior to the secondary camera, the imaging parameters including: view angle, resolution, aperture; the method comprises the following steps:

Acquiring first image data acquired by the main camera and second image data acquired by the auxiliary camera;

performing action recognition according to the second image data to obtain an action recognition result; wherein the action recognition result comprises an action classification label;

adding the action classification label to the first image data.

2. The object recognition method of claim 1, wherein the adding the action classification label to the first image data comprises:

determining the acquisition time of the image frame corresponding to the action classification label;

determining a target image frame in the first image data according to the acquisition time;

adding the action classification label to the target image frame.

3. The object recognition method of claim 1, wherein the adding the action classification label to the first image data comprises:

determining the acquisition time of the image frame corresponding to the action classification label and a target object;

performing target identification according to the target image frame, and determining the position of the target object in the target image frame;

Adding the action classification label to the location in the target image frame.

4. The object recognition method of claim 3, further comprising:

5. The object recognition method of claim 1, further comprising:

performing target recognition according to the first image data, and determining the acquisition time of a first image frame in which at least one target object is located;

determining a second image frame in the second image data corresponding to the acquisition time;

if the target object does not exist in the second image frame, performing action recognition according to the first image frame to obtain an action classification label corresponding to the first image data;

and adding the action classification label corresponding to the first image data.

6. The object recognition method according to any one of claims 1-5, wherein the method further comprises:

If a target action classification label retrieval operation is received, retrieving the first image data according to the target action classification label;

and outputting the image and/or video including the target action classification label in the first image data.

7. The object recognition method of claim 4, further comprising:

if a target action classification label retrieval operation is received, retrieving the second image data according to the target action classification label;

and outputting the image and/or video including the target action classification label in the second image data.

8. The object recognition method of any one of claims 1-5, wherein the action classification labels comprise at least one of: a facial motion classification label, a hand motion classification label, a leg motion classification label, and a whole body motion classification label.

9. The target action recognition device is applied to a double-camera system, wherein the double-camera system comprises a main camera and an auxiliary camera; at least one imaging parameter of the primary camera is superior to the secondary camera, the imaging parameters including: view angle, resolution, aperture; the method comprises the following steps:

10. A camera system, comprising: the system comprises a main camera, an auxiliary camera and a processor; at least one imaging parameter of the primary camera is superior to the secondary camera, the imaging parameters including: view angle, resolution, aperture;

the processor is configured to execute the target action recognition method according to any one of claims 1 to 8.