CN116614666B

CN116614666B - AI-based camera feature extraction system and method

Info

Publication number: CN116614666B
Application number: CN202310869059.3A
Authority: CN
Inventors: 王路明; 席磊磊; 康志伟; 邓雪莲
Original assignee: Microgrid Union Technology Chengdu Co ltd
Current assignee: Microgrid Union Technology Chengdu Co ltd
Priority date: 2023-07-17
Filing date: 2023-07-17
Publication date: 2023-10-20
Anticipated expiration: 2043-07-17
Also published as: CN116614666A

Abstract

The invention discloses an AI-based camera feature extraction system and a method, which belong to the technical field of camera monitoring equipment, wherein the method comprises the following steps: acquiring a real-time image of a target to be detected; detecting a specific gesture of a target to be detected, and performing recording action to obtain continuous frame images after the specific gesture; comparing the obtained continuous frame images with the content in the storage library, and judging whether a target continuous frame image exists in the storage library; when the fact that the target continuous frame image in the storage library is consistent with at least part of image content in the acquired continuous frame image is detected, executing an instruction corresponding to the target continuous frame image. The AI-based camera feature extraction method provided by the invention can effectively avoid the situation that the instruction is wrongly issued due to the careless instruction action of the host, thereby achieving the purpose of improving the robustness of the system and avoiding the situation that the loss is possibly caused to the live broadcast party.

Description

AI-based camera feature extraction system and method

Technical Field

The invention belongs to the technical field of camera monitoring equipment, and particularly relates to a camera characteristic extraction system and method based on an AI (advanced technology attachment).

Background

With the popularity of the internet, the traditional industry has also been more or less affected, and the traditional industry has begun to change the internet. Illustratively, webcasting is becoming increasingly popular. In the process of network live broadcasting, in order to meet various demands of a host, the host hopes to be able to remotely control internet equipment to make different instructions while live broadcasting.

In the prior art adopted at present, the issuing of the implementation instruction is performed through the gesture cognition of the anchor. The specific mode adopted at present is that a static image is obtained by shooting a host, then the static image is identified, the identified static image is compared with a pre-stored image in a storage library, and when the preset image in the storage library is identical to the identified static image, a command corresponding to the preset image is issued.

However, the mode is realized through static image comparison, and the host side deliberately avoids the instruction action when the instruction is not required to be issued in order to distinguish the daily action from the instruction action; in actual operation, in order to avoid abrupt actions of the anchor, the instruction actions are usually designed to be close to daily actions, so that more anchor acts carelessly, which leads to wrong instruction delivery, and wrong instruction delivery, which leads to increased system progress of the internet equipment and affects the running stability of the system; the weight causes economic loss, with the result of being difficult to recover.

Disclosure of Invention

The invention provides a feature extraction system and method based on an AI camera, which can effectively avoid the situation that an inadvertent instruction action occurs when a host player performs daily actions in a live broadcast process, thereby achieving the aim of avoiding the issuing of instruction errors.

The invention is realized by the following technical scheme:

in one aspect, the invention provides an AI-based camera feature extraction method, which comprises the following steps: acquiring a real-time image of a target to be detected; detecting a specific gesture of a target to be detected, and performing recording action to obtain continuous frame images after the specific gesture; comparing the obtained continuous frame images with the content in the storage library, and judging whether a target continuous frame image exists in the storage library; when the fact that the target continuous frame image in the storage library is consistent with at least part of image content in the acquired continuous frame image is detected, executing an instruction corresponding to the target continuous frame image.

In some embodiments, after detecting that the target continuous frame image in the storage library is consistent with at least part of image content in the acquired continuous frame image, before executing the instruction corresponding to the target continuous frame image, the method further comprises the following steps: cutting the obtained continuous frame images according to the content in the target continuous frame images to obtain processed continuous frame images consistent with the content of the target continuous frame images; acquiring a first image of a first frame and a second image of a last frame in processed continuous frame images; gray processing is carried out on the first image and the second image, so that a plurality of pixel points with the content of 0 or 1 are obtained; acquiring the quantity difference between 1 pixel point in the first image and 1 pixel point in the second image; and when the number of the pixel points 1 in the first image and the number difference of the pixel points 1 in the second image are within the range of the number difference of the pixel points corresponding to the target continuous frame image, continuing to execute the subsequent instruction corresponding to the target continuous frame image.

In some of these embodiments, acquiring a real-time image of the object to be measured includes: acquiring an overall image of a target to be detected; searching a mark point in the whole image; and acquiring a specified image with the mark point as the center and the radius as a preset radius, and taking the image as a real-time image of the target to be detected.

In some embodiments, the marking points include marking points with a designated portion of the object to be measured or marking points with a location where the peripheral is worn.

In some of these embodiments, finding the marker points in the overall image includes: identifying a target to be detected; and taking the specific part of the target to be measured as a marking point.

In some embodiments, after the recording, the method further includes the following steps: stopping the recording action; and repeatedly acquiring a real-time image of the target to be detected and subsequent steps.

In some embodiments, after the recording, before stopping the recording, the method further comprises the following steps: judging whether the recording time reaches a preset time or not; and when the preset time is reached, continuing to execute the subsequent step of stopping the recording action.

In some embodiments, after determining whether the target continuous frame image exists in the storage repository, the method further comprises the steps of: and deleting the acquired continuous frame images when the fact that the target continuous frame images are not consistent with at least part of image contents in the acquired continuous frame images is detected in the storage library.

On the other hand, the present embodiment provides an AI-based camera feature extraction system, including a memory and a processor, the memory storing a computer program, the processor executing the computer program to implement the AI-based camera feature extraction method of any one of the above embodiments.

Compared with the prior art, the invention has the following advantages:

according to the AI-based camera feature extraction method provided by the invention, compared with a mode of adopting a static image as a comparison by confirming the recording action in advance and comparing the continuous frame images acquired in the subsequent recording action and then executing the corresponding instruction, the stability of instruction issuing is better, the situation that the instruction is wrongly issued due to the fact that the host player carelessly acts on the instruction can be effectively avoided, the purpose of improving the robustness of the system is further achieved, and the situation that the loss is possibly caused to the live player is avoided.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the following description will briefly describe the drawings in the embodiments, it being understood that the following drawings only illustrate some embodiments of the present invention and should not be considered as limiting the scope, and that other related drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

Fig. 1 is a flow chart of an AI-based camera feature extraction method according to some embodiments of the present invention;

fig. 2 is a flowchart illustrating steps for acquiring a real-time image of a target to be measured according to some embodiments of the present invention.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention.

In the description of the present invention, it should be noted that, as the terms "center", "upper", "lower", "left", "right", "vertical", "horizontal", "inner", "outer", etc., are used to indicate orientations or positional relationships based on those shown in the drawings, or those that are conventionally put in use in the product of the present invention, they are merely used to facilitate description of the present invention and simplify description, and do not indicate or imply that the apparatus or elements referred to must have a specific orientation, be configured and operated in a specific orientation, and thus should not be construed as limiting the present invention.

Furthermore, the terms "horizontal," "vertical," and the like in the description of the present invention, if any, do not denote absolute levels or overhangs, but rather may be slightly inclined. As "horizontal" merely means that its direction is more horizontal than "vertical", and does not mean that the structure must be perfectly horizontal, but may be slightly inclined.

In the description of the present invention, it should also be noted that, unless explicitly stated and limited otherwise, the terms "disposed," "mounted," "connected," and "connected" should be construed broadly, and may be, for example, fixedly connected, detachably connected, or integrally connected; can be mechanically or electrically connected; can be directly connected or indirectly connected through an intermediate medium, and can be communication between two elements. The specific meaning of the above terms in the present invention will be understood in specific cases by those of ordinary skill in the art.

The terms "comprising" and "having," and any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or modules is not limited to only those steps or modules but may include other steps or modules not expressly listed or inherent to such process, method, article, or apparatus.

The sequence of the different steps is not sequential, unless specifically stated.

On the one hand, the present embodiment provides a method for extracting features based on an AI camera, referring to fig. 1, mainly including the following steps:

s10, acquiring a real-time image of the object to be detected. In S10, a real-time image of the target to be measured is continuously acquired, so as to facilitate the subsequent determination of the specific gesture of the target to be measured. The acquired real-time image can be the whole image shot by the camera, or can be part of the image content in the whole shot image.

In some examples, the object to be measured may be a person, and when the object to be measured is a person, the instruction of the internet device may be issued by acquiring a gesture, a body posture, or the like of the person.

In other examples, the object to be measured may also be an object, and when the object to be measured is an object, for example, the object may be an object, and the object may be a device, and the action of the device may be a periodic preset action of the object or a real-time action of the object that is remotely operated, through the action of the device, as a trigger action of an internet device instruction in an actual application process.

S20, detecting the specific gesture of the target to be detected, and performing recording action to acquire continuous frame images after the specific gesture. In S20, when a specific gesture of the object to be detected is detected in the acquired real-time image, a recording instruction is triggered at this time to perform a recording action. The recording action is aimed at acquiring continuous frame images after a specific gesture, and determining whether to issue an instruction or not by acquiring the continuous frame images and determining the content in a storage library in a subsequent step.

S30, comparing the acquired continuous frame images with the content in the storage library, and judging whether the target continuous frame images exist in the storage library. In S30, the acquired continuous frame image is compared with the content in the storage repository, the content stored in the storage repository is also the content including the continuous frame image, the plurality of continuous frame images are a set of data, the target continuous frame image means that a set of data exists in the storage repository, the plurality of continuous frame images in the set of data are the same as the acquired continuous frame image content, and for convenience of description, the set of data are defined as the target continuous frame image. Because the number of frames of the acquired continuous frame image may be greater than the number of frames of the target continuous frame image in the memory bank, this is typically due to the longer recording action time, because the number of frames of the corresponding target continuous frame image is not necessarily the same for different instructions, so the recording action may employ the acquired continuous frame image having a number of frames greater than the maximum number of frames of the target continuous frame image in the memory bank, and thus the number of frames of the target continuous frame image is typically less than the number of frames of the acquired continuous frame image.

And S40, when the fact that the target continuous frame image in the storage library is consistent with at least part of image content in the acquired continuous frame image is detected, judging that the live broadcasting party sends out a corresponding instruction, and executing the instruction corresponding to the target continuous frame image. In S40, each target continuous frame image in the repository corresponds to at least one instruction, and when at least part of the content in the continuous frame images in the acquired recording action is the same as the target continuous frame image, the instruction corresponding to the target continuous frame image is triggered, and then the instruction is executed.

In the above embodiment, by comparing the previous confirmation of the recording action with the continuous frame images acquired in the subsequent recording action and then executing the corresponding instruction, compared with the mode of adopting the static image as the comparison, the instruction issuing stability is better, the situation that the host side carelessly generates the instruction action and further causes the wrong issuing of the instruction can be effectively avoided, the purpose of improving the robustness of the system is further realized, and the situation that the loss is possibly caused to the live broadcast side is avoided.

In some embodiments, in step S40, after detecting that there is a target continuous frame image in the storage repository that is consistent with at least part of the image content in the acquired continuous frame image, before executing the instruction corresponding to the target continuous frame image, the method further includes the following steps:

and T10, cutting the acquired continuous frame images according to the content in the target continuous frame images to obtain processed continuous frame images consistent with the content of the target continuous frame images. In T10, the processed continuous frame image and the target continuous frame image are identical in content, not from the human sense, but from the machine perspective, the machine determination considers that the two are identical. In the prior art, when a real-time image meets a certain requirement through pre-modeling and machine learning, the real-time image is considered to be identical or partially identical to the content of the image pre-stored in a storage library. Therefore, the processed continuous frame image obtained in the step T10 is a multi-frame real-time image meeting certain requirements from the view angle of the machine, based on the same content considered by the machine, the obtained continuous frame image is then cut so that the cut processed continuous frame image is consistent with the content of the target continuous frame image, and the cut multi-frame image is defined as the processed continuous frame image for convenience of subsequent description.

And T20, acquiring a first image of the first frame and a second image of the last frame in the processed continuous frame images. In T20, the image of the first frame in the processed continuous frame images is defined as the first image; the last frame image of the processed continuous frame images is defined as a second image, so that the subsequent description is more convenient.

And T30, gray scale processing is carried out on the first image and the second image. In T30, the first image and the second image undergo gray scale processing to obtain an image including a plurality of pixels, and the content in each pixel is 0 or 1.

And T40, acquiring the difference between the number of 1 pixel points in the first image and the number of 1 pixel points in the second image.

And T50, when the number of the pixel points 1 in the first image and the number difference of the pixel points 1 in the second image are within the range of the number difference of the pixel points corresponding to the target continuous frame image, continuing to execute the subsequent instruction corresponding to the target continuous frame image. In T50, each target continuous frame image in the repository corresponds to a range interval with a pixel point 1 with a number difference, and the difference between the number of 1 pixels in the first image and the number of 1 pixels in the second image is within the range interval with a pixel point 1 with a number difference, that is, the two are considered as the same content.

In the above embodiment, when the machine considers that the acquired recorded continuous frame image corresponds to the target continuous frame image in the storage library, in order to avoid the situation that the prior machine learns to perform fitting, the auxiliary determination is further performed through the difference in the number of pixel points, and it is further determined that the recorded continuous frame image meets a certain requirement, that is, the acquired recorded continuous frame image corresponds to the target continuous frame image in the storage library. With this arrangement, the robustness of the system can be further improved. On the other hand, the situation that the number of overfitting is large in the current machine learning can be effectively solved.

In some embodiments, referring to fig. 2, S10 may mainly include the following steps:

s101, acquiring an overall image of a target to be detected. In T101, the acquired whole image is the content of the whole still image acquired by the camera.

S102, searching for a mark point in the whole image. In S102, the preset manner of the marking point may include various manners, which may be that the designated part of the live broadcast party is used as the marking point, or that the designated area in the whole image acquired by the camera is used as the marking point, or that external auxiliary equipment is used, and that the position of the auxiliary equipment is used as the marking.

S103, acquiring a specified image with a mark point as a center and a radius as a preset radius, and taking the image as a real-time image of the object to be detected.

In the embodiment, the real-time image of the target to be detected is obtained by simplifying the whole image obtained by the camera, so that the size of the comparison image is smaller in the subsequent comparison process, the calculation power required by the machine for image comparison is lower, the early investment of equipment can be effectively reduced, and the economical efficiency is improved. On the other hand, the machine has lower calculation power and lower occupied operation content, and by the arrangement, the robustness of the system can be effectively improved. In other aspects, after the whole image is simplified, in the subsequent comparison process, the compared content of the image is less, so that the condition that other secondary content (other content except for the specific gesture) affects the main content (including the content of the specific gesture) in the comparison process due to excessive compared content can be effectively avoided.

In some embodiments, when the real-time image of the target to be detected is obtained by using the mark point as the center and using the specific image with the radius of the preset radius, the preset content in the storage library can be the local image content, and when the number of the pixel points of the subsequent comparison image is poor, the number of the compared pixel points is smaller, and the comparison speed is higher.

In some examples, the marking point may take the object instruction location as a marking point, such as a face, a hand, etc. of a human body. The peripheral can be worn, and the position where the peripheral is worn is taken as the position where the marking point is located.

In other examples, finding the marker points in the overall image may include the steps of:

s1021, identifying the target to be tested.

S1022, taking the specific part of the object to be measured as a marking point.

In the above embodiment, by identifying the target to be detected, when the content in the whole image is too much, a targeted identification operation can be performed, the content to be identified is reduced, and then a subsequent comparison operation is performed. For example, when the live broadcast party includes a plurality of characters, in order to make the instruction issue more accurate, and also in order to avoid the situation that other characters have misoperation due to unclear instruction actions, only some of the characters are taken as the instruction issue party, and by identifying the target to be tested, the action triggering instruction issue of other people or objects can be effectively avoided. On the other hand, the number of the identification contents of the system can be reduced, and the comparison efficiency of the whole comparison process is improved.

In some embodiments, after the recording, the method further includes the following steps:

and K10, stopping recording. In K10, after sending out the instruction of stopping recording, an array comprising continuous frame images is obtained, and then the obtained array comprising continuous frame images is compared with data in a storage library.

And K20, repeatedly acquiring a real-time image of the target to be detected and subsequent steps. In K20, after stopping the recording operation, the still image is repeatedly identified to issue the next live broadcast control command. The repeated steps can also be performed simultaneously with the steps S30 and S40, that is, the processor repeatedly performs the identification of the still image on the one hand to determine whether the recording action is required; and on the other hand, the recorded continuous frame images are still compared with the target continuous frame images pre-existing in the storage library, so that multi-core operation is realized. Of course, in a different example, after stopping the recording action, the step of repeating the identification of the still image may also have a precedence relationship with the steps of steps S30 and S40, and for example, the step of repeating the identification of the still image after the identification of the previous instruction and the determination of whether to execute or not may be repeated, which requires less processor.

In some embodiments, after the recording, before stopping the recording, the method further comprises the following steps:

judging whether the recording time reaches the preset time.

And when the preset time is judged to be reached, continuing to execute the subsequent step of stopping the recording action.

In the above embodiment, the preset time is used as the end signal of the recording action, and the preset time may be the recording time corresponding to the number of frames of the target continuous frame image in the storage library, because the length of time is proportional to the number of frames of the acquired continuous frame image, so as to avoid the situation that the live broadcasting party fails to recognize the recorded continuous frame image due to failure in recognition even if the live broadcasting party performs the correct recording action. And the situation that the system process occupies too long time and the system burden is increased due to too long recording time can be avoided.

In some of these embodiments, after the step of determining whether the target consecutive frame image is present in the memory bank, the method further comprises the steps of:

and deleting the acquired continuous frame images when the fact that the target continuous frame images are not consistent with at least part of image contents in the acquired continuous frame images is detected in the storage library.

In the above embodiment, when the recorded continuous frame image cannot find the corresponding content in the storage library, that is, it is judged that the recorded continuous frame image has no practical meaning, the previous specific gesture is false triggering of the host as a daily action, at this time, no instruction is required to be issued, at this time, the acquired continuous frame image is deleted in time, so that the occupation of the memory can be effectively removed, and the running speed of the internet device is improved.

On the other hand, the present embodiment provides an AI-based camera feature extraction system, including a memory and a processor, where the memory stores a computer program, and the processor executes the computer program to implement the AI-based camera feature extraction method according to any one of the above embodiments.

The present embodiment also provides a computer storage medium on which a computer program is stored, the computer program being loaded by a processing module to implement the AI-based camera feature extraction method of any of the above embodiments.

The foregoing description is only of the preferred embodiments of the present invention, and is not intended to limit the invention in any way, but any simple modifications and equivalent variations of the above embodiments according to the technical principles of the present invention fall within the scope of the present invention.

Claims

1. The AI-based camera feature extraction method is characterized by comprising the following steps:

acquiring a real-time image of a target to be detected;

detecting a specific gesture of a target to be detected, and performing recording action to obtain continuous frame images after the specific gesture;

comparing the obtained continuous frame images with the content in a storage library, and judging whether a target continuous frame image exists in the storage library;

when detecting that the target continuous frame images exist in the storage library and the content of at least part of the images in the acquired continuous frame images is consistent, executing an instruction corresponding to the target continuous frame images;

when detecting that the target continuous frame image in the storage library is consistent with at least part of image content in the acquired continuous frame image, before executing the instruction corresponding to the target continuous frame image, the method further comprises the following steps:

cutting the obtained continuous frame images according to the content in the target continuous frame images to obtain processed continuous frame images consistent with the content of the target continuous frame images;

acquiring a first image of a first frame and a second image of a last frame in the processed continuous frame images;

gray processing is carried out on the first image and the second image, so that a plurality of pixel points with the content of 0 or 1 are obtained;

acquiring the quantity difference between 1 of the pixel points in the first image and 1 of the pixel points in the second image;

and when the difference between the number of the pixel points 1 in the first image and the number of the pixel points 1 in the second image is within the range of the difference of the pixel points corresponding to the target continuous frame image, continuing to execute the subsequent instruction corresponding to the target continuous frame image.

2. The AI-based camera feature extraction method of claim 1, wherein obtaining a real-time image of the object to be detected comprises:

acquiring an overall image of a target to be detected;

searching a mark point in the whole image;

and acquiring a specified image with the mark point as a center and the radius as a preset radius, and taking the image as a real-time image of the target to be detected.

3. The AI-based camera feature extraction method of claim 2, wherein the marking points include marking points with a designated portion of the object to be detected or marking points with a location where the peripheral is worn.

4. The AI-based camera feature extraction method of claim 2, wherein finding a marker point in the overall image comprises:

identifying a target to be detected;

and taking the specific part of the target to be measured as a marking point.

5. The AI-based camera feature extraction method of claim 1, further comprising, after the recording act, the steps of:

stopping the recording action;

and repeatedly acquiring a real-time image of the target to be detected and subsequent steps.

6. The AI-based camera feature extraction method of claim 5, further comprising, after the act of recording, before stopping the act of recording, the steps of:

judging whether the recording time reaches a preset time or not;

and when the preset time is reached, continuing to execute the subsequent step of stopping the recording action.

7. The AI-based camera feature extraction method of claim 1, further comprising, after determining whether there are target sequential frame images in the repository, the steps of:

and deleting the acquired continuous frame images when the fact that the target continuous frame images are not present in the storage library and at least part of image contents in the acquired continuous frame images are consistent is detected.

8. An AI-based camera feature extraction system comprising a memory storing a computer program and a processor executing the computer program to implement the AI-based camera feature extraction method of any of claims 1-7.