CN116909405B

CN116909405B - Instruction control method based on artificial intelligence action recognition

Info

Publication number: CN116909405B
Application number: CN202311166670.6A
Authority: CN
Inventors: 周丽宁
Original assignee: Beijing Huilang Times Technology Co Ltd
Current assignee: Beijing Huilang Times Technology Co Ltd
Priority date: 2023-09-12
Filing date: 2023-09-12
Publication date: 2023-12-15
Anticipated expiration: 2043-09-12
Also published as: CN116909405A

Abstract

The application discloses an instruction control method based on artificial intelligent action recognition, which comprises the steps of acquiring basic data by a data acquisition end, transmitting the acquired basic data to a data preprocessing end, analyzing the acquired basic data by the data preprocessing end, solving the technical problem of error caused by recognition due to unclear images, and secondly, failing to perform good classification recognition on actions to cause instruction errors.

Description

Instruction control method based on artificial intelligence action recognition

Technical Field

The application relates to the technical field of artificial intelligence action recognition instructions, in particular to an instruction control method based on artificial intelligence action recognition.

Background

The prior art has realized non-contact control based on limb motion recognition, for example, menu-like software control. Current common limb motion capture schemes include Kinect somatosensory and RealSense real-sense cameras.

The method according to the patent application CN201610202178.3 is shown to comprise: acquiring at least one human body baseline in a shooting range; determining at least two detection areas according to the at least one human body baseline; confirming that at least one detection area has an instruction to trigger a new event, and acquiring and sending an operation control instruction corresponding to the instruction to trigger the new event. According to the technical scheme provided by the embodiment of the application, at least one human body baseline in the shooting range is obtained, at least two detection areas are determined according to the at least one human body baseline, after the fact that an instruction triggers a new event in at least one detection area is confirmed, an operation control instruction corresponding to the instruction triggering the new event is obtained and sent, and an ordinary camera is adopted for image shooting, so that equipment cost required by limb action recognition is reduced, and accuracy of limb action recognition is effectively improved.

Part of the existing action recognition is used, the situation that the recognition is error is caused because the acquired video has quality problems, so that an error command can be generated when the recognition is performed, and the situation that the recognition is not performed because the generated action cannot be acquired when the behavior is recognized is further caused, so that the generation of the whole command is further influenced.

Disclosure of Invention

Aiming at the defects of the prior art, the application provides an instruction control method based on artificial intelligent action recognition, which solves the problems that the recognition generates errors due to unclear images, and the actions cannot be well classified and recognized, so that instruction errors are caused.

In order to achieve the above purpose, the application is realized by the following technical scheme: the instruction control method based on artificial intelligence action recognition specifically comprises the following steps:

step one: the data acquisition end acquires basic data, wherein the basic data comprises: monitoring video images and transmitting acquired basic data to a data preprocessing end;

step two: the data preprocessing end analyzes the acquired basic data, acquires the monitoring video image and preprocesses the monitoring video image, wherein the preprocessing comprises the following steps: removing noise points and recognizing joint points, generating a preprocessing result, and transmitting the preprocessing result to an action recognition end;

step three: the method comprises the steps that a motion recognition end obtains a preprocessing result, analyzes the preprocessing result, firstly performs segmentation recognition on a monitoring video image, then recognizes limb motion and head motion respectively, and generates a motion recognition result, wherein the motion recognition result comprises: a limb action result and a head action result are transmitted to the instruction generating end at the same time;

step four: the instruction generating end acquires the action recognition result, judges and generates a corresponding action instruction according to the action recognition result and the action recognition big data, and transmits the action instruction to the instruction control end;

step five: the instruction control end acquires the action instruction and sends the action instruction to corresponding equipment or a system for control.

As a further aspect of the application: the specific processing mode of removing noise points in the second pretreatment is as follows:

s1: the method comprises the steps of obtaining a monitoring video image, dividing the monitoring video image according to the number of frames, marking and marking each divided frame of image as i, and meanwhile, carrying out noise analysis on the i frame of image, and classifying the noise into a bright noise and a dark noise according to the difference of the noise, wherein the specific classification mode is as follows:

s11: any single frame image in the i frame images is obtained and marked as a target image, then a black background plate is generated, the black background plate is matched with the target image, and noise points displayed after matching are classified as bright noise points;

s12: then generating a white background plate, matching the white background plate with the target image in the same way, classifying the noise points displayed after matching into dark noise points, and carrying out noise point analysis on the i-frame image in the same way;

s2: then, acquiring bright noise points and dark noise points of the target image after classification, classifying the bright noise points and the dark noise points into influence noise points and normal noise points according to whether the influence on the target image is caused, and then removing the influence noise points after classification, wherein the specific classification mode is as follows:

s21: taking a single noise point of a target image as an origin, the following needs to be described: the noise comprises a light noise and a dark noise, the radius is R to form a circle, all pixel points in the circle are obtained at the same time, then the distance between two adjacent pixel points is obtained and recorded as a point distance Dj, and the following description is needed: the average distance record of the point distance in the circle is calculated as the point average distance Dp between two adjacent pixel points which do not include the pixel points with noise, and the average distance record is taken as the standard point distance;

s22: then, the point distance between two pixel points with the noise is obtained and recorded as Dz, the Dz is compared with the point average distance Dp, when Dz is more than or equal to Dp, the noise is judged to have influence, the noise is classified as influence noise, and the influence noise is removed in a mean value filtering mode, and the method needs to be described: the mean filter removal is expressed as: the value of each pixel is replaced by the average value of the pixel values in the surrounding neighborhood, otherwise, when Dz < Dp, the noise is judged to have no influence and classified as a normal noise, and meanwhile, the normal noise is not processed, and the following description is needed: if the noise exists between the two pixel points and the distance between the two pixel points exceeds the standard point distance, the noise is indicated to influence the whole image, otherwise, the noise is indicated to not influence the whole image.

As a further aspect of the application: the manner of generating the action recognition result in the third step is as follows:

p1: the method comprises the steps of obtaining a monitoring video image, and intercepting the monitoring video by combining the motion trail of the joint point, wherein the specific monitoring video intercepting mode is as follows:

p11: carrying out three-dimensional modeling processing on joint points of a user in the monitoring video image, drawing a motion track of the joint points, and simultaneously matching the motion track of the joint points;

p12: when the motion tracks of the articulation points are the same, the generated actions are the same, the times are the same, the articulation point motion time point is taken as a starting point, the monitoring video image is intercepted by taking the articulation point repeated motion time point as an end point to obtain intercepted video, and the actions are identified;

p2: acquiring the position of an articulation point in the intercepted video, and judging the action type of the articulation point, wherein the action type comprises: the specific judging modes of the limb actions and the head actions are as follows:

matching the positions of the joint points with the human body image, judging the head action when the positions of the joint points are recognized to be at the shoulder and above, judging the limb action when the positions of the joint points are recognized to be at the position below the shoulder, and then analyzing the head action and the limb action respectively;

p3: the analysis of limb movements was: the motion trail of the joint point is obtained, and the corresponding limb motion result is generated by matching the motion trail with the behavior motion, wherein the following needs to be described: the behavior action means that the behavior action results obtained through artificial intelligence comparison and simulation calculation are matched with the motion trail of the joint points and the behavior action, and then the corresponding limb action results are obtained through artificial intelligence calculation;

p4: when the head motion is determined, the head motion is analyzed in the following specific analysis modes:

p41: the method comprises the steps of taking the shoulders as an X axis, taking a central axis where the head is located as a Y axis, establishing a rectangular coordinate system, taking a connecting point of the head and the shoulders as an origin, and marking the eyebrows of the head as a starting point;

p42: then, taking the point of stopping movement of the starting point as an end point, connecting the original point, the starting point and the end point in a straight line, and analyzing the head action according to the shape obtained by connection;

p43: when the origin, the starting point and the end point of the shape obtained by connection are on the same straight line, the head motion is indicated to move up and down, and when the shape obtained by connection is triangle, the head motion is indicated to turn.

Advantageous effects

The application provides an instruction control method based on artificial intelligence action recognition. Compared with the prior art, the method has the following beneficial effects:

according to the application, the acquired image is analyzed, the noise in the image is identified, the noise is classified, and the noise influencing the image is removed, so that the overall quality of the image is ensured, errors in subsequent identification are avoided, different classification treatments are performed on actions, matching analysis is performed on actions generated by different parts by combining artificial intelligence, the accuracy of identification is ensured, and the errors of overall identification are reduced.

Drawings

FIG. 1 is a flow chart of the method of the present application;

FIG. 2 is a diagram showing the judgment of the method of the present application.

Detailed Description

The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application.

Referring to fig. 1 and 2, the present application provides an instruction control method based on artificial intelligence action recognition, which specifically includes the following steps:

step one: the data acquisition end acquires basic data, wherein the basic data comprises: monitoring the video image and transmitting the acquired basic data to a data preprocessing end.

Step two: the data preprocessing end analyzes the acquired basic data, acquires the monitoring video image and preprocesses the monitoring video image, wherein the preprocessing comprises the following steps: noise removal and joint point identification, and generating a preprocessing result, and simultaneously transmitting the preprocessing result to an action recognition end, wherein the specific processing mode of noise removal in preprocessing is as follows:

the method is characterized in that a black background plate is utilized to be matched with an image in combination with the scene to be analyzed, dark noise points on the image are fused with the black background plate, bright noise points are displayed, and when the white background plate is utilized to be matched with the image, the dark noise points are displayed, and the i-frame image is classified in the same mode.

S2: then, acquiring bright noise points and dark noise points after target image classification, classifying the bright noise points and the dark noise points into influence noise points and normal noise points according to whether the influence on the target image is caused, and then removing the classified influence noise points, wherein the specific classification mode of the influence noise points and the normal noise points is as follows:

s22: then, the point distance between two pixel points with the noise is obtained and recorded as Dz, the Dz is compared with the point average distance Dp, when Dz is more than or equal to Dp, the noise is judged to have influence, the noise is classified as influence noise, and the influence noise is removed in a mean value filtering mode, and the method needs to be described: the mean filter removal is expressed as: the value of each pixel is replaced by the average value of the pixel values in the surrounding neighborhood, otherwise, when Dz < Dp, the noise is judged to have no influence and classified as a normal noise, and meanwhile, the normal noise is not processed, and the following description is needed: if noise exists between two pixel points, and the distance between the two pixel points exceeds the standard point distance, the noise is proved to have influence on the whole image, otherwise, the noise is not proved to have influence on the whole image when the distance does not exceed the standard point distance;

the joint point identification mode in pretreatment is specifically as follows: a human body image is acquired, the joint points of the human body image are marked and denoted as j, and j=1, 2, …, n, and here, it is to be noted that: the joint points of the human body image represent the connection points of human bones, such as the bone connection points between the big arm and the small arm, as an articulation point.

Step three: the method comprises the steps that a motion recognition end obtains a preprocessing result, analyzes the preprocessing result, firstly performs segmentation recognition on a monitoring video image, then recognizes limb motion and head motion respectively, and generates a motion recognition result, wherein the motion recognition result comprises: the limb action result and the head action result are transmitted to the instruction generating end at the same time, and the specific mode for generating the action recognition result is as follows:

p1: the method comprises the steps of obtaining a monitoring video image, and intercepting the monitoring video by combining the motion trail of the joint point, wherein the specific intercepting mode is as follows:

p4: the analysis of the head motion is:

The method is characterized in that the method is used for analyzing the actual application scene, a rectangular coordinate system is established by the human head, head movements are identified according to the movement positions of the eyebrows, when the head movements are head lifting or head lowering, the positions of the starting point and the end point are on the same straight line, and when the head movements are turning upwards, a triangle is formed among the three points of the starting point, the end point and the original point.

Step four: the instruction generating end acquires the action recognition result, judges and generates a corresponding action instruction according to the action recognition result and the action recognition big data, and transmits the action instruction to the instruction control end.

The second embodiment of the present application is different from the first embodiment in that, in the second step, the noise points are classified in different ways, all the point distances Dj are obtained, then the point distance Dp is calculated, and then the two are substituted into the formulaCalculating to obtain a discrete value I of the point distance Dj, taking the calculated discrete value I as a standard point distance, comparing I with Dz, judging that the noise has influence when Dz is more than or equal to I, classifying the noise as influence noise, otherwise, when Dz is more than or equal to I<And I, judging that the noise has no influence and classifying the noise as a normal noise.

In the third embodiment of the present application, the difference between the third embodiment and the first and second embodiments is that the removal manner of the influence noise point in the second step is different, the removal manner of the influence noise point in the present embodiment adopts a median filtering and a high-value filtering manner to remove the influence noise point, and the median filtering removal is expressed as: the value of each pixel is replaced by the median of the pixel values in its surrounding neighborhood, where it is to be noted that: the median value is expressed as ordering all numbers from small to large, and the middle number is selected as the median value, and the high-value filtering removal is expressed as: the value of each pixel is weighted averaged using a gaussian function for filtering.

In the fourth embodiment, as the fourth embodiment of the present application, the emphasis is placed on the implementation of the first, second and third embodiments in combination.

Some of the data in the above formulas are numerical calculated by removing their dimensionality, and the contents not described in detail in the present specification are all well known in the prior art.

The above embodiments are only for illustrating the technical method of the present application and not for limiting the same, and it should be understood by those skilled in the art that the technical method of the present application may be modified or substituted without departing from the spirit and scope of the technical method of the present application.

Claims

1. The instruction control method based on artificial intelligence action recognition is characterized by comprising the following steps:

step two: the data preprocessing end analyzes the acquired basic data, acquires the monitoring video image and preprocesses the monitoring video image, wherein the preprocessing comprises the following steps: noise removal and joint point identification, and a preprocessing result is generated, and meanwhile, the preprocessing result is transmitted to an action identification end, and the specific processing mode of the noise removal is as follows:

s21: taking a single noise point of a target image as an origin, taking the radius as R as a circle, simultaneously acquiring all pixel points in the circle, then acquiring the distance between two adjacent pixel points and marking the distance as a point distance Dj, calculating the average distance marking the point distance in the circle as a point average distance Dp, and simultaneously taking the average distance marking the point distance as a standard point distance;

s22: then, the point distance between two pixel points with noise points is obtained and recorded as Dz, dz is compared with the point average distance Dp, when Dz is more than or equal to Dp, the noise points are judged to have influence and classified as influence noise points, the influence noise points are removed in a mean value filtering mode, otherwise, when Dz is less than Dp, the noise points are judged to have no influence and classified as normal noise points, and meanwhile, no treatment is carried out on the normal noise points;

2. The method according to claim 1, wherein the step three generates the action recognition result by the following manner:

p1: acquiring a monitoring video image, and intercepting the monitoring video by combining the motion trail of the joint point;

p2: acquiring the position of an articulation point in the intercepted video, and judging the action type of the articulation point, wherein the action type comprises: limb movements and head movements;

p3: the analysis of limb movements was: acquiring a motion trail of the joint point, and matching by combining the behavior action to generate a corresponding limb action result;

p4: when the head motion is determined, the head motion is analyzed.

3. The instruction control method based on artificial intelligence action recognition according to claim 2, wherein the specific interception mode of the monitoring video in P1 is as follows:

p12: when the motion tracks of the joints are the same, the generated actions are the same, the times are the same, the joint point motion time point is taken as a starting point, the joint point repeated motion time point is taken as an end point, the monitoring video image is intercepted, the intercepted video is obtained, and the actions are identified.

4. The instruction control method based on artificial intelligence action recognition according to claim 2, wherein the specific judgment mode in P2 is as follows:

the position of the joint point is matched with the human body image, when the joint point is recognized to be at the position of the shoulder or above, the head action is judged, when the joint point is recognized to be at the position of the position below the shoulder, the limb action is judged, and then the head action and the limb action are respectively analyzed.

5. The instruction control method based on artificial intelligence action recognition according to claim 2, wherein the specific analysis of the head action in P4 is: