CN114387671A

CN114387671A - Step counting method and device, mobile terminal and computer readable storage medium

Info

Publication number: CN114387671A
Application number: CN202210040342.0A
Authority: CN
Inventors: 吕根鹏; 曾凡涛; 刘玉宇
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2022-01-13
Filing date: 2022-01-13
Publication date: 2022-04-22

Abstract

The application relates to the field of data processing, and discloses a step counting method, a step counting device, a mobile terminal and a computer readable storage medium, wherein the method comprises the following steps: acquiring multi-frame images of a target object, and performing frame difference processing on every two adjacent frame images in the multi-frame images to obtain a frame difference image corresponding to every two adjacent frame images; then, calling a gesture recognition model to recognize the gesture of the target object according to the frame difference image corresponding to each two adjacent frames of images to obtain at least one gesture recognition result of the target object; and determining a step number updating strategy for the target object according to at least one gesture recognition result, and updating the step number of the target object according to the step number updating strategy. By the method, the efficiency of counting the steps of the target object can be improved. The present application relates to blockchain techniques, such as the above-described step update strategy and gesture recognition model can be written into blockchains for other data processing scenarios.

Description

Step counting method and device, mobile terminal and computer readable storage medium

Technical Field

The present application relates to the field of data processing, and in particular, to a step counting method and apparatus, a mobile terminal, and a computer-readable storage medium.

Background

With the combination of deep learning and machine vision, a camera of a mobile terminal (such as a mobile phone and a vehicle-mounted terminal) is used to obtain a picture, and then the picture is processed by a model of the mobile terminal, so that a mode of obtaining required visual information is more and more widely applied.

In some application fields, for example, step counting is performed on a target object by using an image, a conventional method needs to input each frame of image corresponding to the target object into a model, the model performs a series of detection and operation processing on each frame of image to obtain a processing result of each frame of image, complex analysis operation needs to be performed at least by combining the processing results of each frame of image corresponding to three continuous frames of image, for example, analysis is performed by using a corresponding analysis tool, whether one curve of projection curves of the three continuous frames of image has an obvious peak and whether the other two curves have an obvious trough is analyzed according to the processing results of the three continuous frames of image, and thus whether the target object moves one step is judged based on the analysis results. This approach is not ideal in efficiency because it requires the model to process every frame of image. Therefore, how to improve the efficiency of counting the number of steps of the target object on the basis of reducing the processing amount of the model is an urgent technical problem to be solved.

Disclosure of Invention

The embodiment of the application provides a step counting method, a step counting device, a mobile terminal and a computer readable storage medium, which can improve the efficiency of step counting of a target object.

The embodiment of the application discloses a step counting method on one hand, and the method comprises the following steps:

acquiring a multi-frame image of a target object;

performing frame difference processing on each two adjacent frames of images in the multi-frame images to obtain a frame difference image corresponding to each two adjacent frames of images;

calling a gesture recognition model to recognize the gesture of the target object according to the frame difference image corresponding to each two adjacent frames of images to obtain at least one gesture recognition result of the target object;

determining a step number updating strategy for the target object according to the at least one gesture recognition result;

and updating the step number of the target object according to the step number updating strategy.

An embodiment of the present application discloses a step counting device on one hand, and the device includes:

an acquisition unit configured to acquire a multi-frame image of a target object;

the processing unit is used for carrying out frame difference processing on each two adjacent frames of images in the multi-frame images to obtain a frame difference image corresponding to each two adjacent frames of images;

the processing unit is further configured to invoke a gesture recognition model to recognize the gesture of the target object according to the frame difference image corresponding to each two adjacent frames of images, so as to obtain at least one gesture recognition result of the target object;

a determining unit, configured to determine a step number update policy for the target object according to the at least one gesture recognition result;

the processing unit is further configured to update the step number of the target object according to the step number update policy.

An embodiment of the present application discloses a mobile terminal, which includes:

a processor adapted to implement one or more computer programs; and a computer storage medium storing one or more computer programs adapted to be loaded by the processor and to perform the above-described step count statistical method.

In one aspect, the present application discloses a computer-readable storage medium storing one or more computer programs adapted to be loaded by a processor and execute the above step counting method.

An aspect of the embodiments of the present application discloses a computer program product or a computer program, where the computer program product includes a computer program, and the computer program is stored in a computer readable storage medium. The processor of the mobile terminal reads the computer program from the computer-readable storage medium, and the processor executes the computer program, so that the mobile terminal executes the step counting method.

In the embodiment of the application, any mobile terminal can obtain a frame difference image corresponding to each two adjacent frames of images by obtaining the multiple frames of images of the target object and performing frame difference processing on each two adjacent frames of images in the multiple frames of images; then, calling a gesture recognition model to recognize the gesture of the target object according to the frame difference image corresponding to each two adjacent frames of images to obtain at least one gesture recognition result of the target object; and further determining a step number updating strategy (including step number increase and step number invariance) for the target object according to at least one gesture recognition result, and finally updating the step number of the target object according to the step number updating strategy. According to the method, the mobile terminal judges the posture recognition result (the states of rising, falling, unchanged height and the like) of the target object by utilizing the frame difference image corresponding to every two adjacent frames of images of the target object, and further judges whether the target object moves one step or not. The step number of the target object can be counted in real time, specifically, the method utilizes a trained gesture recognition model to process a frame difference image, the gesture recognition result of the target object in the frame difference image is obtained, each frame of image does not need to be input into the model to be processed, the processing result of each frame is obtained, the processing result is analyzed in a complex mode by utilizing a corresponding tool to determine the gesture recognition result, based on the result, the time length of the method when the image is processed is shorter, the gesture recognition result of the target object can be determined in a shorter time, and therefore the step number of the target object is determined, and therefore the efficiency of counting the step number of the target object is improved.

Drawings

In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is not necessary for those skilled in the art to describe the embodiments or the prior art

On the premise of creative work, other drawings can be obtained according to the drawings.

FIG. 1 is a schematic flow chart of a method for counting steps disclosed in an embodiment of the present application;

FIG. 2 is a schematic diagram of a motion state icon disclosed in an embodiment of the present application;

FIG. 3 is a schematic flow chart diagram of another step counting method disclosed in the embodiments of the present application;

FIG. 4 is a schematic flow chart illustrating a method for training a gesture recognition model according to an embodiment of the present disclosure;

fig. 5 is a schematic structural diagram of a step counting apparatus disclosed in an embodiment of the present application;

fig. 6 is a schematic structural diagram of a mobile terminal disclosed in an embodiment of the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The step counting method disclosed in the embodiment of the application can be executed by any mobile terminal, wherein the mobile terminal can be a smart phone, a smart watch, a smart car, and the like, but is not limited thereto. The system comprises a mobile terminal, a camera acquisition device and a display device, wherein the mobile terminal comprises the camera acquisition device and is mainly used for acquiring image information of a target object; optionally, the mobile terminal further includes a display device, which is mainly used to display the step count statistical result after the step count statistical result of the target object is obtained. In a specific implementation scene, in order to realize the step number statistics of the target object, the camera of the mobile terminal is fixed during the motion process of the target object, so that the motion image of the target object can be conveniently acquired. Specific scenarios may be as follows: when the target object moves on the running machine, the mobile terminal is fixed on the running machine so as to acquire the moving image of the target object in real time and count the steps of the target object; or in the marathon competition, the mobile terminal is fixed on the following shooting vehicle, and the steps of the athlete are recorded in real time, so that the physical condition of the athlete can be monitored. In addition to the above-mentioned scenes, other scenes may be included, and it is only necessary to fix the camera of the mobile terminal to acquire the moving image of the target object, and the specific scene is not limited herein.

In a possible implementation manner, the step counting method provided in the embodiment of the present application may process a recorded video in addition to a real-time image, that is, perform frame division processing on the video, and then process an image frame to obtain a step counting result of the video. The above-described display device may also be used to display the exercise state of the target object, or to display the physical state of the target object while exercising, such as calories consumed, exercise time, and the like.

Referring to fig. 1, a flow diagram of a step counting method disclosed in the embodiment of the present application is a flow diagram, where the step counting method may be executed by a mobile terminal, and the embodiment shown in fig. 1 performs step counting according to multiple frame images, where the step counting method at least includes three steps, and specifically, the step counting method may specifically include the following steps:

s101, acquiring multi-frame images of the target object.

Wherein, in general, the target object refers to a moving object, such as the user a. In a possible embodiment, the collected images not including the target object are filtered or some blurred images are filtered, so that the remaining multi-frame images are all usable images. The multi-frame image may be obtained in real time or obtained by processing a video.

In a possible implementation manner, a camera of the mobile terminal obtains a multi-frame image of the target object in real time, and in the embodiment related to fig. 1, the multi-frame image at this point includes at least three frames of images, which correspond to images at three moments. The method comprises the steps that a camera of a mobile terminal records a motion video of a target object within preset time, then the video of the section is read from a memory of the mobile terminal, frame division processing is carried out on the video, a video frame image sequence corresponding to the video is carried out, and then the video frame image sequence is screened to obtain a multi-frame image of the target object.

S102, performing frame difference processing on every two adjacent frames of images in the multi-frame images to obtain a frame difference image corresponding to every two adjacent frames of images.

According to the step S102, it is known that the multi-frame image includes at least three frames of images, and assuming that the three frames of images are taken as an example and are respectively a first frame of image, a second frame of image and a third frame of image, where the first frame of image is adjacent to the second frame of image, the second frame of image is adjacent to the third frame of image, and a time point of the first frame of image is most front, it can be understood that the first frame of image is at a time point t-2, the second frame of image is at a time point t-1, and the third frame of image is at a time point t, and each two adjacent frames of images in the multi-frame image are subjected to frame difference processing to obtain a frame difference image corresponding to each two adjacent frames of images, so that two frame difference images can be obtained. The acquired image may be an RGB (various colors obtained by changing three color channels of red (R), green (G), and blue (B) and superimposing the same on each other, which is one of the most widely used color systems) image or a grayscale image, and the type of the acquired image is not limited herein.

In a possible implementation process, the specific process of performing frame difference processing on each two adjacent frame images in the multiple frame images to obtain a frame difference image corresponding to each two adjacent frame images may include: firstly, acquiring any two adjacent frames of images; if the type of the acquired image is an RGB image, the RGB image can be understood as a matrix, each two adjacent frames of images include a first image and a second image, and the first image is assumed to be I_tCorresponding to a height H1 and a width W1, then firstImage I_tThe RGB image of (3, H1, W1) may be a matrix of (3, H1, W1) values each equal to or greater than 0 and equal to or less than 256 representing color values of the image, the second image I_t-1Is (3, H2, W2), so the subtraction can be performed directly by subtraction of the matrix, which results in a matrix of (3, H3, W3) representing the first image I_tAnd a second image I_t-1The frame difference image of (1). Where 3 represents the number of matrices (i.e., 3 channels of RGB), and H × W represents the size of the matrices.

S103, calling a gesture recognition model to recognize the gesture of the target object according to the frame difference image corresponding to each two adjacent frames of images to obtain at least one gesture recognition result of the target object.

According to the above explanation, the multi-frame image is at least three frames, and the number of the posture recognition results obtained according to the posture model is at least two. For step count statistics in the present application, the gesture recognition results may include three categories, namely "up", "down", and "height invariant". In the model processing, it is determined based on a set threshold, and "up" and "down" indicate that the target object has moved, and "height is unchanged" indicates that the target object has not moved.

In a possible implementation process, the gesture recognition model can be understood as a classification model, the corresponding output is a probability value (classification result), in this application, three values respectively represent a probability corresponding to "rising", a probability corresponding to "falling", and a probability corresponding to "height invariant", and finally the gesture recognition result with the highest probability is selected as the target object. For example, the probability value of "up" that is the classification result may be 0.7, the probability value of "down" that is 0.1, the probability value of "constant height" that is 0.2, and the three probability values are added to 1, and from the respective probability values, of which 0.7 is the maximum, the posture recognition result of the target object may be determined to be "up". For example, when the multi-frame image is a three-frame image and there are two corresponding frame difference images, two gesture recognition results can be obtained according to the gesture recognition model.

And S104, determining a step number updating strategy for the target object according to at least one gesture recognition result.

From the above explanation, it can be known that in the embodiment of the present application, the multi-frame image includes the first frame image, the second frame image adjacent to the first frame image, and the third frame image adjacent to the second frame image, and therefore, the obtained at least one gesture recognition result includes the first gesture recognition result between the first frame image and the second frame image, and the second gesture recognition result between the second frame image and the third frame image, where the time point of the first frame image is the earliest.

In a possible implementation manner, the determining, according to at least one gesture recognition result, a step number update policy for the target object may specifically be: and if the first posture identification result is ascending and the second posture identification result is descending, determining that the step number updating strategy is the step number plus 1, otherwise, determining that the step number updating strategy is the step number is unchanged. Or if the first posture recognition result is descending and the second posture recognition result is ascending, determining that the step number updating strategy is the step number plus 1, otherwise, determining that the step number updating strategy is the step number is unchanged. Whether "if the first posture recognition result is rising and the second posture recognition result is falling" or "if the first posture recognition result is falling and the second posture recognition result is rising" is a limiting condition, once set, the subsequent step number update strategy is determined according to the condition.

In a specific implementation process, the gesture recognition results are known to include rising, falling and height invariance, so a specific step number update strategy can be as follows: if the gesture recognition result continuously appears in the condition of from rising to falling, the step number updating strategy is that the step number is added with 1, otherwise (comprising from falling to rising, from rising to rising, from falling to falling, from unchanged height to unchanged height, from rising to unchanged height, from falling to unchanged height, from unchanged height to rising and from unchanged height to falling), the step number updating strategy is that the step number is unchanged. That is, the gesture recognition result can be used, the situation that the height is unchanged is ignored, and as long as the gesture recognition result rises and then falls, the motion step number is added by 1, namely, the target object moves by one step.

For example, the gesture recognition model may obtain a series of gesture recognition results from the multi-frame images over time, assuming that the gesture recognition result of the target object: the height is unchanged, the rising, the height is unchanged, the falling, the height is unchanged, the rising, the falling and the height are unchanged, and the labels with the unchanged inner height are removed to obtain the rising, the falling, the rising and the falling.

If the set condition rises and then falls, and the step number updating strategy is that the step number is added with 1, then the second and the third are rising and falling, the step number of the target object is added with 1, the pedometer is added with 1, the fifth and the sixth are rising and falling, the step number of the target object is added with 1, the pedometer is added with 1, and otherwise, the step number is not changed. If the set condition is decreased and then increased, and the step number updating strategy is that the step number is added with 1, then the fourth and fifth steps are decreased and increased, the step number of the target object is added with 1, the pedometer is added with 1, and otherwise, the step number is not changed.

And S105, updating the step number of the target object according to the step number updating strategy.

In a possible implementation manner, as shown in step S104, the step number is updated according to the step number updating policy, and if the step number updating policy is the step number plus 1, the pedometer of the mobile terminal is plus 1. For a real-time scenario, the change of the step number may be displayed on the display of the mobile terminal in real time, and if the original step number is 2000 and the step number update policy is detected as the step number plus 1, the step number is changed from 2000 to 20001. If the multi-frame image is obtained according to a video, the method can directly obtain the total moving steps of the target object corresponding to the video.

Further, in a possible implementation manner, the mobile terminal may also count the frequency of increasing the number of steps of the target object, determine the motion state of the target object according to the frequency of increasing the number of steps, match a corresponding motion state icon for the target object according to the motion state, and output the motion state icon through the mobile terminal. The frequency and the time are related, the time point of the first frame image and the time point of the last frame image in the multi-frame images are firstly determined, the two time points are subtracted to obtain the total time length, then the total number of steps increased in the period of time is determined, and the frequency of the step number increase is determined by using the total number of steps/the total time length. In this embodiment of the application, the motion state of the target object may include a walking type (slow walking, fast walking), and a running type (slow running, fast running), which is specifically determined according to a frequency of increasing the number of steps, if the frequency of increasing the number of steps is greater than a set threshold, it is determined that the motion state of the target object is the running type, and a threshold may be further set in the running type for determining whether the target object is slow running or fast running; if the frequency of the increase of the step number is less than or equal to the set threshold, the motion state of the target object is determined to be the walking type. Wherein the set threshold is determined based on a priori knowledge. The exercise status icon is automatically obtained from the database according to the exercise status, as shown in fig. 2, which is a schematic diagram of an exercise status icon disclosed in the embodiment of the present application, wherein 210 represents an icon of "run", 220 represents an icon of "go", further 211 represents jogging, 212 represents fast running, 221 represents slow walking (e.g., walking, etc.), and 222 represents fast walking (e.g., race walking, etc.).

In addition to the above embodiments, in some possible embodiments, the mobile terminal may also count the energy currently consumed by the target object according to the number of steps of the target object. That is, the consumed energy of the target object is calculated according to the prior knowledge, and the energy consumption is displayed on the display screen of the mobile terminal, and the number of steps increases, and the consumed energy also increases.

In the embodiment of the application, at least two frame difference images are determined according to at least three frame images mainly from the perspective of multi-frame images (including at least three frame images), the at least two frame difference images are processed by using a gesture recognition model, so that at least two gesture recognition results of a target object are determined, a step number updating strategy of the target object is determined according to the gesture recognition results, and finally the step number of the target object is counted in real time according to the step number updating strategy. The method is characterized in that the step number of the target object is directly counted according to the image information, the frame difference information is processed through the model, namely for three frames of images, the model only needs to run twice, and compared with the traditional method that each frame of image is processed, the method can improve the processing efficiency, and therefore the efficiency of counting the step number of the target object is improved.

Based on the foregoing step counting method, an embodiment of the present application discloses another step counting method, please refer to fig. 3, which is a schematic flow chart of the another step counting method disclosed in the embodiment of the present application, where the step counting method determines the step count of the target object according to the gesture recognition result at the previous time and the gesture recognition result at the current time, that is, the multi-frame image may refer to two frame images, and the step counting method is also executed by the mobile terminal, and specifically, the step counting method may include the following steps:

s301, acquiring an image at the t-1 moment and an image at the t moment.

Step S301 introduces a time point, and a camera of the mobile terminal is used to obtain an image at a time t-1 and an image at a time t.

S302, determining a frame difference image according to the image at the t-1 moment and the image at the t moment.

And S303, calling a gesture recognition model to perform gesture recognition on the frame difference image to obtain a gesture recognition result of the target object at the time t.

Steps S302 to S303 can refer to steps S102 to S103 shown in fig. 1, and are not described herein again.

S304, acquiring the gesture recognition result of the target object at the t-1 moment.

In a possible implementation manner, the gesture recognition result at the time t-1 corresponding to the target object may be directly read from a local memory or a database of the mobile terminal, that is, at the time t, the mobile terminal may read the gesture recognition result at the time t-1, where the gesture recognition result is one of "rising", "falling", and "unchanged height". The confirmation process of the gesture recognition result at the time t-1 is substantially consistent with the confirmation process of the gesture recognition result at the time t, namely, the gesture recognition result at the time t-1 is determined according to the image at the time t-2 and the image at the time t-1.

S305, if the gesture recognition result at the time t-1 is rising and the gesture recognition result at the time t is falling, determining the step number updating strategy to be the step number plus 1, otherwise, determining the step number updating strategy to be the step number unchanged.

Step S305 is identical to step S104, that is, if the gesture recognition result at the time t-1 is rising and the gesture recognition result at the time t is falling, the step number update strategy is determined as the step number plus 1, otherwise, the step number update strategy is determined as the step number is not changed. Or if the gesture recognition result at the time t-1 is descending and the gesture recognition result at the time t is ascending, determining the step number updating strategy to be the step number plus 1, otherwise, determining the step number updating strategy to be the step number unchanged. In the embodiment of the application, the gesture recognition model only needs to be processed once, namely, one frame difference image is processed, and a gesture recognition result can be obtained.

S306, updating the step number of the target object according to the step number updating strategy.

This step is identical to step S105, and will not be described here.

In the embodiment of the application, the mobile terminal can combine the historical information and the current information to quickly realize the step count statistics of the target object. Specifically, the gesture recognition model only needs to process the frame difference image at the current moment, so that the model processing efficiency is improved, the gesture recognition result of the target object at the previous moment can be obtained according to the historical information, and the obtaining speed is higher than that of the model processing.

Based on the above explanation of the step counting method, the above all uses the gesture recognition model, and based on this, the embodiment of the present application further provides a flow diagram of a training method for the gesture recognition model, please refer to fig. 4, which is a flow diagram of a training gesture recognition model disclosed in the embodiment of the present application, and specifically includes the following steps:

s401, obtaining a plurality of sample images.

In a possible implementation process, the acquired sample images may be obtained by processing a video, and since the sample images are used for training the gesture recognition model, the acquired sample images should include the same target object as much as possible, and for different target objects, one object should correspond to each sample image. Meanwhile, the acquired sample images are in a time sequence relationship, namely, the sample images are in a sequential order, the time interval between two adjacent sample images cannot be too long or too short, and the time is too short or too short, so that the postures of the target objects of the two sample images are the same, and the training effect cannot be achieved.

In one possible implementation, the acquired plurality of sample images may be a sequence of directly crawled processed image frames, the sequence being arranged in a chronological order. The manner of acquiring the sample image is not limited here.

S402, performing frame difference processing on every two adjacent frame sample images in the plurality of sample images to obtain a frame difference image corresponding to every two adjacent frame sample images.

Since the present application needs to determine the pose recognition result of two consecutive frames of images, in order to acquire a large amount of data for training, a frame difference image for each adjacent two frames of sample images in a plurality of sample images is needed. For example, there are 5 sample images, i.e., image 1, image 2, image 3, image 4, and image 5, and based on this, frame difference processing may be performed on each of two adjacent sample images to obtain 4 frame difference images, i.e., a frame difference image between image 1 and image 2, a frame difference image between image 2 and image 3, a frame difference image between image 3 and image 4, and a frame difference image between image 4 and image 5. By analogy, if N sample images exist, N-1 frame difference images can be obtained.

Step S402 is the same as step S102 and step S302, and is described herein again. Note that, in fig. 4, the order of step S402 is not limited, and may be any step after step S401.

And S403, performing key point detection on the plurality of sample images to obtain a key point set of each sample image in the plurality of sample images.

In one possible implementation manner, for each sample image in the plurality of sample images, key feature detection may be performed by using a feature detection model, so as to obtain a key point set of each sample image in the plurality of sample images. In the embodiment of the present application, the detected key features may include a left ankle, a right ankle, a left knee, a right knee, and the like of the target object, and in addition to these key features, the key features may be selected according to actual conditions, such as shoulders, heads, and the like of the target object.

S404, acquiring a position information set corresponding to the key point set of each sample image, wherein the position information set comprises position information of each key point in the key point set.

In a possible implementation manner, after the key point set of each sample image is obtained, a position information set corresponding to the key point set of each sample image may also be obtained, where the position information set includes position information of each key point in the key point set. The position information may refer to height information of each key point, for example, for the key points of the left ankle, the right ankle, the left knee, and the right knee, the corresponding heights may be set as h₁，h₂，h₃，h₄It can be understood that there are as many h as there are sample images₁，h₂，h₃，h₄. Therefore, a set of location information corresponding to the set of key points can be obtained.

S405, determining the corresponding attitude label of each two adjacent frames of sample images according to the position information set corresponding to the key point set of each sample image.

In a possible implementation manner, for any two adjacent frame sample images, including a first sample image and a second sample image adjacent to the first sample image (the first sample image may be understood as a sample image at time t-1, and the second sample image may be understood as a sample image at time t), in this case, determining a pose tag corresponding to each two adjacent frame sample images according to a position information set corresponding to a keypoint set of each sample image includes: calculating a first sum among all position information in a position information set corresponding to the key point set of the first sample image; calculating a second sum among all position information in a position information set corresponding to the key point set of the second sample image; calculating a difference between the second sum and the first sum; and determining the corresponding attitude tag of each two adjacent frame sample images according to the difference. Assuming that the first sample image and the second sample image both include the same number of key features, the heights of the key features included in the first sample image are h₁₁，h₁₂，h₁₃，h₁₄The second image comprises key features with heights h₂₁，h₂₂，h₂₃，h₂₄From the above explanation, the first sum is given as formula (1) and the second sum is given as formula (2):

further, the difference between the first sum and the second sum is calculated, and the corresponding pose labels of the first sample image and the second sample image are determined according to the difference, as shown in formula (3):

wherein, Y_1,2The pose labels of the first sample image and the second sample image are shown, the corresponding threshold D may be set according to the height of the target user in a fixed ratio, or may be determined a priori, it should be noted that the first threshold D1, the second threshold D2, and the third threshold D3 may be the same value or different values, and are defined herein. It can be understood for equation (3) that if the height of the second sample image is greater than the height of the first sample image, i.e., h₂>h₁And exceeds the first threshold value D1, the gesture label of the target object is judged to be 'ascending'; if the height of the first sample image is greater than the height of the second sample image, i.e. h₁>h₂And exceeds a second threshold value D2, the attitude label of the target object is judged to be 'descending'; if the absolute values of the height of the second sample image and the height of the first sample image do not exceed (are less than or equal to) the third threshold value D3, the posture label of the target object is judged to be "height-base". The above is the determination process for the pose labels of two specific adjacent sample images, and the determination process for the pose labels of any two adjacent sample images is the same as that described above, and is not further described here.

In another possible implementation manner, for any two adjacent frame sample images, including a first sample image and a second sample image adjacent to the first sample image, in this case, determining a pose tag corresponding to each two adjacent frame sample images according to a position information set corresponding to a keypoint set of each sample image includes: calculating a first average value between each position information in the position information set corresponding to the key point set of the first sample image, calculating a second average value between each position information in the position information set corresponding to the key point set of the second sample image, and calculating one of the second average value and the first average valueAnd determining the corresponding attitude label of each two adjacent frames of sample images according to the difference value. Assuming that the first sample image and the second sample image both include the same number of key features, the heights of the key features included in the first sample image are h₁₁，h₁₂，h₁₃，h₁₄The second image comprises key features with heights h₂₁，h₂₂，h₂₃，h₂₄According to the above explanation, the first average value is formula (4), and the second average value is formula (5):

further, a difference between the first average value and the second average value is calculated, and the corresponding pose labels of the first sample image and the second sample image are determined according to the difference, where the corresponding explanation can be referred to as formula (3), and the two are similar, and the description thereof is omitted here.

The above is two ways of determining the pose tag provided by the present application, but is not limited thereto, and is not listed here.

S406, training an initial classification model by using a frame difference image corresponding to each two adjacent frame sample images and a posture label corresponding to each two adjacent frame sample images, and obtaining the trained classification model as a posture recognition model.

In a specific implementation process, the initial classification model is trained by using the frame difference image corresponding to each two adjacent frame sample images obtained in step S402 and the posture label corresponding to each two adjacent frame sample images, and the trained classification model is obtained as a posture recognition model. The initial classification model used here may be mobilenetV2, shuffleNet, etc. In the specific training process, frame difference images corresponding to every two adjacent frame sample images are sent to an initial classification model, and then a predicted attitude label of the model is obtained. The difference between the results of the sub-prediction orientation label and the actual label (the orientation label corresponding to each two adjacent frame sample images obtained by the above processing) is calculated, and loss (loss value) is obtained. The initial classification model is weighted down according to the loss. Through a large amount of training, when the initial classification model reaches a convergence condition or reaches the iteration times, the training of the initial classification model can be stopped, the trained classification model is obtained, and the model is used as a posture recognition model.

In the embodiment of the application, the training of the gesture recognition model is mainly explained, firstly, the training data is prepared, the gesture label is printed on the sample image, then, the prepared data is used for training the model, and finally, the gesture recognition model which can process the frame difference image and obtain the gesture recognition result is obtained. In the actual application process, the gesture recognition result of the target object can be determined more quickly by adopting the gesture recognition model, so that the efficiency of counting the steps of the target object is improved.

Based on the above description of the step counting method, an embodiment of the present application discloses a structural schematic diagram of a step counting device, please refer to fig. 5, which is a structural schematic diagram of a step counting device provided in an embodiment of the present application. The step count statistics apparatus 500 shown in fig. 5 may operate as follows:

an obtaining unit 501, configured to obtain a multi-frame image of a target object;

the processing unit 502 is configured to perform frame difference processing on each two adjacent frames of images in the multiple frames of images to obtain a frame difference image corresponding to each two adjacent frames of images;

the processing unit 502 is further configured to invoke a gesture recognition model to recognize the gesture of the target object according to the frame difference image corresponding to each two adjacent frames of images, so as to obtain at least one gesture recognition result of the target object;

a determining unit 503, configured to determine a step number update policy for the target object according to the at least one gesture recognition result;

the processing unit 502 is further configured to update the step number of the target object according to the step number update policy.

In a possible implementation manner, the obtaining unit 501 is further configured to obtain a plurality of sample images;

the processing unit 502 is further configured to perform frame difference processing on each two adjacent sample images in the plurality of sample images to obtain a frame difference image corresponding to each two adjacent sample images; performing key point detection on the plurality of sample images to obtain a key point set of each sample image in the plurality of sample images;

the obtaining unit 501 is further configured to obtain a position information set corresponding to the key point set of each sample image, where the position information set includes position information of each key point in the key point set;

the determining unit 503 is further configured to determine, according to the position information set corresponding to the keypoint set of each sample image, a pose label corresponding to each two adjacent frames of sample images;

the processing unit 502 is further configured to train an initial classification model by using the frame difference image corresponding to each two adjacent frame sample images and the pose label corresponding to each two adjacent frame sample images, and obtain a trained classification model as a pose recognition model.

In a possible implementation manner, the plurality of sample images include a first sample image and a second sample image adjacent to the first sample image, and the determining unit 503 determines, according to the position information set corresponding to the keypoint set of each sample image, a pose label corresponding to each adjacent two frames of sample images, specifically to:

calculating a first sum among all position information in a position information set corresponding to the key point set of the first sample image;

calculating a second sum among all position information in a position information set corresponding to the key point set of the second sample image;

calculating a difference between the second sum and the first sum;

and determining the corresponding attitude tag of each two adjacent frame sample images according to the difference.

In one possible implementation, the respective sample images include a first sample image and a second sample image adjacent to the first sample image; the determining unit 503 determines, according to the position information set corresponding to the keypoint set of each sample image, a pose label corresponding to each sample image pair, and is specifically configured to:

calculating a first average value among all position information in a position information set corresponding to the key point set of the first sample image;

calculating a second average value among all position information in a position information set corresponding to the key point set of the second sample image;

calculating a difference between the second average and the first average;

In a possible implementation manner, the multi-frame image includes a first frame image, a second frame image adjacent to the first frame image, and a third frame image adjacent to the second frame image, the at least one gesture recognition result includes a first gesture recognition result between the first frame image and the second frame image, and a second gesture recognition result between the second frame image and the third frame image, and the determining unit 503 determines the step number update policy for the target object according to the at least one gesture recognition result, and is specifically configured to:

and if the first posture identification result is ascending and the second posture identification result is descending, determining that the step number updating strategy is the step number plus 1, otherwise, determining that the step number updating strategy is the step number unchanged.

In a possible implementation manner, the determining unit 503 is further configured to determine a frequency of increasing the number of steps of the target object;

the processing unit 502 is further configured to determine a motion state of the target object according to the frequency of the step number increase; and matching a corresponding motion state icon for the target object according to the motion state, and outputting the motion state icon through a mobile terminal.

In a possible implementation manner, the multi-frame image includes an image at a time t-1 and an image at a time t, the at least one gesture recognition result includes a gesture recognition result at the time t, and the determining unit 503 determines, according to the at least one gesture recognition result, a step number update policy for the target object, specifically configured to:

acquiring a posture recognition result of the target object at the t-1 moment;

and if the gesture recognition result at the time t-1 is rising and the gesture recognition result at the time t is falling, determining the step number updating strategy to be the step number plus 1, otherwise, determining the step number updating strategy to be the step number unchanged.

According to an embodiment of the present application, the steps involved in the step counting method shown in fig. 1 and fig. 3 may be performed by the units in the step counting device shown in fig. 5. For example, step S101 in the step counting method shown in fig. 1 may be performed by the obtaining unit 501 in the step counting apparatus shown in fig. 5, steps S102, S103, and S105 may be performed by the processing unit 502 in the step counting apparatus shown in fig. 5, and step S104 may be performed by the determining unit 503 in the step counting apparatus shown in fig. 5; as another example, steps S301 and S304 in the step counting method shown in fig. 3 may be performed by the obtaining unit 501 in the step counting device shown in fig. 5, steps S302 and S305 may be performed by the determining unit 503 in the step counting device shown in fig. 5, and steps S303 and S306 may be performed by the processing unit 502 in the step counting device shown in fig. 5.

According to another embodiment of the present application, the units in the step counting apparatus shown in fig. 5 may be respectively or entirely combined into one or several other units to form the unit, or some unit(s) thereof may be further split into multiple units with smaller functions to form the unit(s), which may achieve the same operation without affecting the achievement of the technical effect of the embodiment of the present application. The units are divided based on logic functions, and in practical application, the functions of one unit can be realized by a plurality of units, or the functions of a plurality of units can be realized by one unit. In other embodiments of the present application, the resource-based management and control apparatus may also include other units, and in practical applications, these functions may also be implemented with the assistance of other units, and may be implemented by cooperation of multiple units.

According to another embodiment of the present application, the step counting apparatus shown in fig. 5 may be constructed by running a computer program (including program codes) capable of executing the steps involved in the corresponding method shown in fig. 1 and 3 on a general-purpose computing device such as a computer including a processing element such as a Central Processing Unit (CPU), a random access storage medium (RAM), a read-only storage medium (ROM), and a storage element, and the step counting method of the embodiment of the present application may be implemented. The computer program may be, for example, embodied on a computer-readable storage medium, and loaded and executed in the mobile terminal described above through the computer-readable storage medium.

In the embodiment of the application, the obtaining unit 501 obtains a multi-frame image of a target object, and the processing unit 502 performs frame difference processing on each two adjacent frames of images in the multi-frame image to obtain a frame difference image corresponding to each two adjacent frames of images; then, calling a gesture recognition model to recognize the gesture of the target object according to the frame difference image corresponding to each two adjacent frames of images to obtain at least one gesture recognition result of the target object; the determining unit 503 determines a step number update policy for the target object according to at least one gesture recognition result, and finally the processing unit 502 updates the step number of the target object according to the step number update policy. The method comprises the steps of judging the gesture recognition result of the target object by utilizing a frame difference image corresponding to every two adjacent frames of images of the target object, and further judging whether the target object moves one step or not. The number of steps of the target object can be counted in real time. By using the trained gesture recognition model, the time consumption is shortened, and therefore, the efficiency of counting the steps of the target object can be improved.

Based on the method and the device embodiment, the embodiment of the application provides a mobile terminal. Referring to fig. 6, a schematic structural diagram of a mobile terminal provided in an embodiment of the present application is shown. The mobile terminal 600 shown in fig. 6 comprises at least a processor 601, an input interface 602, an output interface 603, a computer storage medium 604, and a memory 605. The processor 601, the input interface 602, the output interface 603, the computer storage medium 604, and the memory 605 may be connected by a bus or other means.

A computer storage medium 604 may be stored in the memory 605 of the mobile terminal 600, the computer storage medium 604 being for storing a computer program comprising program instructions, the processor 601 being for executing the program instructions stored by the computer storage medium 604. The processor 601 (or CPU) is a computing core and a control core of the mobile terminal 600, and is adapted to implement one or more instructions, and in particular, is adapted to load and execute one or more computer instructions to implement corresponding method flows or corresponding functions.

An embodiment of the present application also provides a computer-readable storage medium (Memory), which is a Memory device in the mobile terminal 600 and is used for storing programs and data. It is understood that the computer readable storage medium herein can include both built-in storage media in the mobile terminal 600 and, of course, extended storage media supported by the mobile terminal 600. The computer readable storage medium provides a storage space in which an operating system of the mobile terminal 600 is stored. Also stored in this memory space are one or more instructions, which may be one or more computer programs (including program code), suitable for loading and execution by processor 601. It should be noted that the computer-readable storage medium may be a high-speed RAM memory, or may be a non-volatile memory (non-volatile memory), such as at least one disk memory; and optionally at least one computer readable storage medium located remotely from the aforementioned processor.

In one embodiment, the computer storage medium may be loaded with one or more instructions and executed by processor 601 to perform the corresponding steps described above with respect to the step count statistics method illustrated in FIGS. 1 and 3 and the corresponding steps described above with respect to the gesture recognition model training method illustrated in FIG. 4. In particular implementations, one or more instructions in the computer storage medium are loaded and executed by processor 601 to perform the steps of:

acquiring a multi-frame image of a target object;

In one possible implementation, the processor 601 is further configured to:

acquiring a plurality of sample images;

performing frame difference processing on every two adjacent frame sample images in the plurality of sample images to obtain a frame difference image corresponding to every two adjacent frame sample images;

performing key point detection on the plurality of sample images to obtain a key point set of each sample image in the plurality of sample images;

acquiring a position information set corresponding to the key point set of each sample image, wherein the position information set comprises position information of each key point in the key point set;

determining a position label corresponding to each two adjacent frame sample images according to a position information set corresponding to the key point set of each sample image;

and training an initial classification model by using the frame difference image corresponding to each two adjacent frame sample images and the posture label corresponding to each two adjacent frame sample images to obtain a trained classification model serving as a posture identification model.

In a possible implementation manner, the plurality of sample images include a first sample image and a second sample image adjacent to the first sample image, and the processor 601 determines, according to the position information set corresponding to the keypoint set of each sample image, a pose tag corresponding to each adjacent two frames of sample images, specifically to:

calculating a difference between the second sum and the first sum;

In a possible implementation manner, the sample images include a first sample image and a second sample image adjacent to the first sample image, and the processor 601 determines, according to the position information set corresponding to the keypoint set of the sample images, a pose tag corresponding to each sample image pair, specifically to:

calculating a difference between the second average and the first average;

In a possible implementation manner, the multi-frame image includes a first frame image, a second frame image adjacent to the first frame image, and a third frame image adjacent to the second frame image, the at least one gesture recognition result includes a first gesture recognition result between the first frame image and the second frame image, and a second gesture recognition result between the second frame image and the third frame image, and the processor 601 determines a step number update policy for the target object according to the at least one gesture recognition result, and is specifically configured to:

In a possible implementation manner, after the step number of the target object is updated according to the step number update policy, the processor 601 is further configured to:

determining a frequency of step number increase of the target object;

determining the motion state of the target object according to the frequency of the increase of the step number;

and matching a corresponding motion state icon for the target object according to the motion state, and outputting the motion state icon through a mobile terminal.

In a possible implementation manner, the multi-frame image includes an image at a time t-1 and an image at a time t, the at least one gesture recognition result includes a gesture recognition result at the time t, and the processor 601 determines a step number update policy for the target object according to the at least one gesture recognition result, and is specifically configured to:

acquiring a posture recognition result of the target object at the t-1 moment;

In the embodiment of the application, the processor 601 obtains a multi-frame image of a target object, and performs frame difference processing on each two adjacent frames of images in the multi-frame image to obtain a frame difference image corresponding to each two adjacent frames of images; then, calling a gesture recognition model to recognize the gesture of the target object according to the frame difference image corresponding to each two adjacent frames of images to obtain at least one gesture recognition result of the target object; and determining a step number updating strategy for the target object according to at least one gesture recognition result, and finally updating the step number of the target object according to the step number updating strategy. The method comprises the steps of judging the gesture recognition result of the target object by utilizing a frame difference image corresponding to every two adjacent frames of images of the target object, and further judging whether the target object moves one step or not. The number of steps of the target object can be counted in real time. By using the trained gesture recognition model, the time consumption is shortened, and therefore, the efficiency of counting the steps of the target object can be improved.

According to an aspect of the present application, the present application embodiment also provides a computer product or a computer program, which includes computer instructions, which are stored in a computer-readable storage medium. The processor 601 reads the computer instructions from the computer readable storage medium, and the processor 601 executes the computer instructions to cause the mobile terminal 600 to perform the step counting method shown in fig. 1 and 3, and the method of training the gesture recognition model shown in fig. 4.

It is emphasized that the data may also be stored in a node of a blockchain in order to further ensure the privacy and security of the data. The block chain is a novel application mode of computer technologies such as distributed data storage, point-to-point transmission, a consensus mechanism and an encryption algorithm. A block chain (Blockchain), which is essentially a decentralized database, is a series of data blocks associated by using a cryptographic method, and each data block contains information of a batch of network transactions, so as to verify the validity (anti-counterfeiting) of the information and generate a next block. The blockchain may include a blockchain underlying platform, a platform product service layer, an application service layer, and the like.

It should be noted that, for simplicity of description, the above-mentioned method embodiments are described as a series of acts or combination of acts, but those skilled in the art will recognize that the present application is not limited by the order of acts described, as some steps may occur in other orders or concurrently depending on the application. Further, those skilled in the art should also appreciate that the embodiments described in the specification are preferred embodiments and that the acts and modules referred to are not necessarily required in this application.

In the embodiments provided in the present application, it should be understood that the disclosed apparatus may be implemented in other manners. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the above-described modules is merely a logical division, and other divisions may be realized in practice, for example, a plurality of modules or components may be combined or integrated into another system, or some features may be omitted, or not executed.

The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims

1. A method for counting steps, the method comprising:

acquiring a multi-frame image of a target object;

2. The method of claim 1, further comprising:

acquiring a plurality of sample images;

3. The method according to claim 2, wherein the plurality of sample images includes a first sample image and a second sample image adjacent to the first sample image, and the determining the pose label corresponding to each adjacent two-frame sample image according to the position information set corresponding to the keypoint set of each sample image comprises:

calculating a difference between the second sum and the first sum;

4. The method of claim 2, wherein the sample images include a first sample image and a second sample image adjacent to the first sample image, and wherein determining the pose label corresponding to each sample image pair according to the position information set corresponding to the keypoint set of the sample images comprises:

calculating a difference between the second average and the first average;

5. The method of claim 1, wherein the multi-frame image comprises a first frame image, a second frame image adjacent to the first frame image, and a third frame image adjacent to the second frame image, wherein the at least one gesture recognition result comprises a first gesture recognition result between the first frame image and the second frame image, and a second gesture recognition result between the second frame image and the third frame image, and wherein determining the step update policy for the target object according to the at least one gesture recognition result comprises:

6. The method of claim 1, wherein after updating the number of steps of the target object according to the step update policy, the method further comprises:

determining a frequency of step number increase of the target object;

7. The method according to claim 1, wherein the multi-frame image comprises an image at a time t-1 and an image at a time t, the at least one gesture recognition result comprises a gesture recognition result at a time t, and the determining a step update policy for the target object according to the at least one gesture recognition result comprises:

acquiring a posture recognition result of the target object at the t-1 moment;

8. A step counting apparatus, comprising:

9. A mobile terminal, characterized in that the mobile terminal comprises:

a processor adapted to implement one or more computer programs; and the number of the first and second groups,

computer storage medium storing one or more computer programs adapted to be loaded by the processor and to perform the step count statistical method according to any one of claims 1-7.

10. A computer-readable storage medium, characterized in that it stores one or more computer programs adapted to be loaded by a processor and to perform the step count statistical method according to any one of claims 1-7.