CN114080258A

CN114080258A - Motion model generation method and related equipment

Info

Publication number: CN114080258A
Application number: CN202080006118.9A
Authority: CN
Inventors: 赵帅; 李令言; 陈军
Original assignee: Huawei Technologies Co Ltd
Current assignee: Huawei Technologies Co Ltd
Priority date: 2020-06-17
Filing date: 2020-06-17
Publication date: 2022-02-22
Anticipated expiration: 2040-06-17
Also published as: WO2021253296A1; CN114080258B

Abstract

A motion model generation method and related equipment are provided, and the method comprises the following steps: acquiring description information input by a first user (S403), wherein the description information is used for describing target movement executed by a target user and describing target times for executing the target movement; the first user is a target user or other users outside the target user; determining first fluctuation conditions of a plurality of key points of a target user in first video data (S404), wherein the first video data is video data in a process that the target user executes target motion; determining a motion model of the motion of the target according to the first fluctuation condition and the target times (S405); wherein the motion model is used for motion counting.

Description

Motion model generation method and related equipment

Technical Field

The invention relates to the technical field of data processing, in particular to a motion model generation method and related equipment.

Background

With the continuous improvement of living standard of people, more and more people begin to pay attention to the health and utilize the idle time to do sports and fitness. When a user exercises at home, in a gym or in an open place, it is often desirable to record the number of exercises to assess his or her progress. With the development of intelligent terminal devices, the popularization range of the terminal devices is wider and wider, and it is a common way to record the motion of a user through the terminal devices (such as a mobile phone, a tablet computer, an intelligent television, and the like).

Generally, some motion models of common motions, such as models of motions of deep squat, push-up, sit-up and the like, can be preset in the terminal device, and when videos of the common motions performed by the user are acquired, the videos can be identified through the preset motion models, so that motion counting is realized. However, this motion counting method is very limited in the type of motion to be counted, and is not accurate for motion counting outside the preset motion type.

Disclosure of Invention

The embodiment of the application provides a motion model generation method and related equipment, which can expand motion models expected by users and improve the generation efficiency of the motion models.

In a first aspect, an embodiment of the present application provides a motion model generation method, including: acquiring description information input by a first user, wherein the description information is used for describing target movement executed by the target user and describing target times of executing the target movement; the first user is the target user or other users outside the target user; determining a first fluctuation situation of a plurality of key points of the target user in first video data, wherein the first video data is video data in the process that the target user executes the target motion; determining a motion model of the target motion according to the first fluctuation condition and the target times; wherein the motion model is used for motion counting.

According to the method provided by the first aspect, the terminal device obtains the first video data, analyzes and obtains fluctuation conditions of a plurality of key points of a target user in the first video data, and inputs a motion type (namely target motion) and the number of times of executing the motion of the type (namely target number) to the terminal device by the user; then the terminal device generates a motion model of the target motion for the user according to the fluctuation condition and the motion times, and then can count the motion of the terminal device based on the motion model. By adopting the mode, the user can not only expand the expected motion model, but also better meet the individual requirements of the user on the motion model; and, it is more efficient swift to establish the motion model through this kind of mode.

In a possible implementation manner, after determining a motion model of the motion of the target according to the first fluctuation condition and the target number, the method further includes: motion counting, by the motion model, the object motion performed by a second user in second video data; wherein the second user is the first user, the target user, or another user outside the first user and the target user.

In one possible implementation, the motion counting, by the motion model, the target motion performed by the second user in the second video data includes: if the first similarity between the second image sequence feature in the second video data and the first image sequence feature in the first video data is greater than a first preset threshold, counting the movement performed by the second user in the second video data through the movement model and a second fluctuation condition of a plurality of key points of the second user in the second video data; wherein the first similarity is used to characterize a standard degree to which the second user performs the target motion.

By this implementation, in the case that the calculated first similarity is greater than the threshold, the motion model may count the motion according to the fluctuation condition of the plurality of key points in the second video data.

In one possible implementation, the method further includes: outputting the motion evaluation information; wherein the motion evaluation information is generated according to the first similarity.

In a possible implementation manner, before the obtaining the description information input by the target user, the method further includes: acquiring the first video data; and if the motion model for counting the target motion in the first video data does not exist in the database, triggering the operation of acquiring the description information input by the first user.

By the method, the motion model for counting the motion of the first video data can be established under the condition that the target image sequence feature with the similarity of the first image sequence feature of the first video data being greater than or equal to the second preset threshold value does not exist in the database, and resource waste caused by the fact that the motion model is established due to the existence of the motion model in the database can be avoided.

In one possible implementation, the determining that a motion model that motion counts the target motion in the first video data does not exist in the database comprises: and if the database does not have the target image sequence feature of which the second similarity with the first image sequence feature of the first video data reaches a second preset threshold, determining that the motion model corresponding to the target image sequence feature does not exist in the database.

In one possible implementation, the database includes at least one of a local database or a database of a cloud server.

In one possible implementation, the determining a first fluctuation situation of a plurality of key points of the target user in the first video data includes: determining a plurality of keypoints for the target user in the first video data; calculating distances of coordinates of the plurality of key points in each frame image of the first video data relative to coordinates of the plurality of key points in a reference frame image of the first video data; projecting the distances onto a preset coordinate axis to determine a first fluctuation condition of a plurality of key points of the target user in the first video data.

In a possible implementation manner, the determining a motion model of the target motion according to the first fluctuation condition and the target times includes: selecting k key points with variation and wave amplitude arranged at the first k bits from the first wave condition; counting the motion peak values of the k key points to obtain a count value; and determining a motion model of the target motion according to the counting value and the target times in a regression fitting mode.

In one possible implementation, the regression fit includes at least one of a linear regression fit and a support vector machine regression. In a possible implementation manner, after determining a motion model of the motion of the target according to the first fluctuation condition and the target number, the method further includes: and saving the motion model or providing the motion model for the equipment of the opposite communication terminal.

In a possible implementation manner, the key points are skeleton key points of the target user, or the key points are predetermined optical flow key points capable of reflecting human body motion. By the method, the key points can be selected according to specific conditions, and the diversity of key point selection is improved.

In a second aspect, an embodiment of the present application provides a motion counting apparatus, including: the input unit is used for acquiring description information input by a first user, wherein the description information is used for describing target movement executed by the target user and the target times of executing the target movement; the first user is the target user or other users outside the target user; a fluctuation condition extraction unit, configured to determine a first fluctuation condition of a plurality of key points of the target user in first video data, where the first video data is video data in a process in which the target user performs the target motion; the motion model establishing unit is used for determining a motion model of the target motion according to the first fluctuation condition and the target times; wherein the motion model is used for motion counting.

In one possible implementation manner, the method further includes: a motion counting unit, configured to perform motion counting on the target motion performed by a second user in second video data through the motion model after the motion model establishing unit determines the motion model of the target motion according to the first fluctuation condition and the target number; wherein the second user is the first user, the target user, or another user outside the first user and the target user.

In a possible implementation manner, the motion counting unit is specifically configured to: if the first similarity between the second image sequence feature in the second video data and the first image sequence feature in the first video data is greater than a first preset threshold, counting the movement performed by the second user in the second video data through the movement model and a second fluctuation condition of a plurality of key points of the second user in the second video data; wherein the first similarity is used to characterize a standard degree to which the second user performs the target motion.

In one possible implementation manner, the method further includes: an output unit for outputting the motion evaluation information; wherein the motion evaluation information is generated according to the first similarity.

In one possible implementation manner, the method further includes: an acquisition unit configured to acquire the first video data before the input unit acquires the description information input by the first user; and if it is determined that a motion model for counting the target motion in the first video data does not exist in the database, triggering the operation of acquiring the description information input by the target user.

In a possible implementation manner, the determining that a motion model for counting the motion of the target in the first video data does not exist in the database specifically includes: and if the database does not have the target image sequence feature of which the second similarity with the first image sequence feature of the first video data reaches a second preset threshold, determining that the motion model corresponding to the target image sequence feature does not exist in the database.

In one possible implementation, the database includes at least one of a local database or a cloud server.

In a possible implementation manner, the fluctuation condition extraction unit is specifically configured to: determining a plurality of keypoints for the target user in the first video data; calculating distances of coordinates of the plurality of key points in each frame image of the first video data relative to coordinates of the plurality of key points in a reference frame image of the first video data; projecting the distances onto a preset coordinate axis to determine a first fluctuation condition of a plurality of key points of the target user in the first video data.

In a possible implementation manner, the motion model establishing unit is specifically configured to: selecting k key points with variation and wave amplitude arranged at the first k bits from the first wave condition; counting the motion peak values of the k key points to obtain a count value; and determining a motion model of the target motion according to the counting value and the target times in a regression fitting mode.

In one possible implementation, the regression fit includes at least one of a linear regression fit and a support vector machine regression.

In one possible implementation manner, the method further includes: and the storage unit is used for storing the motion model or providing the motion model for the equipment at the opposite end of communication after the motion model establishing unit determines the motion model of the motion of the target according to the fluctuation conditions of the key points in the first video data and the target times.

In a possible implementation manner, the key points are skeleton key points of the target user, or the key points are predetermined optical flow key points capable of reflecting human body motion.

In a third aspect, an embodiment of the present application provides a motion counting apparatus, including a memory and a processor, where the memory is used to store a computer program, and the processor is configured to call all or part of the computer program stored in the memory, and perform the method provided in the first aspect.

In a fourth aspect, an embodiment of the present application provides a chip system, where the chip system includes at least one processor and an interface circuit, where the interface circuit and the at least one processor are interconnected by a line, and the interface circuit is configured to receive a computer program from outside the chip system; the method provided by the first aspect is implemented when the computer program is executed by the processor.

In a fifth aspect, the present application provides a computer-readable storage medium, in which a computer program is stored, and when the computer program is executed by a processor, the computer program implements the method provided in the first aspect.

In a sixth aspect, the present application provides a computer program product, and when the computer program product runs on a terminal device, the method provided in the first aspect is implemented.

Drawings

Fig. 1A is a schematic system architecture diagram of a motion model generation method according to an embodiment of the present application;

fig. 1B is a schematic structural diagram of a terminal device 100 according to an embodiment of the present application;

fig. 1C is a block diagram of a software structure of a terminal device 100 according to an embodiment of the present application;

FIG. 2 is a schematic structural diagram of a motion counting apparatus according to an embodiment of the present disclosure;

fig. 3 is a schematic structural diagram of a cloud server according to an embodiment of the present disclosure;

FIG. 4 is a schematic flow chart diagram illustrating a method for generating a motion model according to an embodiment of the present disclosure;

FIG. 4A is a schematic illustration of a user performing an instrumented exercise according to embodiments of the present disclosure;

FIG. 4B is a schematic illustration of a user performing an instrumented exercise according to embodiments of the present disclosure;

FIG. 4C is a schematic diagram of key points of a human skeleton (bone) provided by an embodiment of the present application;

fig. 5 is an application scenario diagram of a motion model generation method provided in an embodiment of the present application;

fig. 6 is an application scenario diagram of another motion model generation method provided in an embodiment of the present application.

Detailed Description

The technical solution in the embodiments of the present application will be described in detail and removed with reference to the accompanying drawings. In the description of the embodiments herein, "/" means "or" unless otherwise specified, for example, a/B may mean a or B; "and/or" in the text is only an association relationship describing an associated object, and means that three relationships may exist, for example, a and/or B may mean: three cases of a alone, a and B both, and B alone exist, and in addition, "a plurality" means two or more than two in the description of the embodiments of the present application.

In the following, the terms "first", "second" are used for descriptive purposes only and are not to be understood as implying or implying relative importance or implicitly indicating the number of technical image sequence features indicated. Thus, an image sequence feature defined as "first" or "second" may explicitly or implicitly include one or more of the image sequence feature, and in the description of the embodiments of the present application, "plurality" means two or more unless otherwise specified.

In order to facilitate understanding of the embodiment of the present invention, a system architecture of a motion model generation method based on the embodiment of the present application is described below. Referring to fig. 1A, fig. 1A is a schematic diagram of a system architecture of a motion model generation method according to an embodiment of the present disclosure. The system architecture of the motion model generation method in the present application may include the terminal device 100 and the cloud server 101 in fig. 1A, where the cloud server 101 and the terminal device 100 may communicate through a network.

The cloud server 101 may be one server or a server cluster including a plurality of servers; the method is mainly used for storing the established motion model, outputting the established motion model and the like, for example, the cloud server sends the established motion model to the terminal device.

The terminal device 100 may be an electronic device such as a communication terminal, a mobile device, a user terminal, a wireless communication device, a portable terminal, a user agent, and the like, and is mainly used for inputting, processing, outputting (such as displaying), and the like of data, where the mobile device may be a mobile phone, a smart watch, a wearable device, a tablet device, or a handheld device with a wireless communication function. The terminal device may be deployed (or installed) with a corresponding application program (APP) or client, so as to support the terminal device to implement functions of data input, processing, and output (such as display). Specifically, the terminal device may input a motion model and perform corresponding operation or recognition based on the motion model, for example, receive the motion model sent by the cloud server, and then perform motion counting based on the motion model. It should be noted that the motion model may also be established by the terminal itself, for example, it may be determined whether the cloud server has a desired motion model, and if not, the motion model is established by the terminal itself; optionally, after the terminal device establishes the motion model by itself, the motion model may be shared with the cloud server for use by other devices.

Some specific application scenarios of the above architecture are illustrated below: the terminal device 100 may acquire, in real time, first video data of the target user performing the target motion through the camera 193, and may also acquire, from a local database of the terminal device 100, previously stored first video data of the target user performing the target motion. After the terminal device 100 acquires the first video data, at least two optional processing modes exist.

In a first manner, after the terminal device 100 acquires the first video data, it needs to perform similarity calculation with the image sequence feature of the motion type in the repository to determine whether the image sequence feature for identifying the motion type of the first video data exists in the database, and if the similarity is greater than a preset threshold, the image sequence feature exists. Wherein the database may comprise at least one of a database of the cloud server 101 or a local database of the terminal device 100.

In a case where the terminal device 100 determines that the image sequence feature for identifying the type of motion in the first video data does not exist in the database, the terminal device 100 may acquire description information input by the target user, where the description information is used to describe that the motion performed by the target user is the target motion and to describe that the number of times the target motion is performed is the target number of times. Then, the terminal device 100 may analyze fluctuation conditions of a plurality of key points of the first video data during the target user performs the target motion from the first video data. Next, the terminal device may determine a motion model of the motion of the object according to the fluctuation condition of the plurality of key points of the first video data and the number of times of the object, and then may perform motion counting using the motion model. Optionally, if the target user agrees to upload the motion model of the target motion to the cloud server 101, the terminal device 100 may upload the motion model of the target motion to the cloud server 101 through the network, so as to store the motion model in the database of the cloud server 101 for downloading and use by other terminal devices.

In a case where the terminal device 100 determines that the image sequence feature for identifying the type of motion in the first video exists in the database, the terminal device 100 may load a motion model corresponding to the image sequence feature in the database to determine the number of times of motion in the first video data.

In the second mode, the terminal device 100 sends the first video data to the cloud server 101. It is determined by the cloud server 101 whether image sequence features for identifying the type of motion in the first video data are present in the database of the cloud server 101.

Under the condition that the cloud server 101 determines that the image sequence feature for identifying the motion type in the first video data does not exist in the database, the cloud server 101 may obtain description information input by the target user, where the description information is used to describe that the motion performed by the target user is the target motion and describe that the number of times the target motion is performed is the target number. Then, the cloud server 101 may analyze, according to the first video data, fluctuation conditions of a plurality of key points of the first video data during the target user performs the target motion. Next, the cloud server 101 may determine a motion model of the motion of the target according to the fluctuation condition of the plurality of key points of the first video data and the target frequency, and store the motion model in the database of the motion server 101. When the terminal device 100 or other terminal devices need to use the motion model, the cloud server 101 may send the motion model correspondingly.

When the cloud server 101 determines that the image sequence feature for identifying the motion type in the first video data exists in the database, the cloud server 101 may load a motion model corresponding to the image sequence feature in the database to identify the motion type and the motion frequency in the first video data. Then, the cloud server 101 may send the first similarity to the terminal device 100, where the first similarity is a maximum similarity calculated from similarities between the image sequence features stored in the database and the first image sequence feature of the first video data.

It is to be understood that the system architecture of the motion model generation method in fig. 1A is only an exemplary implementation manner in the embodiment of the present application, and the system architecture of the motion model generation method in the embodiment of the present application includes, but is not limited to, the above system architecture of the motion model generation method.

Based on the system architecture diagram of the motion model generation method, the embodiment of the present application provides a terminal device 100 applied to the system architecture of the motion model generation method. Fig. 1B is a schematic structural diagram of a terminal device 100 provided in an embodiment of the present application, the terminal device 100 shown in fig. 1B is only an example, and the terminal device 100 may have more or fewer components than those shown in fig. 1B, may combine two or more components, or may have a different component configuration. The various components shown in the figures may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing and/or application specific integrated circuits.

The terminal device 100 may include: the mobile communication device includes a processor 110, an external memory interface 120, an internal memory 121, a Universal Serial Bus (USB) interface 130, an antenna 1, an antenna 2, a mobile communication module 150, a wireless communication module 160, an audio module 170, a speaker 170A, a receiver 170B, a microphone 170C, an earphone interface 170D, a sensor module 180, a button 190, a motor 191, an indicator 192, a camera 193, a display 194, a Subscriber Identity Module (SIM) card interface 195, and the like. The sensor module 180 may include a gyroscope sensor 180A, an acceleration sensor 180B, a distance sensor 180C, a proximity light sensor 180D, a fingerprint sensor 180E, a touch sensor 180F, an ambient light sensor 180G, and the like.

Processor 110 may include one or more processing units, such as: the processor 110 may include an Application Processor (AP), a modem processor, a Graphics Processing Unit (GPU), an Image Signal Processor (ISP), a controller, a memory, a video codec, a Digital Signal Processor (DSP), a baseband processor, and/or a neural-Network Processing Unit (NPU), etc. The different processing units may be separate devices or may be integrated into one or more processors.

The controller may be a neural center and a command center of the terminal device 100, among others. The controller can generate an operation control signal according to the instruction operation code and the timing signal to complete the control of instruction fetching and instruction execution.

Optionally, a memory may also be provided in the processor 110 for storing instructions and data. In some embodiments, the memory in the processor 110 is a cache memory. The memory may hold instructions or data that have just been used or recycled by the processor 110. If the processor 110 needs to reuse the instruction or data, it can be called directly from the memory. Avoiding repeated accesses reduces the latency of the processor 110, thereby increasing the efficiency of the system.

In some embodiments, processor 110 may include one or more interfaces. The interface may include an integrated circuit (I2C) interface, an integrated circuit built-in audio (I2S) interface, a Pulse Code Modulation (PCM) interface, a universal asynchronous receiver/transmitter (UART) interface, a Mobile Industry Processor Interface (MIPI), a general-purpose input/output (GPIO) interface, a Subscriber Identity Module (SIM) interface, and/or a Universal Serial Bus (USB) interface, etc.

The I2C interface is a bi-directional synchronous serial bus that includes a serial data line (SDA) and a Serial Clock Line (SCL). In some embodiments, processor 110 may include multiple sets of I2C buses. The processor 110 may be coupled to the touch sensor 180F, charger, flash, camera 193, etc. via different I2C bus interfaces. For example: the processor 110 may be coupled to the touch sensor 180F through an I2C interface, such that the processor 110 and the touch sensor 180F communicate through an I2C bus interface to implement the touch function of the terminal device 100.

The I2S interface may be used for audio communication. In some embodiments, processor 110 may include multiple sets of I2S buses. The processor 110 may be coupled to the audio module 170 via an I2S bus to enable communication between the processor 110 and the audio module 170. In some embodiments, the audio module 170 may communicate audio signals to the wireless communication module 160 via the I2S interface, enabling answering of calls via a bluetooth headset.

The PCM interface may also be used for audio communication, sampling, quantizing and encoding analog signals. In some embodiments, the audio module 170 and the wireless communication module 160 may be coupled by a PCM bus interface. In some embodiments, the audio module 170 may also transmit audio signals to the wireless communication module 160 through the PCM interface, so as to implement a function of answering a call through a bluetooth headset. Both the I2S interface and the PCM interface may be used for audio communication.

The UART interface is a universal serial data bus used for asynchronous communications. The bus may be a bidirectional communication bus. It converts the data to be transmitted between serial communication and parallel communication. In some embodiments, a UART interface is generally used to connect the processor 110 with the wireless communication module 160. For example: the processor 110 communicates with a bluetooth module in the wireless communication module 160 through a UART interface to implement a bluetooth function. In some embodiments, the audio module 170 may transmit the audio signal to the wireless communication module 160 through a UART interface, so as to realize the function of playing music through a bluetooth headset.

MIPI interfaces may be used to connect processor 110 with peripheral devices such as display screen 194, camera 193, and the like. The MIPI interface includes a Camera Serial Interface (CSI), a Display Serial Interface (DSI), and the like. In some embodiments, processor 110 and camera 193 communicate through a CSI interface to implement the capture function of terminal device 100. The processor 110 and the display screen 194 communicate through the DSI interface to implement the display function of the terminal device 100.

The GPIO interface may be configured by software. The GPIO interface may be configured as a control signal and may also be configured as a data signal. In some embodiments, a GPIO interface may be used to connect the processor 110 with the camera 193, the display 194, the wireless communication module 160, the audio module 170, the sensor module 180, and the like. The GPIO interface may also be configured as an I2C interface, an I2S interface, a UART interface, a MIPI interface, and the like.

The USB interface 130 is an interface conforming to the USB standard specification, and may specifically be a Mini USB interface, a Micro USB interface, a USB Type C interface, or the like. The USB interface 130 may be used to connect a charger to charge the terminal device 100, and may also be used to transmit data between the terminal device 100 and a peripheral device. And the earphone can also be used for connecting an earphone and playing audio through the earphone. The interface may also be used to connect other terminal devices, such as AR devices and the like.

It should be understood that the interface connection relationship between the modules illustrated in the embodiment of the present application is only an exemplary illustration, and does not constitute a limitation on the structure of the terminal device 100. In other embodiments of the present application, the terminal device 100 may also adopt different interface connection manners or a combination of multiple interface connection manners in the above embodiments.

The wireless communication function of the terminal device 100 may be implemented by the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, a modem processor, a baseband processor, and the like.

The antennas 1 and 2 are used for transmitting and receiving electromagnetic wave signals. Each antenna in terminal device 100 may be used to cover a single or multiple communication bands. Different antennas can also be multiplexed to improve the utilization of the antennas. For example: the antenna 1 may be multiplexed as a diversity antenna of a wireless local area network. In other embodiments, the antenna may be used in conjunction with a tuning switch.

The mobile communication module 150 may provide a solution including 2G/3G/4G/5G wireless communication applied on the terminal device 100. The mobile communication module 150 may include at least one filter, a switch, a power amplifier, a Low Noise Amplifier (LNA), and the like. The mobile communication module 150 may receive the electromagnetic wave from the antenna 1, filter, amplify, etc. the received electromagnetic wave, and transmit the electromagnetic wave to the modem processor for demodulation. The mobile communication module 150 may also amplify the signal modulated by the modem processor, and convert the signal into electromagnetic wave through the antenna 1 to radiate the electromagnetic wave. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be disposed in the processor 110. In some embodiments, at least some of the functional modules of the mobile communication module 150 may be disposed in the same device as at least some of the modules of the processor 110.

The modem processor may include a modulator and a demodulator. The modulator is used for modulating a low-frequency baseband signal to be transmitted into a medium-high frequency signal. The demodulator is used for demodulating the received electromagnetic wave signal into a low-frequency baseband signal. The demodulator then passes the demodulated low frequency baseband signal to a baseband processor for processing. The low frequency baseband signal is processed by the baseband processor and then transferred to the application processor. The application processor outputs a sound signal through an audio device (not limited to the speaker 170A, the receiver 170B, etc.) or displays an image or video through the display screen 194. In some embodiments, the modem processor may be a stand-alone device. In other embodiments, the modem processor may be provided in the same device as the mobile communication module 150 or other functional modules, independent of the processor 110.

The wireless communication module 160 may provide a solution for wireless communication applied to the terminal device 100, including Wireless Local Area Networks (WLANs) (e.g., wireless fidelity (Wi-Fi) networks), bluetooth (bluetooth, BT), Global Navigation Satellite System (GNSS), Frequency Modulation (FM), Near Field Communication (NFC), Infrared (IR), and the like. The wireless communication module 160 may be one or more devices integrating at least one communication processing module. The wireless communication module 160 receives electromagnetic waves via the antenna 2, performs frequency modulation and filtering processing on electromagnetic wave signals, and transmits the processed signals to the processor 110. The wireless communication module 160 may also receive a signal to be transmitted from the processor 110, perform frequency modulation and amplification on the signal, and convert the signal into electromagnetic waves through the antenna 2 to radiate the electromagnetic waves.

In some embodiments, the antenna 1 of the terminal device 100 is coupled to the mobile communication module 150, and the antenna 2 is coupled to the wireless communication module 160, so that the terminal device 100 can communicate with a network and other devices, for example, the cloud server through a wireless communication technology or a wired communication technology. The wireless communication technology may include global system for mobile communications (GSM), General Packet Radio Service (GPRS), code division multiple access (code division multiple access, CDMA), Wideband Code Division Multiple Access (WCDMA), time-division code division multiple access (time-division code division multiple access, TD-SCDMA), Long Term Evolution (LTE), LTE, BT, GNSS, WLAN, NFC, FM, and/or IR technologies, etc. The GNSS may include a Global Positioning System (GPS), a global navigation satellite system (GLONASS), a beidou navigation satellite system (BDS), a quasi-zenith satellite system (QZSS), and/or a Satellite Based Augmentation System (SBAS).

In some embodiments, a Bluetooth (BT) module and a WLAN module included in the wireless communication module 160 may transmit signals to detect or scan devices near the terminal device 100, so that the terminal device 100 may discover the nearby devices using wireless communication technologies such as bluetooth or WLAN, establish a wireless communication connection with the nearby devices, and receive data (such as video data) shared by the nearby devices through the connection. Among other things, a Bluetooth (BT) module may provide solutions that include one or more of classic Bluetooth (Bluetooth 2.1) or Bluetooth Low Energy (BLE) Bluetooth communication. The WLAN module may provide solutions that include one or more of Wi-Fi direct, Wi-Fi LAN, or Wi-Fi softAP WLAN communications.

In some embodiments, the solution of wireless communication provided by the mobile communication module 150 may enable the terminal device 100 to communicate with a device (e.g., a cloud server) in the network, and the solution of WLAN wireless communication provided by the wireless communication module 160 may also enable the terminal device 100 to communicate with a device (e.g., a transit node) in the network and to communicate with the cloud server through the device (e.g., the transit node) in the network. Thus, the cloud server can discover the terminal device 100, and transmit data to the terminal device 100 or the terminal device 100 can transmit data to the cloud device.

The terminal device 100 implements a display function by the GPU, the display screen 194, and the application processor. The GPU is a microprocessor for image processing, and is connected to the display screen 194 and an application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. The processor 110 may include one or more GPUs that execute program instructions to generate or alter display information.

The display screen 194 is used to display images, video, and the like. The display screen 194 includes a display panel. The display panel may adopt a Liquid Crystal Display (LCD), an organic light-emitting diode (OLED), an active-matrix organic light-emitting diode (active-matrix organic light-emitting diode, AMOLED), a flexible light-emitting diode (FLED), a miniature, a Micro-oeld, a quantum dot light-emitting diode (QLED), and the like. In some embodiments, the terminal device 100 may include 1 or N display screens 194, where N is a positive integer greater than 1.

The terminal device 100 may implement a shooting function by the ISP, the camera 193, the video codec, the GPU, the display screen 194, the application processor, and the like, for example, based on which the above-described first video data can be obtained.

The ISP is used to process the data fed back by the camera 193. For example, when a photo is taken, the shutter is opened, light is transmitted to the camera photosensitive element through the lens, the optical signal is converted into an electrical signal, and the camera photosensitive element transmits the electrical signal to the ISP for processing and converting into an image visible to naked eyes. The ISP can also carry out algorithm optimization on the noise, brightness and skin color of the image. The ISP can also optimize parameters such as exposure, color temperature and the like of a shooting scene. In some embodiments, the ISP may be provided in camera 193.

The camera 193 is used to capture still images or video. The object generates an optical image through the lens and projects the optical image to the photosensitive element. The photosensitive element may be a Charge Coupled Device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor. The light sensing element converts the optical signal into an electrical signal, which is then passed to the ISP where it is converted into a digital image signal. And the ISP outputs the digital image signal to the DSP for processing. The DSP converts the digital image signal into image signal in standard RGB, YUV and other formats. In some embodiments, the terminal device 100 may include 1 or N cameras 193, N being a positive integer greater than 1.

The digital signal processor is used for processing digital signals, and can process digital image signals and other digital signals. For example, when the terminal device 100 selects a frequency point, the digital signal processor is used to perform fourier transform or the like on the frequency point energy.

Video codecs are used to compress or decompress digital video. The terminal device 100 may support one or more video codecs. In this way, the terminal device 100 can play or record video in a plurality of encoding formats, such as: moving Picture Experts Group (MPEG) 1, MPEG2, MPEG3, MPEG4, and the like.

The NPU is a neural-network (NN) computing processor that processes input information quickly by using a biological neural network structure, for example, by using a transfer mode between neurons of a human brain, and can also learn by itself continuously. The NPU can implement applications such as intelligent recognition of the terminal device 100, for example: image recognition, face recognition, speech recognition, text understanding, and the like.

The external memory interface 120 may be used to connect an external memory card, such as a Micro SD card, to extend the storage capability of the terminal device 100. The external memory card communicates with the processor 110 through the external memory interface 120 to implement a data storage function. For example, files such as music, video, etc. are saved in an external memory card.

The internal memory 121 may be used to store computer-executable program code, which includes instructions. The processor 110 executes various functional applications of the terminal device 100 and data processing by executing instructions stored in the internal memory 121. The internal memory 121 may include a program storage area and a data storage area. The storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required by at least one function, and the like. The storage data area may store data (such as audio data, a phonebook, etc.) created during use of the terminal device 100, and the like. In addition, the internal memory 121 may include a high-speed random access memory, and may further include a nonvolatile memory, such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (UFS), and the like.

The gyro sensor 180A may be used to determine the motion attitude of the terminal device 100. In some embodiments, the angular velocity of terminal device 100 about three axes (i.e., x, y, and z axes) may be determined by gyroscope sensor 180A. The gyro sensor 180A may be used to photograph anti-shake. Illustratively, when the shutter is pressed, the gyro sensor 180A detects the shake angle of the terminal device 100, calculates the distance to be compensated for by the lens module according to the shake angle, and allows the lens to counteract the shake of the terminal device 100 through a reverse movement, thereby achieving anti-shake. The gyro sensor 180A may also be used for navigation, somatosensory gaming scenes.

The acceleration sensor 180B can detect the magnitude of acceleration of the terminal device 100 in various directions (generally, three axes). The magnitude and direction of gravity can be detected when the terminal device 100 is stationary. The method can also be used for recognizing the posture of the terminal equipment, and is applied to horizontal and vertical screen switching, pedometers and other applications.

And a distance sensor 180C for measuring a distance. The terminal device 100 may measure the distance by infrared or laser. In some embodiments, shooting a scene, the terminal device 100 may range using the distance sensor 180C to achieve fast focus.

The proximity light sensor 180D may include, for example, a Light Emitting Diode (LED) and a light detector, such as a photodiode. The light emitting diode may be an infrared light emitting diode. The terminal device 100 emits infrared light to the outside through the light emitting diode. The terminal device 100 detects infrared reflected light from a nearby object using a photodiode. When sufficient reflected light is detected, it can be determined that there is an object near the terminal device 100. When insufficient reflected light is detected, the terminal device 100 can determine that there is no object near the terminal device 100. The terminal device 100 can detect that the user holds the terminal device 100 close to the ear by using the proximity light sensor 180D, so as to automatically turn off the screen to achieve the purpose of saving power. The proximity light sensor 180D may also be used in a holster mode, a pocket mode automatically unlocks and locks the screen.

The ambient light sensor 180L is used to sense the ambient light level. The terminal device 100 may adaptively adjust the brightness of the display screen 194 according to the perceived ambient light level. The ambient light sensor 180L may also be used to automatically adjust the white balance when taking a picture. The ambient light sensor 180L may also cooperate with the proximity light sensor 180D to detect whether the terminal device 100 is in a pocket to prevent accidental touches.

The fingerprint sensor 180E is used to collect a fingerprint. The terminal device 100 can utilize the collected fingerprint characteristics to realize fingerprint unlocking, access to an application lock, fingerprint photographing, fingerprint incoming call answering and the like.

The touch sensor 180F is also referred to as a "touch panel". The touch sensor 180F may be disposed on the display screen 194, and the touch sensor 180F and the display screen 194 form a touch screen, which is also called a "touch screen". The touch sensor 180F is used to detect a touch operation applied thereto or therearound. The touch sensor can communicate the detected touch operation to the application processor to determine the touch event type. Visual output associated with the touch operation may be provided through the display screen 194. In other embodiments, the touch sensor 180F may be disposed on the surface of the terminal device 100, different from the position of the display screen 194.

The keys 190 include a power-on key, a volume key, and the like. The keys 190 may be mechanical keys. Or may be touch keys. The terminal device 100 may receive a key input, and generate a key signal input related to user setting and function control of the terminal device 100.

The motor 191 may generate a vibration cue. The motor 191 may be used for incoming call vibration cues, as well as for touch vibration feedback. For example, touch operations applied to different applications (e.g., photographing, audio playing, etc.) may correspond to different vibration feedback effects. The motor 191 may also respond to different vibration feedback effects for touch operations applied to different areas of the display screen 194. Different application scenes (such as time reminding, receiving information, alarm clock, game and the like) can also correspond to different vibration feedback effects. The touch vibration feedback effect may also support customization.

Indicator 192 may be an indicator light that may be used to indicate a state of charge, a change in charge, or a message, missed call, notification, etc.

The SIM card interface 195 is used to connect a SIM card. The SIM card can be brought into and out of contact with the terminal device 100 by being inserted into the SIM card interface 195 or being pulled out of the SIM card interface 195. The terminal device 100 may support 1 or N SIM card interfaces, where N is a positive integer greater than 1. The SIM card interface 195 may support a Nano SIM card, a Micro SIM card, a SIM card, etc. The same SIM card interface 195 can be inserted with multiple cards at the same time. The types of the plurality of cards may be the same or different. The SIM card interface 195 may also be compatible with different types of SIM cards. The SIM card interface 195 may also be compatible with external memory cards. The terminal device 100 interacts with the network through the SIM card to implement functions such as communication and data communication. In some embodiments, the terminal device 100 employs eSIM, namely: an embedded SIM card. The eSIM card may be embedded in the terminal device 100 and cannot be separated from the terminal device 100.

In some embodiments, the terminal device 100 may acquire the description information of the target user input through the touch screen formed by the touch sensor 180F and the display screen 194; the terminal device 100 may further obtain first video data of the target user during the target motion through the camera 193 or obtain first video data of the target user during the target motion through the wireless communication module 160 or the mobile communication module 150, where the first video data may be stored in the internal memory 120 or the external memory 121, and in addition, a fluctuation condition of a plurality of key points in the first video data may be determined through the processor 110; a motion model of the target motion may then be determined by the processor 110 according to the fluctuation situation of the plurality of key points and the target times, and the motion model may be used for the terminal device 100 or other devices to identify the movement times.

The software system of the terminal device 100 may adopt a hierarchical architecture, an event-driven architecture, a micro-core architecture, a micro-service architecture, a cloud architecture, or the like. The embodiment of the present application takes an Android system with a layered architecture as an example, and exemplarily illustrates a software structure of the terminal device 100. Fig. 1C is a block diagram of a software structure of a terminal device 100 according to an embodiment of the present application, which is applied to fig. 1B. As can be seen in FIG. 1C, the layered architecture divides the software into several layers, each layer having a clear role and division of labor. The layers are communicated through a software interface, and respectively comprise an application program layer, an application program framework layer, an Android runtime (Android runtime) and system library and a kernel layer from top to bottom.

The application layer may include a series of application packages. As shown in fig. 1C, the application layer may include applications for motion counting, cameras, galleries, maps, wireless local area networks, bluetooth, music, video, etc.

The application framework layer provides an Application Programming Interface (API) and a programming framework for the application program of the application layer. The application framework layer includes a number of predefined functions. As shown in FIG. 1C, the application framework layers may include a window manager, content provider, view system, phone manager, resource manager, notification manager, and the like.

The window manager is used for managing window programs. The window manager can obtain the size of the display screen, judge whether a status bar exists, lock the screen, intercept the screen and the like.

The content provider is used to store and retrieve data and make it accessible to applications. The data may include video, images, audio, calls made and received, browsing history and bookmarks, phone books, etc.

The view system includes visual controls such as controls to display text, controls to display pictures, and the like. The view system may be used to build applications. The display interface may include one or more views. For example, the display interface including the short message notification icon may include a view for displaying text and a view for displaying pictures.

The resource manager provides various resources for the application, such as localized strings, icons, pictures, layout files, video files, and the like.

The notification manager enables the application to display notification information in the status bar, can be used to convey notification-type messages, can disappear automatically after a short dwell, and does not require user interaction. Such as a notification manager used to inform download completion, message alerts, etc. The notification manager may also be a notification that appears in the form of a chart or scroll bar text at the top status bar of the system, such as a notification of a background running application, or a notification that appears on the screen in the form of a dialog window. For example, text information is prompted in the status bar, a prompt tone is given, the terminal device vibrates, an indicator light flickers, and the like.

The Android Runtime comprises a core library and a virtual machine. The Android runtime is responsible for scheduling and managing an Android system. The core library comprises two parts: one part is a function which needs to be called by java language, and the other part is a core library of android.

The application layer and the application framework layer run in a virtual machine. And executing java files of the application program layer and the application program framework layer into a binary file by the virtual machine. The virtual machine is used for performing the functions of object life cycle management, stack management, thread management, safety and exception management, garbage collection and the like.

The system library may include a plurality of functional modules. For example: surface managers (surface managers), Media Libraries (Media Libraries), three-dimensional graphics processing Libraries (e.g., OpenGL ES), 2D graphics engines (e.g., SGL), and the like.

The surface manager is used to manage the display subsystem and provide fusion of 2D and 3D layers for multiple applications.

The media library supports a variety of commonly used audio, video format playback and recording, and still image files, among others. The media library may support a variety of audio-video encoding formats, such as MPEG4, h.264, MP3, AAC, AMR, JPG, PNG, and the like.

The three-dimensional graphic processing library is used for realizing three-dimensional graphic drawing, image rendering, synthesis, layer processing and the like. The 2D graphics engine is a drawing engine for 2D drawing.

The kernel layer is a layer between hardware and software. The inner core layer at least comprises a display driver, a camera driver, an audio driver and a sensor driver.

The following exemplifies the workflow of the software and hardware of the terminal device 100 in connection with the image information progress count scene in which the user's motion is captured. The following exemplifies the workflow of the terminal device 100 software and hardware.

When the touch sensor 180F receives a touch operation, a corresponding hardware interrupt is issued to the kernel layer. The kernel layer processes the touch operation into an original input event (including touch coordinates, a time stamp of the touch operation, and other information). The original input event is stored in the kernel layer, and the application program framework layer acquires the original input event from the kernel layer and identifies the control corresponding to the input event. Taking a control corresponding to the touch operation as an example of a motion counting application, the motion counting application calls an interface of an application framework layer, starts the motion counting application, further starts a camera drive by calling a kernel layer, and captures first video data of a target user in a target motion process by a camera 193; or the kernel layer is called to start the display driver to obtain the first video data stored in the gallery, then the processor 110 processes the video data to obtain the first image sequence feature, and then the processor 110 processes the first image sequence feature and the image sequence feature stored in the database to generate a processing result. If the processing result is that the database does not have a motion model for identifying the motion type and the motion count of the video data, the notification manager may enable the motion count application program to display the processing result in the status bar, where the processing result may be to prompt the target user to input description information, where the description information is used to describe that the motion performed by the target user is the target motion and describe that the number of times the target motion is performed is the target number of times. And after the frame layer acquires the description information input by the target user from the kernel layer, the motion model of the target motion is determined by the processor according to the fluctuation conditions of a plurality of key points in the first video data and the target times.

Fig. 2 is a schematic structural diagram of a motion counting apparatus according to an embodiment of the present application, which is applied to fig. 1B. The motion counting apparatus 200 may correspond to the terminal device or one or more devices in the terminal device, for example, the apparatus 200 may be the processor 110 or a software program running on the processor 110. Illustratively, the apparatus 200 may specifically correspond to the motion counting procedure in the application layer in fig. 1C. It should be understood that the motion counting apparatus 200 shown in FIG. 2 may have more or fewer components than shown in FIG. 2, may combine two or more components, or may have a different configuration of components. The various components shown in the figures may be implemented in hardware, software, or a combination of hardware and software, including one or more signal processing and/or application specific integrated circuits. The motion counting apparatus 200 may include: the image processing system comprises an acquisition unit 201, an image sequence feature extraction unit 202, a similarity judgment unit 203, an input unit 204, a fluctuation condition extraction unit 205, a motion model establishment unit 206, a motion counting unit 207, a local database 208A, a cloud server database 208B, an output unit 209 and a storage unit 210. Wherein:

the acquiring unit 201 is configured to acquire first video data, where the first video data is video data of a target user in a target motion process. An image sequence feature extraction unit 202, configured to extract a first image sequence feature of the first video data, and may save the first image sequence feature to at least one of a local database 208A or a database 208B of the motion server. In the embodiment of the present application, the image sequence is a series of images sequentially and continuously acquired from video data, and the series of images may include a plurality of frames of images.

Optionally, in this embodiment of the present application, the video data includes an image sequence, the image sequence of the video data is input into a convolutional neural network, and an output vector of a last pooling Pool (Avg-Pool) in the network is taken to obtain an image sequence feature, where the image sequence feature can reflect one or more motion characteristics of a user in the video data. Wherein the convolutional neural network may be one or more of: a dual Stream dilated 3D convolutional network (Two-Stream inflected 3D ConvNet, I3D), a fast-slow join network (SlowFastNet), and so on.

The similarity judging unit 203 is further configured to calculate a second similarity between the first image sequence feature of the first video data and the image sequence features stored in the database; if the second similarity is smaller than a second preset threshold, it indicates that a motion model for counting the motion in the first video data does not exist in the database. Alternatively, the first similarity and the second similarity are calculated by a distance similarity, K-nearest neighbor (KNN) algorithm, or the like.

An input unit 204, configured to obtain description information input by a first user when the similarity determination unit 203 determines that the second similarity is smaller than a second preset threshold, where the description information is used to describe a target motion performed by the target user and a target number of times the target motion is performed; the first user may be a target user who performs target motion, or may be another user outside the target user, which is not limited in this embodiment of the present application.

A fluctuation-situation extraction unit 205, configured to determine a first fluctuation situation of the plurality of key points of the target user in the first video data.

A motion model establishing unit 206, configured to determine a motion model of the target motion according to the first fluctuation condition and the target number, and store the motion model in at least one of a local database 208A or a database 208B of a cloud server; wherein the motion model is used for motion counting.

Optionally, the motion model establishing unit 206 is specifically configured to: selecting k key points with variation and with fluctuation amplitude arranged at k bits from the first fluctuation condition; carrying out peak value counting on the k key points to obtain a count value; and determining a motion model of the target motion according to the counting value and the target times in a regression fitting mode.

After the motion model establishing unit 206 determines the motion model of the object motion, a motion counting unit 207 for counting the object motion performed by the second user in the second video data through the motion model; the second user may be a target user who performs a target motion in the first video data, a first user who performs an input operation, or may be the first user and another user outside the target user.

Specifically, the similarity determining unit 203 determines that, if a first similarity between a second image sequence feature in the second video data and a first image sequence feature in the first video data is greater than a first preset threshold, the motion counting unit 207 is instructed to perform motion counting on the motion performed by a second user in the second video data through the motion model and a second fluctuation condition of a plurality of key points of the second user in the second video data; wherein the first similarity is used to characterize a standard degree to which the second user performs the target motion.

An output unit 209, configured to output motion evaluation information, where the motion evaluation information is generated according to the first similarity.

Optionally, the database includes at least one of a local database 208A or a database 208B of the cloud server. The local database 208A may be an external memory card to which the internal memory 121 or the external memory interface 120 shown in fig. 1B is connected. The cloud server database 208B includes a pool of storage resources on the cloud server for storing data.

A saving unit 210, configured to, after the motion model establishing unit 206 determines a motion model of the target motion according to the fluctuation conditions of the plurality of key points in the first video data and the target number, save the motion model or provide the motion model to a device at the opposite end of communication.

It is to be understood that the structure of the motion model generation device shown in fig. 2 is only an exemplary implementation manner in the embodiment of the present application, and the structure of the motion counting device in the embodiment of the present application includes, but is not limited to, the above structure. The above units may be implemented in hardware, software or a combination of both. The software may be stored in the memory and executed by the processor, and the hardware may include, but is not limited to, various types of processing circuits or processors, which may be specifically referred to the description of the following embodiments and will not be described herein.

Referring to fig. 3, fig. 3 is a schematic structural diagram of a cloud server according to an embodiment of the present disclosure, where the cloud server 30 includes at least one processor 301, at least one memory 302, and at least one communication interface 303. In addition, the device may also include common components such as an antenna, which will not be described in detail herein.

The processor 301 may be a general purpose Central Processing Unit (CPU), a microprocessor, an application-specific integrated circuit (ASIC), or one or more integrated circuits for controlling the execution of programs according to the above schemes.

Communication interface 303 is used for communicating with other devices or communication Networks, such as ethernet, Radio Access Network (RAN), core network, Wireless Local Area Networks (WLAN), etc.

The Memory 302 may be, but is not limited to, a Read-Only Memory (ROM) or other type of static storage device that can store static information and instructions, a Random Access Memory (RAM) or other type of dynamic storage device that can store information and instructions, an Electrically Erasable Programmable Read-Only Memory (EEPROM), a Compact Disc Read-Only Memory (CD-ROM) or other optical Disc storage, optical Disc storage (including Compact Disc, laser Disc, optical Disc, digital versatile Disc, blu-ray Disc, etc.), magnetic disk storage media or other magnetic storage devices, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. The memory may be self-contained and coupled to the processor via a bus. The memory may also be integral to the processor.

The memory 302 is used for storing computer programs for executing the above schemes, and is controlled by the processor 301 to execute. The processor 301 is configured to execute the computer program stored in the memory 302, so as to implement a corresponding function of the cloud server, where the specific function to be implemented may refer to the description of the cloud server, such as storing the established motion model, outputting the established motion model, and the like, for example, the cloud server sends the established motion model to the terminal device.

Based on the system architecture of the motion model generation method provided in fig. 1A, the embodiment of the present application provides a flow diagram of the motion model generation method, please refer to fig. 4. The motion model is used for motion counting, and the motion model generation method may include, but is not limited to, the following steps:

step S401, the terminal device obtains first video data of a target user in a target motion process. In particular, as the life style of the user changes, the types of exercise that the user takes are endless and innovative. Fig. 4A is a schematic diagram of a user performing an instrumental exercise according to an embodiment of the present application, fig. 4B is a schematic diagram of a user performing an instrumental exercise according to an embodiment of the present application, and it can be seen from fig. 4A and 4B that the types of exercises performed by the user are various.

When a user wants to record the process of executing a certain motion, the video data of the motion can be acquired through a camera of the terminal equipment; or when the user wants to help himself to determine the standard degree of executing the exercise by recording the standard exercise posture of another person (such as a fitness coach), the camera of the terminal equipment can be used for acquiring the exercise executing process of another person (such as a fitness coach); or the terminal equipment can be used for acquiring video data of others (such as fitness coaches) in the process of performing the exercise through a network way.

In the embodiment of the present application, for convenience of subsequent description, the executed motion is referred to as target motion, a user executing the motion is referred to as a target user, and video data in a process of the target user executing the target motion is referred to as first video data.

In step S402, the terminal device determines whether a motion model for motion counting of the target motion in the first video data exists in the database. In particular, motion models for certain types of motion and image sequence features corresponding to video data captured when the user performs these motions may be stored in the database. In the embodiment of the present application, the motion model is obtained based on the video data in the motion process and the motion times of the corresponding motion types, so that the motion model can reflect the relationship between the video data and the motion times to a certain extent, and after a piece of video data is input to the motion model, the number of times of the corresponding motion performed in the piece of video data can be predicted. For example, the motion model can reflect a mapping relationship C ═ f (C) formed by an independent variable and a dependent variable₁、C ₂、...、C _k) In the present embodiment, the independent variable (C)₁、C ₂、...、C _k) It may be a fluctuation situation of a plurality of key points in the video data, and the dependent variable C may be the number of movements.

Optionally, the terminal device may extract a first image sequence feature of the first video data, then perform similarity calculation on the first image sequence feature and an image sequence feature stored in the database, and determine whether a motion model for performing motion counting on target motion in the first video data exists in the database according to a second similarity obtained through the calculation.

In general, if the image sequence features in the two captured video data are relatively close, for example, the second similarity is greater than a second preset threshold, the motion types recorded in the two captured video data are considered to be the same.

If the database does not have the target image sequence feature with the second similarity between the target image sequence feature and the first image sequence feature of the first video data being greater than or equal to the second preset threshold, it may be determined that the motion model for performing motion counting on the target motion in the first video data does not exist in the database, that is, the motion in the first video data is the user-defined motion of the target user, so that the related operation of acquiring the description information input by the target user may be triggered, specifically, the operation is as in steps S403 to S408.

If the database has a target image sequence feature with a second similarity greater than or equal to a second preset threshold with respect to the first image sequence feature of the first video data, it may be determined that a motion model for counting a motion of a target in the first video data exists in the database, and therefore, the motion frequency of the target in the first video data is identified by the motion model of the target motion existing in the database.

It should be noted that the second preset threshold is a preset threshold for reference contrast, and optionally, the second preset threshold is obtained by counting a large number of similarity values, and can reflect to some extent whether the two image sequence features used for calculating the similarity are image sequences of video data of the same motion type, for example, if the statistical result indicates that in 95% of experimental data, the similarity of the image sequences of the video data of the same motion type is not lower than 0.8, then 0.8 may be set as the second preset threshold. Of course, the second preset threshold may also be obtained by other means, such as training of a correlation model, and so on.

The database in the embodiment of the application comprises at least one of a local database or a database of a cloud server. Optionally, if the database includes a local database and a database of the cloud server, similarity calculation may be performed on a first image sequence feature of the first video data and an image sequence feature stored in the local database first when calculating the similarity, and when the calculated similarity is smaller than a second preset threshold, similarity calculation is performed on the first image sequence feature and the image sequence feature stored in the database of the cloud server, and if the calculated similarity is still smaller than the second preset threshold, it is indicated that a motion model for performing motion counting on target motion in the first video data does not exist in the database.

If any one of the local database or the database of the cloud server has a target image sequence feature, the similarity of which with the first image sequence feature of the first video data reaches a second preset threshold, it indicates that a motion model for counting the motion of the target of the first video data exists in the database, so that the motion model of the target motion existing in the database can be loaded to identify the motion times of the target motion in the first video data.

Optionally, if the similarity between the first image sequence feature of the first video data and the image sequence feature stored in the local database is greater than a second preset threshold, the terminal device may load a motion model of the target motion in the local database to identify a motion count of the target motion in the first video data.

Optionally, under the condition that the similarity between the first image sequence feature of the first video data and the image sequence feature stored in the local database is smaller than a second preset threshold, if the calculated similarity between the first image sequence feature of the first video data and the image sequence feature stored in the database of the cloud server is larger than the second preset threshold, the terminal device may load a motion model of the target motion in the database of the cloud server to identify the target motion count in the first video data.

For example: the terminal device may extract the first image sequence feature of the first video data by using a convolutional neural network for video motion recognition, where the convolutional neural network may be one or more of the following: I3D, SlowFastNet, etc. The terminal equipment inputs the image sequence of the first video data into the convolutional neural network, and an output vector v of the last pooling layer (Avg-Pool) of the network is taken to be called as a first image sequence characteristic of the first video data. Next, the terminal device may calculate the similarity between the first image sequence feature and the image sequence features stored in the database by one or more of distance similarity, KNN nearest neighbor classifier, and correlation coefficient calculation. Suppose that the database stores image sequence features corresponding to a motion model i, denoted v_iWhere i is a positive integer greater than or equal to 0, e.g. v₀Image sequence characteristics that can represent push-up motion, v₁Image sequence features that can represent sit-up movements, and so on; and the first image sequence feature of the first video data is denoted v, the first image sequence feature v and the image sequence features v stored in the database may be calculated according to a cosine distance similarity algorithm_iSimilarity between them

Similarity cos_iThe value of (a) ranges between 0 and 1. It will be appreciated that the similarity cos is_iA value of (a) closer to 1 may indicate that the more likely an image sequence feature is stored in the database that is similar to the first image sequence feature of the first video data; similarity cos_iA value of (d) closer to 0 may indicate that the less likely it is that the image sequence features stored in the database are more dissimilar to the first image sequence features of the first video data. If the second predetermined threshold is 0.8 and i is 7, the calculated similarity cos is determined_iRespectively as follows: cos (chemical oxygen demand)₁＝0.53、cos ₂＝0.14、cos ₃＝0.95、cos ₄＝0.86、cos ₅＝0.34、cos ₆＝0.78、cos ₇0.87, then cos in the similarity obtained due to the calculation₃＝0.95、cos ₄0.86 and cos₇If the similarity of three similarity values 0.87 is greater than the second preset threshold value 0.8, the largest similarity value cos among the three similarity values may be selected₃The motion type represented by the corresponding image sequence feature is set to 0.95 (or any one of the three) as the target motion type, and the motion model of the motion type represented by the corresponding image sequence feature is used for carrying out motion counting on the target motion in the first video data.

In step S403, the terminal device obtains the description information input by the first user. Specifically, the terminal device may prompt the first user to input description information about the target motion, and after the first user performs input, the terminal device receives the description information accordingly, where the first user input may be a key input, a touch input, a voice control input, a gesture input, or the like. It should be noted that the first user is the target user or another user outside the target user. It is to be understood that the first user performing the output operation may be a target user performing the target motion in the first video data, or may be another user outside the target user. The description information is used to describe the movement performed by the target user as the target movement and the number of times the target movement is performed as the target number of times, for example, the performed target movement is the opening and closing jumping movement, and the number of times the target movement is performed is 3 times, and information such as the "opening and closing jumping movement" and "3 times" may be input when input is performed.

In step S404, the terminal device determines a first fluctuation situation of a plurality of key points of the target user in the first video data.

Specifically, the key points can embody the motion characteristics of the target user, the key points can be human skeleton (skeleton) key points of the target user, or the key points can be predetermined optical flow key points which can embody the motion of the human body, and the like. Wherein the key point of optical flow is throughAnd (4) selecting key points by an optical flow method. For example, it can be seen from FIG. 4C that human skeleton (bone) key points include, but are not limited to, P₁、P ₂、P ₃、....、P ₁₆These 16 key points. Since the first video data records the process of the target user performing the target motion, the terminal device can determine the fluctuation situation of a plurality of key points in the first video data according to the first video data.

Optionally, determining fluctuation conditions of a plurality of key points in the first video data may specifically be: determining a plurality of key points in the first video data, for example, the terminal device may determine the plurality of key points in the first video data according to human skeleton (skeleton) key points of a target user or predetermined optical flow key points capable of embodying human motion; generally, Video (Video) generally refers to various technologies for capturing, recording, processing, storing, transmitting and reproducing a series of still images as electrical signals, and in the embodiment of the present application, the first Video data includes a plurality of frames of images (i.e. a series of still images), so that the key points of each frame of image in the first Video data can be determined; next, taking the coordinates of the plurality of key points in the reference frame image of the first video data as reference frame coordinates, calculating the distances of the coordinates of the plurality of key points in each frame image of the first video data relative to the reference frame coordinates, and then projecting the calculated distances on a preset coordinate axis to determine the fluctuation condition of the plurality of key points in the first video data, wherein the fluctuation condition can be the variation waveform of the distances of the plurality of key points in the first video data along with time.

Step S405, the terminal device determines a motion model of the target motion according to the first fluctuation condition and the target times. Specifically, not every key point necessarily has a fluctuation amplitude in the process of the target user performing the target motion, for a key point without a fluctuation amplitude, it may be considered that the body part represented by the key point in the motion process of the target user is kept still, and therefore, the key point without a wave amplitude is considered to be unable to reflect the motion situation of the target user, and therefore, the key point with a fluctuation amplitude is specifically selected for analysis in the subsequent processing process, and generally, the motion process is repetitive, so the terminal device needs to select a key point with a fluctuation amplitude and a periodic variation from a plurality of key points in the first video data, and then determine the motion model of the target motion by combining the target times input by the target user.

Optionally, the terminal device needs to select k key points with variation and fluctuation amplitude arranged at the top k bits from the plurality of key points in the first video data; then, counting the motion peak values of the k key points to obtain a count value; and determining a motion model of the target motion according to the counting value and the target times in a regression fitting mode.

Generally, fitting is to connect a series of points on a plane with a smooth curve, and there are various fitting methods because of the myriad possibilities of this curve. There are various methods for implementing the fitting, and the fitting is implemented by using a regression analysis method in the embodiment of the present application, but the method for implementing the fitting is not limited in any way in the embodiment of the present application. In general, regression analysis is to determine the mapping relationship between variables by specifying dependent variables and independent variables, use the mapping relationship as a regression model, solve various parameters (including the proportional coefficients of the independent variables) in the regression model according to measured data, and then further predict the dependent variables through the determined regression model. The regression analysis method includes, but is not limited to, linear regression, support vector machine regression, and other regression models. In the embodiment of the application, the target times are used as a dependent variable in regression analysis, the count values of k key points selected from the first video data are used as independent variables in the regression analysis, and the mapping relationship between the target times and the count values obtained through the regression analysis is a motion model of target motion. For example, the number of times of the target acquired by the terminal device is C, and the count values of the k selected key points are C respectively₁、C ₂、...、C _kThe mapping relationship between the target times and the count value can be expressed by means of regression fitting as follows: c ═ f (C)₁、C ₂、...、C _k)。

By way of example, a classical linear regression model can be represented as:

C＝f(C ₁、C ₂、...、C _k)＝a ₁C ₁+a ₂C ₂+…+a _kC _k+ b (formula 1)

At least k count values are typically required to obtain a unique scaling factor value (a)₁，a ₂，…，a _kB), usually the above-mentioned count (C) of k key points₁、C ₂、...、C _k) And the target times C are positively correlated. Because the target number of times C and the count value of k key points (C)₁、C ₂、...、C _k) Are known, so the value of the proportionality coefficient (a) can be calculated₁，a ₂，…，a _kAnd b) the following:

the constant term b may take 0 or other values. The motion model of the object motion may be specifically expressed as:

wherein the content of the first and second substances,

a count value representing the peak value of the waveform of k key points in the video data for which the motion count needs to be determined using the motion model described above.

In the linear regression model, in addition to the scaling factor, the constant term represents the deviation caused by the target user starting the preparation motion or finishing the finishing motion, and the linear regression model can be expressed as:

C＝f(C ₁、C ₂、...、C _k)＝a ₁(C ₁+b ₁)+a ₂(C ₂+b ₂)+…+a _k(C _k+b _k) (formula 2)

Therefore, the proportionality coefficient (a) can be obtained by equation 2₁，a ₂，…，a _kAnd b) the following:

wherein, the setting can be set according to the actual situation

Is an integer, then b_iAbsolute value of | b_iAnd | is a minimum value satisfied as an integer. In addition, considering that the key point with large motion action amplitude is the main reference basis of counting, the scale factor needs to be weighted according to the amplitude, and the expression of the weight is as follows:

equation 2 can be expressed as:

wherein the content of the first and second substances,

peak _jis represented by C_iOne peak, valley_iIs represented by C_iAnd (4) valley value. The motion model of the object motion can therefore be further specified as:

wherein the content of the first and second substances,

a count value representing the peak value of k key points in the video data for which a motion count needs to be determined using the motion model described above.

In step S406, the terminal device counts target motions performed by the second user in the second video data through the motion model. Specifically, after the second user acquires second video data of the second user performing the motion through the terminal device, whether the motion performed by the second user in the second video data is the target motion may be identified through the previously obtained first image sequence feature, and if the second user is the target motion, the number of times of the motion performed by the second user in the second video data may be further identified through the generated motion model of the target motion. The second user who performs the target motion in the second video data may be the target user who performs the target motion in the first video data, the first user who performs the input operation, or another user outside the first user and the target user.

Optionally, the terminal device may extract a second image sequence feature in the second video data, wherein the method for extracting the second image sequence feature in the second video data may refer to extracting the relevant description of the first image sequence feature in the first video data in step S402. If the calculated first similarity between the second image sequence feature and the first image sequence feature in the first video data is greater than the first preset threshold, the terminal device may determine that the motion types in the second video data are the same as the motion types in the first video data, that is, the motion types are target motions, and then further identify the number of times of executing the target motions in the second video data according to a motion model corresponding to the target motions and fluctuation conditions of a plurality of key points in the second video data. The method for calculating the first similarity between the second image sequence feature and the first image sequence feature may refer to the correlation description for calculating the second similarity between the first image sequence feature and the image sequence features stored in the database in S402.

The first preset threshold is a preset threshold for reference contrast, and optionally, the first preset threshold is obtained by counting a large number of similarity values, and can reflect to some extent whether the two image sequence features used for calculating the similarity are image sequences of video data of the same motion type, for example, if the statistical result shows that in 95% of experimental data, the similarity of the image sequences of the video data of the same motion type is not lower than 0.8, 0.8 may be set as the first preset threshold. Of course, the first preset threshold may also be obtained by other means, such as training a correlation model, and so on.

In step S407, the terminal device outputs the motion evaluation information. Specifically, if the first similarity is greater than a first preset threshold, the terminal device may output (for example, display through a display screen or perform voice broadcast through an audio module) the motion evaluation information, and accordingly, the user may acquire the motion evaluation information output by the terminal device. Wherein the motion evaluation information is generated according to the first similarity. Alternatively, the motion estimation information may be a numerical value corresponding to the first similarity, an action comment, or the like. For example, if the first similarity is 0.8, the similarity 0.8 may correspond to a value of "80", and the exercise comment corresponding to the similarity 0.8 may be "action comparison standard". The motion assessment information may be visual information, for example "action comparison criteria" may be displayed on the display. Or the exercise evaluation information may also be other invisible information, such as a section of prompt voice, a prompt including "action comparison criteria", and played through a speaker, and the present embodiment is not limited to this. The first similarity is a similarity between a first image sequence feature in the first video data and a second image sequence feature in the second video data. It can be understood that, if the exercise model is calculated according to the video data of the standard exercise process of the fitness trainer, the exercise evaluation information output by the terminal device can help the target user to judge the standard degree of the exercise process of the target user, if the value corresponding to the first similarity is larger, the standard degree of the exercise can be shown to be closer to the fitness trainer, and if the value corresponding to the first similarity is smaller, the standard degree of the exercise can be shown to be deviated from the fitness trainer, so that the target user can be helped to improve the action by outputting the exercise evaluation information.

Step S408, the terminal device uploads the first image sequence characteristics and the motion model to a database of the cloud server. Optionally, if the target user using the terminal device agrees to upload the first image sequence feature and the motion model of the target motion to the cloud server, the terminal device may provide the motion model of the target motion and the first image sequence to a communication peer device (for example, the cloud server), so that the cloud server may send the motion model of the target motion and the first image sequence to other terminal devices for the other terminal devices to perform motion type identification and motion counting.

Optionally, after the terminal server obtains the first video data in the process that the target user executes the target motion, the first video data may be sent to the cloud server, and the cloud server determines whether a motion model for performing motion counting on the target motion of the first video data exists in the database, specifically referring to the relevant description of the operation performed by the terminal device in step S402. The cloud server may further determine fluctuation conditions of a plurality of key points in the first video data, and specifically refer to the relevant description of the operation performed by the terminal device in step S404. After the terminal device obtains the description information input by the target user, the description information can be sent to the cloud server. The cloud server can determine a motion model of the target motion according to the fluctuation conditions of the plurality of key points in the first video data and the target times, the motion model can be sent to the terminal device after the motion model is determined, and the terminal device identifies the motion times in the second video data through the motion model.

In the method, a terminal device acquires first video data, analyzes and obtains fluctuation conditions of a plurality of key points in the first video data, and inputs a motion type (namely target motion) and the number of times of executing the motion of the type (namely target number) to the terminal device by a user; then the terminal device generates a motion model of the target motion for the user according to the fluctuation condition and the motion times, and then can count the motion of the terminal device based on the motion model. By adopting the mode, the user can not only expand the expected motion model, but also better meet the individual requirements of the user on the motion model; and, it is more efficient swift to establish the motion model through this kind of mode.

An application scenario diagram of the motion model generation method provided by the embodiment of the present application is introduced below in combination with the first application scenario. Fig. 5 is a view of an application scenario of a motion model generation method provided in an embodiment of the present application, where fig. 5 illustrates a motion process in which a user performs an opening and closing jumping motion by using first video data, and a target number of motions is 3, that is, the embodiment shown in fig. 5 may be regarded as a specific implementation of the embodiment of the method shown in fig. 4. Referring to fig. 5, the user acquires the first video data that performs the opening and closing jumping motion process by himself/herself through the camera 193 of the terminal device 100. Assuming that motion models of push-ups, sit-ups, squats and other motion types and image sequence features thereof are built in the local database, a convolutional neural network (such as I3D, SlowFastNet and the like) for video motion recognition is adopted to calculate first image sequence features of the first video data, and similarity calculation is performed with the image sequence features of the built-in motion models in the local database, and methods such as distance similarity, KNN nearest classifier and the like can be adopted to perform similarity calculation. If the similarity between the first image sequence characteristics of the opening and closing jumping movement and the image sequence characteristics stored in the local database is smaller than a second preset threshold value, it is indicated that no movement model for counting the movement in the first video data exists in the local database.

At this time, the terminal device prompts the user whether to approve searching from the database of the cloud server, because the database of the cloud server contains more image sequence features of motion types and corresponding motion models, such as image sequence features of motion types of chest expanding motion, leg lifting, longitudinal jumping and the like and corresponding motion models besides push-up, sit-up and deep squatting. When the target user selects the "yes" button, the terminal device may perform similarity calculation on the image sequence features of the first video data and the image sequence features stored in the cloud server, and if the calculated similarity is still smaller than a second preset threshold, it indicates that there is no motion model for performing motion counting on the motion in the first video data in the database of the cloud server. At this time, the terminal device may determine the motion in the first video data as the customized motion of the target user.

When the terminal device judges that the motion is the user-defined motion of the target user, the terminal device can prompt the user to input the motion type (namely, target motion) of the motion and the number of times of executing the motion (namely, target number of times), and assume that the motion type of the motion input by the user is 'jump motion open and close' and the number of times of executing the motion is '3'. Next, the terminal device needs to determine a motion model of the motion type of the open-close jump according to the first video data recording the open-close jump motion (i.e., the target motion) and the number of times of target user input (i.e., the target number of times). Assuming that the extracted key points are human skeleton (bone) key points as shown in fig. 4C, fluctuation conditions of a plurality of key points in the process of performing the opening and closing jumping motion by the target user, which are obtained by the terminal device according to the analysis of the first video data, are as shown in 5a in fig. 5, and it can be seen from 5a in fig. 5 that there are 4 key points of leg portions P6, P7, P11, P12 and 2 key points of foot portions P15, P16, so that these 6 key points can be used as candidate key points, and then the peak value counting is performed on the waveforms of these 6 key points to obtain the count value thereof. As can be seen from 5a in FIG. 5, P6 corresponds to the count value C₁Is 3, and the count value C corresponds to P7₂Is 3, and the count value C corresponds to P11₃A count value C of 4 corresponding to P12₄A count value C of 5 corresponding to P15₅Is 3, and the count value C corresponds to P16₆Is 3.

The classical linear regression model C ═ f (C)₁、C ₂、...、C ₆)＝a ₁C ₁+a ₂C ₂+…+a ₆C ₆+ b as the motion model of the opening and closing jumping motion, assume the count value (C) of the selected 6 key points₁、C ₂、...、C ₆) The mapping relation with the number of times C the motion is performed is a direct proportional relation and the coefficients of the 6 key points are independent from each other, so the constant term b may be 0. And then according to the number of times C that the motion is performed and the count value of 6 key points (C)₁、C ₂、...、C ₆) Determining a proportionality coefficient (a) that makes a linear regression model true₁、a ₂、...、a ₆) The obtained proportion series are respectively:

therefore, the motion model of the resulting opening and closing jump motion can be expressed as:

the terminal equipment can store the image sequence characteristics of the opening and closing jumping motion and the motion model obtained through calculation into a local database. Optionally, the terminal device may inquire whether the user agrees to upload the image sequence features and the motion model of the opening and closing jumping motion to the database of the cloud server, and if the user selects the "yes" button, it indicates that the user agrees to upload the image sequence features and the motion model to the database of the cloud server. Therefore, the terminal device sends the motion model of the opening and closing jumping motion to the cloud server, and then the cloud server can store the image sequence characteristics and the motion model of the opening and closing jumping motion so as to facilitate the downloading and use of other devices.

If the target user acquires (such as shoots and downloads) a section of second video data recording the opening and closing movement, similarity calculation is carried out on image sequence characteristics in the second video data and image sequence characteristics of the opening and closing movement stored in a local database, and the similarity obtained by the calculation is larger than a second preset threshold value because the image sequence characteristics of the opening and closing movement are stored in the local database, so that a movement model of the opening and closing movement can be loaded, and the number of times of the opening and closing movement executed by the target user, namely movement counting, in the second video data is obtained by extracting variation waveforms of key points of legs and feet of the opening and closing movement and analyzing by the movement model of the opening and closing movement.

An application scenario diagram of another motion model generation method provided in the embodiment of the present application is introduced below in combination with the application scenario two. Fig. 6 is a view illustrating an application scenario of another motion model generation method provided in an embodiment of the present application, where fig. 6 illustrates a motion process of a user performing stepping and rope skipping motions with a target of first video data, and a target number of the motions is 10, that is, the embodiment shown in fig. 6 may be regarded as a more specific implementation of the embodiment of the method shown in fig. 4. Referring to fig. 6, the user acquires first video data for performing a stepping and skipping exercise process by himself/herself through the camera 193 of the terminal device 100. Assuming that motion models of motion types such as normal rope skipping and cross rope skipping and image sequence characteristics thereof are built in the local database, a convolutional neural network (such as I3D or a dual-current convolutional network) for video motion recognition is adopted to calculate first image sequence characteristics of the first video data, similarity calculation is performed with the image sequence characteristics of the built-in motion models in the local database, and methods such as distance similarity and correlation coefficient can be adopted to perform similarity calculation. If the similarity between the first image sequence characteristics of the step rope skipping movement and the image sequence characteristics stored in the local database is smaller than a second preset threshold value, it is indicated that no movement model for counting the movement in the first video data exists in the local database. At the moment, the terminal equipment prompts whether the user agrees to search from the database of the cloud server or not, if the user selects a 'no' button, the user does not agree to search through the cloud server, and the motion in the first video data is judged to be the user-defined motion of the target user according to the similarity obtained by calculating the image sequence characteristics of the first video data and the image sequence characteristics stored in the local database.

At this time, the terminal apparatus may prompt the user to input a motion type (i.e., target motion) of the motion and the number of times of performing the motion (i.e., target number of times), assuming that the motion type of the motion input by the user is the "stepping and skipping motion" and the number of times of performing the motion is "10". Next, the terminal device needs to determine a motion model of the motion type of stepping skipping rope according to the first video data recording the stepping skipping rope motion (i.e., the target motion) and the number of times of target user input (i.e., the target number of times). Assuming that the extracted key points are human skeleton (skeleton) key points as shown in fig. 4C, the terminal device obtains fluctuation conditions of a plurality of key points in the process of executing stepping and rope skipping motions by the target user according to the first video data, and four key points with large fluctuation amplitude, namely leg portions P11 and P12 and foot portions P15 and P16, can be selected as candidate key points from the fluctuation conditions of the plurality of key points; then the wave forms of the 4 key points P11, P12, P15 and P16 are subjected to peak value counting to obtain a counting value, and the counting value C of the P11 key point₁Count value C of P12 key point of 10₂Count value C of P15 key point of 11₃At 11, count value C of P16 key point₄Is 12; obtaining the peak value peak and the valley value valley of the 4 key points through a formula

Calculating the average fluctuation amplitude of the four key points P11, P12, P15 and P16, and calculating the average fluctuation amplitude P of the key point P11₁Is the average fluctuation amplitude P of the 2.5, P12 key point₂Is the average fluctuation amplitude P of the 2.7, P15 key point₃Is 4.9, and the average fluctuation amplitude P of the P16 keypoint₄Is 3.8.

Linear regression model: c ═ f (C)₁ C ₂... C _k)＝w ₁×a ₁(C ₁+b ₁)+w ₂×a ₂(C ₂+b ₂)+...+w _i×a _k(C _k+b _k) K is used as a motion model of stepping rope skipping motion according to the number of times of executing motion C and the count value of 4 key points (C)₁、C ₂、C ₃、C ₄) Determining the proportionality coefficients for establishing the linear regression model as

Need to satisfy

Is an integer, then b₁Absolute value of | b₁I is such that

Is the minimum value of an integer, thus b₁0; in the same way, b can be obtained₂＝-1，b ₃＝-1，b ₄＝-2。

Then, by the formula

Calculating the weight values of the four key points to obtain the weight value of the P11 key point

Weight value of P12 keypoint

Weight value of P15 keypoint

Weight of P16 keypoint

The motion model for finally determining the stepping rope skipping motion is specifically expressed as follows:

the terminal equipment can store the calculated image sequence characteristics of the stepping and rope skipping movement and the movement model into a local database. Optionally, the terminal device may inquire whether the user agrees to upload the image sequence characteristics and the motion model of the stepping and rope skipping motion to the cloud server, and if the user selects the "no" button, it indicates that the user does not agree to upload the image sequence characteristics and the motion model to the database of the cloud server.

In the foregoing embodiments, the descriptions of the respective embodiments have respective emphasis, and for parts that are not described in detail in a certain embodiment, reference may be made to related descriptions of other embodiments.

The embodiment of the present application further provides a chip system, where the chip system includes at least one processor, a memory and an interface circuit, where the memory, the transceiver and the at least one processor are interconnected by a line, and the at least one memory stores a computer program; the method flow shown in fig. 4 is implemented when the computer program is executed by the processor.

An embodiment of the present application further provides a computer-readable storage medium, in which a computer program is stored, and when the computer program runs on a network device, the method flow shown in fig. 3 is implemented. An embodiment of the present application further provides a computer program product, where when the computer program product runs on a terminal device, the method flow shown in fig. 4 is implemented. The computer program product may comprise a plurality of code instructions, which are implemented as a plurality of software units, and may specifically refer to the implementation of fig. 2, and may specifically correspond to the motion counting program in the application layer of fig. 1C, and may specifically refer to the description of the foregoing embodiments.

As used in the above embodiments, the term "when …" may be interpreted to mean "if …" or "after …" or "in response to a determination of …" or "in response to a detection of …", depending on the context. Similarly, depending on the context, the phrase "at the time of determination …" or "if (a stated condition or event) is detected" may be interpreted to mean "if the determination …" or "in response to the determination …" or "upon detection (a stated condition or event)" or "in response to detection (a stated condition or event)".

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored in a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, digital subscriber line) or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid state disk), among others.

One of ordinary skill in the art will appreciate that all or part of the processes in the methods of the above embodiments may be implemented by hardware related to instructions of a computer program, which may be stored in a computer-readable storage medium, and when executed, may include the processes of the above method embodiments. And the aforementioned storage medium includes: various media capable of storing program codes, such as ROM or RAM, magnetic or optical disks, etc.

Claims

A motion model generation method, comprising:

acquiring description information input by a first user, wherein the description information is used for describing target movement executed by the target user and target times of executing the target movement; the first user is the target user or other users outside the target user;

determining a first fluctuation situation of a plurality of key points of the target user in first video data, wherein the first video data is video data in the process that the target user executes the target motion;

determining a motion model of the target motion according to the first fluctuation condition and the target times; wherein the motion model is used for motion counting.
The method of claim 1, wherein after determining the motion model of the motion of the object based on the first fluctuation condition and the number of times of the object, further comprising:

motion counting, by the motion model, the object motion performed by a second user in second video data; wherein the second user is the first user, the target user, or another user outside the first user and the target user.
The method of claim 2, wherein said motion counting, by said motion model, said object motion performed by a second user in second video data comprises:

if the first similarity between the second image sequence feature in the second video data and the first image sequence feature in the first video data is greater than a first preset threshold value, the second image sequence feature in the second video data and the first image sequence feature in the first video data are not similar to each other

Counting motions performed by a second user in the second video data by the motion model and a second fluctuation scenario of a plurality of keypoints of the second user in the second video data; wherein the first similarity is used to characterize a standard degree to which the second user performs the target motion.
The method of claim 3, further comprising:

outputting the motion evaluation information; wherein the motion evaluation information is generated according to the first similarity.
The method of any of claims 1-4, wherein prior to obtaining the description information of the first user input, further comprising:

acquiring the first video data;

and if the motion model for counting the target motion in the first video data does not exist in the database, triggering the operation of acquiring the description information input by the first user.
The method of claim 5, wherein determining that no motion model exists in the database that motion counts the motion of the object in the first video data comprises: and if the database does not have the target image sequence feature of which the second similarity with the first image sequence feature of the first video data reaches a second preset threshold, determining that the motion model corresponding to the target image sequence feature does not exist in the database.
The method of claim 5 or 6, wherein the database comprises at least one of a local database or a database of a cloud server.
The method according to any one of claims 1-7, wherein said determining a motion model of said object motion based on said first fluctuation situation and said number of objects comprises:

selecting k key points with variation and wave amplitude arranged at the first k bits from the first wave condition;

counting the motion peak values of the k key points to obtain a count value;

and determining a motion model of the target motion according to the counting value and the target times in a regression fitting mode.
The method according to any one of claims 1-8, wherein after determining the motion model of the motion of the object based on the first fluctuation condition and the number of times of the object, further comprising:

and saving the motion model or providing the motion model for the equipment of the opposite communication terminal.
A motion model generation apparatus, comprising:

the input unit is used for acquiring description information input by a first user, wherein the description information is used for describing target movement executed by the target user and the target times of executing the target movement; the first user is the target user or other users outside the target user;

a fluctuation situation extraction unit for determining a first fluctuation situation of a plurality of key points of the target user in the first video data; the first video data is video data of the target user in the process of executing the target motion;

the motion model establishing unit is used for determining a motion model of the target motion according to the first fluctuation condition and the target times; wherein the motion model is used for motion counting.
The apparatus of claim 10, further comprising:

a motion counting unit, configured to perform motion counting on the target motion performed by a second user in second video data through the motion model after the motion model establishing unit determines the motion model of the target motion according to the first fluctuation condition and the target number; wherein the second user is the first user, the target user, or another user outside the first user and the target user.
The apparatus according to claim 11, wherein the motion counting unit is specifically configured to:

if the first similarity between the second image sequence feature in the second video data and the first image sequence feature in the first video data is greater than a first preset threshold value, the second image sequence feature in the second video data and the first image sequence feature in the first video data are not similar to each other

Counting motions performed by the second user in the second video data by the motion model and a second fluctuation case of a plurality of keypoints of the second user in the second video data; wherein the first similarity is used to characterize a standard degree to which the second user performs the target motion.
The apparatus of claim 12, further comprising:

an output unit for outputting the motion evaluation information; wherein the motion evaluation information is generated according to the first similarity.
The apparatus of any one of claims 10-13, further comprising:

an acquisition unit configured to acquire the first video data before the input unit acquires the description information input by the first user;

and if the motion model for counting the target motion in the first video data does not exist in the database, triggering the operation of acquiring the description information input by the first user.
The apparatus according to claim 14, wherein the determining database does not have a motion model for motion counting of the target motion in the first video data, specifically:

and if the database does not have the target image sequence feature of which the second similarity with the first image sequence feature of the first video data reaches a second preset threshold, determining that the motion model corresponding to the target image sequence feature does not exist in the database.
The apparatus of claim 14 or 15, wherein the database comprises at least one of a local database or a database of a cloud server.
The apparatus according to any of the claims 10-16, wherein the motion model establishing unit is specifically configured to:

selecting k key points with variation and wave amplitude arranged at the first k bits from the first wave condition;

counting the motion peak values of the k key points to obtain a count value;

and determining a motion model of the target motion according to the counting value and the target times in a regression fitting mode.
The apparatus of any one of claims 10-17, further comprising:

and the storage unit is used for storing the motion model or providing the motion model for the equipment at the opposite end of communication after the motion model establishing unit determines the motion model of the motion of the target according to the fluctuation conditions of the key points in the first video data and the target times.
A motion model generation device comprising a memory for storing a computer program and a processor configured to invoke all or part of the computer program stored by the memory to perform the method of any of claims 1 to 9.
A chip system, comprising at least one processor and interface circuitry, the interface circuitry and the at least one processor interconnected by a line, the interface circuitry configured to receive a computer program from outside the chip system; the computer program, when executed by the processor, implements the method of any of claims 1-9.
A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the method of any one of the preceding claims 1 to 9.
A computer program product, characterized in that the method of any of claims 1-9 is implemented when the computer program product is run on a terminal device.