CN114758415A

CN114758415A - Model control method, device, equipment and storage medium

Info

Publication number: CN114758415A
Application number: CN202210346046.3A
Authority: CN
Inventors: 夏俊; 蔡文晖
Original assignee: Shenzhen Sensetime Technology Co Ltd
Current assignee: Shenzhen Sensetime Technology Co Ltd
Priority date: 2022-03-31
Filing date: 2022-03-31
Publication date: 2022-07-15
Also published as: WO2023184804A1

Abstract

The embodiment of the application discloses a model control method, a device, equipment and a storage medium, wherein the method comprises the following steps: acquiring an image frame set to be processed, wherein image frames in the image frame set have a sequential time sequence relationship and comprise at least one first target object participating in a task; determining a matching relationship between the at least one first target object and at least one controlled model that performs the task; identifying at least two image frames of the image frame set to be processed to obtain a first identification result, wherein the first identification result at least comprises a first motion parameter of each first target object; and controlling the controlled model matched with each first target object based on the matching relation, and completing the task according to a second motion parameter matched with the first motion parameter.

Description

Model control method, device, equipment and storage medium

Technical Field

The present application relates to, but not limited to, the field of computer vision technologies, and in particular, to a model control method, apparatus, device, and storage medium.

Background

Artificial Intelligence (AI) is a theory, method, technique and application system that uses a digital computer or a machine controlled by a digital computer to simulate, extend and expand human Intelligence, perceive the environment, acquire knowledge and use the knowledge to obtain the best results. Along with the development of AI technology, AI has been widely used in various fields such as education, home, medical treatment, car, however, people are more and more paying attention to the experience and entertainment of AI in the human-computer interaction process, and the existing human-computer interaction mode can not meet the requirements of people.

Disclosure of Invention

In view of this, embodiments of the present application provide at least a model control method, apparatus, device and storage medium.

The technical scheme of the embodiment of the application is realized as follows:

in one aspect, an embodiment of the present application provides a model control method, where the method includes: acquiring an image frame set to be processed, wherein image frames in the image frame set have a sequential time sequence relationship and comprise at least one first target object participating in a task; determining a matching relationship between the at least one first target object and at least one controlled model that performs the task; identifying at least two image frames in the image frame set to be processed to obtain a first identification result, wherein the first identification result at least comprises a first motion parameter of each first target object; and controlling the controlled model matched with each first target object based on the matching relation, and completing the task according to a second motion parameter matched with the first motion parameter.

In some embodiments, the controlled model includes a virtual object in a display device, the controlling the controlled model matched with each of the first target objects based on the matching relationship, and the completing the task according to a second motion parameter matched with the first motion parameter includes: determining a second motion parameter of the virtual object matched with each first target object based on the matching relation and the first motion parameter of each first target object; and controlling the virtual object to complete the task according to the second motion parameter.

In this way, in case that the controlled model comprises a virtual object in the display device, it is achieved that the virtual object in the display device is controlled to complete a task according to the second motion parameter by the first motion parameter of the real first target object.

In some embodiments, the identifying at least two image frames of the image frame set to be processed, obtaining a first identification result, includes: identifying at least two image frames of the image frame set to be processed to obtain a second identification result, wherein the second identification result comprises a detection result of a candidate object; acquiring the number of objects actually participating in the task; screening the number of the first target objects in the candidate objects based on the detection result of the candidate objects; and determining a first motion parameter of each first target object based on the screened second identification result with the number of the first target objects.

Therefore, the object which actually participates in the task is screened out in the image frame, only the first motion parameter of the object which actually participates in the task is obtained subsequently, and the controlled model is controlled to complete the task through the first motion parameter of the object which actually participates in the task, so that audiences which do not participate in the task are screened out, and the control accuracy is improved.

In some embodiments, the detection result includes position information of a detection frame of the candidate object, and the screening of the number of the first target objects from the candidate objects based on the detection result of the candidate object includes: determining the size of a detection frame of the candidate object; screening the candidate objects for the first target object having the number based on a size of a detection box of the candidate objects.

In this way, by acquiring the size of the detection frame of the candidate object and then screening out the first target object from the candidate object based on the size of the detection frame of the candidate object, the object with higher definition can be selected as the first target object, and spectators or players who do not participate in the task can be screened out, which is helpful for improving the control accuracy.

In some embodiments, the detecting result includes position information of a bone key point of the candidate object, and the screening of the number of the first target objects in the candidate object based on the detecting result of the candidate object includes: determining the integrity of skeletal keypoints of the candidate object; selecting the first target object having the number in the candidate objects based on the integrity of the skeletal keypoints of the candidate objects.

In the embodiment of the application, the integrity of the bone key points of the candidate object is obtained, and then the first target object is screened out from the candidate object based on the integrity of the bone key points of the candidate object, so that objects in partial body no-longer image frame images of the candidate object can be screened out, and the control accuracy is improved.

In some embodiments, the determining the first motion parameter of each first target object based on the screened-out second recognition result with the number of first target objects includes: determining whether the action of at least one first target object is a target action based on the screened second recognition result with the number of first target objects; in response to the action of the at least one first target object being a target action, determining a first speed parameter and/or a first direction parameter of the target action; the method further comprises the following steps: and determining a second speed parameter and/or a second direction parameter of the controlled model matched with each first target object based on the matching relation and the first speed parameter and/or the first direction parameter of each first target object.

In this way, when the action of the first target object is determined to be the target action, the first speed parameter and/or the first direction parameter of the target action are determined again and used for subsequently controlling the controlled model matched with the first target object to complete the task. Therefore, the user can be stimulated to make the own action meet the requirement of the target action as much as possible, so that the controlled model is controlled to complete the task, the difficulty and the interestingness of the interaction process are increased, and the victory of the user is stimulated.

In some embodiments, the determining, based on the screened-out second recognition result with the number of first target objects, whether an action of at least one of the first target objects is a target action includes: determining location information of a keypoint associated with said at least one first target object based on said screened out second recognition result with said number of first target objects; determining track information of the key points based on the sequential time relation of the image frames where the second identification results are located and the position information of the key points; and determining the action of the at least one first target object as a target action in response to the track information of the key points meeting a preset condition.

Thus, by determining location information for a keypoint associated with the first target object; then determining the track information of the key points based on the sequential time sequence relation of the image frames and the position information of the key points; and under the condition that the track information of the key points meets the preset conditions, determining the action of the first target object as a target action, thereby realizing the determination of the target action.

In some embodiments, in the case that at least two first target objects match one controlled model, the recognition result further includes motion amplitudes of the at least two first target objects, the controlling the controlled model matching with each of the first target objects based on the matching relation, and the task is completed according to the second motion parameter matching with the first motion parameter, including: determining whether the motion amplitudes of the at least two first target objects are consistent; and in response to the action amplitudes of the at least two first target objects being inconsistent, not outputting control instructions to the controlled model matched with the at least two first target objects.

Therefore, when the action amplitudes of the at least two first target objects are inconsistent, the control instruction is not output to the controlled model matched with the at least two first target objects, so that the evaluation on the team cooperation capability is realized, the users are motivated to be consistent in a reunion manner under the condition that a plurality of users form a group, the tasks are completed together, and the interestingness and the challenge of the interaction process are increased.

In some embodiments, in the case that at least two first target objects match one controlled model, said controlling the controlled model matching each of said first target objects based on said matching relationship, and completing said task according to a second motion parameter matching said first motion parameter, includes: determining a second target object among the at least two first target objects based on a first motion parameter of each of the at least two first target objects; and controlling the controlled model matched with the at least two first target objects based on the matching relation, and completing the task according to the second motion parameters matched with the first motion parameters of the second target object.

In this way, the second target object is determined through the first motion parameter of the first target object, and the controlled model is controlled based on the second motion parameter of the second target object, so that the time for the controlled model to complete the task can be controlled.

In some embodiments, the method further comprises: acquiring a first time length for the controlled model to complete the task; determining a second historical record set of the controlled model from the first historical record set based on the matching relation; ranking the controlled models based on the second set of history records and the first duration.

Therefore, ranking of the controlled models is realized by acquiring the second history record set and the first time length of the task completed by the controlled models under the corresponding matching relation, so that the users can know the level of the users, the competitive psychology of the users is improved, and the fun and mind states of the users are stimulated.

In some embodiments, in the case where the number of controlled models is at least two, the method further comprises: acquiring a second time length for each controlled model to complete the task; ranking the at least two controlled models based on each of the second durations.

Therefore, under the condition that the number of the controlled models is at least two, the ranking of the at least two controlled models is realized by acquiring the second time length for each controlled model to complete the task, so that the competitive psychology of the user is improved, and the fun and mind of the user is stimulated.

In a second aspect, an embodiment of the present application provides a model control apparatus, including: the system comprises a first acquisition module, a second acquisition module and a task processing module, wherein the first acquisition module is used for acquiring an image frame set to be processed, image frames in the image frame set have a sequential time sequence relation, and the image frames comprise at least one first target object participating in a task; a first determination module for determining a matching relationship between the at least one first target object and at least one controlled model performing the task; the identification module is used for identifying at least two image frames of the image frame set to be processed to obtain a first identification result, wherein the first identification result at least comprises a first motion parameter of each first target object; and the control module is used for controlling the controlled model matched with each first target object based on the matching relation and completing the task according to a second motion parameter matched with the first motion parameter.

In a third aspect, an embodiment of the present application provides an electronic device, including a memory and a processor, where the memory stores a computer program that is executable on the processor, and the processor implements some or all of the steps of the above method when executing the program.

In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements some or all of the steps of the above method.

In the embodiment of the application, an image frame set to be processed is obtained, wherein the image frames in the image frame set have a sequential time sequence relationship and comprise at least one first target object participating in a task; then determining a matching relationship between the at least one first target object and the at least one controlled model for executing the task; then at least two image frames of the image frame set to be processed are identified to obtain an identification result, wherein the identification result at least comprises a first motion parameter of each first target object; and finally, based on the matching relation, controlling the controlled model matched with each first target object, and completing the task according to the second motion parameters matched with the first motion parameters. Therefore, the controlled model can execute the specified task based on the first motion parameter of the first target object, so that the interaction between the first target object and the controlled model is realized, and the interestingness in the interaction process is increased.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the technical aspects of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and, together with the description, serve to explain the principles of the application.

Fig. 1A is a schematic diagram of a model control system according to an embodiment of the present disclosure;

FIG. 1B is a block diagram of another exemplary model control system according to an embodiment of the present disclosure;

fig. 1C is a schematic implementation flow diagram of a model control method according to an embodiment of the present disclosure;

fig. 2A is a schematic view of a flow chart for determining a first motion parameter of a first target object according to an embodiment of the present application;

fig. 2B is a schematic view of another implementation flow for determining a first motion parameter of a first target object according to an embodiment of the present application;

fig. 3A is a schematic flow chart of an implementation of determining a first target object according to an embodiment of the present application;

fig. 3B is a schematic view of another implementation flow for determining a first target object according to an embodiment of the present application;

FIG. 4A is a schematic diagram of an implementation flow for determining a controlled model rank according to an embodiment of the present disclosure;

FIG. 4B is a schematic diagram of another implementation flow for determining a rank of a controlled model according to an embodiment of the present disclosure;

fig. 5A, fig. 5B, and fig. 5E are schematic diagrams illustrating an implementation flow of a model control method according to an embodiment of the present application;

fig. 5C is a schematic view of a display interface of a display device according to an embodiment of the present disclosure;

fig. 5D is a schematic view of a display interface for controlling the progress of the dragon boat by the participant in the single-person mode according to the embodiment of the present disclosure;

fig. 5F is a schematic view of a display interface of a participant controlling a dragon boat to move forward in a double mode according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a model control apparatus according to an embodiment of the present application;

fig. 7 is a hardware entity diagram of an electronic device according to an embodiment of the present disclosure.

Detailed Description

In order to make the purpose, technical solutions and advantages of the present application clearer, the technical solutions of the present application are further elaborated with reference to the following drawings and embodiments, the described embodiments should not be construed as limiting the present application, and all other embodiments obtained by a person of ordinary skill in the art without making creative efforts fall within the protection scope of the present application.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is understood that "some embodiments" may be the same subset or different subsets of all possible embodiments, and may be combined with each other without conflict.

Reference to the terms "first/second/third" merely distinguishes similar objects and does not denote a particular ordering with respect to the objects, it being understood that "first/second/third" may, where permissible, be interchanged in a particular order or sequence so that embodiments of the application described herein may be practiced in other than the order shown or described herein.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the present application.

Fig. 1A is a schematic diagram of an alternative architecture of a model control system 10 according to an embodiment of the present application, and referring to fig. 1A, a computer processing device 300 is connected to an image capturing module 100 through a network 200. The network 200 may be a wide area network or a local area network, or a combination of both. The computer processing device 300 and the image capture module 100 may be physically separate or integrated. The image acquisition module 100 may send or store the acquired to-be-processed image frame set to the computer processing device 300 through the network 200, where the image frames in the image frame set have a sequential relationship, and the image frames include at least one first target object participating in a task. The computer processing device 300 acquires a set of image frames to be processed; determining a matching relationship between the at least one first target object and at least one controlled model that performs the task; identifying at least two image frames of the image frame set to be processed to obtain a first identification result; based on the matching relationship, the controlled model 600 matched with each first target object is controlled through the network 200, the task is completed according to the second motion parameter matched with the first motion parameter, and the task completion result of the controlled model 600 is displayed.

Fig. 1B is an alternative architecture diagram of another model control system 10 according to an embodiment of the present disclosure, and referring to fig. 1B, a terminal device 500 is connected to an image capturing module 100 through a network 200, and the terminal device 500 and the image capturing module 100 are connected to a server 400 through the network 200. The image acquisition module 100 may send or store the acquired to-be-processed image frame set to the server 400 through the network 200, where the image frames in the image frame set have a sequential relationship and include at least one first target object participating in a task; the server 400 acquires an image frame set to be processed; determining a matching relationship between the at least one first target object and at least one controlled model that performs the task; identifying at least two image frames of the image frame set to be processed to obtain a first identification result; based on the matching relationship, the controlled model 600 matched with each first target object is controlled through the network 200, the task is completed according to the second motion parameter matched with the first motion parameter, and the server 400 sends the task completion result of the controlled model 600 to the terminal device 500 through the network 200 for displaying.

In some embodiments, the server 400 may be an independent physical server, may also be a server cluster or a distributed system formed by a plurality of physical servers, and may also be a cloud server providing basic cloud computing services such as a cloud service, a cloud database, cloud computing, a cloud function, cloud storage, a Network service, cloud communication, a middleware service, a domain name service, a security service, a Content Delivery Network (CDN), a big data and artificial intelligence platform, and the like. The image acquisition module, the terminal device and the server can be directly or indirectly connected in a wired or wireless communication mode, and the embodiment of the application is not limited.

Fig. 1C is a schematic flowchart of a model control method provided in an embodiment of the present application, and is applied to an electronic device, as shown in fig. 1C, the method includes the following steps S101 to S104:

s101: acquiring an image frame set to be processed, wherein image frames in the image frame set have a sequential time relationship and comprise at least one first target object participating in a task;

here, the electronic device may be the computer processing device 300 in fig. 1A, such as a mobile phone, a notebook computer, a tablet computer, a web-enabled device, a multimedia device, a streaming media device, a mobile internet device, a robot, etc.; or may be the server 400 in fig. 1B. The functions implemented by the method can be implemented by calling program code by a processor in an electronic device, and the program code can be stored in a computer storage medium. The processor may be used to perform model control and the memory may be used to store data required and data generated during the performance of the model control.

The set of image frames comprises a video or real-time frame image, which may be: a Two-dimensional (2D) image or a Three-dimensional (3D) image, wherein the 2D image may include: red Green Blue (RGB) images collected by monocular or monocular cameras, and the like. In some implementations, the image frame set may be an image acquisition module set on the electronic device, such as an image acquired by a camera module in real time; in other implementations, the image frame set may be an image that needs to be model-controlled and is transmitted to the electronic device by other devices through instant messaging; in some implementations, the image frame set may also be a collected image that is obtained by the electronic device by calling a local album through a server in response to a task processing instruction, which is not limited in this embodiment of the present application. The image frame sets acquired at different times or periods may be the same or different, the number of image frames in an image frame set may be determined according to actual requirements, for example, 3 frames, 5 frames, 10 frames, 15 frames, and the like, the image frame set may also be periodically updated, and the number of image frames acquired in each period may be the same or different.

A task refers to a matter that the controlled model needs to complete based on the control instruction, e.g., go from a starting point to an ending point; and e.g., making a turn around a pillar, wherein a controlled model refers to a model for executing a task in response to the model control method provided in the embodiment of the present application, and includes a virtual controlled model (e.g., a virtual object in a motion sensing game display device) and a physical controlled model, and the embodiment of the present application does not limit the type of the task. The first target object refers to an object actually participating in a task, that is, an object for controlling the controlled model to complete the task, wherein the first target object may refer to a dynamic object, such as a moving person, an animal, a robot, or the like; the number of the first target objects may be one, two or more, and the type and the number of the first target objects are not limited in this embodiment of the application.

When a human face/human body object exists in the image frame to be processed in the embodiment of the application, the product applying the technical scheme of the application can obtain the object agreement before the image frame is obtained; for example, the authorization of the object may be obtained through pop-up window information, or a clear identifier may be set to inform the object to acquire a face/body image of the object, or the object may be requested to upload an image frame set of the object by itself.

In some embodiments, after the image frame set to be processed is acquired, logic processing such as encoding, decoding, frame extraction and the like may be performed on the image frame set to be processed, so that a frame rate of the processed image frame set may meet an algorithm requirement for subsequently processing the image frame set.

S102: determining a matching relationship between the at least one first target object and at least one controlled model that performs the task;

here, the matching relationship refers to a correspondence relationship between the first target objects and the controlled models, that is, all the first target objects control several controlled models in common, several first target objects control one controlled model, and each first target object controls which controlled model. For example, the number of all the first target objects is 2, and 1 controlled model is controlled in total, namely 2 first target objects control one controlled model; for another example, the number of all the first target objects is 2, and 2 controlled models are controlled in total, that is, each first target object controls one controlled model, that is, the first target objects and the controlled models correspond to each other one by one.

In some embodiments, the implementation of step S102 may preset a matching type, where the matching type is used to characterize the number of first target objects, the number of controlled models, and a number of first target objects controlling one controlled model, for example, the matching type may be 1 to 1, that is, the number of first target objects is one, the number of controlled models is one, and one first target object controls one controlled model; the matching type can be 2 to 2, that is, the number of the first target objects is two, the number of the controlled models is two, and each first target object controls one controlled model; for another example, the matching type may be 2 to 1, that is, the number of the first target objects is two, the number of the controlled models is 1, and two first target objects control one controlled model; then, the matching type is obtained through a manual output mode, and then, based on the position of the first target object, the controlled model matched with the first target object is determined, for example, when the number of the first target objects is two and the number of the controlled models is two, the first target object on the left side corresponds to the controlled model on the left side, and the first target object on the right side corresponds to the controlled model on the right side. In some embodiments, after the matching type is obtained, the control model matching the first target object may be further selected by manual input, for example, if the matching relationship is 2 to 2, the first target object is a and B, and the controlled models are a and B, the first target object a may manually select the controlled model matching the first target object as a, and the first target object B may manually select the controlled model matching the first target object as B.

In some embodiments, the step S102 may also be implemented by acquiring an image of the controlled model through an image acquisition module, and then identifying the image of the controlled model to obtain the number of the controlled models; then, determining a matching relationship between the first target object and the controlled model based on the number of the controlled models and the number of the first target objects, for example, if the number of the identified controlled models is 1, and the number of the first target objects is 3, then 3 first target objects match 1 controlled model; for another example, if the number of the identified controlled models is 2, and the number of the first target objects is 2, then 2 first target objects match 2 controlled models, that is, each first target object controls one controlled model, and the method for determining the matching relationship between the first target object and the controlled model is not limited in the embodiment of the present application.

S103: identifying at least two image frames of the image frame set to be processed to obtain a first identification result, wherein the first identification result at least comprises a first motion parameter of each first target object;

in some embodiments, the first target objects are all dynamic objects identified in each of the at least two image frames, that is, each of the at least two image frames is identified, resulting in all dynamic objects in each frame. When the method is implemented, the method can be obtained by detecting a human skeleton key point detection algorithm or a target detection algorithm and the like; for example, when the number of all dynamic objects in each frame is consistent with the number of the first target objects participating in the task, the first target objects are all dynamic objects identified in each frame.

In some embodiments, all the dynamic objects identified in each of the at least two image frames are not the first target object participating in the task, that is, all the dynamic objects identified in each of the at least two image frames include the object participating in the task and the audience, and correspondingly, as shown in fig. 2A, the implementation of step S103 may include the following steps S1031 to S1034:

step S1031: identifying at least two image frames of the image frame set to be processed to obtain a second identification result, wherein the second identification result comprises a detection result of a candidate object;

here, the candidate objects are all dynamic objects that can be identified in each of the at least two image frames, and the detection result is a parameter for characterizing the position of the candidate object obtained after the image frame is identified, such as position information of a detection frame, position information of a bone key point, and the like.

In some embodiments, the implementation of step S1031 may utilize a deep learning model such as a neural network, etc., to detect at least two image frames in the image frame set to be processed through a human skeleton keypoint detection algorithm, so as to obtain position information of effective human skeleton keypoints in each of the at least two image frames, and then determine a candidate object in each of the at least two image frames based on the effective human skeleton keypoints, for example, determine that the candidate object is a candidate object when the number of effective human skeleton keypoints is greater than or equal to a preset value, so as to obtain a detection result corresponding to the candidate object (i.e., position information of the bone keypoints corresponding to the candidate object).

In some embodiments, the step S1031 may also be implemented by using a deep learning model such as a neural network, and detecting at least two image frames in the image frame set to be processed by using a target detection algorithm (e.g., RCNN, Fast-RCNN, SPP-net, Fast-RCNN, etc.), and obtaining a detection frame of the candidate object in each of the at least two image frames, that is, a detection result of the candidate object.

Step S1032: acquiring the number of objects actually participating in the task;

here, the implementation of step S1032 may acquire the number of objects actually participating in the task by means of manual input, or set the number of objects actually participating in the task in advance according to the task type.

Step S1033: screening the number of the first target objects in the candidate objects based on the detection result of the candidate objects;

here, in the case where the detection result of the candidate object is the position information of the detection frame of the candidate object, the candidate object with a larger size may be selected as the first target object based on the size of the detection frame, so that the image of the first target object is clearer and the detection of the subsequent keypoint is facilitated. When the detection result of the candidate object is the position information of the bone key points of the candidate object, for example, the candidate object with higher integrity can be selected as the first target object based on the integrity of the bone key points, so that most of the body of the first target object is in the image picture, and the risk that key points needing to be detected cannot be detected in the image in the following process is reduced; for another example, a candidate object with a higher integrity may be selected as the first target object based on the integrity of a specific human skeleton key point; the specific human skeleton key point is a human skeleton key point used for determining the first recognition result, for example, when the first target object performs a rowing motion, the first recognition result of the first target object may be determined by a wrist point of the first target object, and the specific human skeleton key point is a wrist point; for another example, when the first target object is cycling, the first recognition result of the first target object can be determined through the ankle point of the first target object, and the specific human skeleton key point is the ankle point; therefore, the specific human skeleton key points of the first target object can be prevented from being shielded by other objects or objects, so that the specific human skeleton key points are in the image picture, and the first recognition result of the first target object can be conveniently determined. The method for screening the first target object from the candidate objects is not limited in the embodiment of the present application. It should be noted that, since the first target object is screened out based on the detection result of the candidate object, and the detection result of the candidate object may be different in each image frame, the first target object screened out in the candidate object may also be different, that is, the first target object may not be the same person.

Step S1034: and determining a first motion parameter of each first target object based on the screened second identification result with the number of the first target objects.

Wherein the first motion parameter is used to characterize the speed and/or direction of the first target object, in some embodiments, the first motion parameter may include a first speed parameter and/or a first direction parameter, and the first speed parameter is used to characterize the speed of the motion, for example, angular speed, linear speed, frequency, rotational speed (when the first target object is a machine), and the like; the first direction parameter is used to characterize the direction of the motion, such as north-south-east, clockwise, counterclockwise, etc., the first motion parameter may be: for example, the speed is 3 meters per second and clockwise (i.e. the first motion parameter includes the first speed parameter and the first direction parameter), for example, the speed is 3 meters per second (i.e. the first motion parameter includes only the first speed parameter), and for example, the speed is clockwise (i.e. the first motion parameter includes only the first direction parameter).

In some embodiments, the first motion parameter may be determined by identifying location information of a keypoint associated with the first target object. The position information of the key points may be position information of key points of a human body, including but not limited to: position information of skeleton keypoints and/or position information of contour keypoints. The position information of the skeleton key points is the position information of the key points of the skeleton, such as wrists, elbows, fingers, ankles and the like; the position information of the contour key points is position information of key points on the outer surface of the limb, for example, the vertices of the hand contour. In some embodiments, the location information of the key points may further include location information of a certain point on the body of the first target object, which is fixed with respect to the body position, for example, a hair clip on the head, a brooch on the body, etc., and the location information may include coordinates, such as coordinates in an image.

In the case that the second recognition result includes the position information of the detection frame of the first target object, the image corresponding to the detection frame of the first target object may be subjected to human skeleton key point detection, so as to obtain the position information of the key point associated with the first target object.

In the case where the second recognition result includes position information of skeletal key points of the first target object, the position information of key points associated with the first target object may be determined by a relative positional relationship between the skeletal key points. During implementation, all the key points of the skeleton can be connected to obtain the whole skeleton, and then based on the relative distribution positions of all bones and joints in the skeleton, which key point is the key point associated with the first target object is determined, so that the position information of the key point is obtained.

Because the image frames of the second recognition result of the first target object have different sequence (the image frame of the second recognition result refers to the image frame from which the second recognition result is obtained, for example, the first image frame is recognized to obtain a second recognition result, and the image frame of the second recognition result is the first image frame), the body posture or the motion state of the first target object in the image frames of different sequence may be different, and the position information, i.e., the coordinate, of the key point in different body postures or different motion states may change, the variation amount of the position information of the key point may be determined according to the sequence relationship of the image frames corresponding to the second recognition result of the first target object and the position information of the key point, and the speed and the direction of the motion of the first target object may be represented by the variation amount of the position information of the key point. For example, the key point is a nose, the walking distance of the first target object can be known according to the position information of the nose in the two frames of images, and the walking distance is divided by the time difference between the two frames to obtain the speed; in addition, the direction of the first target object can be known by the angle between the connecting line of the key point nose position points in the two frames of images and a fixed direction, such as the (east-righting direction), so that a first speed parameter and a first direction parameter of the first target object are obtained. The embodiment of the present application does not limit the calculation method of the first motion parameter.

In some embodiments, the implementation of step S1034 may detect all frames in the image frame set to be processed, and determine a first motion parameter of the first target object; in some embodiments, the implementation of step S1034 may also determine the first motion parameter of the target motion when it is determined that the motion of the first target object is the target motion, and output the first motion parameter to be zero when the motion of the first target object is not the target motion, where the first motion parameter may include the first speed parameter and/or the first direction parameter, and correspondingly, as shown in fig. 2B, the implementation of step S1034 may include the following steps S1034a to S1034B:

step S1034 a: determining whether the action of at least one first target object is a target action based on the screened second recognition result with the number of first target objects;

here, the target action may be set according to a task requirement, for example, if the task is a rowing task and the action of the first target object is a rowing action, the controlled model is controlled to complete the task, and the target action is a rowing action; for example, if the task is a cycling task and the controlled model needs to be controlled to complete the task when the action of the first target object is a cycling action, the target action is a cycling action.

In the case that at least two first target objects match one controlled model, it may be determined whether only the actions of a part of the first target objects are target actions, for example, 3 first target objects match one controlled model, and then it may be determined whether the actions of 1 or 2 of the first target objects are target actions, that is, the determination of the target actions may not be performed on all the first target objects.

In some embodiments, the implementation of step S1034a may determine whether the action of the first target object is the target action through a pre-trained action detection model, such as a Temporal-Segment-Networks (TSN), Temporal Relationship Networks (TRN), Temporal Pyramid Networks (TPN), and other action detection models.

In some embodiments, the implementation of step S1034a may also include the following steps S13a1 to S13a 3:

step S13a 1: determining location information of a keypoint associated with said at least one first target object based on said screened out second recognition result with said number of first target objects;

here, the method for determining the location information of the key point associated with the at least one first target object is the same as above, and is not described herein again.

In some embodiments, at least two first target objects are included in each image frame, and in order to facilitate determining the location information of the keypoints of each first target object, a certain distance may be maintained between each first target object, so as to reduce the risk of matching the location information of the keypoints to the wrong first target object.

Step S13a 2: determining track information of the key points based on the chronological sequence relation of the image frames where the second identification results are located and the position information of the key points;

as above, since the sequence of the image frames in which the second recognition result is located is different, the body postures or motion states of the objects recognized in the image frames with different sequences may be different, and the position information, i.e., coordinates, of the key points in different body postures or different motion states may change, the position change condition of the key points may be determined according to the sequence relationship of the image frames and the position information of the key points, i.e., the trajectory information of the key points is determined.

Step S13a 3: and determining that the action of the at least one first target object is a target action in response to the track information of the key points meeting a preset condition.

Here, the preset condition may be determined according to the expected motion trajectory of the target motion, for example, if the expected motion trajectory of the target motion is a circle, the preset condition may be a condition that satisfies a circle parameter; for another example, if the expected motion trajectory of the target motion is an ellipse, the preset condition may be a condition that satisfies an ellipse parameter. In implementation, the motion trajectory of the key point associated with the first target object may be used as the motion trajectory of the target motion, and then, based on the motion trajectory of the key point associated with the first target object, it is determined whether the motion trajectory of the key point meets a preset condition, so as to determine whether the motion of the first target object is the target motion.

In some embodiments, the keypoints associated with the first target object may be human keypoints, which in the case of the target action being a rowing action comprise wrist points. Since the expected movement locus of the wrist point is elliptical when the boat is rowing, the preset condition may be a condition that satisfies an elliptical parameter range. When the trajectory information of the wrist point satisfies a preset condition, that is, a condition of an ellipse parameter range, it may be determined that the motion of the first target object is a rowing motion.

In some embodiments, the step S13a3 may be implemented to fit the trajectory information of the wrist point to an ellipse, and compare whether the trajectory information of the wrist point satisfies a preset condition, such as whether parameters, such as the shape and the area of the fitted ellipse, are within a preset parameter range, so as to determine whether the motion of the first target object is a rowing motion.

In some embodiments, the key points associated with the first target object may be human key points, including ankle points in the case of the target action being a cycling action. When the bicycle is ridden, the expected movement track of the ankle point is circular, so the preset condition can also be a condition meeting the circular parameter range. When the trajectory information of the ankle point meets a preset condition, namely a condition of a circular parameter range, it may be determined that the motion of the first target object is a cycling motion.

In the embodiment of the application, the position information of the key points associated with the first target object is determined, and then the track information of the key points is determined based on the chronological sequence relation of the image frames and the position information of the key points; and under the condition that the track information of the key points meets the preset conditions, determining the action of at least one first target object as a target action, thereby realizing the determination of the target action.

Step S1034 b: in response to the action of the at least one first target object being a target action, determining a first speed parameter and/or a first direction parameter of the target action;

illustratively, the first speed parameter may be an angular speed and the first direction parameter may be clockwise or counterclockwise. Step S1034b may be implemented by connecting a key point in each image frame with a fixed point (e.g., a center of a fitted ellipse or circle) to obtain a first line segment, determining a first angle between the first line segment and a straight line passing through the fixed point (e.g., a short radius or a long radius of the ellipse, a certain radius of the circle, etc.), and finally obtaining a first speed parameter and a first direction parameter based on the first angle corresponding to each two image frames. In some embodiments, the first speed parameter and the first direction parameter may be derived by determining a difference between the first angles of every two image frames.

If the difference value is positive, the first direction parameter of the target action is in the anticlockwise direction; and if the difference is negative, the first direction parameter of the target action is clockwise. And dividing the difference value by the time difference corresponding to the image frame to obtain the first speed parameter. For example, the first angle of the 1 st frame is 18 degrees (°), the first angle of the 3 rd frame is 20 °, the difference between the axis angles of the 3 rd frame and the 1 st frame is 2 °, the first direction parameter describing the target motion is counterclockwise since the difference is positive, and if the time difference between the 3 rd frame and the 1 st frame is 66ms if one frame is 33 milliseconds (ms), the first speed parameter is 2/66 which is 1/33 °/ms.

In some embodiments, in order to improve the accuracy of the first speed parameter and the first direction parameter, a plurality of first speed parameters and first direction parameters may be determined, and then the plurality of first speed parameters and first direction parameters are subjected to filtering processing to obtain the first speed parameter and first direction parameter of the target action. Because the first direction parameter includes clockwise direction and counterclockwise direction, and the object of filtering processing, such as average filtering, needs to be digital, the first direction parameter can be converted into digital form first, and then filtering processing is performed, for example, when the first direction parameter is clockwise direction, the value can be set to 1; when the first direction parameter is the anticlockwise direction, the numerical value can be set to be-1, so that filtering processing can be conveniently carried out on a plurality of first direction parameters. In the case that the value after the filtering process is greater than or equal to a preset threshold (for example, the preset threshold is 0), obtaining that the first direction parameter is clockwise; and under the condition that the value after the filtering processing is smaller than a preset threshold value, obtaining that the first direction parameter is in a counterclockwise direction.

In the embodiment of the application, when the action of the first target object is determined to be the target action, the first speed parameter and/or the first direction parameter of the target action are determined, and the first speed parameter and/or the first direction parameter are used for subsequently controlling the controlled model matched with the first target object to complete the task. Therefore, the user can be stimulated to make the own action meet the requirement of the target action as much as possible, so that the controlled model is controlled to complete the task, the difficulty and the interestingness of the interaction process are increased, and the victory of the user is stimulated.

In a case that the first motion parameter includes a first speed parameter and/or a first direction parameter, after determining the first speed parameter and/or the first direction parameter of the target action in response to the action of the first target object being the target action, step S1034b further includes:

step S1034 c: and determining a second speed parameter and/or a second direction parameter of the controlled model matched with each first target object based on the matching relation and the first speed parameter and/or the first direction parameter of each first target object.

Here, in the case where a first target object matches a controlled model, the second direction parameter may be the same as or different from the first direction parameter, for example, the second direction parameter may be the reverse of the first direction parameter; the second speed parameter may be the same as or different from the first speed parameter, e.g., the second speed parameter may be in a direct proportional relationship with the first speed parameter.

In case that at least two first target objects match one controlled model, the second direction parameter may be a sum of the at least two first target object first direction parameters, and the second speed parameter may be a sum of the at least two first target object first speed parameters; or different weights may be set for different first target objects, and the first speed parameter or the first direction parameter of each first target object is multiplied by the weights and then added to obtain a second speed parameter and a second direction parameter. For convenience of calculation, the first direction parameter may be converted into a number for calculation, and the setting method is the same as above and is not described herein again. The matching relationship between the second speed parameter and the second direction parameter and between the first speed parameter and the first direction parameter is not limited in the embodiments of the present application.

In some embodiments, in order to enable the first target object in the image frame to be the same person all the time, step 103 may also be implemented to determine the first target object in the previous frames, then track the determined first target object in the following image frame by using a target tracking algorithm to obtain a detection frame of the first target object, and finally determine the first motion parameter of the first target object based on the detection frame of the first target object. The embodiment of the present application does not limit whether the first target object matched with the controlled model is the same person or different persons.

S104: and controlling the controlled model matched with each first target object based on the matching relation, and completing the task according to a second motion parameter matched with the first motion parameter.

Here, the second motion parameter may be the same as the first motion parameter or may be different from the first motion parameter. For example, where the first motion parameter comprises a first speed parameter, the second motion parameter may be directly proportional to the first speed parameter, e.g., the first speed parameter is 3m/s, then the second motion parameter may be 6 m/s; for another example, in a case that the first motion parameter includes a first direction parameter, the second motion parameter may be opposite to the first direction parameter, and if the first direction parameter is a clockwise direction, the second motion parameter may be a counterclockwise direction.

In some embodiments, the implementation of step S104 may control the controlled model to complete the task based on all the first motion parameters determined in step S103; the controlled model may also be controlled to complete the task based on the first motion parameter of the target motion when it is determined that the first target object is the target motion, which is not limited in the embodiment of the present application.

In some embodiments, the controlled model includes a virtual object in a display device, and the implementation of step S104 may include:

s1041: determining a second motion parameter of the virtual object matched with each first target object based on the matching relation and the first motion parameter of each first target object;

here, the first motion parameter may include a first speed parameter and/or a first direction parameter, and the second motion parameter may include a second speed parameter and/or a second direction parameter. The form of the virtual object in the display device may include a form of a two-dimensional image and a form of a three-dimensional image. In the case where the form of the virtual object is a three-dimensional image form, step S1041 may be performed in step S1034 c; in the case where the form of the virtual object is a two-dimensional image form, since the first motion parameter is a motion parameter of the first target object, as described above, the first velocity parameter may include angular velocity, linear velocity, frequency, rotational speed (when the first target object is a machine), and the like; the first direction parameters may include southeast, northwest, clockwise, counterclockwise, etc.; and in the case that the virtual object is a two-dimensional image, the second velocity parameter of the virtual object may generally include a linear velocity, and the second direction parameter may generally include four directions, i.e., front, back, left and right directions, i.e., the type of the parameter in the first motion parameter is greater than the type of the parameter in the second motion parameter.

Therefore, when the type of the parameter in the first motion parameter is the same as the type of the parameter in the second motion parameter (for example, the type of the first direction parameter in the first motion parameter is front-back, left-right, and the type of the second direction parameter in the second motion parameter is front-back, left-right), the implementation of step S1041 can be referred to as step S1034 c; if the type of the parameter in the first motion parameter is different from the type of the parameter in the second motion parameter (for example, the type of the first velocity parameter in the first motion parameter is angular velocity, and the type of the second velocity parameter in the second motion parameter is linear velocity), the implementation of step S1041 may convert the first motion parameter into the second motion parameter through a certain rule, for example, the first velocity parameter in the first motion parameter is 6 degrees per second, and if it is preset that 1 degree per second is equivalent to 1 meter per second, the second velocity parameter in the second motion parameter is 6 meters per second; for another example, if the first direction parameter in the first motion parameter is in a counterclockwise direction, and the preset counterclockwise direction is equivalent to backward direction, the second direction parameter in the second motion parameter is backward direction, and the embodiment of the present application does not limit the rule for converting the first motion parameter into the second motion parameter.

S1042: and controlling the virtual object to complete the task according to the second motion parameter.

In the embodiment of the application, an image frame set to be processed is obtained, wherein the image frames in the image frame set have a sequential time relationship and comprise at least one first target object participating in a task; then determining a matching relationship between the at least one first target object and the at least one controlled model for executing the task; then at least two image frames of the image frame set to be processed are identified to obtain an identification result, wherein the identification result at least comprises a first motion parameter of each first target object; and finally, based on the matching relation, controlling the controlled model matched with each first target object, and completing the task according to the second motion parameters matched with the first motion parameters. Therefore, the controlled model can execute the specified task based on the first motion parameter of the first target object, so that the interaction between the first target object and the controlled model is realized, and the interestingness in the interaction process is increased.

In some embodiments, the detection result includes position information of a detection frame of a candidate object, as shown in fig. 3A, then the implementation of step S1033 "filtering out the first target object having the number in the candidate object based on the detection result of the candidate object" may include:

step S1133 a: determining the size of a detection frame of the candidate object;

here, the implementation of step S1133a may include the following two cases:

in the first case: the position information of the detection frame is composed of the center point coordinates of the detection frame and the width and height of the detection frame, such as (x, y, w, h), wherein (x, y) is the center point coordinates, w is the width of the detection frame, h is the height of the detection frame, and the size of the detection frame is w h.

In the second case: the position information of the detection frame is composed of the coordinates of the upper left corner point and the lower right corner point of the detection frame, for example (x)₁，y₁，x₂，y₂) Wherein (x)₁，y₁) As the coordinates of the upper left corner point, (x)₂，y₂) Is the coordinate of the lower right corner, the size of the detection frame is (x)₂-x₁)*(y₂-y₁)。

The method for determining the size of the candidate object detection frame is not limited in the embodiment of the application.

Step S1133 b: screening the candidate objects for the first target object having the number based on a size of a detection box of the candidate objects.

Here, the implementation of step S1133b may sort the sizes of the detection boxes of the respective candidate objects from large to small, screen out the first target object with the number of objects that are ranked in the front, for example, the number of objects actually participating in the task is 2, and the number of candidate objects determined in step S1031 is 5, sort the sizes of the detection boxes of the 5 candidate objects, and select two candidate objects with the top sizes as the first target object.

Under a common condition, the larger the size of the detection frame is, the more beneficial the clear candidate object image can be obtained for subsequent key point detection, so that the detection result is more accurate; in addition, generally, the object participating in the task is closer to the camera, the audience is farther away from the camera, the image of the audience is smaller, the image of the object participating in the task is larger, and the audience not participating in the task can be screened out by selecting the candidate object with the larger size.

In the embodiment of the application, the size of the detection frame of the candidate object is obtained, and then the first target object is screened out from the candidate object based on the size of the detection frame of the candidate object, so that the object with higher definition can be selected as the first target object, and audiences or players not participating in the task are screened out.

In some embodiments, the detection result includes location information of a bone keypoint of a candidate object, as shown in fig. 3B, then the implementation of step S1033 "screening out the number of the first target objects in the candidate object based on the detection result of the candidate object" may include:

step S1233 a: determining the integrity of skeletal keypoints of the candidate object;

here, the integrity of the bone key points refers to a ratio of the number of detected bone key points of the candidate object to the total number of the bone key points. For example, if the total number of the bone keypoints is 15 and the number of the detected bone keypoints of the candidate object is 12, the integrity of the bone keypoints of the candidate object is 12/15-4/5.

Step S1233 b: selecting the first target object having the number in the candidate objects based on the integrity of the skeletal keypoints of the candidate objects.

Here, the implementation of step S1233b may rank the integrity of the bone key points of each candidate object from large to small, and screen out the first target object with the number of the top ranked objects, for example, the number of the objects actually participating in the task is 2, and the number of the candidate objects determined in step S1031 is 5, and then rank the integrity of the bone key points of 5 candidate objects, and then select two candidate objects with the integrity of the bone key points ranked first as the first target objects.

In general, the larger the integrity of the skeletal key points of the candidate object is, the larger the proportion of the whole body of the candidate object in the image frame picture is, so that the candidate object with part of the body not in the image frame picture or blocked by others can be screened out, and the probability that the key points for judging whether the motion of the object is the target motion in the subsequent steps are not in the image frame picture is reduced, so that whether the motion of the first target object is the target motion cannot be judged.

In the embodiment of the application, the integrity of the bone key points of the candidate object is obtained, and then the first target object is screened out from the candidate object based on the integrity of the bone key points of the candidate object, so that objects in partial body no-longer image frame images of the candidate object can be screened out.

In some embodiments, at least two first target objects match a controlled model, the recognition result further includes motion amplitudes of the at least two first target objects, and correspondingly, the step S103 "controlling the controlled model matching with each first target object based on the matching relation, and the performing of the task according to the second motion parameter matching with the first motion parameter" includes:

s103 a: determining whether the motion amplitudes of the at least two first target objects are consistent;

here, the motion amplitude refers to a movement value of a distance or an angle of a body or a certain portion of the first target object in each frame. In some embodiments, the magnitude of the motion amplitude may be characterized by location information of the first target object keypoints.

In some embodiments, the implementation of step S103a may determine whether the motion amplitudes of the at least two first target objects are consistent through the deviation between the keypoint location information of each first target object in each frame. In implementation, a first deviation range (for example, a difference between a maximum value and a minimum value of a height/width/distance in each first target object key point position information) between each first target object key point position information in each frame may be preset, then a deviation between each first target object key point position information in each frame is determined, and when the deviation is within the first deviation range, it is determined that the target motion amplitude of each first target object is consistent; and when at least one deviation is not in the first deviation range, judging that the amplitudes of the target actions of the at least two first target objects are inconsistent.

In some embodiments, the implementation of step S103a may also determine whether the motion amplitudes of the at least two first target objects are consistent through the timestamp information when the key point of each first target object is located at the highest point. In implementation, a second deviation range between the timestamp information when the key point of each first target object is located at the highest position may be preset, and when the deviation is within the second deviation range, it is determined that the target actions of each first target object are consistent in amplitude; and when at least one deviation is not in the second deviation range, judging that the amplitudes of the target actions of the at least two first target objects are inconsistent.

S103 b: and in response to the action amplitudes of the at least two first target objects being inconsistent, not outputting control instructions to the controlled model matched with the at least two first target objects.

When the action amplitudes of the at least two first target objects are inconsistent, no control instruction is output to the controlled model matched with the at least two first target objects, so that the evaluation on the team cooperation capability is realized, the users are motivated to be consistent in a reunion manner under the condition that a plurality of users form a group, the tasks are completed together, and the interestingness and the challenge of the interaction process are increased.

In some embodiments, at least two first target objects match a controlled model, and correspondingly, the step S104 of controlling the controlled model matching with each first target object based on the matching relation, and completing the task according to the second motion parameter matching with the first motion parameter includes:

step S1041: determining a second target object among the at least two first target objects based on a first motion parameter of each of the at least two first target objects;

here, the implementation of step S1041 may be to select, as the second target object, a first target object with a largest first motion parameter from among the at least two first target objects; the first target object with the smallest first motion parameter may also be selected as the second target object, and the method for determining the second target object based on the first motion parameter is not limited in the embodiment of the present application.

Step S1042: and controlling the controlled model matched with the at least two first target objects based on the matching relation, and completing the task according to the second motion parameters matched with the first motion parameters of the second target object.

Here, taking the second target object as the first target object with the largest first motion parameter of the at least two first target objects as an example for description, the step S1042 is implemented to control the controlled model to complete the task by using the second motion parameter matched with the first motion parameter of the second target object, that is, to control the controlled model to complete the task by using the second motion parameter matched with the larger first motion parameter, so as to reduce the time for completing the task.

In some embodiments, if the evaluation criterion of the game is winning which takes a time closer to the target time, the first target object which completes the task closer to the target time may be selected as the second target object, thereby achieving control of the time to complete the task. The method for selecting the second target object is not limited in the embodiment of the present application.

In some embodiments, after step S1042, the duration of time that each second target object controls the controlled model may be counted, and the win or lose of the first target object is determined based on the duration of time that each second target object controls the controlled model, for example, the first target object with the longest/shortest duration of controlling the controlled model wins.

In the embodiment of the application, the second target object is determined through the first motion parameter of the first target object, and the controlled model is controlled based on the second motion parameter of the second target object, so that the duration of the task completed by the controlled model can be controlled.

In some embodiments, as shown in fig. 4A, after "controlling the controlled model matched with each of the first target objects based on the matching relationship to complete the task according to the second motion parameter matched with the first motion parameter" in step S104, "steps S105a to S107a are further included:

step S105 a: acquiring a first time length for the controlled model to complete the task;

in some embodiments, the controlled model may be an actually existing model, and the implementation of step S105a may acquire an image frame set including the controlled model, acquire position information of the controlled model by identifying the image frame set including the controlled model, then respectively determine timestamp information of a region where the position information of the controlled model is located at a starting point and a region where the position information of the controlled model is located at an ending point, and determine a first duration for the controlled model to complete a task based on the timestamp information located at the starting point and the timestamp information of the ending point. In implementation, the timestamp information of the starting point may be subtracted from the timestamp information of the ending point to obtain the first duration of completing the task. For example, when the time stamp information of the end point is the 100 th frame, the time stamp information of the departure point is the 5 th frame, and if one frame is 33 milliseconds (ms), the first time length is 95 × 33 — 3135 ms.

In some embodiments, the implementation of step S105a may further install sensors at the start point and the end point, and in case of a change in the position of the controlled model, may trigger a data change of the sensors, and the sensors may determine whether the controlled model starts from the start point and reaches the end point according to the changed data, and further transmit the changed data to the electronic device, and the electronic device determines the first duration of the task completed by the controlled model.

In some embodiments, the controlled model may be a virtual object in a display device, and the implementation of step S105a may be implemented by software, determining the position information of the controlled model and the time when the controlled model is located at the start point and the end point through a timer of the computer, and subtracting the time when the controlled model is located at the start point from the time when the controlled model is located at the end point to obtain the first duration for completing the task.

The method for determining the first duration is not limited in the embodiment of the present application.

Step S106 a: determining a second historical record set of the controlled model from the first historical record set based on the matching relation;

here, the first history record set is a time length record set of all the controlled models completing the task within a preset time period, that is, the time length record set of all the controlled models completing the task under all the matching relationships (e.g., 2 to 2, 2 to 1, 1 to 1, etc.) is included. The second historical record set is a time length record set for the controlled model to complete the task under the corresponding matching relation in the preset time period. For example, when the matching relationship is 2 to 1, the second history record set is a time length record set for the controlled model to complete the task under the condition that all two first target objects control one controlled model in a preset time period; for another example, when the matching relationship is 1 to 1, the second history record set is a time length record set for the controlled model to complete the task under the condition that all the first target objects control one controlled model in the preset time period. The controlled model may match the same first target object in one match, i.e. the first target object is unchanged in one match, or may match a different first target object, i.e. the first target object is changed in one match.

In the case that the controlled models match the same first target object in one match, a second history set of the first target object may also be determined from the first history set, that is, the second history set is only a duration record set for controlling the controlled models to complete tasks under the corresponding matching relationship of the first target object in a preset time period, and may be used to rank the first target object itself in the preset time period. For example, when the matching relationship is 1 to 1, the third target object totally controls the controlled model to complete the task 10 times in one month, and the second history record set is a time length set of the 10 times of task completion of the three-control controlled model, so that ranking of the third target object is realized.

Step S107 a: ranking the controlled model matches based on the second set of history records and the first duration.

Here, the implementation of step S107a may sort all the durations and the first duration in the second set of history records from large to small, resulting in the controlled model ranking.

In some embodiments, the implementation of step S107a may set a longest time limit, ranking only users that are less than the longest time limit, thereby reducing the amount of computation.

In the embodiment of the application, the controlled models are ranked by acquiring the second history record set and the first duration of the task completed by the controlled models under the corresponding matching relationship, so that the users can know the level of the users, the competitive psychology of the users is improved, and the fun state of the users is stimulated.

In some embodiments, the number of controlled models is at least two, and as shown in fig. 4B, after "controlling the controlled model matched with each of the first target objects based on the matching relationship and completing the task according to the second motion parameter matched with the first motion parameter" in step S104, "steps S105B to S106B are further included:

step S105 b: acquiring a second time length for each controlled model to complete the task;

here, step S105b may be referred to as step S105 a.

Step S106 b: ranking the at least two controlled models based on each of the second durations.

Here, the implementation of step S106b may sequence each of the second durations from large to small, obtain the rankings of the at least two controlled models, use the rankings in the group to determine the win or lose in the group, and meanwhile, each of the first target objects in the group may also excite a struggle in the process of the race to obtain a better result.

In some embodiments, the shorter duration of the at least two controlled models may also be stored as the final duration of the group in the corresponding matching relationship in the first history set, for updating the ranking of the controlled models in the corresponding matching relationship. For example, in the 2-to-2 matching relationship, the elapsed time of the two first target objects is 5 minutes and 4 minutes, respectively, and then 4 minutes may be used as the final duration in the set of 2-to-2 matching relationship for subsequent ranking.

In some embodiments, for the case where there are at least two controlled models, the duration of time that a single controlled model completes a task may also be stored in the first set of history records for updating the ranking of the controlled models in the corresponding matching relationship. For example, in the case where there are two controlled models, where one first target object controls one controlled model and two first target objects controls the other controlled model, one first target object controls one controlled model may be used to update the ranking in the 1-to-1 matching relationship and two first target objects controls one controlled model may be used to update the ranking in the 2-to-1 matching relationship.

In the embodiment of the application, under the condition that the number of the controlled models is at least two, the ranking of the at least two controlled models is realized by acquiring the second time length for each controlled model to complete the task, so that the competitive psychology of a user is improved, and the fun and mood of the user is stimulated.

The following describes an application of the model control method provided in this embodiment in an actual scene, where a controlled model is taken as a dragon boat, and when the method is implemented, the dragon boat may be a real dragon boat model or a virtual dragon boat object, and a target action is a rowing action, as shown in fig. 5A, for example, the method includes:

step S501: inputting a video stream (namely the image frame set to be processed) in real time through a conventional camera such as a camera of the all-in-one machine and an independent camera externally connected with the equipment;

step S502: coding and decoding the video stream through software, and performing logic processing such as frame extraction and the like;

here, by performing logic processing such as encoding and decoding, frame extraction, and the like on the video stream, the frame rate of the processed video stream can meet the algorithm requirement for subsequent processing of the video stream.

Step S503: detecting a human body based on a single picture;

here, the step S503 may be performed by using a human bone key point detection algorithm for human body detection.

Step S504: judging whether an effective human body key point is detected or not, and judging whether an effective human body (namely the first target object) exists or not under the condition that the effective human body key point is detected;

here, in the case that the confidence of the detected human body keypoint is greater than the preset value, the implementation of step S504 determines that the human body keypoint is a valid human body keypoint.

The effective human body can be determined based on the number of the effective human body key points, for example, when the number of the effective human body key points is greater than a preset value, an object corresponding to the effective human body key points is determined to be the effective human body, and a human body image can be obtained after human body connection. Under the condition that the number of the objects corresponding to the detected effective human body key points is more than the required effective human body number, effective human bodies can be screened out from the objects corresponding to the detected effective human body key points based on a certain rule, such as the integrity of the effective human body key points.

Step S505: and under the condition that the effective human body exists, drawing a human body connecting line, and if the effective human body does not exist, skipping to continuously process the next frame of image.

Step S506: carrying out ellipse fitting on key points of the wrist of the human body (namely key points associated with the first target object) to judge whether the rowing motion track (namely the target motion) is met;

here, in the case that the key point of the wrist of the human body is not recognized in the determined effective human body, the motion trajectory of the effective human body does not conform to the rowing motion trajectory.

Step S507: under the condition that the ellipse track fitted on the key points of the wrist of the human body is determined to accord with the rowing action track, controlling the dragon boat to advance and reach the end point (namely completing the task); and under the condition that the rowing motion trajectory is not met, the control instruction is not output, and the dragon boat does not move.

In some embodiments, the above model control method includes a single-person mode and a double-person mode (i.e., the matching relationship). The single-person mode means that only one effective human body exists in the image picture of the video stream, and the double-person mode means that 2 or more effective human bodies exist in the picture of the video stream. When there are more than 2 effective human bodies in the video stream, 2 (i.e. the number of the objects actually participating in the task) effective human bodies need to be selected from the more than 2 effective human bodies.

In the single-person mode, as illustrated in fig. 5B, the model control method includes:

s601: identifying an image comprising a single body;

in the case that the dragon boat is a virtual object in the display device, before step S601, a preset standing posture may be displayed in a display interface of the display device, as shown in fig. 5C, which is a schematic display interface diagram of the display device according to the embodiment of the present application, when a "side-to-right" button in the display interface is selected, the participant is guided to side to right, and when a "side-to-left" button in the display interface is selected, the participant is guided to side to left, thereby guiding the participant to perform a rowing action.

S602: after identifying key point information of the wrist of the human body, judging whether the motion of the human body accords with the rowing motion;

s603: in line with the rowing action, the dragon boat advances forward until the end point is reached (i.e. the task is completed).

In some embodiments, after identifying the information of the key points of the wrist of the human body, ellipse fitting may be performed according to the information of the key points of the wrist, whether the motion of the human body conforms to the rowing motion may be determined according to ellipse parameters of the ellipse fitting, and under the condition that the motion of the human body conforms to the rowing motion, parameters (first motion parameters) of the motion of the human body may be determined according to trajectory information of the key points of the wrist, so as to determine second motion parameters of the matching. For example, the preset standing posture displayed in the display interface can also be used for determining the advancing direction of the dragon boat, for example, when a "side-to-right" button in the display interface is selected, in the case that the human body action conforms to the rowing action, the first motion parameter includes a direction parameter indicating clockwise rotation of the wrist of the human body, and the matched second motion parameter includes a direction parameter controlling the advancing direction of the dragon boat; when a button of 'side body to left' in a display interface is selected, under the condition that the human body action conforms to the rowing action, the first motion parameter comprises a direction parameter indicating the counterclockwise rotation of the wrist of the human body, the matched second motion parameter comprises a direction parameter controlling the forward movement of the dragon boat, and therefore the adjustment of the motion direction of the dragon boat is achieved.

In the case of a virtual object in the display device, as shown in fig. 5D, a display interface diagram of the participant controlling the progress of the dragon boat 501 in the single-person mode is shown, wherein the timekeeping zone 503 is used for recording the time when the participant reaches the end point, and the dragon boat game results can be ranked according to the time when the participant reaches the end point.

In some embodiments, the real-time action map (or skeletal wiring map) of the participant may be displayed simultaneously in the display interface, e.g., the white area in fig. 5D may be used to display the real-time action map of the participant to help the participant correct his or her own standing posture. In some embodiments, the display interface may include a dial for displaying the speed of the dragon boat, and the speed of the dragon boat may be displayed in real time, so that the speed is more intuitive, and it is convenient for guiding the participant to adjust the speed. In some embodiments, the edge of the track in the display interface further comprises a cheering squad 502, the action and the output sound of the cheering squad can be changed according to the game progress, for example, when the competitor arrives at the terminal, the cheering squad can jump and play the music of winning; when the competitor catches up with the other side, the cheering team can applaud and yell the sound of the real rod.

S604: and ranking according to the time length of reaching the terminal, and automatically jumping to enter a personal ranking list page (namely ranking the controlled model).

Here, the ranking list shows the user score condition with the highest ranking within a certain time, so that the entertainment mood of the user can be motivated, and the user can continuously try to refresh the ranking of the user.

In a double mode, if only 2 persons exist, selecting the human body images of the two persons as comparison source data; if the number of people exceeds 2, two human body images (wherein each human body controls a dragon boat) are selected according to a certain strategy (such as the size of a human body detection frame or the integrity of a bone key point), and correspondingly, as shown in fig. 5E, the model control method comprises the following steps:

s701: identifying an image including a plurality of human bodies;

s702: selecting two human bodies according to a certain strategy;

s703: drawing a connecting line of the two human bodies;

s704: after identifying the human body key point information of the two human bodies, judging whether the actions of the two human bodies accord with the rowing action;

s705: under the condition of conforming to the rowing action, the dragon boat advances forwards until reaching the terminal (namely completing the task);

in the case that the dragon boat is a virtual object in the display device, as shown in fig. 5F, a schematic diagram of a display interface for controlling the progress of the dragon boat 501 by the participants in the double mode is shown, wherein each participant controls one dragon boat, the time zone 503 is used for recording the time when each participant reaches the endpoint, the win or lose of the participant is determined according to the time when each participant reaches the endpoint, and other setting conditions of the display interface can refer to the single mode.

S706: the dragon boat which reaches the terminal point first wins, plays the winning sound effect and enters the double ranking list (the duration which is less in use in the two dragon boats is used as the score of the competition group for ranking). The leaderboard displays the highest ranking user score within a certain time, and encourages multiple people to play together.

The scheme provided by the embodiment of the application can be a dragon boat game all-in-one machine in an education or game scene in some embodiments, the all-in-one machine can be a real robot, and can also be a computer device with built-in dragon boat game software, and a dragon boat virtual object can be displayed on a display interface.

The dragon boat game all-in-one machine is a product deeply integrating the artificial intelligence vision technology, plays a good promoting role in understanding artificial intelligence for teenagers such as students, and enables the teenagers to interact with machines without age limitation and threshold. In the case of lateralization, techniques of artificial intelligence can be understood, such as: how to capture the human face and identify the characteristics of human bodies so as to arouse the interest of young people in learning artificial intelligence knowledge.

Based on the foregoing embodiments, the present application provides a model control apparatus, which includes units and modules included in the units, and can be implemented by a processor in a computer device; of course, the implementation can also be realized through a specific logic circuit; in the implementation process, the Processor may be a Central Processing Unit (CPU), a Microprocessor Unit (MPU), a Digital Signal Processor (DSP), a Field Programmable Gate Array (FPGA), or the like.

Fig. 6 is a schematic structural diagram of a model control apparatus according to an embodiment of the present application, and as shown in fig. 6, a model control apparatus 600 includes: a first obtaining module 610, a first determining module 620, an identifying module 630, and a control module 640, wherein:

the system comprises a first obtaining module 610, a task processing module and a task processing module, wherein the first obtaining module is used for obtaining an image frame set to be processed, image frames in the image frame set have a sequential time sequence relation, and the image frames comprise at least one first target object participating in a task;

a first determining module 620, configured to determine a matching relationship between the at least one first target object and at least one controlled model for performing the task;

an identifying module 630, configured to identify at least two image frames of the image frame set to be processed, to obtain a first identification result, where the first identification result at least includes a first motion parameter of each first target object;

and the control module 640 is configured to control the controlled model matched with each first target object based on the matching relationship, and complete the task according to the second motion parameter matched with the first motion parameter.

In some embodiments, the controlled model includes a virtual object in a display device, and the control module 640 includes: a first determining sub-module, configured to determine, based on the matching relationship and the first motion parameter of each first target object, a second motion parameter of the virtual object matching each first target object; and the first control submodule is used for controlling the virtual object to complete the task according to the second motion parameter.

In some embodiments, the identification module comprises: the identification submodule is used for identifying at least two image frames of the image frame set to be processed to obtain a second identification result, wherein the second identification result comprises a detection result of a candidate object; the acquisition submodule is used for acquiring the number of objects actually participating in the task; a screening sub-module, configured to screen the number of the first target objects from the candidate objects based on a detection result of the candidate objects; and the second determining submodule is used for determining the first motion parameter of each first target object based on the screened second identification result with the number of the first target objects.

In some embodiments, the detection result includes position information of a detection frame of the candidate object, and the filtering sub-module includes: a first determination unit configured to determine a size of a detection frame of the candidate object; a first screening unit, configured to screen the number of the first target objects from the candidate objects based on a size of a detection frame of the candidate objects.

In some embodiments, the detection result includes location information of a bone key point of the candidate object, and the filtering sub-module includes: a second determining unit, configured to determine the integrity of the bone key points of the candidate object; a second screening unit, configured to screen the number of the first target objects from the candidate objects based on the integrity of the bone key points of the candidate objects.

In some embodiments, the first motion parameter comprises a first speed parameter and/or a first direction parameter, and the first determination sub-module comprises: a third determination unit configured to determine whether an action of at least one of the first target objects is a target action based on the screened second recognition result having the number of first target objects; a fourth determining unit, configured to determine, in response to that the action of the at least one first target object is a target action, a first speed parameter and/or a first direction parameter of the target action; the second determination sub-module: and the device is further used for determining a second speed parameter and/or a second direction parameter of the controlled model matched with each first target object based on the matching relation and the first speed parameter and/or the first direction parameter of each first target object.

In some embodiments, the third determining unit includes: a first determining subunit, configured to determine, based on the screened second identification result with the number of first target objects, location information of a keypoint associated with the at least one first target object; the second determining subunit is configured to determine, based on the chronological relationship of the image frame where the second recognition result is located and the position information of the key point, trajectory information of the key point; and the third determining subunit is configured to determine, in response to that the trajectory information of the key point satisfies a preset condition, that the motion of the at least one first target object is a target motion.

In some embodiments, in the case that at least two first target objects match one controlled model, the recognition result further includes motion magnitudes of the at least two first target objects, and the control module includes: a third determining submodule, configured to determine whether motion amplitudes of the at least two first target objects are consistent; and the second control sub-module is used for responding to the inconsistent motion amplitudes of the at least two first target objects and not outputting control instructions to the controlled model matched with the at least two first target objects.

In some embodiments, in the case where at least two first target objects match one controlled model, the control module includes: a fourth determination sub-module for determining a second target object among the at least two first target objects based on the first motion parameter of each of the at least two first target objects; and the third control sub-module is used for controlling the controlled model matched with the at least two first target objects based on the matching relation and completing the task according to the second motion parameters matched with the first motion parameters of the second target object.

In some embodiments, the apparatus further comprises: the second acquisition module is used for acquiring a first time length for the controlled model to complete the task; a second determination module for determining a second history set of the controlled model from the first history set based on the matching relationship; a first ranking module to rank the controlled model based on the second set of history records and the first duration.

In some embodiments, in the case where the number of the controlled models is at least two, the apparatus further includes: the third acquisition module is used for acquiring a second time length for each controlled model to complete the task; and the second ranking module is used for ranking the at least two controlled models based on each second duration.

The above description of the apparatus embodiments, similar to the above description of the method embodiments, has similar beneficial effects as the method embodiments. In some embodiments, functions of or modules included in the apparatuses provided in the embodiments of the present disclosure may be used to perform the methods described in the above method embodiments, and for technical details not disclosed in the embodiments of the apparatuses of the present disclosure, please refer to the description of the embodiments of the method of the present disclosure for understanding.

If the technical scheme of the application relates to personal information, a product applying the technical scheme of the application clearly informs personal information processing rules before processing the personal information, and obtains personal independent consent. If the technical scheme of the application relates to sensitive personal information, a product applying the technical scheme of the application obtains individual consent before processing the sensitive personal information, and simultaneously meets the requirement of 'express consent'. For example, at a personal information collection device such as a camera, a clear and significant identifier is set to inform that the personal information collection range is entered, the personal information is collected, and if the person voluntarily enters the collection range, the person is regarded as agreeing to collect the personal information; or on the device for processing the personal information, under the condition of informing the personal information processing rule by using obvious identification/information, obtaining personal authorization in the modes of pop-up window information or asking the person to upload personal information thereof and the like; the personal information processing rule may include information such as a personal information processor, a personal information processing purpose, a processing method, and a type of personal information to be processed.

It should be noted that, in the embodiment of the present application, if the model control method is implemented in the form of a software functional module and is sold or used as a standalone product, the model control method may also be stored in a computer-readable storage medium. Based on such understanding, the technical solutions of the embodiments of the present application may be essentially implemented or a part contributing to the related art may be embodied in the form of a software product stored in a storage medium, and including several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media capable of storing program codes, such as a usb disk, a removable hard disk, a Read Only Memory (ROM), a magnetic disk, or an optical disk. Thus, embodiments of the present application are not limited to any particular hardware, software, or firmware, or any combination of hardware, software, and firmware.

An embodiment of the present application provides an electronic device, which includes a memory and a processor, where the memory stores a computer program that can be executed on the processor, and the processor implements some or all of the steps of the above method when executing the program.

The embodiment of the present application provides a computer-readable storage medium, on which a computer program is stored, and the computer program, when executed by a processor, implements some or all of the steps of the above method. The computer readable storage medium may be transitory or non-transitory.

The present application provides a computer program, which includes a computer readable code, and in a case where the computer readable code runs in a computer device, a processor in the computer device executes a program for implementing some or all of the steps in the method.

Embodiments of the present application provide a computer program product, which includes a non-transitory computer-readable storage medium storing a computer program, and when the computer program is read and executed by a computer, the computer program implements some or all of the steps of the above method. The computer program product may be embodied in hardware, software or a combination thereof. In some embodiments, the computer program product is embodied in a computer storage medium, and in other embodiments, the computer program product is embodied in a Software product, such as a Software Development Kit (SDK), or the like.

Here, it should be noted that: the foregoing description of the various embodiments is intended to highlight various differences between the embodiments, which are the same or similar and all of which are referenced. The above description of the apparatus, storage medium, computer program and computer program product embodiments is similar to the description of the method embodiments above, with similar advantageous effects as the method embodiments. For technical details not disclosed in the embodiments of the apparatus, the storage medium, the computer program and the computer program product of the present application, reference is made to the description of the embodiments of the method of the present application for understanding.

It should be noted that fig. 7 is a schematic hardware entity diagram of an electronic device in an embodiment of the present application, and as shown in fig. 7, the hardware entity of the electronic device 700 includes: a processor 701, a communication interface 702, and a memory 703, wherein:

the processor 701 generally controls the overall operation of the electronic device 700.

The communication interface 702 may enable the electronic device to communicate with other terminals or servers via a network.

The Memory 703 is configured to store instructions and applications executable by the processor 701, and may also buffer data (e.g., image data, audio data, voice communication data, and video communication data) to be processed or already processed by the processor 701 and modules in the electronic device 700, and may be implemented by a FLASH Memory (FLASH) or a Random Access Memory (RAM). Data may be transferred between the processor 701, the communication interface 702, and the memory 703 via the bus 704.

It should be appreciated that reference throughout this specification to "one embodiment" or "an embodiment" means that a particular feature, structure or characteristic described in connection with the embodiment is included in at least one embodiment of the present application. Thus, the appearances of the phrases "in one embodiment" or "in an embodiment" in various places throughout this specification are not necessarily all referring to the same embodiment. Furthermore, the particular features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. It should be understood that, in the various embodiments of the present application, the sequence numbers of the above steps/processes do not mean the execution sequence, and the execution sequence of the steps/processes should be determined by their functions and inherent logic, and should not constitute any limitation to the implementation process of the embodiments of the present application. The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In the several embodiments provided in the present application, it should be understood that the disclosed apparatus and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or in other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units; can be located in one place or distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, all functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

Those of ordinary skill in the art will understand that: all or part of the steps for realizing the method embodiments can be completed by hardware related to program instructions, the program can be stored in a computer readable storage medium, and the program executes the steps comprising the method embodiments when executed; and the aforementioned storage medium includes: various media that can store program codes, such as a removable Memory device, a Read Only Memory (ROM), a magnetic disk, or an optical disk.

Alternatively, the integrated unit described above may be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a separate product. Based on such understanding, the technical solutions of the present application or portions thereof contributing to the related art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present application. And the aforementioned storage medium includes: various media that can store program code, such as removable storage devices, ROMs, magnetic or optical disks, etc.

The above description is only for the embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application.

Claims

1. A model control method, comprising:

acquiring an image frame set to be processed, wherein image frames in the image frame set have a sequential time sequence relationship and comprise at least one first target object participating in a task;

determining a matching relationship between the at least one first target object and at least one controlled model that performs the task;

identifying at least two image frames of the image frame set to be processed to obtain a first identification result, wherein the first identification result at least comprises a first motion parameter of each first target object;

and controlling the controlled model matched with each first target object based on the matching relation, and completing the task according to a second motion parameter matched with the first motion parameter.

2. The method of claim 1, wherein the controlled model comprises a virtual object in a display device, and wherein controlling the controlled model matched to each of the first target objects based on the matching relationship completes the task according to a second motion parameter matched to the first motion parameter comprises:

determining a second motion parameter of the virtual object matched with each first target object based on the matching relation and the first motion parameter of each first target object;

and controlling the virtual object to complete the task according to the second motion parameter.

3. The method according to claim 1 or 2, wherein said identifying at least two image frames of said set of image frames to be processed, resulting in a first identification result, comprises:

identifying at least two image frames of the image frame set to be processed to obtain a second identification result, wherein the second identification result comprises a detection result of a candidate object;

acquiring the number of objects actually participating in a task;

screening the number of the first target objects in the candidate objects based on the detection result of the candidate objects;

and determining a first motion parameter of each first target object based on the screened second identification result with the number of the first target objects.

4. The method of claim 3, wherein the detection result comprises position information of a detection frame of the candidate object, and the screening the candidate objects for the first target object with the number based on the detection result of the candidate object comprises:

determining the size of a detection frame of the candidate object;

screening the candidate objects for the first target object having the number based on a size of a detection box of the candidate objects.

5. The method of claim 3, wherein the detection result comprises location information of skeletal key points of the candidate object, and wherein the screening of the candidate object for the first target object having the number based on the detection result of the candidate object comprises:

determining the integrity of skeletal keypoints of the candidate object;

screening the candidate objects for the first target object having the number based on the integrity of skeletal keypoints of the candidate objects.

6. The method according to any one of claims 3 to 5, wherein the first motion parameters comprise a first speed parameter and/or a first direction parameter, and the determining the first motion parameters of each first target object based on the screened-out second recognition result with the number of first target objects comprises:

determining whether the action of at least one first target object is a target action based on the screened second recognition result with the number of first target objects;

in response to the action of the at least one first target object being a target action, determining a first speed parameter and/or a first direction parameter of the target action;

the method further comprises the following steps:

and determining a second speed parameter and/or a second direction parameter of the controlled model matched with each first target object based on the matching relation and the first speed parameter and/or the first direction parameter of each first target object.

7. The method of claim 6, wherein determining whether the action of at least one of the first target objects is a target action based on the screened-out second recognition result with the number of first target objects comprises:

determining location information of a keypoint associated with said at least one first target object based on said screened out second recognition result with said number of first target objects;

determining track information of the key points based on the sequential time relation of the image frames where the second identification results are located and the position information of the key points;

and determining the action of the at least one first target object as a target action in response to the track information of the key points meeting a preset condition.

8. The method according to any one of claims 1 to 7, wherein in the case that at least two first target objects match one controlled model, the recognition result further includes motion magnitudes of the at least two first target objects, and the controlling of the controlled model matching each of the first target objects based on the matching relationship, the task being completed according to the second motion parameter matching the first motion parameter, includes:

determining whether the motion amplitudes of the at least two first target objects are consistent;

and in response to the action amplitudes of the at least two first target objects being inconsistent, not outputting control instructions to the controlled model matched with the at least two first target objects.

9. The method according to any one of claims 1 to 7, wherein, in a case where at least two first target objects match one controlled model, said controlling the controlled model matching each of said first target objects based on said matching relationship, and performing said task in accordance with a second motion parameter matching said first motion parameter, comprises:

determining a second target object among the at least two first target objects based on a first motion parameter of each of the at least two first target objects;

and controlling the controlled model matched with the at least two first target objects based on the matching relation, and completing the task according to the second motion parameters matched with the first motion parameters of the second target object.

10. The method of any one of claims 1 to 8, further comprising:

acquiring a first time length for the controlled model to complete the task;

determining a second historical record set of the controlled model from the first historical record set based on the matching relation;

ranking the controlled models based on the second set of history records and the first duration.

11. The method according to any one of claims 1 to 8, wherein in the case where the number of controlled models is at least two, the method further comprises:

acquiring a second time length for each controlled model to complete the task;

ranking the at least two controlled models based on each of the second durations.

12. A model control apparatus, characterized by comprising:

the system comprises a first acquisition module, a second acquisition module and a task processing module, wherein the first acquisition module is used for acquiring an image frame set to be processed, image frames in the image frame set have a sequential time sequence relation, and the image frames comprise at least one first target object participating in a task;

a first determination module for determining a matching relationship between the at least one first target object and at least one controlled model for performing the task;

the identification module is used for identifying at least two image frames of the image frame set to be processed to obtain a first identification result, wherein the first identification result at least comprises a first motion parameter of each first target object;

and the control module is used for controlling the controlled model matched with each first target object based on the matching relation and completing the task according to a second motion parameter matched with the first motion parameter.

13. An electronic device comprising a memory and a processor, the memory storing a computer program operable on the processor, wherein the processor implements the steps of the method of any one of claims 1 to 11 when executing the program.

14. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method of any one of claims 1 to 11.