CN113327306B

CN113327306B - Exclusive animation generation method and system based on hand shadow realization

Info

Publication number: CN113327306B
Application number: CN202110581138.5A
Authority: CN
Inventors: 苏鸿丽; 郑灵翔; 许添硕; 王悰扬
Original assignee: Xiamen University
Current assignee: Xiamen University
Priority date: 2021-05-26
Filing date: 2021-05-26
Publication date: 2022-06-21
Anticipated expiration: 2041-05-26
Also published as: CN113327306A

Abstract

A dedicated animation generation method and a generation system based on hand shadows relate to animation generation. 1) Acquiring training data; 2) preprocessing data; 3) labeling the data set; 4) building a target detection network model; 5) training a model; 6) curing the model; 7) making a material library; 8) starting a hand shadow recognition system; 9) making various animal hand shadows in front of the equipment; 10) the camera shoots and acquires a picture for identification to obtain the action data information of the hand shadow; 11) generating a hand shadow at a corresponding position according to the hand shadow motion data information; 12) adding voice information to the animation; 13) repeating steps 10) to 12) until a termination gesture is received; 14) and saving the pictures and the corresponding audio in the steps 11) to 12) into video files, and generating corresponding two-dimensional codes according to the names of the video files for scanning and viewing by users. The scene is demonstrated through both hands, the position of hand is discerned in real time, animation creation is carried out, cultivates user creativity, imagination, attention.

Description

Exclusive animation generation method and generation system based on hand shadow

Technical Field

The invention relates to the technical field of animation generation, in particular to a hand shadow-based exclusive animation generation method and a generation system, which are realized by utilizing a target detection technology and a deep neural network and enable a user to freely create an animation.

Background

The animation is deeply loved by the audience due to the randomness of the action and the exaggeration of the image, and the demand of the animation is more and more increased along with the rapid development of market economy. The movie and animation industry is an important cultural industry with intensive capital, technology, knowledge and labor, is a new industry with great development potential in the 21 st century and the sunward industry, and has the characteristics of wide consumer groups, large market demand, long product life cycle, high cost, high investment, high added value, high internationalization degree and the like.

At present of vigorously developing cultural industry and animation industry, because the animation original creation, including the cartoon that can form the product, all need a large amount of professional talents, the input of time cost and economic cost, just can engage in professional creation because of the original talents of animation need train for a long time again, and then influence the development of animation industry and peripheral industry, and current animation patent all is that the personage split of setting for at the computer end, use for the professional with the data-based figure, with the purpose of improving work efficiency, itself is not a product, and use value is single, and audience group is less, also can't push to the market. Meanwhile, the existing cartoon products are mainly inspired education of passive acceptance type for students, and a learning mode for enabling the students to actively search for knowledge is lacked. The cartoon with educational elementary meaning is discordant, and more important to children is the operation ability, creation ability and imagination.

Chinese patent CN202010638556.9 discloses a method and system for tracking human body posture and generating animation, which obtains a single-frame image sequence by preprocessing the video data of human body posture; respectively decoding each frame of image in the single-frame image sequence based on a deep learning neural network model to obtain human body posture depth information and human body key point information of each frame of image; processing the human body posture depth information and the human body key point information of each frame of image according to the joint data of a preset animation model to obtain human body action data of a single frame of image sequence; and based on a rendering engine, driving an animation model by adopting human body motion data of a single-frame image sequence to generate an animation.

Chinese patent CN201910428261.6 discloses a method and system for quickly generating two-dimensional animation based on shadow play preview, the method includes the following steps: (1) designing a character role or a scene of the shadow play and creating a configuration file of the character role or the scene; (2) creating a shadow block configuration file; (3) making a shadow man, marking a mark point and numbering; (4) creating a node configuration file; (5) creating a node displacement data configuration file; (6) creating a transition data configuration file; (7) creating a node special effect data configuration file; (8) previewing shadow play; (9) and (5) generating and producing animation. The method has the advantages that the two-dimensional animation can be synchronously generated when shadow play actors perform previewing, real shadow play animation effects can be made, the operation is faster and more streamlined, the replacement of materials is more convenient when the real shadow play animation effects are made, and the technical problems that shadow play animation making processes are slower and the reusability is poor in the prior art are solved.

Disclosure of Invention

The invention aims to provide a special animation generation method and a special animation generation system which are realized based on hand shadows and have high cost and low efficiency, and are mainly used for animation free creation aiming at various character forms which are generated by children based on hand shadows and are variable aiming at the defects of the existing animation films such as uniform space, poor operability and the like.

The exclusive animation generation method based on the hand shadow comprises the following steps:

1) acquiring training data: shooting various gestures at multiple angles, multiple sizes, multiple targets and multiple positions in a white background in advance, and establishing various hand shadow action data sets;

2) data preprocessing: performing data preprocessing on the hand shadow action data set established in the step 1), deleting invalid data, and enhancing the valid data by using a clahe (self-adaptive finite histogram equalization) algorithm;

in step 2), the specific method for preprocessing the data may be:

(1) taking the average value of the three RGB components as the component value of the pixel, and converting the color image into a gray-scale image;

(2) data enhancement is performed by using horizontal flipping, random clipping and clahe (adaptive finite histogram equalization) algorithm;

the specific method for enhancing the valid data by using the clahe algorithm can be as follows:

(1) expanding the image boundary, so that each image processed by the above steps in the hand shadow motion data set is just segmented into a plurality of sub-blocks, and if the area of each sub-block is tileSizeTotal, and the sub-block coefficient lutScale is 255.0/tileSizeTotal, processing the preset limit: limit ═ MAX (1, limit ═ tileSizeTotal/256);

(2) calculating a histogram of each sub-block, namely counting each gray value of 0-255 and occurrence frequency of each sub-block;

(3) limiting each gray level of each sub-block histogram by using a preset limit value, and counting the number of pixels of the whole histogram exceeding the limit;

(4) calculating lut cumulative histogram tileLut of each subblock, wherein tileLut [ i ] ═ sum [ i ]. lutScale, sum [ i ] is a cumulative histogram, and the lutScale ensures that tileLut takes a value of [0,255 ];

(5) traversing each point of the original image, considering tilellut of a sub-block where the point is located and 4 sub-blocks which are the right sub-block, the lower sub-block and the right sub-block, obtaining 4 values by taking the original gray value as an index, and then performing bilinear interpolation to obtain a gray value after the point is transformed;

3) labeling the data set: carrying out data annotation by using an image annotation tool, and dividing an obtained data set into a training set and a test set;

4) building a target detection network model;

in step 4), the specific method for building the target detection network model may be: constructing a target detection model, extracting feature maps of different scales for detection, detecting the feature maps of different scales by adopting prior frames of different scales and length-width ratios which are adaptive to the feature maps, wherein the prior frames of a plurality of groups of length-width ratios are used for detecting hand shadows of different shapes; the target detection network model is composed of an embedded development board, and a hand shadow recognition neural network model is operated.

5) Training a model, and selecting a model weight with highest precision on a test set;

in step 5), the training model may use the server to train the model of step 4) with the data set of step 3), and finally select the model weight with the highest precision on the test set using Early Stop, learning rate attenuation, and training strategies of multiple sets of random seeds.

6) Curing the model: compiling a script calling model according to an application scene, setting the model as a test mode, and not storing gradient information so as to accelerate the model reasoning speed;

7) creating a hand shadow optimization material and an authoring scene material, uploading the hand shadow optimization material and the authoring scene material to a server, and waiting for calling;

8) starting a hand shadow recognition animation generation system, starting a device camera, and making a corresponding gesture by a user through an interactive interface prompt displayed by user terminal display equipment so as to call scene materials stored on a server and display the scene materials on the user terminal display equipment;

9) starting a camera at the equipment end, and making various animal hand shadows in front of an illuminating device at the equipment end by a user;

10) the method comprises the following steps that a camera shoots and acquires an animal hand shadow picture made by a user, an embedded development board recognition module on terminal equipment recognizes hand shadow actions, data information such as hand shadow action name and position is obtained and transmitted to a server;

in step 10), the camera takes a picture, the embedded development board identification module on the terminal device identifies the hand shadow action, and the specific method for obtaining data information such as the name and position of the hand shadow action may be: the method comprises the steps that a camera shoots and acquires an animal hand shadow picture made by a user in real time, an embedded development board recognition module on terminal equipment analyzes information in the real-time picture, after the picture of the hand part of the user is extracted, a trained target detection network model is used for recognition, after the hand shadow action of the user is obtained, if the model is a model in a training set, the hand shadow action can be recognized, corresponding logic operation is carried out in a program according to the meaning of the hand shadow action, and if the model cannot be recognized, the user is prompted to do the hand shadow action again; the logic operation is specifically as follows:

(1) analyzing hand shadow actions appearing in the identification area;

(2) analyzing the position of each hand shadow action;

(3) identifying hand shadow actions of each hand;

(4) the hand shadow action identification is effective;

(5) changing the picture according to the hand shadow action; generating an animation or changing a scene.

11) Calling a hand shadow optimization material stored on a server according to the data information obtained in the step 10), generating a hand shadow at a corresponding position, and displaying the hand shadow on a display device of the user terminal, wherein the movement of the hand shadow is the same as the gesture position;

in step 11), the corresponding position refers to: taking the picture shot by the camera as a standard, the position of the hand in the shot picture corresponds to the position of the animal silhouette on the display screen of the user terminal, and if the picture scene in the screen needs to be limited, the position is preferentially limited, and then the picture corresponds to the corresponding picture.

12) Adding voice information for the animation by the user;

in step 12), the specific method for adding the voice information to the animation by the user may be:

directly using the voice information of the user as the voice-over of the creation: after receiving the voice information of the user, an audio acquisition module (microphone) is stored and synthesized with the picture, and finally, an animation segment with the side voice of the user is output; or voice recognition is carried out to assist the user in creating: the audio acquisition module (microphone) performs voice recognition after receiving voice information of a user, analyzes content spoken by the user, extracts key information, and correspondingly changes a picture to make up creation limitations caused by hand shadow movement limitations, for example, trees can be automatically generated in an environment after detecting that the user speaks trees.

13) Repeating steps 10) to 11) until a termination gesture is received;

14) saving the pictures from the step 11) to the step 12) and the corresponding audio into video files, and generating corresponding two-dimensional codes according to the names of the video files to be provided for the user to scan and view the animation created by the user.

The exclusive animation generation system based on the hand shadow comprises an equipment end, a user terminal and a server;

the device side includes: embedded development board, lighting device and power module. The embedded development board includes: the device comprises an identification module, a WIFI module, a camera module, an audio acquisition module and a USB interface;

the identification module is used for detecting and tracking hand shadow actions of a user, identifying and analyzing hand shadow action pictures, and submitting data information such as names, positions and the like related to the hand shadow actions obtained through identification and analysis to a server;

the WIFI module is used for establishing connection between the equipment end and the user terminal, and the user end searches for the connected equipment end according to the WIFI _ SSID so as to transmit, present and express an creation picture on the user terminal through wireless connection;

the camera module is used for shooting a hand shadow action image of a user in real time and acquiring an image signal;

the audio acquisition module is used for acquiring and receiving voice signals of a user and comprises a microphone; the audio acquisition module is connected with the embedded development board and is connected with the user terminal through a network;

the USB interface is used for animation transmission and external charging of equipment, and a user can perform batch management on created animations through the USB interface;

the lighting device is used for providing enough light sources for shooting the hand shadow actions of the user;

the power module provides power for each module of the equipment, and the power module can adopt a battery or a direct current power supply;

the user terminal comprises a mobile phone, a tablet computer or a personal computer and the like, can display the generated animation and interacts with the server or the equipment terminal; and the user terminal is connected with the equipment terminal through WIFI.

The server is a GPU server.

Compared with the prior art, the invention has the following outstanding advantages:

by means of the method and the system, the scene can be demonstrated through two hands, the position of the hand is recognized in real time, the position of the hand is the position of the character, the movement change of each part of the shape character corresponds to each joint point of the hand, and animation creation is carried out on the scene through intelligent recognition. The creativity, imagination, attention and the like of the user are further developed.

Drawings

Fig. 1 is a system block diagram of a dedicated animation generation system implemented based on a hand shadow according to an embodiment of the present invention.

FIG. 2 is a schematic diagram of a gesture data set established by the present invention.

Detailed Description

The following examples further illustrate the invention in conjunction with the drawings.

As shown in fig. 1, an embodiment of a dedicated animation generation system implemented based on a hand shadow includes a device side, a user terminal, and a server;

the device side includes: embedded development board, lighting device and power module.

The embedded development board includes: the device comprises an identification module, a WIFI module, a camera module, an audio acquisition module and a USB interface;

the lighting device is used for providing enough light source for shooting the hand shadow action of the user;

the user terminal comprises a mobile phone, a tablet computer or a personal computer and the like; and the user terminal is connected with the equipment end through the WIFI module.

The server is a GPU server.

When the device is used, a user opens the device end, the device is connected with the user terminal, and the creation process is displayed on the display screen. A user firstly adopts fist-grasping hand gestures to select creation scenes, namely forests, deserts and grasslands, through a lighting device at a device end. And displaying the scene through the user terminal display equipment after the scene is successfully selected. And then, making animal hand shadows in front of equipment lamplight, wherein the system comprises a bird (bird), a butterfly (butterfly), a deer (der), an elephant (elephant), a rabbit (rabbitt), a dog (dog) and a tree (tree) identifies corresponding hand shadow actions, and transmits information such as names and positions of the hand shadow actions to a server. The server calls the stored animal forms corresponding to each other in a matching way, then the optimized image of the animal appears on the display screen, the user uses the hand shadow to compile stories and make animations, and the screen video and the actual sound in the making process are stored in the server by the system; the user terminal mobile phone can be connected with the equipment, historical creation stored on the server is stored to the mobile phone, and created works can be conveniently checked at any time.

Referring to fig. 1, the embodiment of the exclusive animation generation method based on the hand shadow implementation of the present invention includes the following steps:

1) acquiring training data: as shown in fig. 2, a white background, multiple angles, multiple sizes, multiple targets and multiple positions are selected to shoot various gestures, including a bird (bird), a butterfly (butterfly), a deer (der), an elephant (elephant), a rabbit (rabbitt), a dog (dog) and a tree, and various hand shadow action data sets are established; the size of each picture can be more than 250 x 250, and the hand gestures can adopt hand shadow gestures of the existing general expression animals.

2) Data preprocessing: preprocessing the data of the hand shadow motion data set obtained in the step 1), deleting invalid data, and enhancing the valid data by using a clahe (self-adaptive finite histogram equalization) algorithm;

in step 2), the specific method for preprocessing the previous stage data may be:

(2) performing data enhancement by means of horizontal turning, random clipping and the like and a clahe algorithm;

the specific method for enhancing valid data by using the clahe (adaptive finite histogram equalization) algorithm may be as follows:

(1) expanding the image boundary to ensure that the image boundary can be exactly divided into a plurality of sub-blocks, assuming that the area of each sub-block is tileSizeTotal, and the sub-block coefficient lutScale is 255.0/tileSizeTotal, processing the preset limit: limit MAX (1, limit tileSizeTotal/256);

(2) calling a function calcHist to calculate a histogram for each sub-block;

3) labeling the data set: performing data annotation (https:// githu. com/tzutalin/LabelImg) by using open source data annotation software LabelImg, and dividing the data set processed in the step 2) into a training set and a testing set;

in step 3), the data set is divided into a training set and a test set, wherein 70% of the training set and 30% of the test set are used as the training set and the test set.

4) Building a target detection network model;

in step 4), the specific method for building the target detection network model may be: a target detection model SSD (Wei Liu et al. SSD: Single Shot MultiBox Detector,2016) is built by using TensorFlow, characteristic diagrams of different scales are extracted by the SSD for detection, prior frames of different scales and length-width ratios are adopted, the prior frames of different sizes can better detect targets of different sizes, the prior frames of multiple groups of length-width ratios can also more abundantly detect hand shadows of different shapes, and in addition, pre-training weights (https:// githu. com/balancap/SSD-Tensorflow) of the target detection model SSD on a VOC2007 data set are obtained to serve as initial test weights of transfer learning.

5) Training the model, and selecting the model weight with the highest precision on the test set for training the model;

in step 5), the specific method for training the model may use a server (two 1080ti GPUs) to train the model in step 4) with the data set in step 3), and use Early Stop, learning rate attenuation and training strategies of multiple sets of random seeds to finally select the model weight with the highest precision on the test set for training the model.

6) Curing the model: compiling a script calling model according to an application scene (CPU), setting the model as a test mode (information such as gradient and the like is not stored), and accelerating the model reasoning speed;

7) the method comprises the steps of making hand shadow optimization materials and created scene materials, including forests, deserts and grasslands. Uploading the data to a server to wait for calling;

8) and starting the hand shadow recognition system, starting the equipment end camera, and displaying a palm image by the user terminal display equipment so as to allow the user to perform gesture control interaction. Firstly prompting a user to select an creation scene on an interactive interface, wherein the scene comprises a forest, a desert, a grassland and the like, the user controls a palm image to a corresponding scene, such as a forest, and then selects the forest scene by using a fist-grasping gesture, so that the creation scene of a forest theme reserved on a server is called and displayed on a user terminal display device;

9) the user makes an animal hand shadow corresponding to the bird (bird) in front of the lighting device at the equipment end.

10) The method comprises the steps that a camera shoots and obtains pictures, an embedded development board recognition module on terminal equipment recognizes hand shadow actions, data information such as a hand shadow action name bird (bird) and position is obtained and transmitted to a server;

in step 10), the camera takes a picture, the embedded development board identification module on the terminal device identifies the hand shadow action made by the user, and the specific method for obtaining data information such as the name and position of the hand shadow action may be: the camera analyzes information in the real-time picture, after the picture of the hand part of the user is extracted, a trained gesture recognition model is used for recognizing, after the hand shadow action of the user is obtained, if the hand shadow action is a model in a training library, the hand shadow action which can be recognized is carried out, corresponding logic operation is carried out in a program according to the meaning of the gesture, and if the hand shadow action cannot be recognized, the user is prompted to carry out the hand shadow action again; the logic operation specifically includes:

(1) analyzing hand shadow actions appearing in the identification area;

(2) analyzing the position of each hand shadow action;

(3) identifying hand shadow actions of each hand;

(4) the hand shadow action identification is effective;

(5) and changing the picture according to the motion of the hand shadow. (generating an animation or changing a scene).

11) Generating an optimized material corresponding to a bird (bird) hand shadow at a corresponding position according to the data information obtained in the step 10), wherein the movement of the hand shadow is the same as the gesture position;

in step 11), the corresponding position refers to: taking the picture shot by the camera as a standard, the position of the hand in the shot picture corresponds to the position of the animal silhouette on the display screen of the user terminal, and if the picture scene in the screen is limited, the position is preferentially limited, and then the picture corresponds to the corresponding picture; for example, when a bird flies on the sky, the bird does not appear near the ground, but rather at a higher range, and then adjusts according to the position of the hand.

12) Adding voice information for the animation by the user; voice information is freely added by a user according to the animation condition of the user; for example: the bird flies in a big forest without worry and worry, and has a sudden day … …

directly using the voice information of the user as the voice-over of the creation: after receiving the voice information of the user, an audio acquisition module (microphone) is stored and synthesized with the picture, and finally, an animation segment with the side voice of the user is output; or voice recognition is carried out to assist the user in creating: the audio acquisition module (microphone) performs voice recognition after receiving voice information of a user, analyzes content spoken by the user to extract key information, and correspondingly changes the picture to make up for creation limitations caused by hand shadow movement limitations, for example, trees can be automatically generated in the environment after detecting that the user speaks trees.

13) Repeating the steps 10) to 12) until a termination gesture is received;

The method and the system identify and produce simple animations related to animals by using common hand shadows, the movement change of hands corresponds to the movement change of animation actions, the method and the system are simple to build, mainly face children, animation free creation is carried out based on various character forms changed by the hand shadows, the cost is low, the efficiency is high, animation creation can be simply and quickly realized, video files are stored in the animation and the corresponding audio, and a user can store and scan the created animation.

Claims

1. A dedicated animation generation method based on hand shadows is characterized by comprising the following steps:

2) data preprocessing: performing data preprocessing on the hand shadow action data set established in the step 1), deleting invalid data, and enhancing the valid data by using a clahe algorithm;

the data preprocessing comprises the steps of firstly taking the average value of three components of RGB as the component value of a pixel, and converting a color image into a gray image; then, performing data enhancement by using horizontal turning, random clipping and a clahe algorithm;

the specific method for enhancing the effective data by using the clahe algorithm comprises the following steps:

(1) expanding the image boundary, so that each image processed in the hand shadow motion data set is just segmented into a plurality of sub-blocks, assuming that the area of each sub-block is tileSizeTotal, and the sub-block coefficient lutScale is 255.0/tileSizeTotal, processing the preset limit: limit ═ MAX (1, limit ═ tileSizeTotal/256);

(3) limiting each gray level of each sub-block histogram by using a preset limit value, and counting the number of pixels of which the whole histogram exceeds the limit;

4) building a target detection network model;

10) the method comprises the following steps that a camera shoots and acquires an animal hand shadow picture made by a user, an embedded development board recognition module on terminal equipment recognizes hand shadow actions, and name and position data information of the hand shadow actions is obtained and transmitted to a server;

11) calling a hand shadow optimization material stored on a server according to the data information obtained in the step 10), generating a hand shadow at a corresponding position, and displaying the hand shadow on display equipment of the user terminal, wherein the movement of the hand shadow is the same as the gesture position;

12) adding voice information for the animation by the user;

13) repeating steps 10) to 11) until a termination gesture is received;

2. An exclusive animation generation method based on a hand shadow implementation according to claim 1, wherein in step 4), the specific method for building the target detection network model is as follows: constructing a target detection model, extracting feature maps of different scales for detection, detecting the feature maps of different scales by adopting prior frames of different scales and length-width ratios which are adaptive to the feature maps, and detecting hand shadows of different shapes by adopting a plurality of groups of prior frames of length-width ratios; the target detection network model is composed of an embedded development board, and a hand shadow recognition neural network model is operated.

3. The method as claimed in claim 1, wherein in step 5), the training model is the model in step 4) trained by the server using the data set in step 3), and the model weight with the highest precision on the test set is finally selected by using Early Stop, learning rate attenuation and training strategy of multiple sets of random seeds.

4. The method for generating dedicated animation based on hand shadow according to claim 1, wherein in step 10), the camera captures an image of the hand shadow of the animal made by the user, the embedded development board recognition module on the terminal device recognizes the hand shadow, and the specific method for obtaining the name and position data information of the hand shadow is as follows: the method comprises the steps that a camera shoots and acquires an animal hand shadow picture made by a user in real time, an embedded development board recognition module on terminal equipment analyzes information in the real-time picture, after the picture of the hand part of the user is extracted, a trained target detection network model is used for recognition, after the hand shadow action of the user is obtained, if the model in a training data set is a recognizable hand shadow action, corresponding logic operation is carried out in a program according to the meaning of the hand shadow action, and if the model cannot be recognized, the user is prompted to do the hand shadow action again; the logic operation is specifically as follows:

(1) analyzing hand shadow actions appearing in the identification area;

(2) analyzing the position of each hand shadow action;

(3) identifying hand shadow actions of each hand;

(4) the hand shadow action identification is effective;

5. A dedicated animation generation method implemented based on hand shadows as claimed in claim 1, wherein in step 11), the corresponding position refers to a picture taken by the camera, the position of the hand in the taken picture corresponds to the position of the animal silhouette on the display screen of the user terminal, and if the picture scene in the display screen of the user terminal needs to be restricted, the position restriction is preferentially performed, and then the corresponding picture is obtained.

6. The method for generating dedicated animation based on hand shadows according to claim 1, wherein in step 12), the specific method for adding voice information to the animation by the user is as follows:

directly using the voice information of the user as the voice-over of the creation: after receiving the voice information of the user, the audio acquisition module is stored and synthesized with the picture, and finally, the audio acquisition module outputs an animation segment with the voice of the user; or voice recognition is carried out to assist the user in creating: the audio acquisition module performs voice recognition after receiving voice information of a user, analyzes the content spoken by the user, extracts key information, and correspondingly changes the key information in a picture.

7. A dedicated animation generation system realized based on a hand shadow and used for executing the animation generation method as claimed in any one of claims 1 to 6, wherein the system comprises a device side, a user terminal and a server;

the device side includes: the system comprises an embedded development board, a lighting device and a power supply module;

the identification module is used for detecting and tracking hand shadow actions of a user, identifying and analyzing hand shadow action pictures, and transmitting names and position data information related to the hand shadow actions obtained through identification and analysis to the server;

the audio acquisition module is used for acquiring and receiving voice signals of a user, and is connected with the embedded development board and connected with the user terminal through a network;

the USB interface is used for animation transmission and external charging of equipment, and a user conducts batch management on created animations through the USB interface;

the power supply module provides power supply for each module of the equipment;

and the user terminal is connected with the equipment terminal through WIFI.

8. The dedicated animation generation system based on the hand shadow implementation of claim 7, wherein the audio acquisition module comprises a microphone; the power module is a battery or a direct current power supply; the user terminal comprises a mobile phone, a tablet computer or a personal computer and is used for displaying the generated animation and interacting with the server or the equipment terminal.