CN118092647A

CN118092647A - Three-dimensional model processing method and device based on dynamic gesture recognition

Info

Publication number: CN118092647A
Application number: CN202410209841.7A
Authority: CN
Inventors: 崔海涛; 李星; 曹晋
Original assignee: Goolton Technology Co ltd
Current assignee: Goolton Technology Co ltd
Priority date: 2024-02-26
Filing date: 2024-02-26
Publication date: 2024-05-28

Abstract

The application provides a three-dimensional model processing method and device based on dynamic gesture recognition, and relates to the technical field of data processing. In the method, a first hand image, which is shot by an AR (augmented reality) glasses camera and aims at a user, is acquired; acquiring a second hand image shot by a preset camera aiming at a user, wherein the preset camera is positioned on the opposite face of the AR glasses camera; performing scene reconstruction on the first hand image and the second hand image to obtain dynamic gestures of a user; acquiring an initial three-dimensional model of an entity to be processed; and identifying the operation of the dynamic gesture on the initial three-dimensional model to obtain a target three-dimensional model of the entity to be processed, wherein the target three-dimensional model is a three-dimensional model formed after the operation of the initial three-dimensional model by a user. By implementing the technical scheme provided by the application, the AR glasses are used for carrying out dynamic gesture recognition on the interactive operation of the user on the three-dimensional model, the three-dimensional model after the interactive operation of the user is automatically generated, and the processing efficiency of the three-dimensional model is improved.

Description

Three-dimensional model processing method and device based on dynamic gesture recognition

Technical Field

The application relates to the technical field of data processing, in particular to a three-dimensional model processing method and device based on dynamic gesture recognition.

Background

In conventional three-dimensional model construction, users typically need to use specialized modeling software, such as Blender, maya, etc. These software, while powerful, have steep learning curves and complex operations. For non-professionals, using such software can encounter difficulties that greatly limit the user's operational experience and creativity.

With the development of augmented reality technology, devices such as AR glasses offer new possibilities for interactive manipulation of three-dimensional models. When a user wears AR glasses, the entity in the real scene can be modeled by using the AR glasses, and the modeled three-dimensional model is seen through the AR glasses. How to handle the user's interaction with the three-dimensional model through AR glasses remains a challenge.

Therefore, a three-dimensional model processing method and device based on dynamic gesture recognition are urgently needed.

Disclosure of Invention

The application provides a three-dimensional model processing method and device based on dynamic gesture recognition, wherein the AR glasses are used for carrying out dynamic gesture recognition on interactive operation of a user on a three-dimensional model, the three-dimensional model after the interactive operation of the user is automatically generated, and the processing efficiency of the three-dimensional model is improved.

In a first aspect of the present application, there is provided a method for processing a three-dimensional model based on dynamic gesture recognition, the method comprising: acquiring a first hand image, which is shot by an AR (augmented reality) glasses camera and aims at a user, of the user; acquiring a second hand image shot by a preset camera aiming at the user, wherein the preset camera is positioned on the opposite surface of the AR glasses camera; performing scene reconstruction on the first hand image and the second hand image to obtain dynamic gestures of the user; acquiring an initial three-dimensional model of an entity to be processed; and identifying the operation of the dynamic gesture on the initial three-dimensional model to obtain a target three-dimensional model of the entity to be processed, wherein the target three-dimensional model is a three-dimensional model formed after the operation of the user on the initial three-dimensional model.

By adopting the technical scheme, the user can interact with the virtual environment in a more visual and natural way by capturing the hand dynamics through the AR glasses camera. This not only improves the user experience, but also provides a more intuitive way of operation for the user. By combining an AR glasses camera and a preset camera, gestures can be captured from two different angles, which increases the accuracy and reliability of recognition. The multi-angle recognition mode can better process complex gestures or avoid recognition errors caused by hand shielding. This approach allows three-dimensional models to be constructed and modified in real-time in a dynamic environment. It greatly improves the efficiency of construction and modification compared to conventional methods. This approach has a high flexibility since it is based on dynamic gesture recognition. The user can adjust the size, shape or position of the model through gestures according to the requirements or originality of the user. This flexibility enables the method to accommodate a variety of different scenarios and requirements. Using this approach, the threshold for three-dimensional model construction can be lowered for non-professional users. Without specialized modeling skills or tools, a user can easily create and modify a three-dimensional model by gestures alone. Therefore, the AR glasses are used for carrying out dynamic gesture recognition on the interactive operation of the user on the three-dimensional model, the three-dimensional model after the interactive operation of the user is automatically generated, and the processing efficiency of the three-dimensional model is improved.

Optionally, the reconstructing the scene of the first hand image and the second hand image to obtain the dynamic gesture of the user specifically includes: performing key point identification on the first hand image by adopting a preset detection algorithm to obtain a first key point coordinate; performing key point identification on the second hand image by adopting the preset detection algorithm to obtain a second key point coordinate; fusing the first key point coordinates and the second key point coordinates to obtain a target key point coordinate set; performing continuous target detection on the first hand image and the second hand image to obtain hand motion data of the user; and combining the target key point coordinates and the hand motion data to determine the dynamic gesture of the user.

By adopting the technical scheme, the key points in the first hand image and the second hand image can be automatically identified by using the preset detection algorithm, and the automatic detection method can greatly improve the accuracy and efficiency of identification and avoid the need of manual marking or preset key points. The first key point coordinates and the second key point coordinates are fused, so that more complete and accurate hand gesture information can be obtained. The method can process the coordinate difference of the hand in different angles or different views, thereby improving the robustness of gesture recognition. And performing continuous target detection on the first hand image and the second hand image to obtain the motion data of the hands of the user. The continuous detection method can capture dynamic changes of gestures, so that the meaning or the action of the gestures can be accurately identified. By combining the target key point coordinates and the hand motion data, the dynamic gestures of the user can be determined more accurately. The method comprehensively considers the position and motion information of the hand, thereby improving the accuracy and reliability of gesture recognition.

Optionally, the acquiring an initial three-dimensional model of the entity to be processed specifically includes: obtaining modeling data of an entity to be processed, wherein the modeling data comprises shape data, size data and texture data; and carrying out three-dimensional modeling on the entity to be processed according to the shape data, the size data and the texture data to obtain the initial three-dimensional model.

By adopting the technical scheme, the integrity and the accuracy of the data can be ensured by directly acquiring the modeling data from the entity to be processed. The method avoids errors possibly introduced by manual measurement or estimation, thereby improving the accuracy of the three-dimensional model. Texture data is critical to enhance the realism of a three-dimensional model. By acquiring texture data from the entity, more realistic details and textures can be presented on the surface of the model, and the visual effect of the model is enhanced. This approach enables the three-dimensional model to be better adapted to the specific properties and characteristics of the entity to be processed. Since the modeling data is directly derived from the entity, the constructed three-dimensional model will better reflect the actual morphology and details of the entity. If the shape, size or texture of the entity to be processed changes, the updated model can be obtained only by re-acquiring the latest modeling data and re-performing three-dimensional modeling. This approach provides great flexibility and allows for quick response to changes in entity properties. This approach is more efficient than the traditional approach of three-dimensional modeling from scratch. By directly modeling by using the actual data of the entity, tedious measuring and manual modeling processes are avoided, and a great deal of time and labor cost are saved.

Optionally, the operation of identifying the dynamic gesture against the initial three-dimensional model to obtain a target three-dimensional model of the entity to be processed specifically includes: acquiring a first static gesture, wherein the dynamic gesture consists of a plurality of static gestures, and the first static gesture is any one of the plurality of static gestures; inputting the first static gesture into a preset model to be matched, and obtaining a corresponding target static gesture, wherein the target static gesture is a static gesture which is stored in the preset model in advance aiming at hand motion and finger key point coordinates of the user; obtaining a model processing operation corresponding to the target static gesture; and processing the target three-dimensional model of the entity to be processed according to the model processing operation.

By adopting the technical scheme, first, the first static gesture is extracted from the dynamic gestures. Since the dynamic gesture is made up of multiple static gestures, this helps to break down the complex dynamic gesture into simpler, easier to handle static gestures. Next, the first static gesture is input into a preset model for matching. A number of static gestures related to the hand motion of the user and the coordinates of the finger keypoints are stored in a pre-set model. Through matching, a target static gesture corresponding to the first static gesture can be found, and accurate matching is performed according to hand characteristics and motion tracks of a user, so that accuracy and reliability of recognition are improved. Once the target static gesture is found, the three-dimensional model processing operation corresponding to the gesture may be further acquired. And finally, processing the initial three-dimensional model according to the acquired model processing operation to obtain the target three-dimensional model. This step allows the user to intuitively manipulate the three-dimensional model via gestures, thereby enabling modification and adjustment of the model. Since this approach is based on gesture recognition, the user can interact with the virtual environment in a natural way. The user can realize the control of the three-dimensional model through simple gestures without learning a specific input device or a complex command. This greatly improves the user experience and makes the operation of the three-dimensional model more intuitive and easy to understand.

Optionally, the method further comprises: performing real-time monitoring on the operation of the dynamic gesture aiming at the initial three-dimensional model; and displaying the dynamic gesture and the target three-dimensional model through an AR spectacle lens.

By adopting the technical scheme, the operation of the dynamic gesture aiming at the initial three-dimensional model can be monitored in real time, so that instant feedback can be provided for a user. The user can see how his own gestures affect the model and can adjust according to the feedback. The real-time feedback mechanism is beneficial to improving the accuracy and efficiency of operation, so that a user can learn and master how to operate the three-dimensional model through gestures more quickly. Displaying the dynamic gesture and the target three-dimensional model through the AR ophthalmic lens can provide an immersive experience for the user. AR glasses provide a user with an augmented reality environment such that the user can see virtual three-dimensional models and gesture operations in the real world. The display mode increases the participation and immersion of the user, so that the operation is more visual and interesting. The dynamic gestures and the target three-dimensional model are displayed through the AR glasses, and a user can interact with the virtual model more naturally. The user can directly observe and manipulate the three-dimensional model in the AR environment without the need for complex input devices or commands. The interaction mode enables the user to explore and modify the model more freely, and flexibility and interestingness of operation are improved. The AR glasses are used as media for displaying and interaction, a user does not need to rely on a traditional computer or input equipment, and the user can operate only by wearing the AR glasses, so that a convenient and efficient mode is provided for the user.

Optionally, the hand motion data includes hand motion angle, hand motion speed, finger bending degree, and hand motion direction.

By adopting the technical scheme, the hand motion angle describes the position and the posture of the hand joint in space. By measuring and tracking the angular changes of the hand joints, the motion trajectories and movements of the hand can be accurately described. The hand movement speed describes how fast the hand movement changes. By measuring the movement speed of the hand joints, the rhythm and dynamic change of the gestures can be judged. This is useful for recognizing fast or slow gestures and transitions between gestures, helping to more fully understand the user's hand movement intent. The degree of finger flexion describes the degree of flexion of the finger joint. By tracking the bending state of the finger joints, the gesture and motion of the finger can be known, which is very useful for recognizing finger gestures and fine motions. The hand movement direction describes the movement path and direction change of the hand. By tracking the movement track of the hand, the movement direction and the target of the hand can be known. Thus, these components in the hand movement data provide a comprehensive description of hand movement, helping to more accurately recognize and understand gestures. By comprehensively considering the data, the hand intention and operation of the user can be accurately judged, so that a more natural and visual interaction experience is provided.

Optionally, the target three-dimensional model includes a plurality of cells, and the model processing operations include a cell addition operation, a cell modification and/or replacement operation, and a cell deletion operation.

By adopting the technical scheme, the target three-dimensional model is composed of a plurality of units. This structure allows greater flexibility and scalability of the model. By changing or replacing the units, the user can easily make modifications and adjustments to the model without having to construct the entire model from scratch. This greatly improves the working efficiency and the customizability of the model. The model processing operations include a cell addition operation. This means that a user can add new elements to the model to expand or add specific parts of the model. In addition to the augmentation operation, the model process also includes cell modification and replacement operations. This means that the user can modify the properties of an existing unit or replace it with another type of unit. The model processing operations also include a cell deletion operation. This means that the user can delete unnecessary or redundant elements to simplify the model or optimize the design. The operation is helpful for the user to remove redundant elements, improves the definition and maintainability of the model, and meets the personalized requirements of the user.

In a second aspect of the present application, a three-dimensional model processing device based on dynamic gesture recognition is provided, where the three-dimensional model processing device includes an acquisition module and a processing module, where the acquisition module is configured to acquire a first hand image for a user that is captured by an AR glasses camera; the acquisition module is further configured to acquire a second hand image shot by a preset camera for the user, where the preset camera is located on an opposite face of the AR glasses camera; the processing module is used for reconstructing the scene of the first hand image and the second hand image to obtain the dynamic gesture of the user; the acquisition module is also used for acquiring an initial three-dimensional model of the entity to be processed; the processing module is further configured to identify an operation of the dynamic gesture on the initial three-dimensional model, so as to obtain a target three-dimensional model of the entity to be processed, where the target three-dimensional model is a three-dimensional model formed after the user operates the initial three-dimensional model.

In a third aspect of the application there is provided an electronic device comprising a processor, a memory for storing instructions, a user interface and a network interface, both for communicating to other devices, the processor being for executing instructions stored in the memory to cause the electronic device to perform a method as described above.

In a fourth aspect of the application there is provided a computer readable storage medium storing instructions which, when executed, perform a method as described above.

In summary, one or more technical solutions provided in the embodiments of the present application at least have the following technical effects or advantages:

1. By capturing hand dynamics using an AR eye camera, a user can interact with the virtual environment in a more intuitive, natural way. This not only improves the user experience, but also provides a more intuitive way of operation for the user. By combining an AR glasses camera and a preset camera, gestures can be captured from two different angles, which increases the accuracy and reliability of recognition. The multi-angle recognition mode can better process complex gestures or avoid recognition errors caused by hand shielding. This approach allows three-dimensional models to be constructed and modified in real-time in a dynamic environment. It greatly improves the efficiency of construction and modification compared to conventional methods. This approach has a high flexibility since it is based on dynamic gesture recognition. The user can adjust the size, shape or position of the model through gestures according to the requirements or originality of the user. This flexibility enables the method to accommodate a variety of different scenarios and requirements. Using this approach, the threshold for three-dimensional model construction can be lowered for non-professional users. Without specialized modeling skills or tools, a user can easily create and modify a three-dimensional model by gestures alone. Therefore, the AR glasses are used for carrying out dynamic gesture recognition on the interactive operation of the user on the three-dimensional model, the three-dimensional model after the interactive operation of the user is automatically generated, and the processing efficiency of the three-dimensional model is improved;

2. By using a preset detection algorithm, key points in the first hand image and the second hand image can be automatically identified, and the automatic detection method can greatly improve the accuracy and efficiency of identification and avoid the need of manual marking or preset key points. The first key point coordinates and the second key point coordinates are fused, so that more complete and accurate hand gesture information can be obtained. The method can process the coordinate difference of the hand in different angles or different views, thereby improving the robustness of gesture recognition. And performing continuous target detection on the first hand image and the second hand image to obtain the motion data of the hands of the user. The continuous detection method can capture dynamic changes of gestures, so that the meaning or the action of the gestures can be accurately identified. By combining the target key point coordinates and the hand motion data, the dynamic gestures of the user can be determined more accurately. The method comprehensively considers the position and motion information of the hand, thereby improving the accuracy and reliability of gesture recognition;

3. First, a first static gesture is extracted from dynamic gestures. Since the dynamic gesture is made up of multiple static gestures, this helps to break down the complex dynamic gesture into simpler, easier to handle static gestures. Next, the first static gesture is input into a preset model for matching. A number of static gestures related to the hand motion of the user and the coordinates of the finger keypoints are stored in a pre-set model. Through matching, a target static gesture corresponding to the first static gesture can be found, and accurate matching is performed according to hand characteristics and motion tracks of a user, so that accuracy and reliability of recognition are improved. Once the target static gesture is found, the three-dimensional model processing operation corresponding to the gesture may be further acquired. And finally, processing the initial three-dimensional model according to the acquired model processing operation to obtain the target three-dimensional model. This step allows the user to intuitively manipulate the three-dimensional model via gestures, thereby enabling modification and adjustment of the model. Since this approach is based on gesture recognition, the user can interact with the virtual environment in a natural way. The user can realize the control of the three-dimensional model through simple gestures without learning a specific input device or a complex command. This greatly improves the user experience and makes the operation of the three-dimensional model more intuitive and easy to understand.

Drawings

Fig. 1 is a schematic flow chart of a three-dimensional model processing method based on dynamic gesture recognition according to an embodiment of the present application.

Fig. 2 is a schematic block diagram of a three-dimensional model processing device based on dynamic gesture recognition according to an embodiment of the present application.

Fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Reference numerals illustrate: 21. an acquisition module; 22. a processing module; 31. a processor; 32. a communication bus; 33. a user interface; 34. a network interface; 35. a memory.

Detailed Description

In order that those skilled in the art will better understand the technical solutions in the present specification, the technical solutions in the embodiments of the present specification will be clearly and completely described below with reference to the drawings in the embodiments of the present specification, and it is apparent that the described embodiments are only some embodiments of the present application, not all embodiments.

In describing embodiments of the present application, words such as "for example" or "for example" are used to mean serving as examples, illustrations, or descriptions. Any embodiment or design described herein as "such as" or "for example" in embodiments of the application should not be construed as preferred or advantageous over other embodiments or designs. Rather, the use of words such as "or" for example "is intended to present related concepts in a concrete fashion.

In the description of embodiments of the application, the term "plurality" means two or more. For example, a plurality of systems means two or more systems, and a plurality of screen terminals means two or more screen terminals. Furthermore, the terms "first," "second," and the like, are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating an indicated technical feature. Thus, a feature defining "a first" or "a second" may explicitly or implicitly include one or more such feature. The terms "comprising," "including," "having," and variations thereof mean "including but not limited to," unless expressly specified otherwise.

In traditional three-dimensional model construction, designers and artists often rely on specialized modeling software, such as Blender, maya, etc. These tools provide rich functionality and highly free authoring space, but their operator interfaces and functional layouts tend to be complex and learning curves are quite steep. For those who are not trained, grasping the use of such software can be a difficult task and may even lead to many difficult difficulties that they encounter in the authoring process. This clearly limits their creativity, affecting their operational experience.

With the rapid development of technology, the augmented reality technology brings innovative changes to the interactive operation of the three-dimensional model. AR glasses and the like allow users to interact with the three-dimensional model in a completely new way. When users wear AR glasses, they can capture entities directly in the real world and build three-dimensional models in real time in virtual space through the AR glasses. This interaction provides an intuitive, natural way for the user to create and modify the three-dimensional model.

However, how to effectively deal with the user's interaction with the three-dimensional model through AR glasses remains a challenging problem.

In order to solve the above technical problems, the present application provides a three-dimensional model processing method based on dynamic gesture recognition, and referring to fig. 1, fig. 1 is a flow chart of a three-dimensional model processing method based on dynamic gesture recognition according to an embodiment of the present application. The three-dimensional model processing method is applied to an AR glasses controller and comprises the following steps of S110 to S150:

S110, acquiring a first hand image, which is shot by an AR (augmented reality) glasses camera, aiming at a user.

Specifically, the AR glasses controller is a hardware or software module for controlling various functions and operations of the AR glasses. It is responsible for handling user inputs, sending instructions to the display system of the glasses, and communicating with other devices or systems. The camera built into AR glasses is used to capture images and video. This camera may take images of the surrounding environment or close-up images of the user's hand or other body part. The first hand image refers to an image of the hand of the user photographed by the AR glasses camera. This image can be obtained directly from the camera without any processing or modification.

For example, suppose you are using an AR glasses for virtual assembly tasks. The AR glasses are provided with a built-in camera, and can capture and display the surrounding environment of a user and the actions of hands in real time. When the user is using the AR glasses, the AR glasses controller captures and processes images taken by the camera, especially images for the hands. In this way, the virtual fitting part's position in the real world can be seen and adjusted or fitted accordingly to your hand movements.

S120, acquiring a second hand image shot by a preset camera aiming at a user, wherein the preset camera is positioned on the opposite face of the AR glasses camera.

Specifically, the preset camera refers to a preset or designated camera, which is used for shooting a specific target or scene. In this scenario, the preset camera is fixed in position to take an image of the user's hand. In embodiments of the present application, "opposed faces" refer to relative positions or orientations. Therefore, the camera of the AR glasses and the preset camera are oppositely arranged, one of which is photographed in a certain direction, and the other of which is photographed in the opposite direction. The second hand image refers to a hand image of the user photographed through a preset camera. Unlike the first hand image, the second hand image is acquired from the opposite direction of the AR spectacle camera.

For example, assume that a user is using an AR glasses to perform a virtual fit or simulation exercise. The AR glasses have a built-in camera facing the user and a pre-set fixed camera located opposite the camera. When a user is in operation, the AR glasses controller can acquire images from two cameras simultaneously: one is a first hand image acquired from a built-in camera of the AR glasses, and the other is a second hand image acquired from a preset camera. These two images may be used to compare, calibrate or determine the specific position and motion of the user's hand. With this arrangement, the AR glasses are able to capture the user's hand movements from multiple angles, providing more accurate and comprehensive information, facilitating recognition of dynamic gestures.

S130, reconstructing a scene of the first hand image and the second hand image to obtain a dynamic gesture of the user.

In particular, scene reconstruction refers to the process of converting a plurality of images or video frames into a three-dimensional model or scene by image processing and computer vision techniques. In this process, the AR glasses controller analyzes information such as objects, textures, illumination, etc. in the image, and reconstructs a three-dimensional scene using the information. Dynamic gestures refer to gestures of a user in a dynamic process, such as continuous motion or gesture changes of a hand. Through scene reconstruction, dynamic changes of the hand can be captured from multiple angles, and corresponding dynamic gestures are generated. In the embodiment of the application, the three-dimensional reconstruction algorithm of the multi-view image is preferentially adopted as the scene reconstruction algorithm. The algorithm utilizes images of multiple view angles, and performs three-dimensional reconstruction on a scene through technologies such as feature point detection, matching, sparse seed point cloud generation and the like. On the basis, measures such as initial value correction optimization of growth points under condition are added, accuracy of reconstruction is improved, and reconstruction errors caused by noise, holes and the like are reduced.

In one possible implementation manner, the scene reconstruction is performed on the first hand image and the second hand image to obtain a dynamic gesture of the user, which specifically includes: carrying out key point identification on the first hand image by adopting a preset detection algorithm to obtain a first key point coordinate; carrying out key point identification on the second hand image by adopting a preset detection algorithm to obtain a second key point coordinate; fusing the first key point coordinates and the second key point coordinates to obtain a target key point coordinate set; performing continuous target detection on the first hand image and the second hand image to obtain hand movement data of a user; and determining the dynamic gesture of the user by combining the target key point coordinates and the hand motion data.

In particular, a key point refers to a point in an image having a distinct feature, such as a node of a hand. Keypoint identification is a technique in computer vision for identifying and locating these feature points in an image. The preset detection algorithm refers to a predefined and trained algorithm for identifying key points in the image. The algorithm is based on machine learning or deep learning technology, and can accurately identify key points of a specific object after training. In this step, the AR glasses controller fuses the coordinates of the keypoints identified in the first hand image and the second hand image to obtain a target set of coordinates of the keypoints. This involves coordinate conversion and calibration to ensure that the keypoint data acquired from two different angles can accurately correspond. Continuous object detection refers to object detection of continuous image frames or videos to obtain more complete hand motion data. Through continuous target detection, the AR glasses controller can capture continuous motion and dynamic changes of the hand. Finally, the AR glasses controller combines the target keypoint coordinates and the hand motion data to determine the user's dynamic gesture. This involves analysis and interpretation of hand movements to recognize specific gestures or actions.

For example, assume that a user is using an AR lens for hand motion tracking and dynamic gesture recognition. The AR glasses simultaneously shoot hand images of a user through the built-in camera and the preset camera. First, the AR glasses controller performs key point recognition on the first hand image and the second hand image using a preset detection algorithm. For example, for hand images, the keypoints may include finger joints, palm, etc. Next, the controller fuses the keypoint coordinates in the first hand image and the second hand image to obtain a target set of keypoint coordinates. This step involves coordinate conversion and calibration to ensure accurate correspondence of the keypoint data at different viewing angles. The controller then performs successive object detection on the first hand image and the second hand image to obtain more complete hand movement data. Finally, the controller combines the target keypoint coordinates and the hand motion data to determine the user's dynamic gesture. For example, by analyzing the motion trajectory and speed change of the hand, a specific gesture or motion, such as a left-right movement, a finger touch, etc., can be recognized.

In one possible embodiment, the hand motion data includes hand motion angle, hand motion speed, degree of finger bending, and hand motion direction.

In particular, hand motion angle refers to the angle at which each knuckle of the hand is positioned in a particular gesture or motion. These angles may be used to determine the pose and motion state of the hand, such as when grabbing an object or performing a particular action. The hand movement speed refers to the speed or velocity of hand movement. By measuring the speed of hand movement, the cadence and dynamics of the user's gestures, such as fast swipes or slow rotations, etc., can be known. The degree of finger bending refers to the degree of bending of the finger joints. By monitoring the degree of bending of the finger, the fineness and strength of the user's gesture, such as the state when pinching or relaxing the finger, can be determined. The hand movement direction refers to the direction or path of hand movement. By analyzing the direction of motion of the hand, the direction change and path of the user's gesture can be known, such as sliding on a plane or circling in the air, etc.

S140, acquiring an initial three-dimensional model of the entity to be processed.

In particular, an entity to be processed refers to an object or entity, such as an object or scene, that needs to be processed or manipulated. In an augmented reality environment, the entity to be processed is typically a virtual model or image that is presented by requiring AR glasses. An initial three-dimensional model refers to a preliminary three-dimensional model or shape of the entity to be processed. This model is created based on the physical size, shape, and appearance of the entity for simulation, rendering, or processing in an augmented reality environment.

In one possible implementation manner, obtaining an initial three-dimensional model of the entity to be processed specifically includes: obtaining modeling data of an entity to be processed, wherein the modeling data comprises shape data, size data and texture data; and carrying out three-dimensional modeling on the entity to be processed according to the shape data, the size data and the texture data to obtain an initial three-dimensional model.

In particular, modeling data refers to data used to create a three-dimensional model, including shape data, size data, and texture data. Shape data describes the basic shape and structure of the object, size data provides the actual size and scale of the object, and texture data provides surface details and color information of the object. Shape data such as basic parameters of rectangular parallelepiped, sphere, cylinder, etc. These data may be obtained by measurement or from an existing three-dimensional model. Size data refers to data representing the actual size and dimensions of an object, which is derived from actual measurement or reference data of the entity. In three-dimensional modeling, accurate dimensional data is crucial to ensure that the model matches the real object. Texture data refers to data describing the surface details and colors of an object, such as texture, pattern, color, etc. Texture data is typically acquired through image acquisition or from existing image sources. Three-dimensional modeling refers to the process of creating a three-dimensional model from modeling data. This process is implemented by three-dimensional modeling software or techniques, such as 3D modeling, reverse engineering, etc.

For example, suppose a user is using an AR glasses for virtual office supplies display and configuration. The user can see a virtual mouse through the AR glasses and configure and adjust the virtual mouse according to own preference. In order to properly render this mouse in an augmented reality environment, the AR glasses controller needs to acquire an initial three-dimensional model of the mouse. First, the AR glasses controller acquires modeling data of the mouse including shape data, size data, and texture data in some way. The data may be collected and stored in advance in a database or acquired by scanning an actual mouse. Once this data is acquired, the controller may use 3D modeling software or techniques for three-dimensional modeling. Finally, the AR glasses controller will load and present the initial three-dimensional model in the augmented reality environment. The user can see the shape, size and texture details of the mouse and adjust and configure the mouse according to his own preference.

S150, identifying the operation of the dynamic gesture on the initial three-dimensional model to obtain a target three-dimensional model of the entity to be processed, wherein the target three-dimensional model is a three-dimensional model formed after the operation of the initial three-dimensional model by a user.

In particular, dynamic gesture operation recognition refers to recognition of dynamic gesture operations, such as gesture swipes, drags, zooms, etc., performed by a user on an initial three-dimensional model. This recognition is accomplished through computer vision and gesture recognition techniques, enabling the user to manipulate the three-dimensional model through gestures. The target three-dimensional model refers to a three-dimensional model formed after a user operates on the initial three-dimensional model. The target three-dimensional model reflects the modification and operation of the initial model by the user and is the result of the interactive operation of the user.

Thus, by capturing hand dynamics using an AR eye camera, a user can interact with the virtual environment in a more intuitive, natural manner. This not only improves the user experience, but also provides a more intuitive way of operation for the user. By combining an AR glasses camera and a preset camera, gestures can be captured from two different angles, which increases the accuracy and reliability of recognition. The multi-angle recognition mode can better process complex gestures or avoid recognition errors caused by hand shielding. This approach allows three-dimensional models to be constructed and modified in real-time in a dynamic environment. It greatly improves the efficiency of construction and modification compared to conventional methods. This approach has a high flexibility since it is based on dynamic gesture recognition. The user can adjust the size, shape or position of the model through gestures according to the requirements or originality of the user. This flexibility enables the method to accommodate a variety of different scenarios and requirements. Using this approach, the threshold for three-dimensional model construction can be lowered for non-professional users. Without specialized modeling skills or tools, a user can easily create and modify a three-dimensional model by gestures alone. Therefore, the AR glasses are used for carrying out dynamic gesture recognition on the interactive operation of the user on the three-dimensional model, the three-dimensional model after the interactive operation of the user is automatically generated, and the processing efficiency of the three-dimensional model is improved.

In one possible implementation manner, the operation of identifying the dynamic gesture to the initial three-dimensional model to obtain the target three-dimensional model of the entity to be processed specifically includes: acquiring a first static gesture, wherein the dynamic gesture consists of a plurality of static gestures, and the first static gesture is any one of the plurality of static gestures; inputting a first static gesture into a preset model to be matched to obtain a corresponding target static gesture, wherein the target static gesture is a static gesture which is stored in the preset model in advance aiming at hand motion and finger key point coordinates of a user; obtaining model processing operation corresponding to a target static gesture; and processing the target three-dimensional model of the entity to be processed according to the model processing operation.

In particular, dynamic gesture recognition is the recognition of continuous hand motions and actions of a user, consisting of a series of static gestures. A static gesture refers to a particular hand gesture or motion, while a dynamic gesture is a continuous change in a plurality of static gestures. The first static gesture is any one of the dynamic gestures. By recognizing and analyzing these static gestures, the user's intent and operation may be understood. The AR glasses controller inputs the first static gesture into a preset model for matching, and the purpose is to find a target static gesture corresponding to the static gesture. The preset model stores various static gestures and corresponding interpretations or operations thereof, for example, the user fingers are pinched, so that the three-dimensional model is reduced, otherwise, the user fingers are opened, and the three-dimensional model is enlarged. The AR glasses controller obtains corresponding model processing operations according to the target static gestures, and the operations are modification or adjustment of the three-dimensional model, such as movement, rotation, scaling and the like. And finally, the AR glasses controller carries out corresponding processing and updating on the target three-dimensional model according to the model processing operation. This involves making corresponding modifications to the model in accordance with the user's operations to reflect the user's intent and modifications.

In one possible embodiment, the method further comprises: performing real-time monitoring on the operation of the dynamic gesture aiming at the initial three-dimensional model; and displaying the dynamic gesture and the target three-dimensional model through the AR glasses lens.

Specifically, real-time monitoring and tracking of user gestures by the AR eyewear controller includes identifying, analyzing, and interpreting dynamic gestures to achieve a rapid response to gesture operations. Through the AR ophthalmic lenses, the system is able to present the target three-dimensional model to the user in real time. This means that the user can see the modified and updated three-dimensional model in the AR environment, thereby better understanding the effect of his own operations on the model. Not only the target three-dimensional model is displayed, but also the dynamic gesture of the user is displayed through the AR glasses lens. Such presentation may help the user better understand interactions between gestures and models, and how to influence the models by gesture manipulation. Thus, the user can directly observe and manipulate the three-dimensional model in the AR environment without requiring complicated input devices or commands. The interaction mode enables the user to explore and modify the model more freely, and flexibility and interestingness of operation are improved.

In one possible embodiment, the target three-dimensional model includes a plurality of cells, and the model processing operations include a cell addition operation, a cell modification and/or replacement operation, and a cell deletion operation.

Specifically, the target three-dimensional model is composed of a plurality of units. The unit may be a separate object, component or assembly or may be part of a larger structure. These units may be predefined or dynamically generated based on user operations. Model processing operations refer to operations performed on a target three-dimensional model for modifying, updating, or adjusting the morphology, structure, and properties of the model. The cell addition operation refers to an operation of adding a new cell or component to the target three-dimensional model. For example, a new room or facility is added to the building model. The unit modifying and/or replacing operation refers to an operation of modifying an attribute of an existing unit or replacing it with another unit. For example, the size, shape, or material of the room is modified in the building model, or replaced with other types of rooms. The cell deletion operation refers to an operation of deleting a specific cell from the target three-dimensional model. For example, an unnecessary room or structure is deleted from the building model.

The application further provides a three-dimensional model processing device based on dynamic gesture recognition, and referring to fig. 2, fig. 2 is a schematic block diagram of the three-dimensional model processing device based on dynamic gesture recognition according to an embodiment of the application. The three-dimensional model processing device is an AR (augmented reality) glasses controller, wherein the AR glasses controller comprises an acquisition module 21 and a processing module 22, and the acquisition module 21 acquires a first hand image shot by an AR glasses camera and aimed at a user; the acquisition module 21 acquires a second hand image shot by a preset camera aiming at a user, wherein the preset camera is positioned on the opposite face of the AR glasses camera; the processing module 22 performs scene reconstruction on the first hand image and the second hand image to obtain dynamic gestures of the user; the acquisition module 21 acquires an initial three-dimensional model of an entity to be processed; the processing module 22 identifies an operation of the dynamic gesture on the initial three-dimensional model to obtain a target three-dimensional model of the entity to be processed, where the target three-dimensional model is a three-dimensional model formed after the operation of the initial three-dimensional model by the user.

In one possible implementation, the processing module 22 performs scene reconstruction on the first hand image and the second hand image to obtain a dynamic gesture of the user, which specifically includes: the processing module 22 performs key point identification on the first hand image by adopting a preset detection algorithm to obtain a first key point coordinate; the processing module 22 performs key point identification on the second hand image by adopting a preset detection algorithm to obtain second key point coordinates; the processing module 22 fuses the first key point coordinates and the second key point coordinates to obtain a target key point coordinate set; the processing module 22 performs continuous target detection on the first hand image and the second hand image to obtain hand movement data of the user; the processing module 22 combines the target keypoint coordinates and the hand motion data to determine the dynamic gesture of the user.

In one possible implementation manner, the acquiring module 21 acquires an initial three-dimensional model of the entity to be processed, specifically including: the acquisition module 21 acquires modeling data of an entity to be processed, the modeling data including shape data, size data, and texture data; the processing module 22 performs three-dimensional modeling on the entity to be processed according to the shape data, the size data and the texture data, and obtains an initial three-dimensional model.

In one possible implementation, the processing module 22 identifies an operation of the dynamic gesture on the initial three-dimensional model to obtain a target three-dimensional model of the entity to be processed, specifically including: the acquisition module 21 acquires a first static gesture, wherein the dynamic gesture is composed of a plurality of static gestures, and the first static gesture is any one of the plurality of static gestures; the processing module 22 inputs the first static gesture into a preset model to be matched, so as to obtain a corresponding target static gesture, wherein the target static gesture is a static gesture which is stored in the preset model in advance aiming at hand motion and finger key point coordinates of a user; the acquisition module 21 acquires a model processing operation corresponding to the target static gesture; the processing module 22 processes the target three-dimensional model of the entity to be processed according to the model processing operation.

In one possible implementation, the processing module 22 monitors the operation of the dynamic gesture against the initial three-dimensional model in real time; the processing module 22 presents the dynamic gesture and the target three-dimensional model through an AR ophthalmic lens.

It should be noted that: in the device provided in the above embodiment, when implementing the functions thereof, only the division of the above functional modules is used as an example, in practical application, the above functional allocation may be implemented by different functional modules according to needs, that is, the internal structure of the device is divided into different functional modules, so as to implement all or part of the functions described above. In addition, the embodiments of the apparatus and the method provided in the foregoing embodiments belong to the same concept, and specific implementation processes of the embodiments of the method are detailed in the method embodiments, which are not repeated herein.

The application further provides an electronic device, and referring to fig. 3, fig. 3 is a schematic structural diagram of the electronic device according to an embodiment of the application. The electronic device may include: at least one processor 31, at least one network interface 34, a user interface 33, a memory 35, at least one communication bus 32.

Wherein the communication bus 32 is used to enable connected communication between these components.

The user interface 33 may include a Display screen (Display) and a Camera (Camera), and the optional user interface 33 may further include a standard wired interface and a standard wireless interface.

The network interface 34 may optionally include a standard wired interface, a wireless interface (e.g., WI-FI interface), among others.

Wherein the processor 31 may comprise one or more processing cores. The processor 31 connects various parts within the overall server using various interfaces and lines, performs various functions of the server and processes data by executing or executing instructions, programs, code sets, or instruction sets stored in the memory 35, and invoking data stored in the memory 35. Alternatively, the processor 31 may be implemented in at least one hardware form of digital signal Processing (DIGITAL SIGNAL Processing, DSP), field-Programmable gate array (Field-Programmable GATE ARRAY, FPGA), programmable logic array (Programmable Logic Array, PLA). The processor 31 may integrate one or a combination of several of a central processing unit (Central Processing Unit, CPU), an image processor (Graphics Processing Unit, GPU), a modem, etc. The CPU mainly processes an operating system, a user interface, an application program and the like; the GPU is used for rendering and drawing the content required to be displayed by the display screen; the modem is used to handle wireless communications. It will be appreciated that the modem may not be integrated into the processor 31 and may be implemented by a single chip.

The Memory 35 may include a random access Memory (Random Access Memory, RAM) or a Read-Only Memory (Read-Only Memory). Optionally, the memory 35 includes a non-transitory computer readable medium (non-transitory computer-readable storage medium). Memory 35 may be used to store instructions, programs, code sets, or instruction sets. The memory 35 may include a stored program area and a stored data area, wherein the stored program area may store instructions for implementing an operating system, instructions for at least one function (such as a touch function, a sound playing function, an image playing function, etc.), instructions for implementing the above-described respective method embodiments, etc.; the storage data area may store data or the like involved in the above respective method embodiments. The memory 35 may alternatively be at least one memory device located remotely from the aforementioned processor 31. As shown in fig. 3, an operating system, a network communication module, a user interface module, and an application program of a three-dimensional model processing method based on dynamic gesture recognition may be included in the memory 35 as a computer storage medium.

In the electronic device shown in fig. 3, the user interface 33 is mainly used for providing an input interface for a user, and acquiring data input by the user; and processor 31 may be configured to invoke an application program in memory 35 that stores a three-dimensional model processing method based on dynamic gesture recognition, which when executed by one or more processors, causes the electronic device to perform the method as in one or more of the embodiments described above.

It should be noted that, for simplicity of description, the foregoing method embodiments are all described as a series of acts, but it should be understood by those skilled in the art that the present application is not limited by the order of acts described, as some steps may be performed in other orders or concurrently in accordance with the present application. Further, those skilled in the art will also appreciate that the embodiments described in the specification are all of the preferred embodiments, and that the acts and modules referred to are not necessarily required for the present application.

The application also provides a computer readable storage medium storing instructions. When executed by one or more processors, cause an electronic device to perform the method as described in one or more of the embodiments above.

In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments.

In the several embodiments provided by the present application, it should be understood that the disclosed apparatus may be implemented in other ways. For example, the apparatus embodiments described above are merely illustrative, such as a division of units, merely a division of logic functions, and there may be additional divisions in actual implementation, such as multiple units or components may be combined or integrated into another system, or some features may be omitted, or not performed. Alternatively, the coupling or direct coupling or communication connection shown or discussed with each other may be through some service interface, device or unit indirect coupling or communication connection, electrical or otherwise.

The units described as separate units may or may not be physically separate, and units shown as units may or may not be physical units, may be located in one place, or may be distributed over a plurality of network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution of this embodiment.

In addition, each functional unit in the embodiments of the present application may be integrated in one processing unit, or each unit may exist alone physically, or two or more units may be integrated in one unit. The integrated units may be implemented in hardware or in software functional units.

The integrated units, if implemented in the form of software functional units and sold or used as stand-alone products, may be stored in a computer readable memory. Based on this understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art or in whole or in part in the form of a software product stored in a memory, comprising several instructions for causing a computer device (which may be a personal computer, a server or a network device, etc.) to perform all or part of the steps of the method of the various embodiments of the present application. And the aforementioned memory includes: various media capable of storing program codes, such as a U disk, a mobile hard disk, a magnetic disk or an optical disk.

The foregoing is merely exemplary embodiments of the present disclosure and is not intended to limit the scope of the present disclosure. That is, equivalent changes and modifications are contemplated by the teachings of this disclosure, which fall within the scope of the present disclosure. Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure. This application is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a scope and spirit of the disclosure being indicated by the claims.

Claims

1. A method for processing a three-dimensional model based on dynamic gesture recognition, the method comprising:

acquiring a first hand image, which is shot by an AR (augmented reality) glasses camera and aims at a user, of the user;

acquiring a second hand image shot by a preset camera aiming at the user, wherein the preset camera is positioned on the opposite surface of the AR glasses camera;

performing scene reconstruction on the first hand image and the second hand image to obtain dynamic gestures of the user;

acquiring an initial three-dimensional model of an entity to be processed;

and identifying the operation of the dynamic gesture on the initial three-dimensional model to obtain a target three-dimensional model of the entity to be processed, wherein the target three-dimensional model is a three-dimensional model formed after the operation of the user on the initial three-dimensional model.

2. The method for processing the three-dimensional model based on dynamic gesture recognition according to claim 1, wherein the performing scene reconstruction on the first hand image and the second hand image to obtain the dynamic gesture of the user specifically comprises:

performing key point identification on the first hand image by adopting a preset detection algorithm to obtain a first key point coordinate;

performing key point identification on the second hand image by adopting the preset detection algorithm to obtain a second key point coordinate;

fusing the first key point coordinates and the second key point coordinates to obtain a target key point coordinate set;

Performing continuous target detection on the first hand image and the second hand image to obtain hand motion data of the user;

And combining the target key point coordinates and the hand motion data to determine the dynamic gesture of the user.

3. The method for processing a three-dimensional model based on dynamic gesture recognition according to claim 1, wherein the obtaining an initial three-dimensional model of an entity to be processed specifically comprises:

Obtaining modeling data of an entity to be processed, wherein the modeling data comprises shape data, size data and texture data;

And carrying out three-dimensional modeling on the entity to be processed according to the shape data, the size data and the texture data to obtain the initial three-dimensional model.

4. The method for processing a three-dimensional model based on dynamic gesture recognition according to claim 1, wherein the operation of recognizing the dynamic gesture against the initial three-dimensional model to obtain the target three-dimensional model of the entity to be processed specifically comprises:

acquiring a first static gesture, wherein the dynamic gesture consists of a plurality of static gestures, and the first static gesture is any one of the plurality of static gestures;

Inputting the first static gesture into a preset model to be matched, and obtaining a corresponding target static gesture, wherein the target static gesture is a static gesture which is stored in the preset model in advance aiming at hand motion and finger key point coordinates of the user;

obtaining a model processing operation corresponding to the target static gesture;

and processing the target three-dimensional model of the entity to be processed according to the model processing operation.

5. The method for processing a three-dimensional model based on dynamic gesture recognition according to claim 1, wherein the method further comprises:

performing real-time monitoring on the operation of the dynamic gesture aiming at the initial three-dimensional model;

And displaying the dynamic gesture and the target three-dimensional model through an AR spectacle lens.

6. The method of claim 2, wherein the hand motion data includes hand motion angle, hand motion speed, finger bending degree, and hand motion direction.

7. The method of claim 4, wherein the target three-dimensional model comprises a plurality of cells, and wherein the model processing operations comprise a cell addition operation, a cell modification and/or replacement operation, and a cell deletion operation.

8. A three-dimensional model processing device based on dynamic gesture recognition, characterized in that the three-dimensional model processing device comprises an acquisition module (21) and a processing module (22), wherein,

The acquisition module (21) is used for acquiring a first hand image, which is shot by the AR glasses camera and is aimed at a user;

The acquisition module (21) is further used for acquiring a second hand image shot by a preset camera aiming at the user, and the preset camera is positioned on the opposite face of the AR glasses camera;

The processing module (22) is used for reconstructing a scene of the first hand image and the second hand image to obtain a dynamic gesture of the user;

The acquisition module (21) is further used for acquiring an initial three-dimensional model of the entity to be processed;

The processing module (22) is further configured to identify an operation of the dynamic gesture on the initial three-dimensional model, so as to obtain a target three-dimensional model of the entity to be processed, where the target three-dimensional model is a three-dimensional model formed after the user operates the initial three-dimensional model.

9. An electronic device, characterized in that the electronic device comprises a processor (31), a memory (35), a user interface (33) and a network interface (34), the memory (35) being adapted to store instructions, the user interface (33) and the network interface (34) being adapted to communicate to other devices, the processor (31) being adapted to execute the instructions stored in the memory (35) to cause the electronic device to perform the method according to any one of claims 1 to 7.

10. A computer readable storage medium storing instructions which, when executed, perform the method of any one of claims 1 to 7.