CN117111530A

CN117111530A - Intelligent control system and method for carrier through gestures

Info

Publication number: CN117111530A
Application number: CN202311257569.1A
Authority: CN
Inventors: 戴肖肖; 张汉章; 张建东; 蒋连杰; 李博文; 沈培彦; 祝鹏飞
Original assignee: Zhejiang Jialift Warehouse Equipment Co ltd
Current assignee: Zhejiang Jialift Warehouse Equipment Co ltd
Priority date: 2023-09-27
Filing date: 2023-09-27
Publication date: 2023-11-24
Anticipated expiration: 2043-09-27
Also published as: CN117111530B

Abstract

The application discloses an intelligent control system and method for a carrier through gestures, which are characterized in that gesture control videos of users are acquired in real time through a camera, and data processing and analysis algorithms are introduced into the rear end to analyze the gesture control videos, so that the intention of the users is judged according to the gesture actions of the users, the real-time identification and intelligent control of the gestures of the users are realized, and a more visual and efficient man-machine interaction mode is provided for the carrier and other equipment. Therefore, the user can control the movement, the steering, the stopping and other actions of the carrier through gestures under the condition of not contacting any equipment, and the more natural, more convenient and safer carrying operation is realized.

Description

Intelligent control system and method for carrier through gestures

Technical Field

The application relates to the technical field of intelligent control, in particular to an intelligent control system and method for a carrier through gestures.

Background

The carrier is a common logistics equipment and can be used for carrying various articles. At present, the control modes of the carrier mainly comprise a remote controller, buttons, a touch screen and the like, and the modes all require users to contact the carrier or other equipment, so that the control modes are not flexible and convenient. The gesture control is a natural interaction mode based on man-machine interaction, so that a user can control the equipment through natural gesture actions, and the use experience and efficiency of the user are improved. Accordingly, there is a need for a gesture-based intelligent control system for a cart that allows a user to control movement, steering, stopping, etc. of the cart by gestures without touching any equipment, thereby enabling a more natural, faster, and safer handling operation.

Disclosure of Invention

The embodiment of the application provides an intelligent control system and method for a carrier through gestures, which are characterized in that gesture control videos of users are acquired in real time through a camera, and data processing and analysis algorithms are introduced into the rear end to analyze the gesture control videos, so that the intention of the users is judged according to the gesture actions of the users, the real-time identification and intelligent control of the gestures of the users are realized, and a more visual and efficient man-machine interaction mode is provided for the carrier and other equipment. Therefore, the user can control the movement, the steering, the stopping and other actions of the carrier through gestures under the condition of not contacting any equipment, and the more natural, more convenient and safer carrying operation is realized.

The embodiment of the application also provides an intelligent control system of the carrier through gestures, which comprises the following steps:

the user gesture control video acquisition module is used for acquiring a user gesture control video acquired by the camera;

the video segmentation module is used for video segmentation of the user gesture control video to obtain a plurality of user gesture control video clips;

the gesture motion local semantic feature extraction module is used for extracting features of the plurality of user gesture control video segments through a gesture motion semantic feature extractor based on a deep neural network model so as to obtain a plurality of local gesture semantic features;

the gesture global semantic association coding module is used for carrying out global semantic association coding on the plurality of local gesture semantic features to obtain gesture semantic global context features;

the gesture type detection and control instruction generation module is used for determining type labels of user gestures and generating control instructions of the carrier based on the gesture semantic global context characteristics.

The embodiment of the application also provides an intelligent control method of the carrier through gestures, which comprises the following steps:

acquiring a user gesture control video acquired by a camera;

video segmentation is carried out on the user gesture control video to obtain a plurality of user gesture control video clips;

respectively extracting features of the gesture control video segments of the plurality of users through a gesture action semantic feature extractor based on a deep neural network model to obtain a plurality of local gesture semantic features;

performing global semantic association coding on the plurality of local gesture semantic features to obtain gesture semantic global context features;

and determining a type tag of the gesture of the user based on the gesture semantic global context characteristics, and generating a control instruction of the carrier.

Drawings

In order to more clearly illustrate the embodiments of the application or the technical solutions in the prior art, the drawings that are required in the embodiments or the description of the prior art will be briefly described, it being obvious that the drawings in the following description are only some embodiments of the application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art. In the drawings:

fig. 1 is a block diagram of a gesture-based intelligent control system for a truck according to an embodiment of the present application.

Fig. 2 is a flowchart of a method for intelligently controlling a carrier by gestures according to an embodiment of the present application.

Fig. 3 is a schematic diagram of a system architecture of a method for intelligent control of a carrier by gesture according to an embodiment of the present application.

Fig. 4 is an application scenario diagram of a gesture-based intelligent control system for a carrier, which is provided in an embodiment of the present application.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the embodiments of the present application will be described in further detail with reference to the accompanying drawings. The exemplary embodiments of the present application and their descriptions herein are for the purpose of explaining the present application, but are not to be construed as limiting the application.

Unless defined otherwise, all technical and scientific terms used in the embodiments of the application have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used in the present application is for the purpose of describing particular embodiments only and is not intended to limit the scope of the present application.

In describing embodiments of the present application, unless otherwise indicated and limited thereto, the term "connected" should be construed broadly, for example, it may be an electrical connection, or may be a communication between two elements, or may be a direct connection, or may be an indirect connection via an intermediate medium, and it will be understood by those skilled in the art that the specific meaning of the term may be interpreted according to circumstances.

It should be noted that, the term "first\second\third" related to the embodiment of the present application is merely to distinguish similar objects, and does not represent a specific order for the objects, it is to be understood that "first\second\third" may interchange a specific order or sequence where allowed. It is to be understood that the "first\second\third" distinguishing objects may be interchanged where appropriate such that embodiments of the application described herein may be practiced in sequences other than those illustrated or described herein.

The carrier is a common logistics equipment for carrying, transporting and stacking various articles, and is generally used in places such as warehouses, factories, ports, airports and the like, so that the logistics efficiency is improved and the labor intensity is reduced.

The design and function of the truck may vary according to specific needs, but generally has the following characteristics:

load carrying capacity: the carrier generally has higher bearing capacity and can carry heavy objects. They may be designed to be manually operated or equipped with an electric drive system to easily move the weights.

The operation mode is as follows: the operation modes of the carrier are various. One common way is a manual push-pull cart that is pushed and pulled by an operator to move items. Another way is an electric truck equipped with an electric drive system, which can control the movement and steering of the vehicle by means of levers or buttons.

And (3) carrier design: carrier designs for carts are also different. Some trucks are equipped with platforms or shelves for transporting goods. Other types of carts may have clamps, jaws or suction cups, etc. that are adapted for use in particular types of article handling.

The carrying mode is as follows: the carrier can carry out different carrying modes according to the needs. Some carrier possesses the play to rise the function, can be with article lifting to a take the altitude to stack or loading and unloading. Other types of trucks may be provided with swivel or tilt functions to accommodate specific handling requirements.

Safety: the truck is typically provided with safety protection devices to ensure the safety of the operator and the surrounding environment. For example, some trucks are equipped with brake systems, anti-skid devices, and anti-collision devices to reduce the risk of accidents.

Gesture control is a natural interaction mode based on man-machine interaction, and the device is controlled by recognizing and interpreting gesture actions of a user. Compared with the traditional control modes such as buttons, remote controllers and the like, gesture control is more visual and natural, and the use experience and efficiency of a user can be improved.

Gesture control utilizes the hand motions and gestures naturally possessed by humans to enable a user to interact with the device in a more natural manner, and the user only needs to express intent through gesture motions without learning complex button operations or memorizing various functional keys on a remote control. Gesture control does not require the user to touch the device or use external tools, and the user can manipulate the device through gesture actions in the air, which non-contact operation helps to improve sanitation and safety, particularly in certain environments such as medical treatment, food processing, and the like. Gesture control can improve the efficiency of use because a user can perform a quick operation through a direct gesture action, and the gesture control can complete a specific task more quickly than if a button or menu on a remote controller is turned over. Gesture control provides a kind of interactive experience that is in person for the user, through gesture control, the user can participate in the operation in-process more directly, feel the interaction with equipment, has strengthened user's participation sense and enjoyment. Gesture control can be applied to various devices and scenes, such as smart phones, smart televisions, game controllers, virtual reality devices and the like, can provide personalized interaction modes for different users, and is suitable for different use scenes and requirements.

The application provides an intelligent control system of a carrier through gestures, which can control the movement, steering, stopping and other actions of the carrier through gestures under the condition that a user does not contact any equipment, so that more natural, more convenient and safer carrying operation is realized.

In one embodiment of the present application, fig. 1 is a block diagram of a gesture-passing intelligent control system for a truck according to an embodiment of the present application. As shown in fig. 1, a cart intelligent control system 100 through gestures according to an embodiment of the present application includes: the user gesture control video acquisition module 110 is configured to acquire a user gesture control video acquired by the camera; the video slicing module 120 is configured to slice the video of the user gesture control video to obtain a plurality of user gesture control video segments; the gesture motion local semantic feature extraction module 130 is configured to perform feature extraction on the plurality of user gesture control video segments through a gesture motion semantic feature extractor based on a deep neural network model, so as to obtain a plurality of local gesture semantic features; the gesture global semantic association encoding module 140 is configured to perform global semantic association encoding on the plurality of local gesture semantic features to obtain gesture semantic global context features; the gesture type detection and control instruction generation module 150 is configured to determine a type tag of a gesture of a user based on the gesture semantic global context feature and generate a manipulation instruction of the carrier.

In the gesture control video capture module 110, good connection and communication are performed with the camera to ensure that the gesture control video of the user can be accurately obtained. And acquiring a user gesture control video to provide input data for subsequent gesture recognition and control.

In the video slicing module 120, the user gesture control video is sliced into a plurality of slices according to a specific algorithm or rule for independent feature extraction and analysis for each slice. Through video segmentation, the user gesture control video can be decomposed into smaller fragments, and finer gesture action analysis and recognition are provided.

In the gesture motion local semantic feature extraction module 130, a gesture motion semantic feature extractor based on a deep neural network model is used, and needs to be trained and optimized to extract effective gesture features. By extracting the local gesture semantic features of each user gesture control video segment, key information of gestures can be captured, and input is provided for subsequent global semantic association coding and gesture type detection.

In the gesture global semantic association encoding module 140, a suitable algorithm or model is designed to perform global semantic association encoding on a plurality of local gesture semantic features to capture overall context information of the gesture. Through global semantic association coding, a plurality of local gesture semantic features can be integrated to generate global context features of gesture semantics, and more comprehensive gesture information is provided for subsequent gesture type detection and control instruction generation.

In the gesture type detection and control instruction generation module 150, a suitable classification algorithm or model is designed based on the gesture semantic global context feature to determine the type label of the gesture of the user, and generate a corresponding truck control instruction. Through gesture type detection and control instruction generation, the gesture type of a user can be accurately identified, and a corresponding control instruction is generated, so that accurate control and operation of the carrier are realized.

The method realizes the identification and control of the gestures of the user through the steps of video acquisition, segmentation, feature extraction, global association coding, type detection and the like. So as to ensure that the system can accurately understand gesture intention of the user and generate corresponding control instructions.

Aiming at the technical problems, the technical concept of the application is that the gesture control video of the user is acquired in real time through the camera, and the analysis of the gesture control video is carried out by introducing a data processing and analysis algorithm at the rear end, so that the intention of the user is judged according to the gesture action of the user, thereby realizing the real-time identification and intelligent control of the gesture of the user and providing a more visual and efficient man-machine interaction mode for equipment such as a carrier and the like. Therefore, the user can control the movement, the steering, the stopping and other actions of the carrier through gestures under the condition of not contacting any equipment, and the more natural, more convenient and safer carrying operation is realized.

Specifically, in the technical scheme of the application, firstly, a user gesture control video acquired by a camera is acquired. Then, considering that the user gesture control video may be long and a plurality of gesture control instructions may exist in the video, in order to avoid the interference of continuous gestures for a long time, in the technical scheme of the application, the user gesture control video is further subjected to video slicing to obtain a plurality of user gesture control video segments. In this way, each video clip can be processed independently, so that the model can capture key features of the gesture more easily, and the accuracy of gesture recognition is improved. Moreover, by segmenting the video, more different gesture samples can be obtained. Each segment represents an independent gesture, so that the diversity of training data can be increased, and the generalization capability of the model can be improved.

In one embodiment of the present application, the gesture motion local semantic feature extraction module includes: the local gesture semantic feature analysis unit is used for enabling the plurality of user gesture control video segments to respectively pass through a gesture action semantic feature extractor based on a three-dimensional convolutional neural network model so as to obtain a plurality of local gesture semantic feature graphs; the local gesture semantic space feature visualization unit is used for performing space feature visualization processing on the plurality of local gesture semantic feature graphs respectively to obtain a plurality of space-visualized local gesture semantic feature graphs; the local gesture semantic full-perception unit is used for respectively carrying out gesture semantic full-perception processing on the plurality of spatial display local gesture semantic feature graphs to obtain a plurality of spatial display local gesture semantic feature vectors as the plurality of local gesture semantic features.

By using a gesture action semantic feature extractor based on a three-dimensional convolutional neural network model, training and optimization are needed to extract an effective local gesture semantic feature map. By extracting the partial gesture semantic feature graphs of the plurality of user gesture control video clips, the partial action information of the gestures can be captured, and input is provided for subsequent spatial feature visualization and full perception processing.

Spatial feature visualization of multiple local gesture semantic feature graphs may involve algorithms or techniques for image processing and feature enhancement. Through the spatial feature visualization processing, key information in the partial gesture semantic feature map can be highlighted, so that the partial gesture semantic feature map is more obvious and easier to analyze, and the accuracy and the reliability of subsequent processing are improved.

The gesture semantic full-perception processing is performed on the spatial-visualization local gesture semantic feature map, and algorithms or technologies such as feature fusion, context modeling and the like can be involved. Through gesture semantic full-perception processing, a local gesture semantic feature map can be comprehensively and comprehensively analyzed, higher-level semantic information of gestures is captured, and input is provided for subsequent feature vector generation and comprehensive feature extraction.

The units together form an analysis process of the semantic features of the local gestures, and the gesture control video segments of the users are converted into the semantic features of the local gestures with more clear and rich semantic information through feature extraction, spatial feature display and full perception processing. To ensure that the system is able to accurately analyze and understand the local features of the gesture and to provide useful input for subsequent processing steps.

Then, considering that all the user gesture control video segments are three-dimensional data and the channel dimension is the time dimension, in order to perform local semantic understanding based on the video segments on the gestures of the user, in the technical scheme of the application, the feature mining is performed on the plurality of user gesture control video segments through the gesture action semantic feature extractor based on the three-dimensional convolutional neural network model, so that the local semantic understanding feature information related to the user gesture control in each user gesture control video segment is extracted, and a plurality of local gesture semantic feature graphs are obtained.

In one embodiment of the present application, the local gesture semantic spatial feature visualization unit is configured to: and respectively passing the plurality of local gesture semantic feature graphs through a spatial attention module to obtain the plurality of spatially-visualized local gesture semantic feature graphs.

In the gesture recognition task, different parts of the gesture have different importance, and some parts contain more information about the gesture type. Therefore, in the technical scheme of the application, the plurality of local gesture semantic feature graphs are further respectively passed through the spatial attention module to obtain a plurality of spatially-visualized local gesture semantic feature graphs. It should be appreciated that by applying a spatial attention mechanism, the model may be focused on important parts while processing gesture features, and non-important parts may be ignored, thereby improving gesture recognition performance. That is, since the spatial attention module may adaptively adjust the weight of each feature map according to the spatial distribution and importance of gesture features. Thus, the model can concentrate on the area with the most information in the gesture, enhance the representation capability of key features and reduce the interference of irrelevant information.

In one embodiment of the present application, the local gesture semantic full perception unit is configured to: and respectively passing the plurality of space-display local gesture semantic feature graphs through a gesture semantic full-perception module based on a full connection layer to obtain a plurality of space-display local gesture semantic feature vectors.

The spatial attention module is a technique for enhancing information of specific regions in an image, and may highlight regions of interest by adaptively assigning weights, thereby improving the attention and accuracy of the model to those regions. In the processing of the local gesture semantic features, the spatial attention module may be configured to perform spatial feature visualization processing on the plurality of local gesture semantic feature graphs to obtain a plurality of spatially-visualized local gesture semantic feature graphs.

The spatial attention module is generally composed of several main components: a feature extractor extracts features from the input local gesture semantic feature map using Convolutional Neural Networks (CNNs) or other feature extraction models. By learning the weight or the attention profile, it is decided which regions should be characterized, wherein the attention mechanisms include self-attention mechanism (self-attention) and spatial attention mechanism (spatial attention). The features extracted by the feature extractor are multiplied or added with the attention weights to obtain a weighted or salient feature representation, which may make the model more focused on the region of interest. After feature fusion, the obtained feature map can highlight and emphasize the interested region more easily to be used by subsequent processing steps.

Through the spatial attention module, the spatial feature visualization processing of the local gesture semantic feature map can be realized. The module can automatically learn and adjust the weights of different areas in the feature map so that areas with important information or key features are focused and highlighted more. This helps to improve accuracy and robustness of gesture recognition and understanding, enabling the model to better utilize spatially local feature information.

Furthermore, in the technical scheme of the application, the feature mining is further performed on the plurality of spatial visualization local gesture semantic feature graphs in the gesture semantic full-perception module based on the full-connection layer respectively so as to extract full-perception feature information about the local semantics of the user gesture in each user gesture control video segment respectively, thereby obtaining a plurality of spatial visualization local gesture semantic feature vectors.

In one embodiment of the present application, the gesture global semantic association encoding module is configured to: and enabling the plurality of spatial display local gesture semantic feature vectors to pass through a context semantic encoder to obtain gesture semantic global context feature vectors serving as the gesture semantic global context features.

The context semantic encoder can encode the local gesture semantic feature vector in a global scope, and the global semantic meaning and relation of the gesture action are captured by considering the overall context information of the gesture action, so that more comprehensive and accurate feature representation is provided. By encoding a plurality of spatially-visualized local gesture semantic feature vectors, semantic consistency modeling of gesture actions can be achieved, meaning that similar or related gesture actions have similar representations in feature space, thereby improving the understanding ability of the model for similarity and relevance between gestures. The context semantic encoder can fuse a plurality of local feature vectors to generate a comprehensive gesture semantic global context feature vector, and the feature fusion can help to eliminate redundant information among the local features and extract more representative and distinguishing global features. By introducing global context information, the gesture semantic global context feature vector can provide more semantic information, including overall action patterns of gestures, target intentions, and the like, which enables the model to better understand gesture intentions of users and generate corresponding manipulation instructions.

By encoding a plurality of spatially-visualized local gesture semantic feature vectors into gesture semantic global context feature vectors by a context semantic encoder, a more comprehensive, accurate and semantically-rich gesture feature representation may be provided. This helps to improve the performance of gesture recognition and understanding and provides more reliable and efficient feature input for subsequent gesture control tasks.

Then, in order to be able to draw out the semantic feature information of the user gesture in the whole user gesture control video, so as to fully understand the gesture semantics of the user, the plurality of spatially-visualized local gesture semantic feature vectors need to be encoded in a context semantic encoder, so as to extract each local semantic understanding feature of the user gesture based on global context semantic association feature information, thereby obtaining gesture semantic global context feature vectors.

In one embodiment of the present application, the gesture type detection and control instruction generation module includes: the gesture type detection unit is used for enabling the gesture semantic global context feature vector to pass through a classifier to obtain a classification result, wherein the classification result is used for representing type labels of gestures of a user; and a carrier manipulation instruction generation unit configured to generate a manipulation instruction of the carrier based on the classification result.

Wherein, gesture type detection element includes: the full-connection coding subunit is used for carrying out full-connection coding on the gesture semantic global context feature vector by using a plurality of full-connection layers of the classifier so as to obtain a coding classification feature vector; and the classification subunit is used for passing the coding classification feature vector through a Softmax classification function of the classifier to obtain the classification result.

The gesture type detection unit can accurately identify the gesture type of the user by classifying the gesture semantic global context feature vector, and is beneficial to correlating the gesture action with a specific control instruction or intention, so that more accurate and precise gesture control is realized. By identifying and classifying gesture types, the gesture type detection unit can help the system understand the intent and needs of the user. Different gesture types may correspond to different operations or instructions, so the system may respond better to the user's expectations through detection of the gesture type.

The carrier control command generating unit generates corresponding carrier control commands according to the classification result output by the gesture type detecting unit, so that the carrier can perform corresponding operations, such as advancing, retreating, turning, lifting, putting down and the like, according to gesture actions of a user, and accurate control and operation of the carrier can be realized by generating accurate control commands. The carrier control instruction generating unit can generate corresponding control instructions in real time according to gesture actions of a user. The method and the device enable the carrier to rapidly respond to the instruction of the user, realize real-time control and action execution, and improve the operation efficiency and the user experience of the carrier through real-time response.

And then, the gesture semantic global context feature vector passes through a classifier to obtain a classification result, wherein the classification result is used for representing the type label of the gesture of the user. That is, the classification processing is performed by using the global context associated feature of the user gesture semantics, so that the type of the user gesture is judged, and the control instruction of the carrier is generated based on the type label of the user gesture. Therefore, the intention of the user can be judged according to the gesture action of the user, so that the real-time identification and intelligent control of the gesture of the user are realized, and a more visual and efficient man-machine interaction mode is provided for equipment such as a carrier and the like.

In one embodiment of the application, the intelligent control system of the carrier through gestures further comprises a training module for training the gesture action semantic feature extractor based on the three-dimensional convolutional neural network model, the spatial attention module, the gesture semantic full perception module based on the full connection layer, the context semantic encoder and the classifier. The training module comprises: the training data acquisition unit is used for acquiring training data, wherein the training data comprises a training user gesture control video and a true value of a type label of the user gesture; the training video segmentation unit is used for video segmentation of the training user gesture control video to obtain a plurality of training user gesture control video clips; the training gesture motion local semantic feature extraction unit is used for enabling the plurality of training user gesture control video clips to respectively pass through the gesture motion semantic feature extractor based on the three-dimensional convolutional neural network model so as to obtain a plurality of training local gesture semantic feature graphs; the training gesture motion local semantic space visualization unit is used for enabling the training local gesture semantic feature images to pass through the space attention module respectively to obtain training space visualization local gesture semantic feature images; the training gesture motion local semantic full-perception unit is used for enabling the plurality of training space display local gesture semantic feature images to respectively pass through the gesture semantic full-perception module based on the full-connection layer so as to obtain a plurality of training space display local gesture semantic feature vectors; the training gesture global semantic association coding unit is used for enabling the plurality of training space display local gesture semantic feature vectors to pass through the context semantic encoder to obtain training gesture semantic global context feature vectors; the classification loss unit is used for enabling the training gesture semantic global context feature vector to pass through the classifier to obtain a classification loss function value; the model training unit is used for training the gesture action semantic feature extractor, the spatial attention module, the gesture semantic full perception module, the context semantic encoder and the classifier based on the three-dimensional convolutional neural network model based on the classification loss function value and through gradient descent direction propagation, wherein the training gesture semantic global context feature vector is subjected to weight space iterative recursive directional proposal optimization during each iteration of training.

In particular, in the technical scheme of the application, after the plurality of training user gesture control video segments pass through gesture action semantic feature extractors based on three-dimensional convolutional neural network models respectively, the obtained plurality of training local gesture semantic feature graphs can express time-sequence-related image semantic features under local time domains corresponding to the video segments, so that the spatial distribution of the image semantic features is enhanced through a spatial attention module, the plurality of training space-visualization local gesture semantic feature graphs pass through gesture semantic full-perception modules based on full connection layers respectively to obtain a plurality of training space-visualization local gesture semantic feature vectors, and after the plurality of training space-visualization local gesture semantic feature vectors pass through context semantic encoders to obtain training gesture semantic global context feature vectors, the image semantic feature vectors are further extracted to be associated with local time-domain context Wen Shixu under global time domains, so that the training gesture semantic global context feature vectors obtain image semantic feature representations respectively reinforced in time and space dimensions, but the training gesture global context feature vectors are enabled to be classified into a classification matrix with difficulty in relation to the time-domain and space-domain, and a classification matrix is difficult to influence the training classifier.

Therefore, when classifying the training gesture semantic global context feature vector by the classifier, the applicant performs weight space iterative recursive directed proposal optimization on the training gesture semantic global context feature vector at each iteration, and specifically comprises the following steps: performing weight space iterative recursive directed proposal optimization on the training gesture semantic global context feature vector by using the following optimization formula;

wherein, the optimization formula is:

wherein M is ₁ And M ₂ The weight matrix of the previous iteration and the current iteration are respectively adopted, wherein, during the first iteration, M is set by adopting different initialization strategies ₁ And M ₂ (e.g., M ₁ Set as a unitary matrix and M ₂ Set as the average diagonal matrix of the feature vectors to be classified), V _c Is the training gesture semantic global context feature vector,is an optimization training gesture semantic global context feature vector, exp (·) represents the exponential operation of the vector that represents computing natural exponential function values that are exponentially raised to the feature values of various locations in the vector>Represents matrix multiplication, +.represents dot-by-dot multiplication and dot-by-dot addition.

Here, the weighted spatial iterative recursive directed proposed optimization may be performed by combining the initial training gesture semantic global context feature vector V to be classified _c As anchor points, to iterate in weight space based on weight matrix corresponding to training gesture semantic global context feature vector V _c Anchor footprints (anchor footprints) under different time-space feature distribution dimensions are obtained as directional proposals (oriented proposal) iterating recursion in the weight space, so that class confidence and local accuracy of weight matrix convergence are improved based on prediction proposals, and training effect of training gesture semantic global context feature vectors through a classifier is improved. Therefore, the intention of the user can be judged according to the gesture action of the user, so that the real-time identification and intelligent control of the gesture of the user are realized, a more visual and efficient man-machine interaction mode is provided for equipment such as a carrier, and the movement, the steering, the stopping and other actions of the carrier can be controlled through the gesture under the condition that the user does not touch any equipment, so that the more natural, more convenient and safer carrying operation is realized.

In summary, the intelligent control system 100 for a cart through gestures according to the embodiments of the present application is illustrated, which enables a user to control actions such as movement, turning, stopping, etc. of the cart through gestures without touching any equipment, thereby realizing a more natural, more convenient and safer handling operation.

As described above, the cart intelligent control system 100 through gestures according to the embodiment of the present application may be implemented in various terminal devices, such as a server or the like for cart intelligent control through gestures. In one example, the gestured cart intelligent control system 100 in accordance with embodiments of the application may be integrated into the terminal device as a software module and/or hardware module. For example, the gestured cart intelligent control system 100 may be a software module in the operating system of the terminal device, or may be an application developed for the terminal device; of course, the gesture-passing cart intelligent control system 100 may also be one of a plurality of hardware modules of the terminal device.

Alternatively, in another example, the gestured cart intelligent control system 100 and the terminal device may be separate devices, and the gestured cart intelligent control system 100 may be connected to the terminal device via a wired and/or wireless network and communicate interactive information in a agreed data format.

Fig. 2 is a flowchart of a method for intelligently controlling a carrier by gestures according to an embodiment of the present application. Fig. 3 is a schematic diagram of a system architecture of a method for intelligent control of a carrier by gesture according to an embodiment of the present application. As shown in fig. 2 and 3, a method for intelligently controlling a carrier by gestures includes: 210, acquiring a user gesture control video acquired by a camera; 220, video segmentation is carried out on the user gesture control video to obtain a plurality of user gesture control video clips; 230, respectively extracting features of the plurality of user gesture control video segments through a gesture action semantic feature extractor based on the deep neural network model to obtain a plurality of local gesture semantic features; 240, performing global semantic association coding on the plurality of local gesture semantic features to obtain gesture semantic global context features; 250, determining a type tag of the gesture of the user and generating a control instruction of the carrier based on the gesture semantic global context feature.

It will be appreciated by those skilled in the art that the specific operation of the steps in the above-described gesture-based cart intelligent control method has been described in detail in the above description of the gesture-based cart intelligent control system with reference to fig. 1, and thus, a repetitive description thereof will be omitted.

Fig. 4 is an application scenario diagram of a gesture-based intelligent control system for a carrier, which is provided in an embodiment of the present application. As shown in fig. 4, in the application scenario, first, a user gesture control video acquired by a camera is acquired (e.g., C as illustrated in fig. 4); the acquired user gesture control video is then input into a server (e.g., S as illustrated in fig. 4) deployed with a gestured cart intelligent control algorithm, where the server is capable of processing the user gesture control video based on the gestured cart intelligent control algorithm to determine a type tag of the user gesture and generate a manipulation instruction for the cart.

The foregoing description of the embodiments has been provided for the purpose of illustrating the general principles of the application, and is not meant to limit the scope of the application, but to limit the application to the particular embodiments, and any modifications, equivalents, improvements, etc. that fall within the spirit and principles of the application are intended to be included within the scope of the application.

Claims

1. The utility model provides a carrier intelligent control system through gesture which characterized in that includes:

2. The intelligent control system for a cart through gestures according to claim 1, wherein the gesture motion local semantic feature extraction module comprises:

the local gesture semantic feature analysis unit is used for enabling the plurality of user gesture control video segments to respectively pass through a gesture action semantic feature extractor based on a three-dimensional convolutional neural network model so as to obtain a plurality of local gesture semantic feature graphs;

the local gesture semantic space feature visualization unit is used for performing space feature visualization processing on the plurality of local gesture semantic feature graphs respectively to obtain a plurality of space-visualized local gesture semantic feature graphs;

the local gesture semantic full-perception unit is used for respectively carrying out gesture semantic full-perception processing on the plurality of spatial display local gesture semantic feature graphs to obtain a plurality of spatial display local gesture semantic feature vectors as the plurality of local gesture semantic features.

3. The intelligent control system of a vehicle according to claim 2, wherein the local gesture semantic spatial feature visualization unit is configured to: and respectively passing the plurality of local gesture semantic feature graphs through a spatial attention module to obtain the plurality of spatially-visualized local gesture semantic feature graphs.

4. The intelligent control system for a cart according to claim 3, wherein the local gesture semantic full perception unit is configured to: and respectively passing the plurality of space-display local gesture semantic feature graphs through a gesture semantic full-perception module based on a full connection layer to obtain a plurality of space-display local gesture semantic feature vectors.

5. The intelligent control system of a cart through gestures of claim 4, wherein the gesture global semantic association encoding module is configured to: and enabling the plurality of spatial display local gesture semantic feature vectors to pass through a context semantic encoder to obtain gesture semantic global context feature vectors serving as the gesture semantic global context features.

6. The intelligent control system for a cart according to claim 5, wherein the gesture type detection and control instruction generation module comprises:

the gesture type detection unit is used for enabling the gesture semantic global context feature vector to pass through a classifier to obtain a classification result, wherein the classification result is used for representing type labels of gestures of a user; and

and the carrier control instruction generating unit is used for generating a control instruction of the carrier based on the classification result.

7. The intelligent control system for a cart according to claim 6, wherein the gesture type detection unit includes:

the full-connection coding subunit is used for carrying out full-connection coding on the gesture semantic global context feature vector by using a plurality of full-connection layers of the classifier so as to obtain a coding classification feature vector; and

and the classification subunit is used for passing the coding classification feature vector through a Softmax classification function of the classifier to obtain the classification result.

8. The intelligent control system of a cart for traversing gestures according to claim 7, further comprising a training module for training the three-dimensional convolutional neural network model-based gesture motion semantic feature extractor, the spatial attention module, the full-connectivity layer-based gesture semantic full perception module, the contextual semantic encoder, and the classifier.

9. The intelligent control system of a cart for passing gestures of claim 8, wherein the training module comprises:

the training data acquisition unit is used for acquiring training data, wherein the training data comprises a training user gesture control video and a true value of a type label of the user gesture;

the training video segmentation unit is used for video segmentation of the training user gesture control video to obtain a plurality of training user gesture control video clips;

the training gesture motion local semantic feature extraction unit is used for enabling the plurality of training user gesture control video clips to respectively pass through the gesture motion semantic feature extractor based on the three-dimensional convolutional neural network model so as to obtain a plurality of training local gesture semantic feature graphs;

the training gesture motion local semantic space visualization unit is used for enabling the training local gesture semantic feature images to pass through the space attention module respectively to obtain training space visualization local gesture semantic feature images;

the training gesture motion local semantic full-perception unit is used for enabling the plurality of training space display local gesture semantic feature images to respectively pass through the gesture semantic full-perception module based on the full-connection layer so as to obtain a plurality of training space display local gesture semantic feature vectors;

the training gesture global semantic association coding unit is used for enabling the plurality of training space display local gesture semantic feature vectors to pass through the context semantic encoder to obtain training gesture semantic global context feature vectors;

the classification loss unit is used for enabling the training gesture semantic global context feature vector to pass through the classifier to obtain a classification loss function value;

the model training unit is used for training the gesture action semantic feature extractor, the spatial attention module, the gesture semantic full perception module, the context semantic encoder and the classifier based on the three-dimensional convolutional neural network model based on the classification loss function value and through gradient descent direction propagation, wherein the training gesture semantic global context feature vector is subjected to weight space iterative recursive directional proposal optimization during each iteration of training.

10. The intelligent control method for the carrier through the gestures is characterized by comprising the following steps of:

acquiring a user gesture control video acquired by a camera;