WO2022099685A1 - 用于手势识别的数据增强方法、装置、计算机设备及存储介质 - Google Patents

用于手势识别的数据增强方法、装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2022099685A1
WO2022099685A1 PCT/CN2020/129017 CN2020129017W WO2022099685A1 WO 2022099685 A1 WO2022099685 A1 WO 2022099685A1 CN 2020129017 W CN2020129017 W CN 2020129017W WO 2022099685 A1 WO2022099685 A1 WO 2022099685A1
Authority
WO
WIPO (PCT)
Prior art keywords
gesture
video
image
data
augmented
Prior art date
Application number
PCT/CN2020/129017
Other languages
English (en)
French (fr)
Inventor
邵池
程骏
郭渺辰
庞建新
Original Assignee
深圳市优必选科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市优必选科技股份有限公司 filed Critical 深圳市优必选科技股份有限公司
Priority to PCT/CN2020/129017 priority Critical patent/WO2022099685A1/zh
Publication of WO2022099685A1 publication Critical patent/WO2022099685A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition

Definitions

  • the present application relates to the field of computer technologies, and in particular, to a data enhancement method, apparatus, computer device and storage medium for gesture recognition.
  • Gesture is a natural form of communication between humans, and gesture recognition is also one of the important research directions of human-computer interaction.
  • the quality and quantity of training data play a very important role in the final gesture recognition results.
  • the collected data may not be comprehensive enough.
  • most people habitually perform gestures with their right hands so that the final data set contains The gesture data of the right hand accounts for a high proportion of the total data, while the gesture data performed with the left hand accounts for a low proportion.
  • a data augmentation method can be used to enrich the training data, thereby improving the accuracy of gesture recognition.
  • a data augmentation method for gesture recognition comprising:
  • the first gesture video data includes: a first gesture video and a label corresponding to the first gesture video;
  • Both the first gesture video data and the second gesture video data are used as training data for the gesture recognition model.
  • a data enhancement device for gesture recognition comprising:
  • a first acquisition module configured to acquire first gesture video data, where the first gesture video data includes: a first gesture video and a label corresponding to the first gesture video;
  • a flipping module configured to perform horizontal mirror flipping of the gesture image corresponding to each video frame in the first gesture video to obtain a second gesture video
  • a determining module configured to determine the label corresponding to the second gesture video according to the label corresponding to the first gesture video
  • an association module configured to associate and store the labels corresponding to the second gesture video and the second gesture video to obtain second gesture video data
  • a training module configured to use the first gesture video data and the second gesture video data together as training data for a gesture recognition model.
  • a computer device includes a memory and a processor, the memory stores a computer program, and when the computer program is executed by the processor, the processor performs the following steps:
  • the first gesture video data includes: a first gesture video and a label corresponding to the first gesture video;
  • Both the first gesture video data and the second gesture video data are used as training data for the gesture recognition model.
  • a computer-readable storage medium storing a computer program, when executed by a processor, the computer program causes the processor to perform the following steps:
  • the first gesture video data includes: a first gesture video and a label corresponding to the first gesture video;
  • Both the first gesture video data and the second gesture video data are used as training data for the gesture recognition model.
  • the second gesture video is obtained by horizontally mirroring and flipping the video frame image in the first gesture video, and the second gesture video is determined according to the label of the first gesture video.
  • the label corresponding to the gesture video, the second gesture video and the label corresponding to the second gesture video are associated and stored to obtain the second video gesture data, and both the first gesture video data and the second gesture video data are used as gesture recognition
  • the training data for the model is obtained according to the first gesture video data, and then the first gesture video data and the second gesture video data are used together as training data, which can make the training data more comprehensive, thereby making the training data more comprehensive.
  • the trained model can not only accurately predict the first gesture, but also accurately predict the second gesture.
  • a data augmentation method for gesture recognition comprising:
  • first gesture image data includes: a first gesture image and a label corresponding to the first gesture image
  • Both the first gesture image data and the second gesture image data are used as training data for the gesture recognition model.
  • a data enhancement device for gesture recognition comprising:
  • an image acquisition module configured to acquire first gesture image data, where the first gesture image data includes: a first gesture image and a label corresponding to the first gesture image;
  • an image flipping module configured to perform horizontal mirror flipping of the first gesture image to obtain a second gesture image
  • a label determination module configured to determine the label corresponding to the second gesture image according to the label corresponding to the first gesture image
  • an image label association module configured to associate and store the second gesture image and the label corresponding to the second gesture image to obtain second gesture image data
  • a model data module configured to use both the first gesture image data and the second gesture image data as training data for a gesture recognition model.
  • a computer device includes a memory and a processor, the memory stores a computer program, and when the computer program is executed by the processor, the processor performs the following steps:
  • first gesture image data includes: a first gesture image and a label corresponding to the first gesture image
  • Both the first gesture image data and the second gesture image data are used as training data for the gesture recognition model.
  • a computer-readable storage medium storing a computer program, when executed by a processor, the computer program causes the processor to perform the following steps:
  • first gesture image data includes: a first gesture image and a label corresponding to the first gesture image
  • Both the first gesture image data and the second gesture image data are used as training data for the gesture recognition model.
  • the above data enhancement method, device, computer equipment and storage medium for gesture recognition by performing horizontal mirror flipping of the first gesture image to obtain the second gesture image, and determining the corresponding position of the second gesture image according to the label of the first gesture image.
  • label, the second gesture image and the label corresponding to the second gesture image are associated and stored to obtain second gesture data, and both the first gesture data and the second gesture data are used as training data of the gesture recognition model.
  • the second gesture data is obtained according to the first gesture data, and then the first gesture data and the second gesture data are used together as training data, which can make the training data more comprehensive, thereby making the model obtained by training more comprehensive. Not only can the first gesture be accurately predicted, but also the second gesture can be accurately predicted.
  • FIG. 1 is a flowchart of a data augmentation method for gesture recognition in one embodiment
  • FIG. 2 is a schematic diagram of a gesture video frame image in one embodiment
  • FIG. 3 is a flowchart of a data enhancement method for gesture recognition in another embodiment
  • FIG. 4 is a schematic diagram before and after edge amplification is performed on a video frame image in one embodiment
  • FIG. 5 is a flowchart of a data enhancement method for gesture recognition in yet another embodiment
  • FIG. 6 is a flowchart of a data enhancement method for gesture recognition in yet another embodiment
  • FIG. 7 is a structural block diagram of a data enhancement apparatus for dynamic gesture recognition in one embodiment
  • FIG. 8 is a structural block diagram of a data enhancement apparatus for dynamic gesture recognition in another embodiment
  • FIG. 9 is a structural block diagram of a data enhancement apparatus for dynamic gesture recognition in yet another embodiment.
  • Figure 10 is a diagram of the internal structure of a computer device in one embodiment.
  • a data enhancement method for gesture recognition is proposed, and the data enhancement method for gesture recognition can be applied to a terminal or a server.
  • the application to a terminal is used as an example for illustration.
  • the data enhancement method for gesture recognition specifically includes the following steps:
  • Step 102 Obtain first gesture video data, where the first gesture video data includes: the first gesture video and a label corresponding to the first gesture video.
  • the first gesture video data is right hand gesture video data. Since most people are accustomed to using the right hand, when collecting dynamic gesture data, the gesture data of the right hand is much more than the gesture data of the left hand. However, if the collection of the gesture data of the left hand is performed again, a large project is required, which is time-consuming and labor-intensive. Based on this, a data enhancement method for dynamic gesture recognition is proposed. The method does not need to re-collect the data, but only needs to enhance the existing data.
  • Each gesture video has a corresponding label.
  • the gesture label is: hand swinging left or hand moving up, etc.
  • Each gesture video is composed of multiple video frame images.
  • Step 104 Perform horizontal mirror flipping of each video frame image in the first gesture video to obtain a second gesture video.
  • FIG. 2 it is a schematic diagram of a gesture video frame image.
  • the three pictures in part 2a in FIG. 2 are right hand gesture video frame images, and the three pictures in part 2b in FIG. 2 are respectively left hand gesture images obtained after horizontal mirror flipping of the right hand gesture video frame image.
  • the left-hand gesture video is obtained by combining the video frames after horizontal mirror flipping in the original order.
  • Step 106 Determine the label corresponding to the second gesture video according to the label corresponding to the first gesture video.
  • the corresponding label needs to be adaptively changed.
  • the label corresponding to the first gesture video is: the ring finger and the index finger of the hand swing to the left at the same time, then convert to get The corresponding label of the second gesture video should be: the ring finger and index finger of the hand swing to the right at the same time.
  • Step 108 Associate and store the second gesture video and the tags corresponding to the second gesture video to obtain second gesture video data.
  • the obtained second gesture video and the corresponding label are stored in association, so that the second gesture video data is obtained, that is, the second gesture video data includes: the second gesture video and the corresponding label.
  • step 110 both the first gesture video data and the second gesture video data are used as training data of the gesture recognition model.
  • both the first gesture data and the second gesture data are used as training data to train the model, that is, the model is trained through the data-enhanced training data, which is beneficial to improve the model's ability to perform in the first Second, the accuracy of gesture recognition.
  • the trained gesture recognition model can be used to recognize dynamic gestures.
  • the above-mentioned data enhancement method for gesture recognition obtains a second gesture video by horizontally mirroring and flipping the video frame image in the first gesture video, and determines the label corresponding to the second gesture video according to the label of the first gesture video, and converts the The second gesture video and the tags corresponding to the second gesture video are associated and stored to obtain second video gesture data, and both the first gesture video data and the second gesture video data are used as training data for the gesture recognition model.
  • the second gesture video data is obtained according to the first gesture video data, and then the first gesture video data and the second gesture video data are used together as training data, which can make the training data more comprehensive, thereby making the training data more comprehensive.
  • the trained model can not only accurately predict the first gesture, but also accurately predict the second gesture.
  • a data enhancement method for dynamic gesture recognition is proposed, further comprising:
  • Step 302 Obtain a gesture video to be augmented, where the gesture video includes multiple video frame images.
  • the gesture video to be augmented refers to the gesture video of the edges to be augmented, and includes a plurality of video frame images of the edges to be augmented.
  • Step 304 Perform edge augmentation on each video frame image in the gesture video to obtain a gesture video after edge augmentation, and the proportion of gestures in the gesture video after edge augmentation is reduced.
  • the edge augmentation is to increase the video frame image in the form of adding a frame to the four sides of the video frame image, so that the proportion of gestures in the gesture video is reduced.
  • FIG. 4 it is a schematic diagram before and after edge enhancement is performed on a video frame image in one embodiment. The left is the image before edge augmentation, and the right is the image after edge augmentation.
  • Step 306 both the gesture video before augmentation and the post-augmentation gesture video are used as training data for the gesture recognition model.
  • an edge-augmented gesture video is obtained by performing edge augmentation processing on an existing gesture video.
  • performing edge augmentation on each video frame image in the gesture video to obtain an augmented edge gesture video includes: determining the augmented color of the corresponding augmented edge according to pixel values of the edge of the video frame image.
  • the edge of the video frame image is amplified according to the amplification color according to the preset amplification width to obtain the amplified video frame image.
  • the corresponding augmented color is determined according to the pixel value of the edge of the video frame image.
  • a video frame image includes four sides: upper, lower, left and right. Then, an RGB color value is calculated for each edge.
  • the amplified color of the amplified edge can be calculated according to the edge pixel value of each edge. In one embodiment, a row or a column of pixel values closest to the edge of each edge is obtained, and then the average value of the pixel values of the row or column is used as the augmented color of the augmented edge.
  • augment the edge of the video frame image according to the preset augmentation width. For example, set the augmentation width to one-fifth of the image height, and the four sides of the upper, lower, left, and right sides can be increased by the same amount. Assuming that the height of the original image is 100, and the corresponding augmented width is calculated to be 20, then, if the original image width:height is 150:100, then after one-fifth of the augmented height of the upper, lower, left and right sides, the obtained Image width:height is 190:140.
  • determining the augmented color of the corresponding augmented edge according to the pixel value of the edge of the video frame image includes: acquiring the pixel value of a preset position of the edge of the video frame image, and using the pixel value of the preset position as the corresponding augmented edge Augmented color of the edge.
  • the preset position refers to the pre-selected position.
  • the amplification color for amplifying the edge part it is necessary to select a color that is as close to the background as possible, rather than the color of the human body part, so as to realize the expansion of the background color. Purpose.
  • the color of the outer side for example, select the color of the outermost quarter of each edge as the augmented color.
  • it can also be set to select the color at the outermost eighth position as the amplified color.
  • the specific selected location can be customized according to the actual scene needs.
  • the RGB image is read in the form of an array by the tool openv
  • the array is named Image
  • the size of Image is (row, col, 3), which represent the number of rows, columns and channels of the array, respectively
  • the value in the array corresponds to the RGB value of a pixel in the image.
  • the top, bottom, left, and right sides of the image be top, bottom, left, and right, respectively.
  • color_top Image[0:1, int(col/3):int(col/3)+1] (1)
  • color_bottom Image[row-1:row, int(col/3):int(col/3)+1] (2)
  • formula (2) The meaning expressed by formula (2) is to take the RGB value of the pixel at the third column of the last row of the original image as the augmented color of the bottom edge.
  • formulas used are shown in (3) and (4):
  • color_left Image[int(row/3):int(row/3)+1, 0:1] (3)
  • color_right Imageim[int(row/3):int(row/3)+1, col-1:col] (4)
  • Formula (3) means to take the RGB value of the pixel at the first column and one third row of the original image, as the augmented color of the left side
  • formula (4) means to take the RGB value of the pixel at the last column and one third row of the original image , as the amplified color of the right edge.
  • the enlarged part of each edge is similar in color to the original edge.
  • 3 in row/3 and col/3 is a set parameter, which can be modified to other values according to the actual situation.
  • performing edge augmentation on each video frame image in the gesture video to obtain a gesture video after edge augmentation includes: acquiring a first video frame image in the gesture video; The pixel values of each edge determine the pixel values of the corresponding augmented edges, and the augmented pixel values corresponding to the four edges in the first video frame image are obtained; the augmented pixel values corresponding to the four edges in the first video frame image and The preset expansion width determines four augmented edges of the first video frame image; all four edges of other video frame images in the gesture video are augmented into the same augmented edges as the first video frame image.
  • the data enhancement method in the above embodiment is mainly applied to dynamic gesture recognition, and the following method can be used for static gesture recognition.
  • the trained gesture recognition model can be used for both dynamic gesture recognition and static gesture recognition.
  • a data enhancement method for gesture recognition including:
  • Step 502 Obtain first gesture image data, where the first gesture image data includes: the first gesture image and a label corresponding to the first gesture image.
  • Step 504 Perform horizontal mirror flipping of the first gesture image to obtain a second gesture image.
  • Step 506 Determine the label corresponding to the second gesture image according to the label corresponding to the first gesture image.
  • Step 508 Associate and store the second gesture image and the label corresponding to the second gesture image to obtain second gesture image data.
  • Step 510 using both the first gesture image data and the second gesture image data as training data of the gesture recognition model.
  • the above data enhancement method can be applied to static gesture recognition, and the second gesture image can be obtained by performing horizontal mirror flipping of the first gesture image, and the label corresponding to the second gesture image is set according to the label corresponding to the first gesture image, For example, the label of the first gesture image is: right fist, and the label of the second gesture image is: right fist, so that the second gesture data is obtained, and then the second gesture data is also used as the training data of the recognition model.
  • the trained gesture recognition model can be used to recognize static gestures.
  • the above data enhancement method, device, computer equipment and storage medium for gesture recognition by performing horizontal mirror flipping of the first gesture image to obtain the second gesture image, and determining the corresponding position of the second gesture image according to the label of the first gesture image.
  • label, the second gesture image and the label corresponding to the second gesture image are associated and stored to obtain second gesture data, and both the first gesture data and the second gesture data are used as training data of the static gesture recognition model.
  • the second gesture data is obtained according to the first gesture data, and then the first gesture data and the second gesture data are used together as training data, which can make the training data more comprehensive, thereby making the model obtained by training more comprehensive. Not only can the first gesture be accurately predicted, but also the second gesture can be accurately predicted.
  • a data enhancement method for gesture recognition including:
  • Step 602 acquiring the gesture image to be augmented.
  • Step 604 Perform edge augmentation on the gesture image to obtain an edge augmented gesture image, where the proportion of gestures in the edge augmented gesture image is reduced.
  • Step 606 Both the pre-augmented gesture image and the augmented gesture image are used as training data for the gesture recognition model.
  • performing edge augmentation on the gesture image to obtain an edge augmented gesture image includes: determining the augmented color of the corresponding augmented edge according to the pixel value of the edge of the gesture image; according to the augmented color, according to a preset The augmented width of the augmented width is used to augment the edge of the gesture image to obtain the augmented gesture image.
  • determining the augmented color of the corresponding augmented edge according to the pixel value of the edge of the gesture image includes: acquiring the pixel value of a preset position of the edge of the gesture image, and using the pixel value of the preset position as the corresponding augmented edge the amplified color.
  • a data enhancement apparatus for gesture recognition including:
  • the first acquiring module 702 is configured to acquire first gesture video data, where the first gesture video data includes: the first gesture video and a label corresponding to the first gesture video.
  • the flipping module 704 is configured to perform horizontal mirror flipping of the gesture image corresponding to each video frame in the first gesture video to obtain a second gesture video.
  • the determining module 706 is configured to determine the label corresponding to the second gesture video according to the label corresponding to the first gesture video.
  • the association module 708 is configured to associate and store the second gesture video and the tag corresponding to the second gesture video to obtain second gesture video data.
  • the training module 710 is configured to use the first gesture video data and the second gesture video data together as training data for a dynamic gesture recognition model.
  • the above-mentioned apparatus further includes:
  • a second acquiring module 712 configured to acquire a gesture video to be augmented, where the gesture video includes multiple video frame images;
  • An augmentation module 714 configured to perform edge augmentation on each video frame image in the gesture video, to obtain a gesture video after edge augmentation, and the proportion of gestures in the gesture video after edge augmentation is reduced;
  • the training module 710 is further configured to use both the pre-augmented gesture video and the augmented gesture video as training data of the dynamic gesture recognition model.
  • the augmentation module 714 is further configured to determine the augmented color of the corresponding augmented edge according to the pixel value of the edge of the video frame image; The edge of the video frame image is amplified to obtain the amplified video frame image.
  • the augmentation module 714 is further configured to acquire the pixel value of the preset position of the edge of the video frame image, and use the pixel value of the preset position as the augmented color of the corresponding augmented edge.
  • the augmentation module 714 is further configured to acquire the first video frame image in the gesture video; determine the pixel value of the corresponding augmented edge according to the pixel value of each edge in the first video frame image, Obtaining the augmented pixel values corresponding to the four edges in the first video frame image respectively; determining the first video frame image according to the augmented pixel values corresponding to the four edges in the first video frame image Four augmented edges of a video frame image; all four edges of other video frame images in the gesture video are augmented into the same augmented edges as the first video frame image.
  • a data enhancement apparatus for gesture recognition including:
  • An image acquisition module 902 configured to acquire first gesture image data, where the first gesture image data includes: a first gesture image and a label corresponding to the first gesture image;
  • an image flipping module 904 configured to perform horizontal mirror flipping of the first gesture image to obtain a second gesture image
  • a label determination module 906 configured to determine the label corresponding to the second gesture image according to the label corresponding to the first gesture image
  • an image tag association module 908, configured to associate and store the second gesture image and the tag corresponding to the second gesture image to obtain second gesture image data
  • the model data module 910 is configured to use both the first gesture image data and the second gesture image data as training data for a gesture recognition model.
  • a data enhancement apparatus for gesture recognition further comprising:
  • an augmentation module configured to acquire a gesture image to be augmented, and perform edge augmentation on the gesture image to obtain an augmented edge gesture image, where the proportion of gestures in the augmented edge gesture image is reduced; Both the pre-augmented gesture image and the augmented gesture image are used as training data for the gesture recognition model.
  • Figure 10 shows an internal structure diagram of a computer device in one embodiment.
  • the computer device may be a terminal or a server.
  • the computer device includes a processor, memory, and a network interface connected by a system bus.
  • the memory includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium of the computer device stores an operating system, and also stores a computer program, which, when executed by the processor, enables the processor to implement the above-mentioned data enhancement method for dynamic gesture recognition.
  • a computer program can also be stored in the internal memory, and when the computer program is executed by the processor, the processor can execute the above-mentioned data enhancement method for dynamic gesture recognition.
  • FIG. 10 is only a block diagram of a partial structure related to the solution of the present application, and does not constitute a limitation on the computer equipment to which the solution of the present application is applied. Include more or fewer components than shown in the figures, or combine certain components, or have a different arrangement of components.
  • a computer device including a memory and a processor, wherein the memory stores a computer program, and when the computer program is executed by the processor, the processor causes the processor to perform the following steps: obtaining a first Gesture video data, the first gesture video data includes: a first gesture video and a label corresponding to the first gesture video; horizontally mirroring and flipping each video frame image in the first gesture video to obtain a second gesture video gesture video; determine the label corresponding to the second gesture video according to the label corresponding to the first gesture video; associate and store the label corresponding to the second gesture video and the second gesture video to obtain second gesture video data; both the first gesture video data and the second gesture video data are used as training data for the dynamic gesture recognition model.
  • the processor when the computer program is executed by the processor, the processor causes the processor to perform the following steps: acquiring a gesture video to be augmented, where the gesture video includes multiple video frame images; Perform edge augmentation on each video frame image in the gesture video to obtain a gesture video after edge augmentation, and the proportion of gestures in the gesture video after edge augmentation is reduced;
  • the post gesture videos are used as training data for the dynamic gesture recognition model.
  • performing edge augmentation on each video frame image in the gesture video to obtain an edge augmented gesture video includes: determining a corresponding augmentation according to pixel values at the edge of the video frame image The augmented color of the edge; according to the augmented color, the edge of the video frame image is augmented according to the preset augmented width to obtain an augmented video frame image.
  • the determining the corresponding augmented color of the augmented edge according to the pixel value of the edge of the video frame image includes: acquiring a pixel value of a preset position of the edge of the video frame image, The pixel value of the position serves as the augmented color of the corresponding augmented edge.
  • performing edge augmentation on each video frame image in the gesture video to obtain an edge augmented gesture video includes: acquiring a first video frame image in the gesture video; The pixel value of each edge in the first video frame image determines the pixel value of the corresponding augmented edge, and obtains the augmented pixel values corresponding to the four edges in the first video frame image respectively; according to the first video frame image The augmented pixel values corresponding to the four edges and the preset augmented width respectively determine the four augmented edges of the first video frame image; augment all the four edges of the other video frame images in the gesture video is the same augmented edge as the first video frame image.
  • a computer device including a memory and a processor, wherein the memory stores a computer program, and when the computer program is executed by the processor, the processor causes the processor to perform the following steps: obtaining a first Gesture image data, the first gesture image data includes: a first gesture image and a label corresponding to the first gesture image; horizontal mirror flipping of the first gesture image to obtain a second gesture image; The label corresponding to the first gesture image determines the label corresponding to the second gesture image; the second gesture image and the corresponding label of the second gesture image are associated and stored to obtain the second gesture image data ; Use both the first gesture image data and the second gesture image data as training data for the gesture recognition model.
  • the processor when the computer program is executed by the processor, the processor causes the processor to perform the following steps: acquiring a gesture image to be augmented; performing edge augmentation on the gesture image, and obtaining an augmented edge
  • the gesture image obtained after the edge augmentation reduces the proportion of gestures in the gesture image after edge augmentation; both the pre-amplification gesture image and the augmented gesture image are used as training data for the gesture recognition model.
  • a computer-readable storage medium which stores a computer program.
  • the processor causes the processor to perform the following steps: acquiring first gesture video data;
  • a gesture video data includes: a first gesture video and a label corresponding to the first gesture video; horizontally mirroring and flipping each video frame image in the first gesture video to obtain a second gesture video; according to the first gesture video
  • a label corresponding to a gesture video determines a label corresponding to the second gesture video; associate and store the label corresponding to the second gesture video and the second gesture video to obtain second gesture video data;
  • Both the first gesture video data and the second gesture video data are used as training data for a dynamic gesture recognition model.
  • the processor when the computer program is executed by the processor, the processor causes the processor to perform the following steps: acquiring a gesture video to be augmented, where the gesture video includes multiple video frame images; Perform edge augmentation on each video frame image in the gesture video to obtain a gesture video after edge augmentation, and the proportion of gestures in the gesture video after edge augmentation is reduced;
  • the post gesture videos are used as training data for the dynamic gesture recognition model.
  • performing edge augmentation on each video frame image in the gesture video to obtain an edge augmented gesture video includes: determining a corresponding augmentation according to pixel values at the edge of the video frame image The augmented color of the edge; according to the augmented color, the edge of the video frame image is augmented according to the preset augmented width to obtain an augmented video frame image.
  • the determining the corresponding augmented color of the augmented edge according to the pixel value of the edge of the video frame image includes: acquiring a pixel value of a preset position of the edge of the video frame image, The pixel value of the position serves as the augmented color of the corresponding augmented edge.
  • performing edge augmentation on each video frame image in the gesture video to obtain an edge augmented gesture video includes: acquiring a first video frame image in the gesture video; The pixel value of each edge in the first video frame image determines the pixel value of the corresponding augmented edge, and obtains the augmented pixel values corresponding to the four edges in the first video frame image respectively; according to the first video frame image The augmented pixel values corresponding to the four edges and the preset augmented width respectively determine the four augmented edges of the first video frame image; augment all the four edges of the other video frame images in the gesture video is the same augmented edge as the first video frame image.
  • a computer-readable storage medium storing a computer program, and when the computer program is executed by a processor, the processor causes the processor to perform the following steps: acquiring first gesture image data, the first A gesture image data includes: a first gesture image and a label corresponding to the first gesture image; horizontal mirror flipping of the first gesture image to obtain a second gesture image; according to the label corresponding to the first gesture image Determine the label corresponding to the second gesture image; associate and store the second gesture image and the label corresponding to the second gesture image to obtain second gesture image data; store the first gesture image Both the data and the second gesture image data serve as training data for the gesture recognition model.
  • the processor when the computer program is executed by the processor, the processor causes the processor to perform the following steps: acquiring a gesture image to be augmented; performing edge augmentation on the gesture image, and obtaining an augmented edge
  • the gesture image obtained after the edge augmentation reduces the proportion of gestures in the gesture image after edge augmentation; both the pre-amplification gesture image and the augmented gesture image are used as training data for the gesture recognition model.
  • Nonvolatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in various forms such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Road (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM double data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous chain Road (Synchlink) DRAM
  • SLDRAM synchronous chain Road (Synchlink) DRAM
  • Rambus direct RAM
  • DRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)

Abstract

本申请公开了一种用于手势识别的数据增强方法,包括:获取第一手势视频数据,所述第一手势视频数据包括:第一手势视频和所述第一手势视频对应的标签;将所述第一手势视频中每一视频帧图像进行水平镜像翻转,得到第二手势视频;根据所述第一手势视频对应的标签确定所述第二手势视频对应的标签;将所述第二手势视频和所述第二手势视频对应的标签进行关联存储,得到第二手势视频数据;将所述第一手势视频数据和所述第二手势视频数据都作为手势识别模型的训练数据。上述用于手势识别的数据增强方法可以使得训练数据更丰富,从而提高手势识别模型的准确度。此外,还提出了一种用于手势识别的数据增强装置、计算机设备及存储介质。

Description

用于手势识别的数据增强方法、装置、计算机设备及存储介质 技术领域
本申请涉及计算机技术领域,具体涉及一种用于手势识别的数据增强方法、装置、计算机设备及存储介质。
背景技术
手势是人类之间交流的一种自然形式,手势识别也是人机交互的重要研究方向之一。在训练手势识别模型的过程中,训练数据的质量和数量对最终手势识别的结果起到十分重要的作用。为了让模型在实际使用中能发挥更好的性能,在采集数据的过程中,倾向于数据的数量越多、数据涵盖的场景越广越好。由于采集数据是一项耗时耗力的过程,采集到数据往往可能不够全面,比如,在采集手势数据的过程中,大部分人都习惯性的用右手执行手势,从而使得最终的数据集中,右手的手势数据在总的数据中占比过高,而用左手执行的手势数据占比较低。将用这种数据训练出来的模型用在实际的场景中,会发现在检测左手手势时,其准确率往往低于右手手势。
技术问题
所以为了提高模型的准确度,可以采用数据增强的方法来使得训练数据更丰富,从而提高手势识别的准确度。
技术解决方案
基于此,有必要针对上述问题,提出一种可以丰富训练数据的用于手势识别的数据增强方法、装置、计算机设备及存储介质。
一种用于手势识别的数据增强方法,包括:
获取第一手势视频数据,所述第一手势视频数据包括:第一手势视频和所述第一手势视频对应的标签;
将所述第一手势视频中每一视频帧图像进行水平镜像翻转,得到第二手势视频;
根据所述第一手势视频对应的标签确定所述第二手势视频对应的标签;
将所述第二手势视频和所述第二手势视频对应的标签进行关联存储,得到第二手势视频数据;
将所述第一手势视频数据和所述第二手势视频数据都作为手势识别模型的训练数据。
一种用于手势识别的数据增强装置,包括:
第一获取模块,用于获取第一手势视频数据,所述第一手势视频数据包括:第一手势视频和所述第一手势视频对应的标签;
翻转模块,用于将所述第一手势视频中每一视频帧对应的手势图像进行水平镜像翻转,得到第二手势视频;
确定模块,用于根据所述第一手势视频对应的标签确定所述第二手势视频对应的标签;
关联模块,用于将所述第二手势视频和所述第二手势视频对应的标签进行关联存储,得到第二手势视频数据;
训练模块,用于将所述第一手势视频数据和所述第二手势视频数据一起作为手势识别模型的训练数据。
一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行以下步骤:
获取第一手势视频数据,所述第一手势视频数据包括:第一手势视频和所述第一手势视频对应的标签;
将所述第一手势视频中每一视频帧图像进行水平镜像翻转,得到第二手势视频;
根据所述第一手势视频对应的标签确定所述第二手势视频对应的标签;
将所述第二手势视频和所述第二手势视频对应的标签进行关联存储,得到第二手势视频数据;
将所述第一手势视频数据和所述第二手势视频数据都作为手势识别模型的训练数据。
一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时,使得所述处理器执行以下步骤:
获取第一手势视频数据,所述第一手势视频数据包括:第一手势视频和所述第一手势视频对应的标签;
将所述第一手势视频中每一视频帧图像进行水平镜像翻转,得到第二手势视频;
根据所述第一手势视频对应的标签确定所述第二手势视频对应的标签;
将所述第二手势视频和所述第二手势视频对应的标签进行关联存储,得到第二手势视频数据;
将所述第一手势视频数据和所述第二手势视频数据都作为手势识别模型的训练数据。
上述手势识别的数据增强方法、装置、计算机设备及存储介质,通过对第一手势视频中的视频帧图像进行水平镜像翻转,得到第二手势视频,并根据第一手势视频的标签确定第二手势视频对应的标签,将第二手势视频和第二手势视频对应的标签进行关联存储,得到第二视频手势数据,将第一手势视频数据和第二手势视频数据都作为手势识别模型的训练数据。上述手势识别的数据增强方法,根据第一手势视频数据得到第二手势视频数据,然后将第一手势视频数据和第二手势视频数据一起作为训练数据,可以使得训练数据更加全面,从而使得训练得到的模型不仅能够准确预测出第一手势,而且能够准确预测出第二手势。
一种用于手势识别的数据增强方法,包括:
获取第一手势图像数据,所述第一手势图像数据包括:第一手势图像和所述第一手势图像对应的标签;
将所述第一手势图像进行水平镜像翻转,得到第二手势图像;
根据所述第一手势图像对应的标签确定所述第二手势图像对应的标签;
将所述第二手势图像和所述第二手势图像对应的标签进行关联存储,得到第二手势图像数据;
将所述第一手势图像数据和所述第二手势图像数据都作为手势识别模型的训练数据。
一种用于手势识别的数据增强装置,包括:
图像获取模块,用于获取第一手势图像数据,所述第一手势图像数据包括:第一手势图像和所述第一手势图像对应的标签;
图像翻转模块,用于将所述第一手势图像进行水平镜像翻转,得到第二手势图像;
标签确定模块,用于根据所述第一手势图像对应的标签确定所述第二手势图像对应的标签;
图像标签关联模块,用于将所述第二手势图像和所述第二手势图像对应的标签进行关联存储,得到第二手势图像数据;
模型数据模块,用于将所述第一手势图像数据和所述第二手势图像数据都作为手势识别模型的训练数据。
一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行以下步骤:
获取第一手势图像数据,所述第一手势图像数据包括:第一手势图像和所述第一手势图像对应的标签;
将所述第一手势图像进行水平镜像翻转,得到第二手势图像;
根据所述第一手势图像对应的标签确定所述第二手势图像对应的标签;
将所述第二手势图像和所述第二手势图像对应的标签进行关联存储,得到第二手势图像数据;
将所述第一手势图像数据和所述第二手势图像数据都作为手势识别模型的训练数据。
一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时,使得所述处理器执行以下步骤:
获取第一手势图像数据,所述第一手势图像数据包括:第一手势图像和所述第一手势图像对应的标签;
将所述第一手势图像进行水平镜像翻转,得到第二手势图像;
根据所述第一手势图像对应的标签确定所述第二手势图像对应的标签;
将所述第二手势图像和所述第二手势图像对应的标签进行关联存储,得到第二手势图像数据;
将所述第一手势图像数据和所述第二手势图像数据都作为手势识别模型的训练数据。
有益效果
上述手势识别的数据增强方法、装置、计算机设备及存储介质,通过对第一手势图像进行水平镜像翻转,得到第二手势图像,并根据第一手势图像的标签确定第二手势图像对应的标签,将第二手势图像和第二手势图像对应的标签进行关联存储,得到第二手势数据,将第一手势数据和第二手势数据都作为手势识别模型的训练数据。上述手势识别的数据增强方法,根据第一手势数据得到第二手势数据,然后将第一手势数据和第二手势数据一起作为训练数据,可以使得训练数据更加全面,从而使得训练得到的模型不仅能够准确预测出第一手势,而且能够准确预测出第二手势。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
其中:
图1是一个实施例中用于手势识别的数据增强方法的流程图;
图2是一个实施例中手势视频帧图像的示意图;
图3是另一个实施例中用于手势识别的数据增强方法的流程图;
图4是一个实施例中对视频帧图像进行边缘扩增前后的示意图;
图5是又一个实施例中用于手势识别的数据增强方法的流程图;
图6是再一个实施例中用于手势识别的数据增强方法的流程图;
图7是一个实施例中用于动态手势识别的数据增强装置的结构框图;
图8是另一个实施例中用于动态手势识别的数据增强装置的结构框图;
图9是又一个实施例中用于动态手势识别的数据增强装置的结构框图;
图10是一个实施例中计算机设备的内部结构图。
本发明的实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
如图1所示,提出了一种用于手势识别的数据增强方法,该用于手势识别的数据增强方法可以应用于终端,也可以应用于服务器,本实施例以应用于终端举例说明。该用于手势识别的数据增强方法具体包括以下步骤:
步骤102,获取第一手势视频数据,第一手势视频数据包括:第一手势视频和第一手势视频对应的标签。
其中,第一手势视频数据为右手手势视频数据。由于大部分人都习惯于用右手,所以在采集动态手势数据时,出现了右手的手势数据比左手的手势数据多很多的情况。而如果重新进行左手的手势数据的采集需要一个很大的工程,耗时耗力。基于此,提出了一种用于动态手势识别的数据增强方法,该方法不需要对数据进行重新采集,只需要对已有的数据进行增强处理即可。
每个手势视频都对应有相应的标签。比如,手势标签为:手向左摆动或手向上移动等。每个手势视频都是有多个视频帧图像组成的。
步骤104,将第一手势视频中每一视频帧图像进行水平镜像翻转,得到第二手势视频。
其中,为了将右手手势视频转换为左手手势视频,需要将每一视频帧图像进行水平镜像翻转处理。如图2所示,为手势视频帧图像的示意图。图2中的2a部分的三张图为右手手势视频帧图像,图2中2b部分的三张图分别为右手手势视频帧图像进行水平镜像翻转之后得到的左手手势图像。将水平镜像翻转之后的视频帧按照原来的顺序组合到一起就得到了左手手势视频。
步骤106,根据第一手势视频对应的标签确定第二手势视频对应的标签。
其中,由于手势进行了水平镜像翻转,所以对应的标签就需要进行适应性改变,比如,参考图2,第一手势视频对应的标签为:手的无名指和食指同时向左摆动,那么进行转换得到的第二手势视频对应的标签应该为:手的无名指和食指同时向右摆动。
步骤108,将第二手势视频和第二手势视频对应的标签进行关联存储,得到第二手势视频数据。
其中,将得到的第二手势视频和相应的标签进行关联存储,这样就得到了第二手势视频数据,即第二手势视频数据包括:第二手势视频和对应的标签。
步骤110,将第一手势视频数据和第二手势视频数据都作为手势识别模型的训练数据。
其中,最后在训练手势识别模型时,将第一手势数据和第二手势数据都作为训练数据对该模型进行训练,即通过数据增强后的训练数据对模型进行训练,有利于提高模型对于第二手势识别的准确度。该训练得到手势识别模型可以用于对动态手势的识别。
上述手势识别的数据增强方法,通过对第一手势视频中的视频帧图像进行水平镜像翻转,得到第二手势视频,并根据第一手势视频的标签确定第二手势视频对应的标签,将第二手势视频和第二手势视频对应的标签进行关联存储,得到第二视频手势数据,将第一手势视频数据和第二手势视频数据都作为手势识别模型的训练数据。上述手势识别的数据增强方法,根据第一手势视频数据得到第二手势视频数据,然后将第一手势视频数据和第二手势视频数据一起作为训练数据,可以使得训练数据更加全面,从而使得训练得到的模型不仅能够准确预测出第一手势,而且能够准确预测出第二手势。
在数据采集过程中手距离摄像头的距离越近,手在画面中的占比就会越大,反之,手距离摄像头的距离越远,手在画面中的占比就会越小。若采集数据的时候,手的占比都比较大,而实际应用的的过程中,存在手距离摄像头的距离较远的情况,此时,将用画面占比大的动态手势数据训练出的模型应用于占比小的动态手势上,会导致结果不准。为了提高对占比小的动态手势的识别,将已有的训练数据进行增强处理,通过扩增图像边缘的方法缩小手在画面中的占比。
如图3所示,在一个实施例中,提出一种用于动态手势识别的数据增强方法还包括:
步骤302,获取待扩增的手势视频,手势视频中包括多个视频帧图像。
其中,待扩增的手势视频是指待扩增边缘的手势视频,包含有多个待扩增边缘的视频帧图像。
步骤304,对手势视频中每个视频帧图像进行边缘扩增,得到边缘扩增后的手势视频,边缘扩增后的手势视频中手势的占比减少。
其中,边缘扩增即对视频帧图像的四个边以添加边框的形式来增大视频帧图像,从而使得手势视频中的手势占比减少。如图4所示,为一个实施例中,对视频帧图像进行边缘扩增前后的示意图。左边为边缘扩增前的图像,右边为边缘扩增后的图像。
步骤306,将扩增前的手势视频和扩增后的手势视频都作为手势识别模型的训练数据。
其中,为了提高手势识别模型对手占比小的手势识别的准确度,通过对已有的手势视频进行边缘扩增处理,得到边缘扩增后的手势视频。通过将扩增前的手势视频和扩增后的手势视频一起作为手势识别模型的训练数据对模型进行训练,提高了对占比小的动态手势的识别。
在一个实施例中,对手势视频中每个视频帧图像进行边缘扩增,得到边缘扩增后的手势视频,包括:根据视频帧图像边缘的像素值确定对应的扩增边缘的扩增颜色。根据扩增颜色按照预设的扩增宽度对视频帧图像的边缘进行扩增,得到扩增后的视频帧图像。
其中,为了使得添加的扩增边缘的扩增颜色更加接近背景色,根据视频帧图像边缘的像素值来确定对应的扩增颜色。具体地,一个视频帧图像包括上、下、左、右四条边。然后针对每条边都计算出一个RGB颜色值,为了尽量与原来的背景色接近,可以根据每条边的边缘像素值来计算得到扩增边缘的扩增颜色。在一个实施例中,获取每个边的最靠近边缘的一行或一列像素值,然后将该一行或一列像素值的平均值作为扩增边缘的扩增颜色。
在确定了扩增颜色后,按照预设的扩增宽度对视频帧图像的边缘进行扩增,比如,将扩增宽度设为图像高度的五分之一,上下左右四个边可以都增加同样的宽度,假设原来的图像高为100,计算得到对应的扩增宽度为20,那么,若原来图像宽:高为150:100,则上下左右各扩增高度的五分之一后,得到的图像宽:高为190:140。
    在一个实施例中,根据视频帧图像边缘的像素值确定对应的扩增边缘的扩增颜色,包括:获取视频帧图像边缘预设位置的像素值,将预设位置的像素值作为相应的扩增边缘的扩增颜色。
其中,预设位置是指预先选取的位置,在选取用于扩增边缘部分的扩增颜色时,需要尽可能选择接近背景的颜色,而不是人的身体部分的颜色,从而实现扩大背景色的目的。通过观察数据集中的图像,得知人在做动作时,大多数情况下,身体都处于画面中央位置,因此,在为每条边选取扩增颜色时,避免选取中间部分的颜色,可以选择靠近边缘外侧的颜色,比如,选取每条边最外侧四分之一处的颜色作为扩增颜色。当然,也可以设置选取最外侧八分之一处的颜色作为扩增颜色。具体选择的位置可以根据实际场景需要自定义设置。
在一个具体的实施例中,假设RGB图像是由工具openv读取为数组的形式,数组命名为Image, Image的大小为(row,col,3),分别表示数组的行数、列数和通道数,数组中的值对应的即是图像某个像素的RGB值。设图像的上、下、左、右四条边分别为top、bottom、left和right。则计算top对应的扩增颜色color_top时,用到的公式如下:
color_top = Image[0:1, int(col/3):int(col/3)+1]                    (1)
公式(1)表示的含义为取原始图像第一行的三分之一列处像素的RGB值,作为top边的扩增颜色。同理,计算bottom对应的扩增颜色color_bottom时,用到的公式如下:
      color_bottom = Image[row-1:row, int(col/3):int(col/3)+1]      (2)
公式(2)表示的含义为取原始图像最后一行的三分之一列处像素的RGB值,作为bottom边的扩增颜色。计算left和right对应的扩增颜色color_left和color_right时,用到的公式如(3)和(4)所示:
     color_left = Image[int(row/3):int(row/3)+1, 0:1]               (3)
     color_right = Imageim[int(row/3):int(row/3)+1, col-1:col]       (4)
公式(3)表示取原始图像第一列三分之一行处像素的RGB值,作为left边的扩增颜色,公式(4)表示取原始图像最后一列三分之一行处像素的RGB值,作为right边的扩增颜色。每一条边扩增后的部分都与原来的边颜色相近。在本申请中,row/3和col/3中的3为一个设置的参数,可根据实际情况修改为其他值。
在一个实施例中,对手势视频中每个视频帧图像进行边缘扩增,得到边缘扩增后的手势视频,包括:获取手势视频中的第一视频帧图像;根据第一视频帧图像中每个边缘的像素值确定相应扩增边缘的像素值,得到第一视频帧图像中四个边缘分别对应的扩增像素值;根据第一视频帧图像中四个边缘分别对应的扩增像素值和预设的扩充宽度确定第一视频帧图像的四个扩增边缘;将手势视频中的其他视频帧图像的四个边缘都扩增为与第一视频帧图像相同的扩增边缘。
其中,考虑到每个手势视频抽帧后会得到多张图像,在计算扩增的边框(即扩增边缘)时候,对于一个动态手势的若干个图像,只根据第一帧的图像计算出四条边对应的颜色,并将计算出的颜色直接应用在后续的多个帧中。原因是在做动作的过程中,背景颜色或亮度会有一些微小变化,比如,可能最初的时候手并未出现在画面中,而是随着动作的进行,手逐渐出现,此时如果每一帧都计算一次,那么每一帧计算出来的RGB颜色值会有所不同,从而导致扩增之后的边框在帧与帧之间存在较大的跳动变化,导致与实际场景差距很大。
上述实施例中的数据增强方法主要应用于动态手势识别,下面的方法可以用于静态手势识别。此外,在一个实施例中,训练得到的手势识别模型既可以用于对动态手势的识别,也可以用于对静态手势的识别。
如图5所示,在一个实施例中,提出了一种用于手势识别的数据增强方法,包括:
步骤502,获取第一手势图像数据,第一手势图像数据包括:第一手势图像和第一手势图像对应的标签。
步骤504,将第一手势图像进行水平镜像翻转,得到第二手势图像。
步骤506,根据第一手势图像对应的标签确定第二手势图像对应的标签。
步骤508,将第二手势图像和第二手势图像对应的标签进行关联存储,得到第二手势图像数据。
步骤510,将第一手势图像数据和第二手势图像数据都作为手势识别模型的训练数据。
上述数据增强方法可以应用于静态手势识别,通过将第一手势图像进行水平镜像翻转,可以得到第二手势图像,并根据第一手势图像对应的标签来设置第二手势图像对应的标签,比如,第一手势图像的标签为:右手拳头,第二手势图像的标签为:右手拳头,从而得到第二手势数据,进而将第二手势数据也作为识别模型的训练数据。该训练得到手势识别模型可以用于对静态手势的识别。
上述手势识别的数据增强方法、装置、计算机设备及存储介质,通过对第一手势图像进行水平镜像翻转,得到第二手势图像,并根据第一手势图像的标签确定第二手势图像对应的标签,将第二手势图像和第二手势图像对应的标签进行关联存储,得到第二手势数据,将第一手势数据和第二手势数据都作为静态手势识别模型的训练数据。上述手势识别的数据增强方法,根据第一手势数据得到第二手势数据,然后将第一手势数据和第二手势数据一起作为训练数据,可以使得训练数据更加全面,从而使得训练得到的模型不仅能够准确预测出第一手势,而且能够准确预测出第二手势。
如图6所示,在一个实施例中,提出了一种用于手势识别的数据增强方法,包括:
步骤602,获取待扩增的手势图像。
步骤604,对手势图像进行边缘扩增,得到边缘扩增后的手势图像,边缘扩增后的手势图像中手势的占比减少。
步骤606,将扩增前的手势图像和扩增后的手势图像都作为手势识别模型的训练数据。
其中,针对单张的手势图像进行边缘扩增,得到扩增后的手势图像,该扩增后的手势图像中手势的占比较少,从而可以提高对于占比较少的手势的识别。
在一个实施例中,对手势图像进行边缘扩增,得到边缘扩增后的手势图像,包括:根据手势图像边缘的像素值确定对应的扩增边缘的扩增颜色;根据扩增颜色按照预设的扩增宽度对手势图像的边缘进行扩增,得到扩增后的手势图像。
在一个实施例中,根据手势图像边缘的像素值确定对应的扩增边缘的扩增颜色,包括:获取手势图像边缘预设位置的像素值,将预设位置的像素值作为相应的扩增边缘的扩增颜色。
如图7所示,在一个实施例中,提出了一种用于手势识别的数据增强装置,包括:
第一获取模块702,用于获取第一手势视频数据,所述第一手势视频数据包括:第一手势视频和所述第一手势视频对应的标签。
翻转模块704,用于将所述第一手势视频中每一视频帧对应的手势图像进行水平镜像翻转,得到第二手势视频。
确定模块706,用于根据所述第一手势视频对应的标签确定所述第二手势视频对应的标签。
关联模块708,用于将所述第二手势视频和所述第二手势视频对应的标签进行关联存储,得到第二手势视频数据。
训练模块710,用于将所述第一手势视频数据和所述第二手势视频数据一起作为动态手势识别模型的训练数据。
如图8所示,在一个实施例中,上述装置还包括:
第二获取模块712,用于获取待扩增的手势视频,所述手势视频中包括多个视频帧图像;
扩增模块714,用于对所述手势视频中每个视频帧图像进行边缘扩增,得到边缘扩增后的手势视频,所述边缘扩增后的手势视频中手势的占比减少;
训练模块710还用于将扩增前的手势视频和所述扩增后的手势视频都作为动态手势识别模型的训练数据。
在一个实施例中,扩增模块714还用于根据所述视频帧图像边缘的像素值确定对应的扩增边缘的扩增颜色;根据所述扩增颜色按照预设的扩增宽度对所述视频帧图像的边缘进行扩增,得到扩增后的视频帧图像。
    在一个实施例中,扩增模块714还用于获取所述视频帧图像边缘预设位置的像素值,将所述预设位置的像素值作为相应的扩增边缘的扩增颜色。
在一个实施例中,扩增模块714还用于获取所述手势视频中的第一视频帧图像;根据所述第一视频帧图像中每个边缘的像素值确定相应扩增边缘的像素值,得到所述第一视频帧图像中四个边缘分别对应的扩增像素值;根据所述第一视频帧图像中四个边缘分别对应的扩增像素值和预设的扩充宽度确定所述第一视频帧图像的四个扩增边缘;将所述手势视频中的其他视频帧图像的四个边缘都扩增为与所述第一视频帧图像相同的扩增边缘。  
如图9所示,在一个实施例中,提出了一种用于手势识别的数据增强装置,包括:
图像获取模块902,用于获取第一手势图像数据,所述第一手势图像数据包括:第一手势图像和所述第一手势图像对应的标签;
图像翻转模块904,用于将所述第一手势图像进行水平镜像翻转,得到第二手势图像;
标签确定模块906,用于根据所述第一手势图像对应的标签确定所述第二手势图像对应的标签;
图像标签关联模块908,用于将所述第二手势图像和所述第二手势图像对应的标签进行关联存储,得到第二手势图像数据;
模型数据模块910,用于将所述第一手势图像数据和所述第二手势图像数据都作为手势识别模型的训练数据。
在一个实施例中,提出了一种用于手势识别的数据增强装置还包括:
扩增模块,用于获取待扩增的手势图像,对所述手势图像进行边缘扩增,得到边缘扩增后的手势图像,所述边缘扩增后的手势图像中手势的占比减少;将扩增前的手势图像和所述扩增后的手势图像都作为手势识别模型的训练数据。
图10示出了一个实施例中计算机设备的内部结构图。该计算机设备具体可以是终端,也可以是服务器。如图10所示,该计算机设备包括通过系统总线连接的处理器、存储器和网络接口。其中,存储器包括非易失性存储介质和内存储器。该计算机设备的非易失性存储介质存储有操作系统,还可存储有计算机程序,该计算机程序被处理器执行时,可使得处理器实现上述的用于动态手势识别的数据增强方法。该内存储器中也可储存有计算机程序,该计算机程序被处理器执行时,可使得处理器执行上述的用于动态手势识别的数据增强方法。本领域技术人员可以理解,图10中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
在一个实施例中,提出了一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行以下步骤:获取第一手势视频数据,所述第一手势视频数据包括:第一手势视频和所述第一手势视频对应的标签;将所述第一手势视频中每一视频帧图像进行水平镜像翻转,得到第二手势视频;根据所述第一手势视频对应的标签确定所述第二手势视频对应的标签;将所述第二手势视频和所述第二手势视频对应的标签进行关联存储,得到第二手势视频数据;将所述第一手势视频数据和所述第二手势视频数据都作为动态手势识别模型的训练数据。
在一个实施例中,所述计算机程序被所述处理器执行时,使得所述处理器执行以下步骤:获取待扩增的手势视频,所述手势视频中包括多个视频帧图像;对所述手势视频中每个视频帧图像进行边缘扩增,得到边缘扩增后的手势视频,所述边缘扩增后的手势视频中手势的占比减少;将扩增前的手势视频和所述扩增后的手势视频都作为动态手势识别模型的训练数据。
在一个实施例中,所述对所述手势视频中每个视频帧图像进行边缘扩增,得到边缘扩增后的手势视频,包括:根据所述视频帧图像边缘的像素值确定对应的扩增边缘的扩增颜色;根据所述扩增颜色按照预设的扩增宽度对所述视频帧图像的边缘进行扩增,得到扩增后的视频帧图像。
    在一个实施例中,所述根据所述视频帧图像边缘的像素值确定对应的扩增边缘的扩增颜色,包括:获取所述视频帧图像边缘预设位置的像素值,将所述预设位置的像素值作为相应的扩增边缘的扩增颜色。
在一个实施例中,所述对所述手势视频中每个视频帧图像进行边缘扩增,得到边缘扩增后的手势视频,包括:获取所述手势视频中的第一视频帧图像;根据所述第一视频帧图像中每个边缘的像素值确定相应扩增边缘的像素值,得到所述第一视频帧图像中四个边缘分别对应的扩增像素值;根据所述第一视频帧图像中四个边缘分别对应的扩增像素值和预设的扩充宽度确定所述第一视频帧图像的四个扩增边缘;将所述手势视频中的其他视频帧图像的四个边缘都扩增为与所述第一视频帧图像相同的扩增边缘。  
在一个实施例中,提出了一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行以下步骤:获取第一手势图像数据,所述第一手势图像数据包括:第一手势图像和所述第一手势图像对应的标签;将所述第一手势图像进行水平镜像翻转,得到第二手势图像;根据所述第一手势图像对应的标签确定所述第二手势图像对应的标签;将所述第二手势图像和所述第二手势图像对应的标签进行关联存储,得到第二手势图像数据;将所述第一手势图像数据和所述第二手势图像数据都作为手势识别模型的训练数据。
在一个实施例中,所述计算机程序被所述处理器执行时,使得所述处理器执行以下步骤:获取待扩增的手势图像;对所述手势图像进行边缘扩增,得到边缘扩增后的手势图像,所述边缘扩增后的手势图像中手势的占比减少;将扩增前的手势图像和所述扩增后的手势图像都作为手势识别模型的训练数据。
在一个实施例中,提出了一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时,使得所述处理器执行以下步骤:获取第一手势视频数据,所述第一手势视频数据包括:第一手势视频和所述第一手势视频对应的标签;将所述第一手势视频中每一视频帧图像进行水平镜像翻转,得到第二手势视频;根据所述第一手势视频对应的标签确定所述第二手势视频对应的标签;将所述第二手势视频和所述第二手势视频对应的标签进行关联存储,得到第二手势视频数据;将所述第一手势视频数据和所述第二手势视频数据都作为动态手势识别模型的训练数据。
在一个实施例中,所述计算机程序被所述处理器执行时,使得所述处理器执行以下步骤:获取待扩增的手势视频,所述手势视频中包括多个视频帧图像;对所述手势视频中每个视频帧图像进行边缘扩增,得到边缘扩增后的手势视频,所述边缘扩增后的手势视频中手势的占比减少;将扩增前的手势视频和所述扩增后的手势视频都作为动态手势识别模型的训练数据。
在一个实施例中,所述对所述手势视频中每个视频帧图像进行边缘扩增,得到边缘扩增后的手势视频,包括:根据所述视频帧图像边缘的像素值确定对应的扩增边缘的扩增颜色;根据所述扩增颜色按照预设的扩增宽度对所述视频帧图像的边缘进行扩增,得到扩增后的视频帧图像。
    在一个实施例中,所述根据所述视频帧图像边缘的像素值确定对应的扩增边缘的扩增颜色,包括:获取所述视频帧图像边缘预设位置的像素值,将所述预设位置的像素值作为相应的扩增边缘的扩增颜色。
在一个实施例中,所述对所述手势视频中每个视频帧图像进行边缘扩增,得到边缘扩增后的手势视频,包括:获取所述手势视频中的第一视频帧图像;根据所述第一视频帧图像中每个边缘的像素值确定相应扩增边缘的像素值,得到所述第一视频帧图像中四个边缘分别对应的扩增像素值;根据所述第一视频帧图像中四个边缘分别对应的扩增像素值和预设的扩充宽度确定所述第一视频帧图像的四个扩增边缘;将所述手势视频中的其他视频帧图像的四个边缘都扩增为与所述第一视频帧图像相同的扩增边缘。
在一个实施例中,提出了一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时,使得所述处理器执行以下步骤:获取第一手势图像数据,所述第一手势图像数据包括:第一手势图像和所述第一手势图像对应的标签;将所述第一手势图像进行水平镜像翻转,得到第二手势图像;根据所述第一手势图像对应的标签确定所述第二手势图像对应的标签;将所述第二手势图像和所述第二手势图像对应的标签进行关联存储,得到第二手势图像数据;将所述第一手势图像数据和所述第二手势图像数据都作为手势识别模型的训练数据。
在一个实施例中,所述计算机程序被所述处理器执行时,使得所述处理器执行以下步骤:获取待扩增的手势图像;对所述手势图像进行边缘扩增,得到边缘扩增后的手势图像,所述边缘扩增后的手势图像中手势的占比减少;将扩增前的手势图像和所述扩增后的手势图像都作为手势识别模型的训练数据。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一非易失性计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink) DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对本申请专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (10)

  1. 一种用于手势识别的数据增强方法,其特征在于,包括:
    获取第一手势视频数据,所述第一手势视频数据包括:第一手势视频和所述第一手势视频对应的标签;
    将所述第一手势视频中每一视频帧图像进行水平镜像翻转,得到第二手势视频;
    根据所述第一手势视频对应的标签确定所述第二手势视频对应的标签;
    将所述第二手势视频和所述第二手势视频对应的标签进行关联存储,得到第二手势视频数据;
    将所述第一手势视频数据和所述第二手势视频数据都作为手势识别模型的训练数据。
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    获取待扩增的手势视频,所述手势视频中包括多个视频帧图像;
    对所述手势视频中每个视频帧图像进行边缘扩增,得到边缘扩增后的手势视频,所述边缘扩增后的手势视频中手势的占比减少;
    将扩增前的手势视频和所述扩增后的手势视频都作为手势识别模型的训练数据。
  3. 根据权利要求2所述的方法,其特征在于,所述对所述手势视频中每个视频帧图像进行边缘扩增,得到边缘扩增后的手势视频,包括:
    根据所述视频帧图像边缘的像素值确定对应的扩增边缘的扩增颜色;
    根据所述扩增颜色按照预设的扩增宽度对所述视频帧图像的边缘进行扩增,得到扩增后的视频帧图像。
  4. 根据权利要求3所述的方法,其特征在于,所述根据所述视频帧图像边缘的像素值确定对应的扩增边缘的扩增颜色,包括:
    获取所述视频帧图像边缘预设位置的像素值,将所述预设位置的像素值作为相应的扩增边缘的扩增颜色。
  5. 根据权利要求2所述的方法,其特征在于,所述对所述手势视频中每个视频帧图像进行边缘扩增,得到边缘扩增后的手势视频,包括:
    获取所述手势视频中的第一视频帧图像;
    根据所述第一视频帧图像中每个边缘的像素值确定相应扩增边缘的像素值,得到所述第一视频帧图像中四个边缘分别对应的扩增像素值;
    根据所述第一视频帧图像中四个边缘分别对应的扩增像素值和预设的扩充宽度确定所述第一视频帧图像的四个扩增边缘;
    将所述手势视频中的其他视频帧图像的四个边缘都扩增为与所述第一视频帧图像相同的扩增边缘。
  6. 一种用于手势识别的数据增强方法,其特征在于,包括:
    获取第一手势图像数据,所述第一手势图像数据包括:第一手势图像和所述第一手势图像对应的标签;
    将所述第一手势图像进行水平镜像翻转,得到第二手势图像;
    根据所述第一手势图像对应的标签确定所述第二手势图像对应的标签;
    将所述第二手势图像和所述第二手势图像对应的标签进行关联存储,得到第二手势图像数据;
    将所述第一手势图像数据和所述第二手势图像数据都作为手势识别模型的训练数据。
  7. 根据权利要求6所述的方法,其特征在于,所述方法还包括:
    获取待扩增的手势图像;
    对所述手势图像进行边缘扩增,得到边缘扩增后的手势图像,所述边缘扩增后的手势图像中手势的占比减少;
    将扩增前的手势图像和所述扩增后的手势图像都作为手势识别模型的训练数据。
  8. 一种用于手势识别的数据增强装置,其特征在于,包括:
    第一获取模块,用于获取第一手势视频数据,所述第一手势视频数据包括:第一手势视频和所述第一手势视频对应的标签;
    翻转模块,用于将所述第一手势视频中每一视频帧对应的手势图像进行水平镜像翻转,得到第二手势视频;
    确定模块,用于根据所述第一手势视频对应的标签确定所述第二手势视频对应的标签;
    关联模块,用于将所述第二手势视频和所述第二手势视频对应的标签进行关联存储,得到第二手势视频数据;
    训练模块,用于将所述第一手势视频数据和所述第二手势视频数据一起作为手势识别模型的训练数据。
  9. 一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时,使得所述处理器执行如权利要求1至7中任一项所述的用于手势识别的数据增强方法的步骤。
  10. 一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行如权利要求1至7中任一项所述的用于手势识别的数据增强方法的步骤。
PCT/CN2020/129017 2020-11-16 2020-11-16 用于手势识别的数据增强方法、装置、计算机设备及存储介质 WO2022099685A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/129017 WO2022099685A1 (zh) 2020-11-16 2020-11-16 用于手势识别的数据增强方法、装置、计算机设备及存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/129017 WO2022099685A1 (zh) 2020-11-16 2020-11-16 用于手势识别的数据增强方法、装置、计算机设备及存储介质

Publications (1)

Publication Number Publication Date
WO2022099685A1 true WO2022099685A1 (zh) 2022-05-19

Family

ID=81602098

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/129017 WO2022099685A1 (zh) 2020-11-16 2020-11-16 用于手势识别的数据增强方法、装置、计算机设备及存储介质

Country Status (1)

Country Link
WO (1) WO2022099685A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106980365A (zh) * 2017-02-21 2017-07-25 华南理工大学 基于深度卷积神经网络框架的第一视角动态手势识别方法
US9891711B1 (en) * 2016-07-26 2018-02-13 Toyota Motor Engineering & Manufacturing North America, Inc. Human machine interface with haptic response based on phased array LIDAR
CN109684803A (zh) * 2018-12-19 2019-04-26 西安电子科技大学 基于手势滑动的人机验证方法
CN109871781A (zh) * 2019-01-28 2019-06-11 山东大学 基于多模态3d卷积神经网络的动态手势识别方法及系统
CN111126108A (zh) * 2018-10-31 2020-05-08 北京市商汤科技开发有限公司 图像检测模型的训练和图像检测方法及装置
CN111401438A (zh) * 2020-03-13 2020-07-10 德联易控科技(北京)有限公司 图像分拣方法、装置及系统

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9891711B1 (en) * 2016-07-26 2018-02-13 Toyota Motor Engineering & Manufacturing North America, Inc. Human machine interface with haptic response based on phased array LIDAR
CN106980365A (zh) * 2017-02-21 2017-07-25 华南理工大学 基于深度卷积神经网络框架的第一视角动态手势识别方法
CN111126108A (zh) * 2018-10-31 2020-05-08 北京市商汤科技开发有限公司 图像检测模型的训练和图像检测方法及装置
CN109684803A (zh) * 2018-12-19 2019-04-26 西安电子科技大学 基于手势滑动的人机验证方法
CN109871781A (zh) * 2019-01-28 2019-06-11 山东大学 基于多模态3d卷积神经网络的动态手势识别方法及系统
CN111401438A (zh) * 2020-03-13 2020-07-10 德联易控科技(北京)有限公司 图像分拣方法、装置及系统

Similar Documents

Publication Publication Date Title
JP6636154B2 (ja) 顔画像処理方法および装置、ならびに記憶媒体
WO2020192483A1 (zh) 图像显示方法和设备
US20220051000A1 (en) Method and apparatus for detecting face key point, computer device and storage medium
US11055516B2 (en) Behavior prediction method, behavior prediction system, and non-transitory recording medium
CN108537749B (zh) 图像处理方法、装置、移动终端及计算机可读存储介质
US20200134795A1 (en) Image processing method, image processing system, and storage medium
CN111556336B (zh) 一种多媒体文件处理方法、装置、终端设备及介质
CN107730444A (zh) 图像处理方法、装置、可读存储介质和计算机设备
CN107085654B (zh) 基于人脸图像的健康分析方法及装置
US11367196B2 (en) Image processing method, apparatus, and storage medium
CN103369238B (zh) 图像生成装置以及图像生成方法
JP2007087345A (ja) 情報処理装置及びその制御方法、コンピュータプログラム、記憶媒体
CN111127309B (zh) 肖像风格迁移模型训练方法、肖像风格迁移方法以及装置
WO2022002262A1 (zh) 基于计算机视觉的字符序列识别方法、装置、设备和介质
CN108875667A (zh) 目标识别方法、装置、终端设备和存储介质
CN114390209A (zh) 拍摄方法、拍摄装置、电子设备和可读存储介质
Han et al. Hybrid high dynamic range imaging fusing neuromorphic and conventional images
US11222208B2 (en) Portrait image evaluation based on aesthetics
CN114119373A (zh) 图像裁剪方法、装置及电子设备
WO2022099685A1 (zh) 用于手势识别的数据增强方法、装置、计算机设备及存储介质
CN113014817A (zh) 高清高帧视频的获取方法、装置及电子设备
Yan et al. Deeper multi-column dilated convolutional network for congested crowd understanding
WO2024041108A1 (zh) 图像矫正模型训练及图像矫正方法、装置和计算机设备
CN113610864B (zh) 图像处理方法、装置、电子设备及计算机可读存储介质
CN112101306B (zh) 基于rgb图像的精细化人脸表情捕获方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20961240

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20961240

Country of ref document: EP

Kind code of ref document: A1