WO2021139340A1 - 扩充数据的方法、装置和计算机设备 - Google Patents

扩充数据的方法、装置和计算机设备 Download PDF

Info

Publication number
WO2021139340A1
WO2021139340A1 PCT/CN2020/124728 CN2020124728W WO2021139340A1 WO 2021139340 A1 WO2021139340 A1 WO 2021139340A1 CN 2020124728 W CN2020124728 W CN 2020124728W WO 2021139340 A1 WO2021139340 A1 WO 2021139340A1
Authority
WO
WIPO (PCT)
Prior art keywords
picture
face
background
background picture
data
Prior art date
Application number
PCT/CN2020/124728
Other languages
English (en)
French (fr)
Inventor
罗天文
孟桂国
张国辉
宋晨
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021139340A1 publication Critical patent/WO2021139340A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Definitions

  • This application relates to the field of big data, in particular to methods, devices and computer equipment for expanding data.
  • the training of the deep neural network requires a large amount of label data, so that the deep neural network learns and extracts hidden patterns in the large amount of data, and makes inferences on new data through the learned hidden patterns.
  • the most commonly used open source face detection dataset WiderFace contains only 12,880 images. Even if it is calculated according to the number of faces, there are only about 170,000 faces. Not yet balanced.
  • the label value of the face detection data set is the coordinate value of the rectangular frame of the face position.
  • commonly used data expansion methods include: performing the same affine transformation on pictures and rectangular boxes to generate new data, such as rotation, scaling, and translation. However, the inventor realized that this data expansion method only caused geometric deformation of the picture, and did not change the content on the picture. For example, the people in the picture were still in the same background and did not solve the problem of data diversity.
  • the main purpose of this application is to provide a data processing method, which aims to solve the technical problem that the existing data expansion method cannot solve the data diversity.
  • This application proposes a method for expanding data, including:
  • the new pictures corresponding to each of the data elements are combined into a data augmentation set.
  • This application also provides a device for expanding data, including:
  • An acquiring module for acquiring a face picture set and a background picture set, wherein the background picture in the background picture set has no face image;
  • An arithmetic module configured to perform a Cartesian product operation on the face picture set and the background picture set to obtain a combined data set, wherein each data element in the combined data set includes a face picture and A background picture;
  • the fusion module is used to merge the face picture and the background picture included in each data element in the combined data set into a new picture respectively;
  • the combination module is used to combine the new pictures corresponding to each of the data elements into a data augmentation set.
  • the present application also provides a computer device, including a memory and a processor, the memory stores a computer program, and the method for realizing expanded data when the processor executes the computer program includes:
  • the new pictures corresponding to each of the data elements are combined into a data augmentation set.
  • the present application also provides a computer-readable storage medium on which a computer program is stored.
  • the method for implementing expanded data when the computer program is executed by a processor includes:
  • the new pictures corresponding to each of the data elements are combined into a data augmentation set.
  • This application keeps the corresponding face frame label value in the face picture unchanged during the fusion process, does not change the face frame label value that has an impact on the accuracy of the face recognition model, and only replaces different background pictures to complete
  • the substantial content of the original face image is changed to increase the diversity and richness of the image data, and realize the expansion of the number of image data, and the expanded image data greatly promotes the training of the face detection model based on deep learning and enhances the depth The accuracy and generalization performance of the learned face detection model.
  • FIG. 1 is a schematic flowchart of a method for expanding data according to an embodiment of the present application
  • FIG. 2 is a schematic diagram of the structure of an apparatus for expanding data according to an embodiment of the present application
  • Fig. 3 is a schematic diagram of the internal structure of a computer device according to an embodiment of the present application.
  • a method for expanding data according to an embodiment of the present application includes:
  • S1 Obtain a face picture set and a background picture set, where the background picture in the background picture set has no face image.
  • the aforementioned face picture set refers to a picture data set composed of face pictures
  • the aforementioned face picture refers to a picture that includes at least one face avatar in a picture.
  • the aforementioned background picture set refers to a picture data set composed of background pictures, and the aforementioned background picture does not include any face avatars.
  • the above-mentioned face picture collection and background picture collection can be obtained by linking the storage address of the above-mentioned picture data collection.
  • S2 Perform a Cartesian product operation on the face picture set and the background picture set to obtain a combined data set, where each data element in the combined data set includes a face picture and a background image.
  • the Cartesian product operation in this embodiment refers to extracting one picture from two picture data sets to form a picture combination, all picture combinations form a combined data set, and each picture combination is used as a data element.
  • the Cartesian product operation process of this embodiment is to extract a face picture m from the face picture set, and then sequentially extract a background picture i n from the background picture set, where n is a positive integer, and the face picture m
  • the corresponding data element is expressed as (m, i n ), and all face pictures m n and background pictures i n form a set of data elements, which is the above-mentioned combined data set.
  • the corresponding face frame label value in the face picture remains unchanged, and through a face picture m corresponding to n background pictures i n , multiple copies of a set of face frame label values are realized.
  • the face picture set is the WiderFace data set
  • the background picture set is the ImageNet data set
  • the number of data elements after fusion is the product of the number of data in the above two data sets.
  • the number of images in WiderFace is 12880
  • the number of filtered ImageNet images is 830000
  • the number of WideFace images in the face detection data set has increased by 830,000 times.
  • This application uses pixel fusion to merge the pixels of two or more pictures into one picture by specifying the fusion ratio, so that the pixels of two or more pictures are displayed in the same picture at the same time in.
  • the aforementioned fusion process does not change the face frame label value in the original face picture, that is, the coordinate range of the rectangular frame corresponding to the face frame remains unchanged.
  • the data expansion set of this application not only significantly increases the level of data volume, but also expands the data through the above-mentioned normal data, so that each picture after fusion contains not only the data content of the face picture, but also the data content of the background picture, which is similar So we got the picture data of the same person appearing in different backgrounds and different scenes.
  • the data expansion set of this application keeps the corresponding face frame label value in the face picture unchanged during the fusion process, does not change the face frame label value that has an impact on the accuracy of the face recognition model, and only replaces the difference
  • the background image of the original face image is changed, the diversity and richness of the image data are increased, the number of image data is expanded, and the expanded image data greatly promotes the development of the face detection model based on deep learning. Training improves the accuracy and generalization performance of the face detection model of deep learning.
  • step S3 of separately fusing the face picture and the background picture in each data element in the combined data set into a new picture includes:
  • S31 Calculate the union area area of the area area of the face picture and the area area of the background picture
  • the union area is obtained by calculating the union of the area of the face picture and the area of the background picture.
  • the above-mentioned area area can be represented by the coordinate data of the four vertices of the picture, and the coordinate data of the picture with a larger area area is taken by the union calculation as the union area area, so that the union area area can simultaneously accommodate the faces that currently need to be merged. Picture and background picture.
  • the above-mentioned union area is greater than or equal to the area of the face picture, that is, the size of the original face picture may be increased during the fusion process, but the picture does not change the position and coordinates such as translation, rotation, etc., that is, the picture’s
  • the origin coordinates are not changed, so the face frame label value of the fused picture is still equal to the face frame label value of the corresponding face picture before fusion, and the face frame label value in the original face picture is not changed.
  • the size of the union area as the limit, blank pictures of the same size are generated to gradually merge the above-mentioned face pictures and background pictures.
  • the face image and background image are overlapped and stacked in the upper left corner alignment, even if the coordinate data from the upper left corner of each image is the same, and as the starting point, they are aligned one by one according to the pixel coordinate position to match the image data.
  • the processing habits are more convenient for data processing.
  • the image data reading rule may be changed to align the upper right corner, or the lower left corner, or the lower right corner. Then, the pixels at the same pixel coordinate positions of the multiple pictures that cover the stack are merged according to the specified fusion ratio, so that the pixels of the merged pictures are displayed in the same picture at the same time.
  • the merged face image area contains not only the pixel content of the original face image, but also the pixel content of the background image, which is a semi-transparent superimposition/mixing of the two.
  • the degree of translucency above depends on the value of the specified blending ratio.
  • the value range of the specified blending ratio is any number between [0,1].
  • the pixels of the face picture and the pixels of the background picture are displayed on the blank picture through fusion, so as to realize the fusion of the same face picture and different background pictures, and realize the integration of different backgrounds and different scenes.
  • the number of face images is expanded to increase data richness.
  • step S34 of fusing the face picture and the background picture on the blank picture under a specified fusion ratio to form the new picture includes:
  • S342 According to The pixel points of the face picture are merged into the first merged picture to generate a second merged picture, where r represents the specified fusion ratio, the numerical range belongs to [0.5, 1], and a(m) represents the specified fusion ratio.
  • the face picture, p(a(m),x,y) represents the pixel value of the pixel position (x,y) on the face picture, p''(e,x,y) represents the second The pixel value of the pixel position (x, y) on the fused image.
  • the differentiated fusion is realized according to the characteristics of the data area of the picture, that is, different data areas have different fusion methods.
  • the pixel position corresponding to the background picture will be identified
  • the pixel value of the pixel position in the background picture is the pixel value corresponding to the background picture
  • the pixel value of the pixel position outside the background picture is the pixel of the blank picture. value.
  • all the pixel values of the face picture are merged to form a second fusion picture.
  • the upper left corner is aligned, and the blank picture, the background picture, and the face picture are sequentially stacked from bottom to top.
  • the pixel value of the face picture will be recognized first. If the current pixel value is the pixel value in the face picture, the pixel value of the face picture and the pixel value of the background picture will be displayed at the same time according to the fusion ratio. For pixel values other than the face image, the pixel value of the first fusion image is displayed as the value to ensure that the pixel value of the face image is still the main consideration factor in the fused image to ensure that the expanded data can be used for humans. Training of face detection model.
  • the method includes:
  • the fusion of pixels in the face area is still based on the pixel values in the face picture, and the fusion of pixels outside the face area is based on the pixel values in the background picture.
  • the pixel value ratio of the original face picture is greater than or equal to 0.5, that is, to ensure that the pixel value of the face area is the main component, so as to ensure higher accuracy of training the face detection model.
  • This application controls the value range of the designated fusion ratio to [0.5, 1] to ensure that the pixel value ratio of the original face image is greater than or equal to 0.5. After adjusting the range of the randomly generated random number in the embodiment of the present application, it is used as the designated fusion ratio.
  • step S1 of acquiring a face picture set and a background picture set includes:
  • S11 Obtain a face detection data set WiderFace, where the face detection data set WiderFace includes a face picture set M and a face frame label set F;
  • the picture set after the affine transformation of the face detection data set WiderFace is used as the face picture set, and the background data set ImageNet is used as the background picture set.
  • affine transformation is performed on the pictures in the original face detection data set WiderFace to further increase the number of face pictures used for fusion.
  • the process of affine transformation in this embodiment is as follows: after each original picture in the face detection data set WiderFace undergoes affine transformation, a result picture is obtained.
  • the above-mentioned affine transformation includes three methods: rotation, scaling, and translation.
  • the affine transformation is realized by multiplying the 2 ⁇ 3 affine transformation matrix.
  • the 2 ⁇ 3 affine transformation matrix is randomly assigned with parameters to randomly assign the above three methods. After the combination, the affine transformation is performed at the same time. During the affine transformation process, the rectangular coordinate value of the face frame in the face picture will follow and change, and the rectangular coordinate value is also multiplied by the affine transformation matrix to obtain the new coordinate value.
  • step S1 of acquiring the face picture set and the background picture set further includes:
  • S104 Perform affine transformation on each background picture i to obtain an affine transformation set B corresponding to the background picture i;
  • the image set after affine transformation of the face detection data set WideFace is used as the face image set
  • the image set after the affine transformation of the background data set ImageNet is used as the background image set to further expand The amount of data for the picture.
  • step S2 of obtaining a combined data set by performing a Cartesian product operation on the face picture set and the background picture set includes:
  • the affine-transformed image set of the face detection data set WiderFace is used as the face image set, and the background image set after the affine transformation of the background data set ImageNet is subjected to the Cartesian product operation to obtain Compared with the data volume of the face detection data set WiderFace and the background data set ImageNet after the Cartesian product operation, the data volume of the combined data set is expanded by a million times, which further increases the amount of data expansion.
  • the data expansion device includes:
  • the acquiring module 1 is used to acquire a face picture set and a background picture set, wherein the background picture in the background picture set has no face image.
  • the operation module 2 is used to perform a Cartesian product operation on the face picture set and the background picture set to obtain a combined data set, wherein each data element in the combined data set includes a face picture And a background image.
  • the fusion module 3 is used for fusing the face picture and background picture included in each data element in the combined data set into a new picture respectively.
  • the combination module 4 is used to combine the new pictures corresponding to each of the data elements into a data augmentation set.
  • the fusion module 3 includes:
  • the calculation unit is used to calculate the combined area area of the area area of the face picture and the area area of the background picture;
  • a generating unit configured to generate a blank picture on the union area area
  • the overlay unit is used to overlay the face picture and background picture on the blank picture in a way that the upper left corner is aligned;
  • the fusion unit is used for fusing the face picture and the background picture with the blank picture under a specified fusion ratio to form the new picture.
  • the fusion unit includes:
  • the first fusion subunit is used according to The pixels of the background picture are merged with the blank picture to generate a first merged picture, where (x, y) represents the pixel position on the blank picture, b(i) represents the background picture, and p (e,x,y) represents the pixel value of the pixel position (x,y) on the blank picture, p(b(i),x,y) represents the pixel position (x,y) on the background picture.
  • the pixel value of p ⁇ (e, x, y) represents the pixel value of the pixel position (x, y) on the first fusion picture;
  • the second fusion subunit is used according to The pixel points of the face picture are merged into the first merged picture to generate a second merged picture, where r represents the specified fusion ratio, the numerical range belongs to [0.5, 1], and a(m) represents the specified fusion ratio.
  • the face picture, p(a(m),x,y) represents the pixel value of the pixel position (x,y) on the face picture, p''(e,x,y) represents the second The pixel value of the pixel position (x, y) on the fused image.
  • the fusion module 3 includes:
  • the first acquiring unit is configured to acquire a randomly generated random number r ⁇ with a range of [0,1].
  • the adjustment unit is used according to The random number r ⁇ is adjusted to the specified fusion ratio r.
  • obtaining module 1 includes:
  • the second acquisition unit is used to acquire the face detection data set WiderFace, where the face detection data set WiderFace includes a face picture set M and a face frame label set F;
  • f ⁇ F and f is in the face picture m ⁇ , F ⁇ Fm
  • the first transformation unit is configured to perform affine transformation on each face picture m to obtain an affine transformation set A corresponding to the face picture m;
  • m ⁇ M,a ⁇ A ⁇ , a(Fm) ⁇ a(f)
  • obtaining module 1 further includes:
  • the third acquisition unit is used to acquire the background data set ImageNet;
  • the culling unit is used to remove the designated background picture containing the face image in the background data set ImageNet to obtain the background picture set I;
  • the second extraction unit is used to extract each background picture i in the background picture set I;
  • the second transformation unit is configured to perform affine transformation on each of the background pictures i to obtain the affine transformation set B corresponding to the background picture i;
  • the arithmetic module 2 includes:
  • the combined data set A(M) ⁇ B(I) wherein the face frame of each data element (a(m), b(i)) in the combined data set A(M) ⁇ B(I)
  • the label value is the face frame label value a(Fm) corresponding to the affine transformed face picture a(m).
  • an embodiment of the present application also provides a computer device.
  • the computer device may be a server, and its internal structure may be as shown in FIG. 3.
  • the computer equipment includes a processor, a memory, a network interface, and a database connected through a system bus. Among them, the processor designed by the computer is used to provide calculation and control capabilities.
  • the memory of the computer device includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, a computer program, and a database.
  • the memory provides an environment for the operation of the operating system and the computer program in the non-volatile storage medium.
  • the database of the computer equipment is used to store all the data required by the process of expanding the data.
  • the network interface of the computer device is used to communicate with an external terminal through a network connection.
  • the computer program is executed by the processor to realize the method of expanding data.
  • the method for the processor to execute the aforementioned expanded data includes: acquiring a face picture set and a background picture set, wherein the background picture in the background picture set has no face image; and the face picture set and the background picture set are combined Through the Cartesian product operation, a combined data set is obtained, wherein each data element in the combined data set includes a face picture and a background picture; each data element in the combined data set includes The face picture and background picture of are respectively fused into a new picture; the new picture corresponding to each of the data elements is combined into a data augmentation set.
  • the above-mentioned computer device keeps the corresponding face frame label value in the face picture unchanged during the fusion process, does not change the face frame label value that has an impact on the accuracy of the face recognition model, and only replaces different background pictures ,
  • Complete the change of the essential content of the original face image increase the diversity and richness of the image data, and realize the expansion of the number of image data, and the expanded image data greatly promotes the training of the face detection model based on deep learning. It improves the accuracy and generalization performance of the face detection model based on deep learning.
  • the above-mentioned processor merges the face picture and the background picture in each data element in the combined data set into a new picture respectively, including: calculating the area of the face picture The union area area of the area and the background picture; generate a blank picture on the union area area; overlay the face picture and the background picture on the blank picture in a way that the upper left corner is aligned; Under the specified fusion ratio, the face picture and the background picture are merged on the blank picture to form the new picture.
  • the above-mentioned processor merges the face picture and the background picture with the blank picture under a specified fusion ratio to form the new picture, including: The pixels of the background picture are merged with the blank picture to generate a first merged picture, where (x, y) represents the pixel position on the blank picture, b(i) represents the background picture, and p (e,x,y) represents the pixel value of the pixel position (x,y) on the blank picture, p(b(i),x,y) represents the pixel position (x,y) on the background picture.
  • the pixel value of p ⁇ (e,x,y) represents the pixel value of the pixel position (x,y) on the first fusion picture; according to The pixels of the face picture are merged into the first merged picture to generate a second merged picture, where r represents the specified fusion ratio, the numerical range belongs to [0.5, 1], and a(m) represents the specified fusion ratio.
  • the face picture, p(a(m),x,y) represents the pixel value of the pixel position (x,y) on the face picture
  • p''(e,x,y) represents the second The pixel value of the pixel position (x, y) on the fused image.
  • the processor merges the face picture and the background picture with the blank picture under a specified fusion ratio to form the new picture, before the step of forming the new picture includes: obtaining a randomly generated range of [ 0,1] random number r ⁇ ; according to The random number r'is adjusted to the specified fusion ratio r.
  • f ⁇ F and f is in the face picture m ⁇ , F ⁇ Fm
  • m ⁇ M ⁇ , Perform affine transformation on each face picture m to obtain the affine transformation set A corresponding to the face picture m; according to the generation process of the affine transformation set A corresponding to the face picture m, Each face image m in the face detection data set WiderFace is subjected to affine transformation, and A(M) ⁇ a(m)
  • m ⁇ M,a ⁇ A ⁇ , a(Fm)
  • the step of acquiring the face picture set and the background picture set by the above processor further includes: acquiring a background data set ImageNet; removing the designated background picture containing the face image in the background data set ImageNet to obtain Background picture set I; extract each background picture i in the background picture set I; perform affine transformation on each background picture i to obtain the affine transformation set B corresponding to the background picture i;
  • the face frame label value of each data element (a(m), b(i)) in ) ⁇ B(I) is the face frame label value a corresponding to the affine transformed face picture a(m) (Fm).
  • FIG. 3 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the computer device to which the solution of the present application is applied.
  • An embodiment of the present application also provides a computer-readable storage medium on which a computer program is stored.
  • the method for implementing expanded data when the computer program is executed by a processor includes: acquiring a face picture set and a background picture set, where all The background picture in the background picture set is an unmanned face image; the face picture set and the background picture set are subjected to a Cartesian product operation to obtain a combined data set, wherein each data element in the combined data set is equal to Including a face picture and a background picture; the face picture and the background picture included in each data element in the combined data set are respectively fused into a new picture; each data element corresponding to The new picture combination is a data augmentation set.
  • the above-mentioned computer-readable storage medium keeps the corresponding face frame label value in the face picture unchanged during the fusion process, does not change the face frame label value that has an impact on the accuracy of the face recognition model, and only replaces the difference
  • the background image of the original face image is changed, the diversity and richness of the image data are increased, the number of image data is expanded, and the expanded image data greatly promotes the development of the face detection model based on deep learning. Training improves the accuracy and generalization performance of the face detection model of deep learning.
  • the above-mentioned processor merges the face picture and the background picture in each data element in the combined data set into a new picture respectively, including: calculating the area of the face picture The union area area of the area and the background picture; generate a blank picture on the union area area; overlay the face picture and the background picture on the blank picture in a way that the upper left corner is aligned; Under the specified fusion ratio, the face picture and the background picture are merged on the blank picture to form the new picture.
  • the above-mentioned processor merges the face picture and the background picture with the blank picture under a specified fusion ratio to form the new picture, including: The pixels of the background picture are merged with the blank picture to generate a first merged picture, where (x, y) represents the pixel position on the blank picture, b(i) represents the background picture, and p (e,x,y) represents the pixel value of the pixel position (x,y) on the blank picture, p(b(i),x,y) represents the pixel position (x,y) on the background picture.
  • the pixel value of p ⁇ (e,x,y) represents the pixel value of the pixel position (x,y) on the first fusion picture; according to The pixel points of the face picture are merged into the first merged picture to generate a second merged picture, where r represents the specified fusion ratio, the numerical range belongs to [0.5, 1], and a(m) represents the specified fusion ratio.
  • the face picture, p(a(m),x,y) represents the pixel value of the pixel position (x,y) on the face picture
  • p''(e,x,y) represents the second The pixel value of the pixel position (x, y) on the fused image.
  • the processor merges the face picture and the background picture with the blank picture under a specified fusion ratio to form the new picture, before the step of forming the new picture includes: obtaining a randomly generated range of [ 0,1] random number r ⁇ ; according to The random number r'is adjusted to the specified fusion ratio r.
  • f ⁇ F and f is in the face picture m ⁇ , F ⁇ Fm
  • m ⁇ M ⁇ , Perform affine transformation on each face picture m to obtain the affine transformation set A corresponding to the face picture m; according to the generation process of the affine transformation set A corresponding to the face picture m, In the face detection data set WiderFace, each face picture m is subjected to affine transformation, and A(M) ⁇ a(m)
  • m ⁇ M,a ⁇ A ⁇ , a(Fm)
  • the step of acquiring the face picture set and the background picture set by the above processor further includes: acquiring a background data set ImageNet; removing the designated background picture containing the face image in the background data set ImageNet to obtain Background picture set I; extract each background picture i in the background picture set I; perform affine transformation on each background picture i to obtain the affine transformation set B corresponding to the background picture i;
  • the face frame label value of each data element (a(m), b(i)) in the set A(M) ⁇ B(I) is the face corresponding to the affine transformed face picture a(m) Box label value a(Fm).
  • Non-volatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in many forms, such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), dual-rate data rate SDRAM (SSRSDRAM), enhanced SDRAM (ESDRAM), synchronous Link (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Processing (AREA)
  • Processing Or Creating Images (AREA)

Abstract

本申请涉及大数据技术,揭示了扩充数据的方法,包括:获取人脸图片集和背景图片集,其中,背景图片集中的背景图片无人脸图像;将人脸图片集和背景图片集通过笛卡尔积运算,得到组合数据集,其中,组合数据集中的每个数据元素,均包括一张人脸图片和一张背景图片;将组合数据集中的每个数据元素中包括的人脸图片和背景图片,分别融合为一张新图片;将每个数据元素对应的新图片组合为数据扩充集。通过在融合过程中保持人脸图片中对应的人脸框标签值不变,仅通过替换不同的背景图片,完成原始人脸图片实质内容的改变,增加图片数据的多样性和丰富度,实现图片数据的数量扩充。

Description

扩充数据的方法、装置和计算机设备
本申请要求于2020年07月27日提交中国专利局、申请号为202010733099.1,发明名称为“扩充数据的方法、装置和计算机设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及大数据领域,特别是涉及到扩充数据的方法、装置和计算机设备。
背景技术
深度神经网络的训练需要大量的标签数据,使深度神经网络在大量数据中学习和提取数据中的隐藏模式,并通过学习到的隐藏模式对新数据做出推理。当把深度神经网络应用到人脸检测领域时,最常用的开源人脸检测数据集WiderFace,只包含了12880张图片,就算是按照人脸数量来算,也只有约17万个人脸,且数据还不均衡。人脸检测数据集的标签值是人脸位置的矩形框的坐标值。为获得更多的数据量,提高深度神经网络的训练效果,现常用的数据扩充方式包括:对图片和矩形框做相同的仿射变换产生新数据,比如旋转、缩放、平移等。但发明人意识到这种数据扩充方式只是使图片产生了几何形变,并没有改变图片上内容,比如图片中的人还是处于同样的背景下,并没有解决数据多样性的问题。
技术问题
本申请的主要目的为提供数据处理方法,旨在解决现有数据扩充方式不能解决数据多样性的技术问题。
技术解决方案
本申请提出一种扩充数据的方法,包括:
获取人脸图片集和背景图片集,其中,所述背景图片集中的背景图片无人脸图像;
将所述人脸图片集和所述背景图片集通过笛卡尔积运算,得到组合数据集,其中,所述组合数据集中的每个数据元素,均包括一张人脸图片和一张背景图片;
将所述组合数据集中的每个数据元素中包括的人脸图片和背景图片,分别融合为一张新图片;
将每个所述数据元素对应的新图片组合为数据扩充集。
本申请还提供了一种扩充数据的装置,包括:
获取模块,用于获取人脸图片集和背景图片集,其中,所述背景图片集中的背景图片无人脸图像;
运算模块,用于将所述人脸图片集和所述背景图片集通过笛卡尔积运算,得到组合数据集,其中,所述组合数据集中的每个数据元素,均包括一张人脸图片和一张背景图片;
融合模块,用于将所述组合数据集中的每个数据元素中包括的人脸图片和背景图片,分别融合为一张新图片;
组合模块,用于将每个所述数据元素对应的新图片组合为数据扩充集。
本申请还提供了一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,所述处理器执行所述计算机程序时实现扩充数据的方法,包括:
获取人脸图片集和背景图片集,其中,所述背景图片集中的背景图片无人脸图像;
将所述人脸图片集和所述背景图片集通过笛卡尔积运算,得到组合数据集,其中,所述组合数据集中的每个数据元素,均包括一张人脸图片和一张背景图片;
将所述组合数据集中的每个数据元素中包括的人脸图片和背景图片,分别融合为一张新图片;
将每个所述数据元素对应的新图片组合为数据扩充集。
本申请还提供了一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现扩充数据的方法,包括:
获取人脸图片集和背景图片集,其中,所述背景图片集中的背景图片无人脸图像;
将所述人脸图片集和所述背景图片集通过笛卡尔积运算,得到组合数据集,其中,所述组合数据集中的每个数据元素,均包括一张人脸图片和一张背景图片;
将所述组合数据集中的每个数据元素中包括的人脸图片和背景图片,分别融合为一张新图片;
将每个所述数据元素对应的新图片组合为数据扩充集。
有益效果
本申请通过在融合过程中保持人脸图片中对应的人脸框标签值不变,通过不改变对人脸识别模型精准度有影响的人脸框标签值,仅通过替换不同的背景图片,完成原始人脸图片实质内容的改变,增加图片数据的多样性和丰富度,实现图片数据的数量扩充,且扩充后的图片数据极大地促进了基于深度学习的人脸检测模型的训练,提升了深度学习的人脸检测模型的精度和泛化性能。
附图说明
图1本申请一实施例的扩充数据的方法流程示意图;
图2本申请一实施例的扩充数据的装置结构示意图;
图3本申请一实施例的计算机设备内部结构示意图。
本发明最佳的实施方式
参照图1,本申请一实施例的扩充数据的方法,包括:
S1:获取人脸图片集和背景图片集,其中,所述背景图片集中的背景图片无人脸图像。
上述人脸图片集指由人脸图片组成的图片数据集合,上述人脸图片指一张图片中至少包括一个人脸头像的图片。上述背景背景图片集指由背景图片组成的图片数据集合,上述背景图片中不包括任何人脸头像。本申请可通过链接上述图片数据集合的存储地址,得到上述人脸图片集和背景图片集。
S2:将所述人脸图片集和所述背景图片集通过笛卡尔积运算,得到组合数据集,其中,所述组合数据集中的每个数据元素,均包括一张人脸图片和一张背景图片。
本实施例的笛卡尔积运算是指,从两个图片数据集合中分别抽取一个图片组成图片组合,所有的图片组合形成组合数据集,每个图片组合作为数据元素。本实施例的笛卡尔积运算过程为,从人脸图片集中抽取一张人脸图片m,然后从背景图片集中依次有序地抽取一张背景图片i n,n属于正整数,人脸图片m对应的数据元素表示为(m,i n),所有的人脸图片m n和背景图片i n,形成的数据元素的集合,为上述组合数据集。上述笛卡尔积运算中人脸图片中对应的人脸框标签值不变,且通过一张人脸图片m对应n张背景图片i n,实现一组人脸框标签值的多次复制使用。本申请实施例中人脸图片集为WiderFace数据集,背景图片集为ImageNet数据集,融合后的数据元素的数量为上述两个数据集中数据数量的乘积。比如,WiderFace的图片数量为12880,筛选后的ImageNet图片数量为830000,得到的组合数据集的数据元素的数量为12880×830000=10690400000=106亿,数据量上得到了极大扩充,比原始的人脸检测数据集WiderFace图片数量增加了83万倍。
S3:将所述组合数据集中的每个数据元素中包括的人脸图片和背景图片,分别融合为一张新图片。
本申请通过像素融合的方式,将两张或两张以上的图片的像素点,通过指定融合比融合于一张图片中,使两张或两张以上的图片的像素点同时显示在同一张图片中。上述融合过程不改变原始人脸图片中的人脸框标签值,即人脸框对应的矩形框的坐标范围不变。
S4:将每个所述数据元素对应的新图片组合为数据扩充集。
本申请的数据扩充集,不仅使数据量的级别显著增加,而且通过上述凡是进行数据扩充,使得融合后的每张图片既包含人脸图片的数据内容,也包含了背景图片的数据内容,近似于得到了同一个人出现在不同背景、不同场景的图片数据。本申请的数据扩充集,通过在融合过程中保持人脸图片中对应的人脸框标签值不变,通过不改变对人脸识别模型精 准度有影响的人脸框标签值,仅通过替换不同的背景图片,完成原始人脸图片实质内容的改变,增加图片数据的多样性和丰富度,实现图片数据的数量扩充,且扩充后的图片数据极大地促进了基于深度学习的人脸检测模型的训练,提升了深度学习的人脸检测模型的精度和泛化性能。
进一步地,所述将所述组合数据集中的每个数据元素中的人脸图片和背景图片,分别融合为一张新图片的步骤S3,包括:
S31:计算所述人脸图片的面积区域和背景图片的面积区域的并集面积区域;
S32:在所述并集面积区域上生成空白图片;
S33:将所述人脸图片和背景图片,按照左上角对齐的方式,覆盖于所述空白图片上;
S34:在指定融合比下,将所述人脸图片和背景图片,融合于所述空白图片上,形成所述新图片。
本实施例通过将人脸图片的面积区域和背景图片的面积区域做并集计算,得到并集面积区域。上述面积区域可通过图片的四个顶点的坐标数据表示,并集计算取面积区域较大的图片的坐标数据,作为并集面积区域,使上述并集面积区域可同时容纳当前需要融合的人脸图片和背景图片。上述并集面积区域大于或等于人脸图片的面积区域,即融合过程中或许会增大原始人脸图片的大小,但融合时图片没有做平移、旋转等任何改变位置坐标的改变,即图片的原点坐标没有改变,所以融合后的图片的人脸框标签值还是等于对应的融合前人脸图片的人脸框标签值,不改变原始人脸图片中的人脸框标签值。通过以并集面积区域的大小为限,生成同样大小尺寸的空白图片,以逐步融合上述人脸图片和背景图片。在融合前,先将人脸图片和背景图片按照左上角对齐的方式覆盖堆叠,即使各图片从左上角的坐标数据相同,并作为起始点,按照像素坐标位置一一对应对齐,以符合图片数据的处理习惯,更方便数据处理。本申请其他实施例也可通过改变图片数据的读取规则,以右上角对齐,或左下角对齐,或右下角对齐的方式进行。然后将覆盖堆叠的多张图片的相同像素坐标位置上的像素点按照指定融合比进行融合,使被融合的多张图片的像素点同时显示在同一张图片中。融合后的人脸图片区域既包含了原始人脸图片的像素内容,也包含了背景图片的像素内容,是两者的半透明叠加/混合。上述半透明的程度取决于指定融合比的取值,上述指定融合比的取值范围,为[0,1]之间的任何数。本申请实施例中,通过融合将人脸图片的像素点和背景图片的像素点均显示于该空白图片上,实现对同一张人脸图片与不同背景图片的融合,实现对不同背景、不同场景下人脸图片的数量扩充,增加数据丰富度。
进一步地,所述在指定融合比下,将所述人脸图片和背景图片,融合于所述空白图片上,形成所述新图片的步骤S34,包括:
S341:根据
Figure PCTCN2020124728-appb-000001
将所述背景图片的像素点融合到所述空白图片上,生成第一融合图片,其中,(x,y)表示所述空白图片上的像素位置,b(i)表示所述背景图片,p(e,x,y)表示所述空白图片上的像素位置(x,y)的像素值,p(b(i),x,y)表示所述背景图片上的像素位置(x,y)的像素值,p`(e,x,y)表示所述第一融合图片上的像素位置(x,y)的像素值;
S342:根据
Figure PCTCN2020124728-appb-000002
将所述人脸图片的像素点融合至所述第一融合图片中,生成第二融合图片,其中,r表示所述指定融合比,数值范围属于[0.5,1],a(m)表示所述人脸图片,p(a(m),x,y)表示所述人脸图片上的像素位置(x,y)的像素值,p``(e,x,y)表示所述第二融合图片上的像素位置(x,y)的像素值。
本实施例的图片融合过程中,根据图片的数据区域特点,实现区分式融合,即不同数据区域的融合方式不同。如上述第一融合图片中,会识别背景图片对应的像素位置,位于背景图片中的像素位置的像素值为背景图片对应的像素值,背景图片之外的像素位置的像素值为空白图片的像素值。然后在第一融合图片的基础上,再将人脸图片的所有像素值融合进来,形成第二融合图片。本实施例中以左上角对齐的方式,自下往上依次堆叠空白图片、背景图片以及人脸图片。在第二融合图片中,会优先识别人脸图片的像素值,如果当前像素值是人脸图片中的像素值,则按照融合比同时显示人脸图片的像素值和背景图片的像素值,对于人脸图片之外的像素值,则以第一融合图片的像素值显示值,以保证融合后的图片中依然以人脸图片的像素值为主要考量因素,以确保扩充的数据能用于人脸检测模型的训练。
进一步地,所述在指定融合比下,将所述人脸图片和背景图片,融合于所述空白图片,形成所述新图片的步骤S34之前,包括:
S3401:获取随机生成的范围为[0,1]的随机数r`。
S3402:根据
Figure PCTCN2020124728-appb-000003
将所述随机数r`,调整为所述指定融合比r。
本实施例中,人脸区域的像素融合,依然以人脸图片中的像素值为主,而人脸区域之外的像素融合,以背景图片中的像素值为主。为了确保融合图片的人脸区域中,原来人脸图片的像素值比例大于等于0.5,即确保人脸区域的像素值是主要成分,以确保训练人脸检测模型的精度更高。本申请通过控制指定融合比的取值范围为[0.5,1],以确保原来人脸图片的像素值比例大于等于0.5。本申请实施例对随机生成的随机数进行范围调整后,作为指定融合比使用。
进一步地,所述获取人脸图片集和背景图片集的步骤S1,包括:
S11:获取人脸检测数据集WiderFace,其中,所述人脸检测数据集WiderFace包括人脸图片集M和人脸框标签集F;
S12:抽取所述人脸检测数据集WiderFace中的每一张人脸图片m,其中,所述人脸图片m的人脸框标签集为Fm,单个人脸框标签值为f,Fm={f|f∈F且f在所述人脸图片m中},F={Fm|m∈M},
Figure PCTCN2020124728-appb-000004
S13:对每一张所述人脸图片m进行仿射变换,得到人脸图片m对应的仿射变换集合A;
S14:根据人脸图片m对应的仿射变换集合A的生成过程,对人脸检测数据集WiderFace中的每一张人脸图片m进行仿射变换,以及A(M)={a(m)|m∈M,a∈A},a(Fm)={a(f)|a∈A,f∈Fm},
Figure PCTCN2020124728-appb-000005
得到所述人脸检测数据集WiderFace对应的仿射变换人脸图片集A(M),以及所述人脸检测数据集WiderFace对应的仿射变换人脸标签集为A(F),其中,a(m)为每一张仿射变换人脸图片,a(Fm)表示仿射变换人脸图片a(m)的人脸框标签集,a(f)表示仿射变换人脸图片a(m)中的单个人脸框标签值。
本实施例中,为了进一步扩充图片数据,将人脸检测数据集WiderFace经过仿射变换后的图片集作为人脸图片集,将背景数据集ImageNet作为背景图片集。在图片融合前先对原始的人脸检测数据集WiderFace中的图片进行了仿射变换,进一步提高用于融合的人脸图片的数量。本实施例的仿射变换的过程为:每张人脸检测数据集WiderFace中的原始图片经过仿射变换后,得到一张结果图片。上述仿射变换包括旋转、缩放、平移三种方式,仿射变换通过乘以2×3的仿射变换矩阵实现,2×3的仿射变换矩阵中通过随机赋予参数,将上述三种方式随机组合后同时执行仿射变换。仿射变换过程中,人脸图片中的人脸框的矩形坐标值会跟随发生变化,也是把矩形坐标值乘以仿射变换矩阵得到新的坐标值。
进一步地,所述获取人脸图片集和背景图片集的步骤S1,还包括:
S101:获取背景数据集ImageNet;
S102:剔除掉所述背景数据集ImageNet中的含有人脸图像的指定背景图片,得到背景图片集I;
S103:抽取所述背景图片集I中的每一张背景图片i;
S104:对每一张背景图片i进行仿射变换,得到背景图片i对应的仿射变换集合B;
S105:根据所述背景图片i对应的仿射变换过程,对所述背景图片集I中的每一张背景图片i进行仿射变换,以及B(I)={b(i)|i∈I,b∈B},得到背景图片集I对应的仿射变换背景图片集B(I),其中,b(i)为每一张仿射变换背景图片,b表示单个的仿射变换。
本申请实施例中,不仅将人脸检测数据集WiderFace经过仿射变换后的图片集作为人脸图片集,而且将背景数据集ImageNet经过仿射变换后的图片集作为背景图片集,以进一步扩充图片的数据量。
进一步地,所述将所述人脸图片集和所述背景图片集通过笛卡尔积运算,得到组合数据集的步骤S2,包括:
S21:根据A(M)×B(I)={(a(m),b(i))|m∈M,a∈A且i∈I,b∈B},得到所述组合数据集A(M)×B(I),其中,所述组合数据集A(M)×B(I)中每个数据元素(a(m),b(i))的人脸框标签值,为所述仿射变换人脸图片a(m)对应的人脸框标签值a(Fm)。
本申请实施例中,通过将人脸检测数据集WiderFace经过仿射变换后的图片集作为人脸图片集,与背景数据集ImageNet经过仿射变换后的背景图片集,进行笛卡尔积运算,得到的组合数据集的数据量,相比于人脸检测数据集WiderFace与背景数据集ImageNet进行笛卡尔积运算后的数据量,又扩充百万倍,进一步提高了数据扩充的数量。
参照图2,本申请一实施例的扩充数据的装置,包括:
获取模块1,用于获取人脸图片集和背景图片集,其中,所述背景图片集中的背景图片无人脸图像。
运算模块2,用于将所述人脸图片集和所述背景图片集通过笛卡尔积运算,得到组合数据集,其中,所述组合数据集中的每个数据元素,均包括一张人脸图片和一张背景图片。
融合模块3,用于将所述组合数据集中的每个数据元素中包括的人脸图片和背景图片,分别融合为一张新图片。
组合模块4,用于将每个所述数据元素对应的新图片组合为数据扩充集。
本申请装置部分的实施例解释,参照对应方法项的解释内容,不赘述。
进一步地,融合模块3,包括:
计算单元,用于计算人脸图片的面积区域和背景图片的面积区域的并集面积区域;
生成单元,用于在所述并集面积区域上生成空白图片;
覆盖单元,用于将人脸图片和背景图片,按照左上角对齐的方式,覆盖于空白图片上;
融合单元,用于在指定融合比下,将所述人脸图片和背景图片,融合于所述空白图片上,形成所述新图片。
进一步地,融合单元,包括:
第一融合子单元,用于根据
Figure PCTCN2020124728-appb-000006
将所述背景图片的像素点融合到所述空白图片上,生成第一融合图片,其中,(x,y)表示所述空白图片上的像素位置,b(i)表示所述背景图片,p(e,x,y)表示所述空白图片上的像素位置(x,y)的像素值,p(b(i),x,y)表示所述背景图片上的像素位置(x,y)的像素值,p`(e,x,y)表示所述第一融合图片上的像素位置(x,y)的像素值;
第二融合子单元,用于根据
Figure PCTCN2020124728-appb-000007
将所述人脸 图片的像素点融合至所述第一融合图片中,生成第二融合图片,其中,r表示所述指定融合比,数值范围属于[0.5,1],a(m)表示所述人脸图片,p(a(m),x,y)表示所述人脸图片上的像素位置(x,y)的像素值,p``(e,x,y)表示所述第二融合图片上的像素位置(x,y)的像素值。
进一步地,融合模块3,包括:
第一获取单元,用于获取随机生成的范围为[0,1]的随机数r`。
调整单元,用于根据
Figure PCTCN2020124728-appb-000008
将随机数r`,调整为指定融合比r。
进一步地,获取模块1,包括:
第二获取单元,用于获取人脸检测数据集WiderFace,其中,所述人脸检测数据集WiderFace包括人脸图片集M和人脸框标签集F;
第一抽取单元,用于抽取所述人脸检测数据集WiderFace中的每一张人脸图片m,其中,所述人脸图片m的人脸框标签集为Fm,单个人脸框标签值为f,Fm={f|f∈F且f在所述人脸图片m中},F={Fm|m∈M},
Figure PCTCN2020124728-appb-000009
第一变换单元,用于对每一张所述人脸图片m进行仿射变换,得到所述人脸图片m对应的仿射变换集合A;
第一得到单元,用于根据所述人脸图片m对应的仿射变换集合A的生成过程,对所述人脸检测数据集WiderFace中的每一张人脸图片m进行仿射变换,以及A(M)={a(m)|m∈M,a∈A},a(Fm)={a(f)|a∈A,f∈Fm},
Figure PCTCN2020124728-appb-000010
得到所述人脸检测数据集WiderFace对应的仿射变换人脸图片集A(M),以及所述人脸检测数据集WiderFace对应的仿射变换人脸标签集为A(F),其中,a(m)为每一张仿射变换人脸图片,a(Fm)表示仿射变换人脸图片a(m)的人脸框标签集,a(f)表示仿射变换人脸图片a(m)中的单个人脸框标签值。
进一步地,获取模块1,还包括:
第三获取单元,用于获取背景数据集ImageNet;
剔除单元,用于剔除掉所述背景数据集ImageNet中的含有人脸图像的指定背景图片,得到背景图片集I;
第二抽取单元,用于抽取所述背景图片集I中的每一张背景图片i;
第二变换单元,用于对每一张所述背景图片i进行仿射变换,得到所述背景图片i对应的仿射变换集合B;
第二得到单元,用于根据所述背景图片i对应的仿射变换过程,对所述背景图片集I中的每一张背景图片i进行仿射变换,以及B(I)={b(i)|i∈I,b∈B},得到所述背景图片集I对应的仿射变换背景图片集B(I),其中,b(i)为每一张仿射变换背景图片,b表示单个的仿射变换。
进一步地,运算模块2,包括:
第三得到单元,用于根据A(M)×B(I)={(a(m),b(i))|m∈M,a∈A且i∈I,b∈B},得到所述组合数据集A(M)×B(I),其中,所述组合数据集A(M)×B(I)中每个数据元素(a(m),b(i))的人脸框标签值,为所述仿射变换人脸图片a(m)对应的人脸框标签值a(Fm)。
参照图3,本申请实施例中还提供一种计算机设备,该计算机设备可以是服务器,其内部结构可以如图3所示。该计算机设备包括通过系统总线连接的处理器、存储器、网络接口和数据库。其中,该计算机设计的处理器用于提供计算和控制能力。该计算机设备的存储器包括非易失性存储介质、内存储器。该非易失性存储介质存储有操作系统、计算机程序和数据库。该内存器为非易失性存储介质中的操作系统和计算机程序的运行提供环境。该计算机设备的数据库用于存储扩充数据的过程需要的所有数据。该计算机设备的网络接口用于与外部的终端通过网络连接通信。该计算机程序被处理器执行时以实现扩充数据的方法。
上述处理器执行上述扩充数据的方法,包括:获取人脸图片集和背景图片集,其中,所述背景图片集中的背景图片无人脸图像;将所述人脸图片集和所述背景图片集通过笛卡尔积运算,得到组合数据集,其中,所述组合数据集中的每个数据元素,均包括一张人脸图片和一张背景图片;将所述组合数据集中的每个数据元素中包括的人脸图片和背景图片,分别融合为一张新图片;将每个所述数据元素对应的新图片组合为数据扩充集。
上述计算机设备,通过在融合过程中保持人脸图片中对应的人脸框标签值不变,通过不改变对人脸识别模型精准度有影响的人脸框标签值,仅通过替换不同的背景图片,完成原始人脸图片实质内容的改变,增加图片数据的多样性和丰富度,实现图片数据的数量扩充,且扩充后的图片数据极大地促进了基于深度学习的人脸检测模型的训练,提升了深度学习的人脸检测模型的精度和泛化性能。
在一个实施例中,上述处理器将所述组合数据集中的每个数据元素中的人脸图片和背景图片,分别融合为一张新图片的步骤,包括:计算所述人脸图片的面积区域和背景图片的面积区域的并集面积区域;在所述并集面积区域上生成空白图片;将所述人脸图片和背景图片,按照左上角对齐的方式,覆盖于所述空白图片上;在指定融合比下,将所述人脸图片和背景图片,融合于所述空白图片上,形成所述新图片。
在一个实施例中,上述处理器在指定融合比下,将所述人脸图片和背景图片,融合于所述空白图片上,形成所述新图片的步骤,包括:根据
Figure PCTCN2020124728-appb-000011
将所述背景图片的像素点融合到所述空白图片上,生成第一融合图片,其中,(x,y)表示所述空白图片上的像素位置,b(i)表示所述背景图片,p(e,x,y)表示所述空白图片上的像素位置(x,y)的像素值,p(b(i),x,y)表示所述背景图片上的像素位置(x,y)的像素值,p`(e,x,y)表示所述第一融合图片上的像素位置(x,y)的像素值;根据
Figure PCTCN2020124728-appb-000012
将所述人脸图片的像素点融合至所述第一融合图片中,生成第二融合图片,其中,r表示所述指定融合比,数值范围属于[0.5,1],a(m)表示所述人脸图片,p(a(m),x,y)表示所述人脸图片上的像素位置(x,y)的像素值,p``(e,x,y)表示所述第二融合图片上的像素位置(x,y)的像素值。
在一个实施例中,上述处理器在指定融合比下,将所述人脸图片和背景图片,融合于所述空白图片,形成所述新图片的步骤之前,包括:获取随机生成的范围为[0,1]的随机数r`;根据
Figure PCTCN2020124728-appb-000013
将所述随机数r`,调整为所述指定融合比r。
在一个实施例中,上述处理器获取人脸图片集和背景图片集的步骤,包括:获取人脸检测数据集WiderFace,其中,所述人脸检测数据集WiderFace包括人脸图片集M和人脸框标签集F;抽取所述人脸检测数据集WiderFace中的每一张人脸图片m,其中,所述人脸图片m的人脸框标签集为Fm,单个人脸框标签值为f,Fm={f|f∈F且f在所述人脸图片m中},F={Fm|m∈M},
Figure PCTCN2020124728-appb-000014
对每一张所述人脸图片m进行仿射变换,得到所述人脸图片m对应的仿射变换集合A;根据所述人脸图片m对应的仿射变换集合A的生成过程,对所述人脸检测数据集WiderFace中的每一张人脸图片m进行仿射变换,以及A(M)={a(m)|m∈M,a∈A},a(Fm)={a(f)|a∈A,f∈Fm},
Figure PCTCN2020124728-appb-000015
得到所述人脸检测数据集WiderFace对应的仿射变换人脸图片集A(M),以及所述人脸检测数据集WiderFace对应的仿射变换人脸标签集为A(F),其中,a(m)为每一张仿射变换人脸图片,a(Fm)表示仿射变换人脸图片a(m)的人脸框标签集,a(f)表示仿射变换人脸图片a(m) 中的单个人脸框标签值。
在一个实施例中,上述处理器获取人脸图片集和背景图片集的步骤,还包括:获取背景数据集ImageNet;剔除掉所述背景数据集ImageNet中的含有人脸图像的指定背景图片,得到背景图片集I;抽取所述背景图片集I中的每一张背景图片i;对每一张所述背景图片i进行仿射变换,得到所述背景图片i对应的仿射变换集合B;根据所述背景图片i对应的仿射变换过程,对所述背景图片集I中的每一张背景图片i进行仿射变换,以及B(I)={b(i)|i∈I,b∈B},得到所述背景图片集I对应的仿射变换背景图片集B(I),其中,b(i)为每一张仿射变换背景图片,b表示单个的仿射变换。
在一个实施例中,上述处理器将人脸图片集和所述背景图片集通过笛卡尔积运算,得到组合数据集的步骤,包括:根据A(M)×B(I)={(a(m),b(i))|m∈M,a∈A且i∈I,b∈B},得到所述组合数据集A(M)×B(I),其中,组合数据集A(M)×B(I)中每个数据元素(a(m),b(i))的人脸框标签值,为所述仿射变换人脸图片a(m)对应的人脸框标签值a(Fm)。
本领域技术人员可以理解,图3中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定。
本申请一实施例还提供一种计算机可读存储介质,其上存储有计算机程序,计算机程序被处理器执行时实现扩充数据的方法,包括:获取人脸图片集和背景图片集,其中,所述背景图片集中的背景图片无人脸图像;将所述人脸图片集和所述背景图片集通过笛卡尔积运算,得到组合数据集,其中,所述组合数据集中的每个数据元素,均包括一张人脸图片和一张背景图片;将所述组合数据集中的每个数据元素中包括的人脸图片和背景图片,分别融合为一张新图片;将每个所述数据元素对应的新图片组合为数据扩充集。
上述计算机可读存储介质,通过在融合过程中保持人脸图片中对应的人脸框标签值不变,通过不改变对人脸识别模型精准度有影响的人脸框标签值,仅通过替换不同的背景图片,完成原始人脸图片实质内容的改变,增加图片数据的多样性和丰富度,实现图片数据的数量扩充,且扩充后的图片数据极大地促进了基于深度学习的人脸检测模型的训练,提升了深度学习的人脸检测模型的精度和泛化性能。
在一个实施例中,上述处理器将所述组合数据集中的每个数据元素中的人脸图片和背景图片,分别融合为一张新图片的步骤,包括:计算所述人脸图片的面积区域和背景图片的面积区域的并集面积区域;在所述并集面积区域上生成空白图片;将所述人脸图片和背景图片,按照左上角对齐的方式,覆盖于所述空白图片上;在指定融合比下,将所述人脸图片和背景图片,融合于所述空白图片上,形成所述新图片。
在一个实施例中,上述处理器在指定融合比下,将所述人脸图片和背景图片,融合于所述空白图片上,形成所述新图片的步骤,包括:根据
Figure PCTCN2020124728-appb-000016
将所述背景图片的像素点融合到所述空白图片上,生成第一融合图片,其中,(x,y)表示所述空白图片上的像素位置,b(i)表示所述背景图片,p(e,x,y)表示所述空白图片上的像素位置(x,y)的像素值,p(b(i),x,y)表示所述背景图片上的像素位置(x,y)的像素值,p`(e,x,y)表示所述第一融合图片上的像素位置(x,y)的像素值;根据
Figure PCTCN2020124728-appb-000017
将所述人脸图片的像素点融合至所述第一融合图片中,生成第二融合图片,其中,r表示所述指定融合比,数值范围属于[0.5,1],a(m)表示所述人脸图片,p(a(m),x,y)表示所述人脸图片上的像素位置(x,y)的像素值,p``(e,x,y)表示所述第二融合图片上的像素位置(x,y)的像素值。
在一个实施例中,上述处理器在指定融合比下,将所述人脸图片和背景图片,融合于 所述空白图片,形成所述新图片的步骤之前,包括:获取随机生成的范围为[0,1]的随机数r`;根据
Figure PCTCN2020124728-appb-000018
将所述随机数r`,调整为所述指定融合比r。
在一个实施例中,上述处理器获取人脸图片集和背景图片集的步骤,包括:获取人脸检测数据集WiderFace,其中,所述人脸检测数据集WiderFace包括人脸图片集M和人脸框标签集F;抽取所述人脸检测数据集WiderFace中的每一张人脸图片m,其中,所述人脸图片m的人脸框标签集为Fm,单个人脸框标签值为f,Fm={f|f∈F且f在所述人脸图片m中},F={Fm|m∈M},
Figure PCTCN2020124728-appb-000019
对每一张所述人脸图片m进行仿射变换,得到所述人脸图片m对应的仿射变换集合A;根据所述人脸图片m对应的仿射变换集合A的生成过程,对所述人脸检测数据集WiderFace中的每一张人脸图片m进行仿射变换,以及A(M)={a(m)|m∈M,a∈A},a(Fm)={a(f)|a∈A,f∈Fm},
Figure PCTCN2020124728-appb-000020
得到所述人脸检测数据集WiderFace对应的仿射变换人脸图片集A(M),以及所述人脸检测数据集WiderFace对应的仿射变换人脸标签集为A(F),其中,a(m)为每一张仿射变换人脸图片,a(Fm)表示仿射变换人脸图片a(m)的人脸框标签集,a(f)表示仿射变换人脸图片a(m)中的单个人脸框标签值。
在一个实施例中,上述处理器获取人脸图片集和背景图片集的步骤,还包括:获取背景数据集ImageNet;剔除掉所述背景数据集ImageNet中的含有人脸图像的指定背景图片,得到背景图片集I;抽取所述背景图片集I中的每一张背景图片i;对每一张所述背景图片i进行仿射变换,得到所述背景图片i对应的仿射变换集合B;根据所述背景图片i对应的仿射变换过程,对所述背景图片集I中的每一张背景图片i进行仿射变换,以及B(I)={b(i)|i∈I,b∈B},得到所述背景图片集I对应的仿射变换背景图片集B(I),其中,b(i)为每一张仿射变换背景图片,b表示单个的仿射变换。
在一个实施例中,上述处理器将所述人脸图片集和所述背景图片集通过笛卡尔积运算,得到组合数据集的步骤,包括:根据A(M)×B(I)={(a(m),b(i))|m∈M,a∈A且i∈I,b∈B},得到所述组合数据集A(M)×B(I),其中,所述组合数据集A(M)×B(I)中每个数据元素(a(m),b(i))的人脸框标签值,为所述仿射变换人脸图片a(m)对应的人脸框标签值a(Fm)。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,上述的计算机程序可存储于一非易失性计算机可读取存储介质中,该计算机程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的和实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可以包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双速据率SDRAM(SSRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及存储器总线动态RAM(RDRAM)等。
需要说明的是,在本文中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的过程、装置、物品或者方法不仅包括那些要素,而且还包括没有明确列出的其他要素,或者是还包括为这种过程、装置、物品或者方法所固有的要素。在没有更多限制的情况下,由语句“包括一个……”限定的要素,并不排除在包括该要素的过程、装置、物品或者方法中还存在另外的相同要素。

Claims (20)

  1. 一种扩充数据的方法,其中,包括:
    获取人脸图片集和背景图片集,其中,所述背景图片集中的背景图片无人脸图像;
    将所述人脸图片集和所述背景图片集通过笛卡尔积运算,得到组合数据集,其中,所述组合数据集中的每个数据元素,均包括一张人脸图片和一张背景图片;
    将所述组合数据集中的每个数据元素中包括的人脸图片和背景图片,分别融合为一张新图片;
    将每个所述数据元素对应的新图片组合为数据扩充集。
  2. 根据权利要求1所述的扩充数据的方法,其中,所述将所述组合数据集中的每个数据元素中的人脸图片和背景图片,分别融合为一张新图片的步骤,包括:
    计算所述人脸图片的面积区域和背景图片的面积区域的并集面积区域;
    在所述并集面积区域上生成空白图片;
    将所述人脸图片和背景图片,按照左上角对齐的方式,覆盖于所述空白图片上;
    在指定融合比下,将所述人脸图片和背景图片,融合于所述空白图片上,形成所述新图片。
  3. 根据权利要求2所述的扩充数据的方法,其中,所述在指定融合比下,将所述人脸图片和背景图片,融合于所述空白图片上,形成所述新图片的步骤,包括:
    根据
    Figure PCTCN2020124728-appb-100001
    将所述背景图片的像素点融合到所述空白图片上,生成第一融合图片,其中,(x,y)表示所述空白图片上的像素位置,b(i)表示所述背景图片,p(e,x,y)表示所述空白图片上的像素位置(x,y)的像素值,p(b(i),x,y)表示所述背景图片上的像素位置(x,y)的像素值,p`(e,x,y)表示所述第一融合图片上的像素位置(x,y)的像素值;
    根据
    Figure PCTCN2020124728-appb-100002
    将所述人脸图片的像素点融合至所述第一融合图片中,生成第二融合图片,其中,r表示所述指定融合比,数值范围属于[0.5,1],a(m)表示所述人脸图片,p(a(m),x,y)表示所述人脸图片上的像素位置(x,y)的像素值,p``(e,x,y)表示所述第二融合图片上的像素位置(x,y)的像素值。
  4. 根据权利要求3所述的扩充数据的方法,其中,所述在指定融合比下,将所述人脸图片和背景图片,融合于所述空白图片,形成所述新图片的步骤之前,包括:
    获取随机生成的范围为[0,1]的随机数r`;
    根据
    Figure PCTCN2020124728-appb-100003
    将所述随机数r`,调整为所述指定融合比r。
  5. 根据权利要求1所述的扩充数据的方法,其中,所述获取人脸图片集和背景图片集的步骤,包括:
    获取人脸检测数据集WiderFace,其中,所述人脸检测数据集WiderFace包括人脸图片集M和人脸框标签集F;
    抽取所述人脸检测数据集WiderFace中的每一张人脸图片m,其中,所述人脸图片m的人脸框标签集为Fm,单个人脸框标签值为f,Fm={f|f∈F且f在所述人脸图片m中},
    Figure PCTCN2020124728-appb-100004
    对每一张所述人脸图片m进行仿射变换,得到所述人脸图片m对应的仿射变换集合A;
    根据所述人脸图片m对应的仿射变换集合A的生成过程,对所述人脸检测数据集WiderFace中的每一张人脸图片m进行仿射变换,以及A(M)={a(m)|m∈M,a∈A},a(Fm)={a(f)|a∈A,f∈Fm},
    Figure PCTCN2020124728-appb-100005
    得到所述人脸检测数据集WiderFace对应的仿射变换人脸图片集A(M),以及所述人脸检测数据集WiderFace对应的仿射变换人脸标签集为A(F),其中,a(m)为每一张仿射变换人脸图片,a(Fm)表示仿射变换人脸图片a(m)的人脸框标签集,a(f)表示仿射变换人脸图片a(m)中的单个人脸框标签值。
  6. 根据权利要求5所述的扩充数据的方法,其中,所述获取人脸图片集和背景图片集的步骤,还包括:
    获取背景数据集ImageNet;
    剔除掉所述背景数据集ImageNet中的含有人脸图像的指定背景图片,得到背景图片集I;
    抽取所述背景图片集I中的每一张背景图片i;
    对每一张所述背景图片i进行仿射变换,得到所述背景图片i对应的仿射变换集合B;
    根据所述背景图片i对应的仿射变换过程,对所述背景图片集I中的每一张背景图片i进行仿射变换,以及B(I)={b(i)|i∈I,b∈B},得到所述背景图片集I对应的仿射变换背景图片集B(I),其中,b(i)为每一张仿射变换背景图片,b表示单个的仿射变换。
  7. 根据权利要求6所述的扩充数据的方法,其中,所述将所述人脸图片集和所述背景图片集通过笛卡尔积运算,得到组合数据集的步骤,包括:
    根据A(M)×B(I)={(a(m),b(i))|m∈M,a∈A且i∈I,b∈B},得到所述组合数据集A(M)×B(I),其中,所述组合数据集A(M)×B(I)中每个数据元素(a(m),b(i))的人脸框标签值,为所述仿射变换人脸图片a(m)对应的人脸框标签值a(Fm)。
  8. 一种扩充数据的装置,其中,包括:
    获取模块,用于获取人脸图片集和背景图片集,其中,所述背景图片集中的背景图片无人脸图像;
    运算模块,用于将所述人脸图片集和所述背景图片集通过笛卡尔积运算,得到组合数据集,其中,所述组合数据集中的每个数据元素,均包括一张人脸图片和一张背景图片;
    融合模块,用于将所述组合数据集中的每个数据元素中包括的人脸图片和背景图片,分别融合为一张新图片;
    组合模块,用于将每个所述数据元素对应的新图片组合为数据扩充集。
  9. 一种计算机设备,包括存储器和处理器,所述存储器存储有计算机程序,其中,所述处理器执行所述计算机程序时实现扩充数据的方法,包括:获取人脸图片集和背景图片集,其中,所述背景图片集中的背景图片无人脸图像;
    将所述人脸图片集和所述背景图片集通过笛卡尔积运算,得到组合数据集,其中,所述组合数据集中的每个数据元素,均包括一张人脸图片和一张背景图片;
    将所述组合数据集中的每个数据元素中包括的人脸图片和背景图片,分别融合为一张新图片;
    将每个所述数据元素对应的新图片组合为数据扩充集。
  10. 根据权利要求9所述的计算机设备,其中,所述将所述组合数据集中的每个数据元素中的人脸图片和背景图片,分别融合为一张新图片的步骤,包括:
    计算所述人脸图片的面积区域和背景图片的面积区域的并集面积区域;
    在所述并集面积区域上生成空白图片;
    将所述人脸图片和背景图片,按照左上角对齐的方式,覆盖于所述空白图片上;
    在指定融合比下,将所述人脸图片和背景图片,融合于所述空白图片上,形成所述新图片。
  11. 根据权利要求10所述的计算机设备,其中,所述在指定融合比下,将所述人脸图片和背景图片,融合于所述空白图片上,形成所述新图片的步骤,包括:
    根据
    Figure PCTCN2020124728-appb-100006
    将所述背景图片的像素点融合到所述空白图片上,生成第一融合图片,其中,(x,y)表示所述空白图片上的像素位置,b(i)表示所述背景图片,p(e,x,y)表示所述空白图片上的像素位置(x,y)的像素值,p(b(i),x,y)表示所述背景图片上的像素位置(x,y)的像素值,p`(e,x,y)表示所述第一融合图片上的像素位置(x,y)的像素值;
    根据
    Figure PCTCN2020124728-appb-100007
    将所述人脸图片的像素点融合至所述第一融合图片中,生成第二融合图片,其中,r表示所述指定融合比,数值范围属于[0.5,1],a(m)表示所述人脸图片,p(a(m),x,y)表示所述人脸图片上的像素位置(x,y)的像素值,p``(e,x,y)表示所述第二融合图片上的像素位置(x,y)的像素值。
  12. 根据权利要求11所述的计算机设备,其中,所述在指定融合比下,将所述人脸图片和背景图片,融合于所述空白图片,形成所述新图片的步骤之前,包括:
    获取随机生成的范围为[0,1]的随机数r`;
    根据
    Figure PCTCN2020124728-appb-100008
    将所述随机数r`,调整为所述指定融合比r。
  13. 根据权利要求9所述的计算机设备,其中,所述获取人脸图片集和背景图片集的步骤,包括:
    获取人脸检测数据集WiderFace,其中,所述人脸检测数据集WiderFace包括人脸图片集M和人脸框标签集F;
    抽取所述人脸检测数据集WiderFace中的每一张人脸图片m,其中,所述人脸图片m的人脸框标签集为Fm,单个人脸框标签值为f,Fm={f|f∈F且f在所述人脸图片m中},
    Figure PCTCN2020124728-appb-100009
    对每一张所述人脸图片m进行仿射变换,得到所述人脸图片m对应的仿射变换集合A;
    根据所述人脸图片m对应的仿射变换集合A的生成过程,对所述人脸检测数据集WiderFace中的每一张人脸图片m进行仿射变换,以及A(M)={a(m)|m∈M,a∈A},a(Fm)={a(f)|a∈A,f∈Fm},
    Figure PCTCN2020124728-appb-100010
    得到所述人脸检测数据集WiderFace对应的仿射变换人脸图片集A(M),以及所述人脸检测数据集WiderFace对应的仿射变换人脸标签集为A(F),其中,a(m)为每一张仿射变换人脸图片,a(Fm)表示仿射变换人脸图片a(m)的人脸框标签集,a(f)表示仿射变换人脸图片a(m)中的单个人脸框标签值。
  14. 根据权利要求13所述的计算机设备,其中,所述获取人脸图片集和背景图片集的步骤,还包括:
    获取背景数据集ImageNet;
    剔除掉所述背景数据集ImageNet中的含有人脸图像的指定背景图片,得到背景图片集I;
    抽取所述背景图片集I中的每一张背景图片i;
    对每一张所述背景图片i进行仿射变换,得到所述背景图片i对应的仿射变换集合B;
    根据所述背景图片i对应的仿射变换过程,对所述背景图片集I中的每一张背景图片i进行仿射变换,以及B(I)={b(i)|i∈I,b∈B},得到所述背景图片集I对应的仿射变 换背景图片集B(I),其中,b(i)为每一张仿射变换背景图片,b表示单个的仿射变换。
  15. 一种计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时实现扩充数据的方法,包括:
    获取人脸图片集和背景图片集,其中,所述背景图片集中的背景图片无人脸图像;
    将所述人脸图片集和所述背景图片集通过笛卡尔积运算,得到组合数据集,其中,所述组合数据集中的每个数据元素,均包括一张人脸图片和一张背景图片;
    将所述组合数据集中的每个数据元素中包括的人脸图片和背景图片,分别融合为一张新图片;
    将每个所述数据元素对应的新图片组合为数据扩充集。
  16. 根据权利要求15所述的计算机可读存储介质,其中,所述将所述组合数据集中的每个数据元素中的人脸图片和背景图片,分别融合为一张新图片的步骤,包括:
    计算所述人脸图片的面积区域和背景图片的面积区域的并集面积区域;
    在所述并集面积区域上生成空白图片;
    将所述人脸图片和背景图片,按照左上角对齐的方式,覆盖于所述空白图片上;
    在指定融合比下,将所述人脸图片和背景图片,融合于所述空白图片上,形成所述新图片。
  17. 根据权利要求16所述的计算机可读存储介质,其中,所述在指定融合比下,将所述人脸图片和背景图片,融合于所述空白图片上,形成所述新图片的步骤,包括:
    根据
    Figure PCTCN2020124728-appb-100011
    将所述背景图片的像素点融合到所述空白图片上,生成第一融合图片,其中,(x,y)表示所述空白图片上的像素位置,b(i)表示所述背景图片,p(e,x,y)表示所述空白图片上的像素位置(x,y)的像素值,p(b(i),x,y)表示所述背景图片上的像素位置(x,y)的像素值,p`(e,x,y)表示所述第一融合图片上的像素位置(x,y)的像素值;
    根据
    Figure PCTCN2020124728-appb-100012
    将所述人脸图片的像素点融合至所述第一融合图片中,生成第二融合图片,其中,r表示所述指定融合比,数值范围属于[0.5,1],a(m)表示所述人脸图片,p(a(m),x,y)表示所述人脸图片上的像素位置(x,y)的像素值,p``(e,x,y)表示所述第二融合图片上的像素位置(x,y)的像素值。
  18. 根据权利要求17所述的计算机可读存储介质,其中,所述在指定融合比下,将所述人脸图片和背景图片,融合于所述空白图片,形成所述新图片的步骤之前,包括:
    获取随机生成的范围为[0,1]的随机数r`;
    根据
    Figure PCTCN2020124728-appb-100013
    将所述随机数r`,调整为所述指定融合比r。
  19. 根据权利要求15所述的计算机可读存储介质,其中,所述获取人脸图片集和背景图片集的步骤,包括:
    获取人脸检测数据集WiderFace,其中,所述人脸检测数据集WiderFace包括人脸图片集M和人脸框标签集F;
    抽取所述人脸检测数据集WiderFace中的每一张人脸图片m,其中,所述人脸图片m的人脸框标签集为Fm,单个人脸框标签值为f,Fm={f|f∈F且f在所述人脸图片m中},
    Figure PCTCN2020124728-appb-100014
    对每一张所述人脸图片m进行仿射变换,得到所述人脸图片m对应的仿射变换集合A;
    根据所述人脸图片m对应的仿射变换集合A的生成过程,对所述人脸检测数据集WiderFace中的每一张人脸图片m进行仿射变换,以及A(M)={a(m)|m∈M,a∈A},a(Fm)={a(f)|a∈A,f∈Fm},
    Figure PCTCN2020124728-appb-100015
    得到所述人脸检测数据集WiderFace对应的仿射变换人脸图片集A(M),以及所述人脸检测数据集WiderFace对应的仿射变换人脸标签集为A(F),其中,a(m)为每一张仿射变换人脸图片,a(Fm)表示仿射变换人脸图片a(m)的人脸框标签集,a(f)表示仿射变换人脸图片a(m)中的单个人脸框标签值。
  20. 根据权利要求19所述的计算机可读存储介质,其中,所述获取人脸图片集和背景图片集的步骤,还包括:
    获取背景数据集ImageNet;
    剔除掉所述背景数据集ImageNet中的含有人脸图像的指定背景图片,得到背景图片集I;
    抽取所述背景图片集I中的每一张背景图片i;
    对每一张所述背景图片i进行仿射变换,得到所述背景图片i对应的仿射变换集合B;
    根据所述背景图片i对应的仿射变换过程,对所述背景图片集I中的每一张背景图片i进行仿射变换,以及B(I)={b(i)|i∈I,b∈B},得到所述背景图片集I对应的仿射变换背景图片集B(I),其中,b(i)为每一张仿射变换背景图片,b表示单个的仿射变换。
PCT/CN2020/124728 2020-07-27 2020-10-29 扩充数据的方法、装置和计算机设备 WO2021139340A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010733099.1A CN111860387B (zh) 2020-07-27 2020-07-27 扩充数据的方法、装置和计算机设备
CN202010733099.1 2020-07-27

Publications (1)

Publication Number Publication Date
WO2021139340A1 true WO2021139340A1 (zh) 2021-07-15

Family

ID=72947876

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/124728 WO2021139340A1 (zh) 2020-07-27 2020-10-29 扩充数据的方法、装置和计算机设备

Country Status (2)

Country Link
CN (1) CN111860387B (zh)
WO (1) WO2021139340A1 (zh)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170161592A1 (en) * 2015-12-04 2017-06-08 Pilot Ai Labs, Inc. System and method for object detection dataset application for deep-learning algorithm training
CN108492343A (zh) * 2018-03-28 2018-09-04 东北大学 一种扩充目标识别的训练数据的图像合成方法
CN110276779A (zh) * 2019-06-04 2019-09-24 华东师范大学 一种基于前后景分割的密集人群图像生成方法
CN110287988A (zh) * 2019-05-16 2019-09-27 平安科技(深圳)有限公司 数据增强方法、装置及计算机可读存储介质
CN110852172A (zh) * 2019-10-15 2020-02-28 华东师范大学 一种基于Cycle Gan图片拼贴并增强的扩充人群计数数据集的方法
CN111415293A (zh) * 2020-03-12 2020-07-14 上海数川数据科技有限公司 基于图像目标-背景变换的数据集增强方法及系统

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2001082593A1 (en) * 2000-04-24 2001-11-01 The Government Of The United States Of America, As Represented By The Secretary Of The Navy Apparatus and method for color image fusion
CN109948093B (zh) * 2017-07-18 2023-05-23 腾讯科技(深圳)有限公司 表情图片生成方法、装置及电子设备
CN109920538B (zh) * 2019-03-07 2022-11-25 中南大学 一种基于数据增强的零样本学习方法

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170161592A1 (en) * 2015-12-04 2017-06-08 Pilot Ai Labs, Inc. System and method for object detection dataset application for deep-learning algorithm training
CN108492343A (zh) * 2018-03-28 2018-09-04 东北大学 一种扩充目标识别的训练数据的图像合成方法
CN110287988A (zh) * 2019-05-16 2019-09-27 平安科技(深圳)有限公司 数据增强方法、装置及计算机可读存储介质
CN110276779A (zh) * 2019-06-04 2019-09-24 华东师范大学 一种基于前后景分割的密集人群图像生成方法
CN110852172A (zh) * 2019-10-15 2020-02-28 华东师范大学 一种基于Cycle Gan图片拼贴并增强的扩充人群计数数据集的方法
CN111415293A (zh) * 2020-03-12 2020-07-14 上海数川数据科技有限公司 基于图像目标-背景变换的数据集增强方法及系统

Also Published As

Publication number Publication date
CN111860387B (zh) 2023-08-25
CN111860387A (zh) 2020-10-30

Similar Documents

Publication Publication Date Title
Ning et al. Multi‐view frontal face image generation: a survey
US11403874B2 (en) Virtual avatar generation method and apparatus for generating virtual avatar including user selected face property, and storage medium
CN110826395B (zh) 人脸旋转模型的生成方法、装置、计算机设备及存储介质
CN103279936B (zh) 基于画像的人脸伪照片自动合成及修正方法
US20120070042A1 (en) Automatic Face Detection and Identity Masking In Images, and Applications Thereof
CN111968134B (zh) 目标分割方法、装置、计算机可读存储介质及计算机设备
JP2016085579A (ja) 対話装置のための画像処理装置及び方法、並びに対話装置
CN113343878A (zh) 基于生成对抗网络的高保真人脸隐私保护方法和系统
CN112580572B (zh) 多任务识别模型的训练方法及使用方法、设备及存储介质
CN111753782A (zh) 一种基于双流网络的假脸检测方法、装置及电子设备
WO2021139340A1 (zh) 扩充数据的方法、装置和计算机设备
WO2022116161A1 (zh) 人像卡通化方法、机器人及存储介质
WO2022160773A1 (zh) 基于虚拟样本的行人重识别方法
Shahreza et al. Template inversion attack against face recognition systems using 3d face reconstruction
CN113160079A (zh) 人像修复模型的训练方法、人像修复方法和装置
CN112634152A (zh) 基于图像深度信息的人脸样本数据增强方法及系统
US20230260176A1 (en) System and method for face swapping with single/multiple source images using attention mechanism
WO2023066142A1 (zh) 全景图像的目标检测方法、装置、计算机设备和存储介质
CN110689063A (zh) 一种基于神经网络的证件识别的训练方法及装置
Mao et al. Robust convolutional neural network cascade for facial landmark localization exploiting training data augmentation
CN113077379A (zh) 特征潜码的提取方法及装置、设备及存储介质
CN113191942A (zh) 生成图像的方法、训练人物检测模型的方法、程序及装置
CN114943799A (zh) 一种面部图像处理方法、装置和计算机可读存储介质
Batchelor et al. The role of focus in object instance recognition
Sultana et al. Background/Foreground Separation: Guided Attention based Adversarial Modeling (GAAM) versus Robust Subspace Learning Methods

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20912842

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20912842

Country of ref document: EP

Kind code of ref document: A1