WO2022116163A1 - 人像分割方法、机器人及存储介质 - Google Patents

人像分割方法、机器人及存储介质 Download PDF

Info

Publication number
WO2022116163A1
WO2022116163A1 PCT/CN2020/133932 CN2020133932W WO2022116163A1 WO 2022116163 A1 WO2022116163 A1 WO 2022116163A1 CN 2020133932 W CN2020133932 W CN 2020133932W WO 2022116163 A1 WO2022116163 A1 WO 2022116163A1
Authority
WO
WIPO (PCT)
Prior art keywords
face
segmentation model
image
portrait
training
Prior art date
Application number
PCT/CN2020/133932
Other languages
English (en)
French (fr)
Inventor
曾钰胜
庞建新
程骏
Original Assignee
深圳市优必选科技股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市优必选科技股份有限公司 filed Critical 深圳市优必选科技股份有限公司
Priority to PCT/CN2020/133932 priority Critical patent/WO2022116163A1/zh
Publication of WO2022116163A1 publication Critical patent/WO2022116163A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition

Definitions

  • the present application relates to the field of computer technology, and in particular to a portrait segmentation method, a robot and a storage medium.
  • portrait segmentation plays a very important role. For example, in applications such as background removal and face cartoon painting, portrait segmentation is required first.
  • the portrait segmentation method realizes accurate and real-time gesture recognition.
  • a portrait segmentation method including:
  • a face segmentation model is used to perform portrait segmentation, and the face segmentation model is a lightweight network model.
  • a robot includes a memory and a processor, the memory stores a computer program, and when the computer program is executed by the processor, the processor executes the following steps:
  • a face segmentation model is used to perform portrait segmentation, and the face segmentation model is a lightweight network model.
  • a computer-readable storage medium storing a computer program, when executed by a processor, the computer program causes the processor to perform the following steps:
  • a face segmentation model is used to perform portrait segmentation, and the face segmentation model is a lightweight network model.
  • the original human image to be segmented is obtained first, then the human face in the original human image is recognized, and the human face is aligned based on the identified key points of the human face to obtain the aligned standard human. image, and then use the face segmentation model to perform portrait segmentation on the standard person image. Since the original human images are aligned before the portrait segmentation, the portrait segmentation model only needs to perform portrait segmentation on the aligned standard human images, which is beneficial to improve the accuracy of portrait segmentation, and does not require complex algorithms.
  • the face segmentation model can be implemented by a lightweight network model, which is suitable for deployment on robots with limited computing power.
  • Fig. 1 is the flowchart of the portrait segmentation method in one embodiment
  • Fig. 2 is a schematic diagram of portrait alignment in one embodiment
  • Fig. 3 is the schematic diagram before and after single person portrait segmentation in one embodiment
  • Fig. 4 is the structural block diagram of the portrait segmentation device in one embodiment
  • FIG. 5 is a schematic diagram of obtaining training character images and corresponding segmentation annotations in one embodiment
  • FIG. 6 is a structural block diagram of a portrait segmentation device in another embodiment
  • FIG. 7 is a diagram of the internal structure of the robot in one embodiment.
  • a portrait segmentation method is proposed, and the portrait segmentation method can be applied to a terminal or a server.
  • the application to a terminal is used as an example for illustration.
  • the portrait segmentation method specifically includes the following steps:
  • Step 102 Obtain the original person image to be segmented.
  • the original person image contains the person image to be segmented.
  • the proportion of the human body in different original character images is often inconsistent. For example, in some original human images, the proportion of the human body is marked on the stomach area, some of the human body is marked on the legs, and some are marked on the shoulders, etc. . If the human images in these original human images are directly segmented, the portrait segmentation model must be adapted to various situations, and the relative model design will be particularly complicated, resulting in a large amount of calculation, and because the robot side often has limited computing power, it is not applicable. for use on the robot side.
  • the original character image may be an image directly captured by a camera, or an image obtained from an album.
  • the terminal is a robot terminal.
  • Step 104 identify the face in the original character image, and perform face alignment according to the identified face key points to obtain an aligned standard character image.
  • the face key points refer to the feature points that reflect the facial features of the face, including: the feature points of eyebrows, eyes, nose, mouth and facial contour.
  • the face alignment method is used to align the original person images to obtain the aligned standard person images.
  • the standard person image refers to a preset normalized person image.
  • the hair in the standard task image can be set as the starting position, and the shoulder area is the ending position.
  • the process of face alignment is equivalent to equidistant transformation + uniform scale scaling, and its effect has the invariant characteristics of angle, parallelism and verticality.
  • the face in the aligned standard human image is positive, and the proportion of the human body conforms to the preset proportion rule.
  • FIG. 2 it is a schematic diagram of portrait alignment in one embodiment.
  • the goal of face alignment is to map five key points in the face (left eye, right eye, nose, mouth left, and mouth right) to specified positions in the target space, while other parts undergo undistorted changes.
  • the role of the 5 key points is to map the face to the frontal face, and then the other parts are mapped to the target space accordingly.
  • the selection of the target space is based on the proportion of the human body. If the proportion of the human body is small, the corresponding target space is also relatively small.
  • Step 106 using a face segmentation model to perform portrait segmentation based on the aligned standard person images, and the face segmentation model is a lightweight network model.
  • the face segmentation model does not need to involve complex networks when training and learning standard human images, and can be implemented by using a lightweight network model, which is not only conducive to improving the performance of portrait segmentation. It has high accuracy and is conducive to improving the speed of portrait segmentation. It is suitable for deployment on the robot end with limited computing power, so as to realize portrait segmentation on the robot end.
  • the above-mentioned portrait segmentation method is especially suitable for single portrait segmentation, as shown in FIG. 3 , which is a schematic diagram before and after segmentation of a single portrait in one embodiment.
  • the above-mentioned portrait segmentation method first obtains the original person image to be segmented, then recognizes the face in the original person image, aligns the face based on the identified key points of the face, and obtains the aligned standard person image.
  • the face segmentation model performs portrait segmentation on the standard person image. Since the original human images are aligned before the portrait segmentation, the portrait segmentation model only needs to perform portrait segmentation on the aligned standard human images, which is beneficial to improve the accuracy of portrait segmentation, and does not require complex algorithms.
  • the face segmentation model can be implemented by a lightweight network model, which is suitable for deployment on robots with limited computing power.
  • recognizing the face in the original character image aligning the face according to the key points of the face obtained by identification, to obtain an aligned standard character image, further comprising: aligning the face The key points are mapped to the specified position in the preset space, and the standard human image after alignment in the preset space is obtained.
  • the traditional face alignment is only to align the face, and the preset space is often relatively small, for example, the size is 112X112, and then in this limited space, set the coordinate positions of five key points after mapping.
  • the coordinate positions of the 5 key points are ⁇ [38.2946, 51.6963], [73.5318, 51.5014], [56.0252, 71.7366], [41.5493, 92.3655] , [70.7299, 92.2041] ⁇ .
  • the standard person image needs to contain not only the face area, but also needs to be extended to other parts, for example, to the shoulders. Then the corresponding space needs to be expanded.
  • the size needs to be set to 256X256, and the coordinate positions of the corresponding 5 key points also need to be changed, so that the hair above the face and the parts from the bottom of the face to the shoulders can be displayed in the standard character image. middle.
  • the target coordinate position where the key points of the face are mapped to the preset space is first determined according to the preset proportion of the human body in the image, and the target coordinate position is the designated position. For example, if the preset proportion of the human body is from the hair Starting from the shoulder, when the face is aligned, in order to reserve the mapping space for the hair part and the part below the head, it is necessary to map the coordinates of the key points of the face to the middle part of the image as much as possible.
  • the ordinate of the left eye and the right eye can be lowered, and the abscissa of the left eye and the abscissa of the right eye can be moved closer to the middle of the image, that is Expand the abscissa of the left eye and decrease the abscissa of the right eye. In this way, space can be reserved for the upper part of the hair, and space is reserved for the left and right parts of the face.
  • using a face segmentation model to perform portrait segmentation based on the aligned standard person images includes: using the aligned person images as an input of a portrait segmentation model, and the portrait segmentation model is used for Segment the target person image from the standard person image; and obtain the segmented target person image output by the portrait segmentation model.
  • the portrait segmentation model is used to segment the target person image in the standard person image to obtain the target person image.
  • the portrait segmentation model is implemented by a lightweight convolutional neural network mobilenetv2.
  • the training method of the portrait segmentation model is as follows: obtaining a training data set, the training data set includes training person images and corresponding segmentation annotations, and the training person images and corresponding segmentation annotations are obtained by It is obtained by aligning and segmenting a human body image set and corresponding human body segmentation annotations; the training human image is used as the input of the portrait segmentation model, and the corresponding segmentation annotation is used as the desired output for training to obtain the target portrait segmentation model.
  • the training of the portrait segmentation model requires the collection of training data sets.
  • the collection of training data sets often requires a lot of manpower and material resources, because not only the training person images need to be obtained, but also the training character images need to be segmented and labeled.
  • the existing human body image set and the corresponding human body segmentation annotation are innovatively aligned and segmented to obtain the training person image and the corresponding segmentation annotation. Since there are some open source human image sets and corresponding human segmentation annotations on the Internet, there are no human images and corresponding segmentation annotations.
  • the human body image refers to a human image including a whole body, and the human image refers to a human image mainly including a face.
  • FIG. 4 it is a schematic diagram of obtaining training person images and corresponding segmentation annotations in one embodiment. By aligning the human image and the segmentation annotations corresponding to the human image, a schematic diagram of the human image and the segmentation annotation of the human image is obtained.
  • using the training person image as the input of the portrait segmentation model, and using the corresponding segmentation annotation as the desired output to train to obtain the target portrait segmentation model includes: using the training person image as the target portrait segmentation model.
  • the input of the portrait segmentation model is obtained, and the actual output of the portrait segmentation model is obtained; the loss value is calculated by using the dice loss function according to the actual output and the expected output, and the portrait is updated according to the loss value using the back propagation algorithm. Divide the weights in the model so that the loss value changes in a decreasing direction until convergence.
  • the training dataset is used for supervised training of the portrait segmentation model.
  • the training person image is used as the input of the portrait segmentation model, the actual output of the portrait segmentation model is obtained, the loss value is calculated according to the actual output and the expected output, and then the weight in the portrait segmentation model is reversely adjusted according to the loss value, so that the Reduce the loss value and repeat the above steps until the final loss value converges and the model training is completed.
  • the dice function as the loss function is beneficial to improve the training accuracy of the portrait segmentation model.
  • the calculation formula of the dice loss function is: represents the intersection between X and Y;
  • the coefficient 2 in the numerator is because the common element between X and Y is repeatedly calculated in the denominator.
  • X and Y represent the expected output and the actual output, respectively.
  • the portrait segmentation model is obtained by training a convolutional neural network, and includes a plurality of convolutional layers, and the convolutional layers are used to perform feature extraction on images; before using the convolutional layers to perform feature extraction, It also includes: performing edge augmentation on the image, so that the resolution of the image obtained after the convolution is consistent with the resolution of the input standard person image.
  • the edge of the image is enlarged, that is, the image is enlarged, and then the convolution operation is performed based on the enlarged image, so that the result obtained after the convolution operation is obtained.
  • the resolution of the standard person image is consistent with the resolution of the original input person image, so that the accuracy of the portrait segmentation can be guaranteed.
  • the portrait segmentation model is obtained by improving the original Unet network.
  • Unet is an existing deep learning segmentation network.
  • Unet's network was originally applied in medical images. The characteristics of medical images are that the resolution is large, the details are obvious, and it is easy to segment.
  • the portrait segmentation task in order to improve the accuracy and speed of portrait segmentation, the portrait and annotations are simplified, and a 256X256 size image can be used as input. And in order to avoid the loss of semantic information, before the convolution, the edge of the image is amplified to keep the input and output consistent, that is, the output result of 256X256 is finally obtained.
  • this solution adopts 256X256 resolution for training, which is conducive to further improving the speed while ensuring the accuracy, and has the advantages of high speed and small memory usage.
  • a device for segmenting a portrait including:
  • Alignment module 504 is used to identify the human face in the original character image, and perform face alignment according to the identified face key points to obtain the aligned standard character image;
  • the segmentation module 506 is configured to use a face segmentation model to perform portrait segmentation based on the aligned standard person images, where the face segmentation model is a lightweight network model.
  • the alignment module is further configured to map the face key points to designated positions in a preset space to obtain a standard person image aligned in the preset space.
  • the alignment module is further configured to determine the target coordinate position of the key point of the face in the preset space according to the preset proportion of the human body, and the target coordinate position is used as the designated position.
  • the segmentation module is further configured to use the aligned person image as an input of a portrait segmentation model, and the portrait segmentation model is used to segment the target person image from the standard person image; The target person image output by the portrait segmentation model.
  • the above-mentioned apparatus for segmenting a portrait further includes:
  • the training module 501 is used to obtain a training data set, the training data set includes a training person image and a corresponding segmentation annotation, and the training person image and the corresponding segmentation annotation are marked by dividing the existing human body image set and the corresponding human body. Obtained by performing alignment and segmentation, the training person image is used as the input of the portrait segmentation model, and the corresponding segmentation annotation is used as the desired output for training to obtain the target portrait segmentation model.
  • the training module is further configured to use the training person image as the input of the portrait segmentation model, and obtain the actual output of the portrait segmentation model; adopt dice according to the actual output and the expected output
  • the loss function is calculated to obtain a loss value, and the weight in the portrait segmentation model is updated according to the loss value by using a back-propagation algorithm, so that the loss value changes in a decreasing direction until convergence.
  • the portrait segmentation model is obtained by training a convolutional neural network, and includes a plurality of convolutional layers, and the convolutional layers are used to perform feature extraction on the image; the segmentation module is also used to perform feature extraction on the image. Edge augmentation is performed so that the resolution of the image obtained after convolution is consistent with the resolution of the input standard person image.
  • Figure 7 shows an internal structure diagram of the robot in one embodiment.
  • the robot includes a processor, memory, camera, and network interface connected through a system bus.
  • the memory includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium of the robot stores an operating system, and also stores a computer program.
  • the processor can implement the above-mentioned method for dividing a portrait.
  • a computer program may also be stored in the internal memory, and when the computer program is executed by the processor, the processor may execute the above-mentioned method for dividing a portrait.
  • FIG. 7 is only a block diagram of a partial structure related to the solution of the present application, and does not constitute a limitation on the robot to which the solution of the present application is applied. More or fewer components are shown in the figures, either in combination or with different arrangements of components.
  • a robot including a memory and a processor, wherein the memory stores a computer program, and when the computer program is executed by the processor, the processor is caused to perform the following steps: obtaining a to-be-divided The original character image; the face in the original character image is identified, and the face is aligned according to the key points of the face obtained by identification, and the aligned standard character image is obtained; based on the aligned standard character image, a human
  • the face segmentation model performs portrait segmentation
  • the face segmentation model is a lightweight network model.
  • recognizing the face in the original character image aligning the face according to the key points of the face obtained by identification, to obtain an aligned standard character image, further comprising: aligning the face The key points are mapped to the specified position in the preset space, and the standard human image after alignment in the preset space is obtained.
  • the method before the mapping of the face key points to the designated positions in the preset space, the method further includes: determining the target coordinate positions of the face key points in the preset space according to a preset proportion of the human body, and the The target coordinate position serves as the designated position.
  • using a face segmentation model to perform portrait segmentation based on the aligned standard person images includes: using the aligned person images as an input of a portrait segmentation model, and the portrait segmentation model is used for Segment the target person image from the standard person image; and obtain the target person image output by the portrait segmentation model.
  • the training method of the portrait segmentation model is as follows: obtaining a training data set, the training data set includes training person images and corresponding segmentation annotations, and the training person images and corresponding segmentation annotations are obtained by It is obtained by aligning and segmenting a human body image set and corresponding human body segmentation annotations; the training human image is used as the input of the portrait segmentation model, and the corresponding segmentation annotation is used as the desired output for training to obtain the target portrait segmentation model.
  • using the training person image as the input of the portrait segmentation model, and using the corresponding segmentation annotation as the desired output to train to obtain the target portrait segmentation model includes: using the training person image as the target portrait segmentation model.
  • the input of the portrait segmentation model is obtained, and the actual output of the portrait segmentation model is obtained; the loss value is calculated by using the dice loss function according to the actual output and the expected output, and the portrait is updated according to the loss value using the back propagation algorithm. Divide the weights in the model so that the loss value changes in a decreasing direction until convergence.
  • the portrait segmentation model is obtained by training a convolutional neural network, and includes a plurality of convolutional layers, and the convolutional layers are used to perform feature extraction on images; before using the convolutional layers for feature extraction , and further comprising: performing edge amplification on the image, so that the resolution of the image obtained after the convolution is consistent with the resolution of the input standard person image.
  • a computer-readable storage medium which stores a computer program.
  • the processor causes the processor to perform the following steps: acquiring the original character image to be segmented; The face in the original character image is identified, and face alignment is performed according to the identified face key points to obtain an aligned standard character image; based on the aligned standard character image, a face segmentation model is used to perform portrait segmentation,
  • the face segmentation model is a lightweight network model.
  • recognizing the face in the original character image aligning the face according to the key points of the face obtained by identification, to obtain an aligned standard character image, further comprising: aligning the face The key points are mapped to the specified position in the preset space, and the standard human image after alignment in the preset space is obtained.
  • the method before the mapping of the face key points to the designated positions in the preset space, the method further includes: determining the target coordinate positions of the face key points in the preset space according to a preset proportion of the human body, and the The target coordinate position serves as the designated position.
  • using a face segmentation model to perform portrait segmentation based on the aligned standard person images includes: using the aligned person images as an input of a portrait segmentation model, and the portrait segmentation model is used for Segment the target person image from the standard person image; and obtain the target person image output by the portrait segmentation model.
  • the training method of the portrait segmentation model is as follows: obtaining a training data set, the training data set includes training person images and corresponding segmentation annotations, and the training person images and corresponding segmentation annotations are obtained by It is obtained by aligning and segmenting a human body image set and corresponding human body segmentation annotations; the training human image is used as the input of the portrait segmentation model, and the corresponding segmentation annotation is used as the desired output for training to obtain the target portrait segmentation model.
  • using the training person image as the input of the portrait segmentation model, and using the corresponding segmentation annotation as the desired output to train to obtain the target portrait segmentation model includes: using the training person image as the target portrait segmentation model.
  • the input of the portrait segmentation model is obtained, and the actual output of the portrait segmentation model is obtained; the loss value is calculated by using the dice loss function according to the actual output and the expected output, and the portrait is updated according to the loss value using the back propagation algorithm. Divide the weights in the model so that the loss value changes in a decreasing direction until convergence.
  • the portrait segmentation model is obtained by training a convolutional neural network, and includes a plurality of convolutional layers, and the convolutional layers are used to perform feature extraction on images; before using the convolutional layers for feature extraction , and further comprising: performing edge amplification on the image, so that the resolution of the image obtained after the convolution is consistent with the resolution of the input standard person image.
  • Nonvolatile memory may include read only memory (ROM), programmable ROM (PROM), electrically programmable ROM (EPROM), electrically erasable programmable ROM (EEPROM), or flash memory.
  • Volatile memory may include random access memory (RAM) or external cache memory.
  • RAM is available in various forms such as static RAM (SRAM), dynamic RAM (DRAM), synchronous DRAM (SDRAM), double data rate SDRAM (DDRSDRAM), enhanced SDRAM (ESDRAM), synchronous chain Road (Synchlink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic RAM (DRDRAM), and memory bus dynamic RAM (RDRAM), etc.
  • SRAM static RAM
  • DRAM dynamic RAM
  • SDRAM synchronous DRAM
  • DDRSDRAM double data rate SDRAM
  • ESDRAM enhanced SDRAM
  • SLDRAM synchronous chain Road (Synchlink) DRAM
  • SLDRAM synchronous chain Road (Synchlink) DRAM
  • Rambus direct RAM
  • DRAM direct memory bus dynamic RAM
  • RDRAM memory bus dynamic RAM

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Image Analysis (AREA)
  • Processing Or Creating Images (AREA)

Abstract

一种人像分割方法,包括:获取待分割的原始人物图像(102);对原始人物图像中的人脸进行识别,根据识别得到的人脸关键点进行人脸对齐,得到对齐后的标准人物图像(104);基于对齐后的标准人物图像采用人脸分割模型进行人像分割,人脸分割模型为轻量型网络模型(106)。上述人像分割方法不需要设置复杂的人脸分割模型,适用于在算力有限的机器人端使用。此外,还提出了一种机器人及存储介质。

Description

人像分割方法、机器人及存储介质 技术领域
本申请涉及计算机技术领域,具体涉及一种人像分割方法、机器人及存储介质。
背景技术
目前在一些人脸应用里面,人像分割扮演着非常重要的角色,比如,在背景去除、人脸卡通画等应用中首先需要进行人像分割。
目前人像分割的数据集的定义没有非常明确的标准,有一些人像数据的人体占比是不一致的,有一些是标到了肩膀,有一些是标到肚子区域,还有一些是标到了腿部。数据的多样性会给人像分割带来一定的挑战,在这种情况下为了保证一定的精度,分割的模型就会设计地比较复杂。
由于机器人端算力有限,复杂的人像分割模型在机器人端很难使用,因此,亟需一种可以在机器人端使用的人像分割方法。
发明内容
基于此,有必要针对上述问题,提出一种适用于在机器人端使用的人像分割方法、机器人及存储介质,该人像分割方法实现了准确并实时地对手势进行识别。
一种人像分割方法,包括:
获取待分割的原始人物图像;
对所述原始人物图像中的人脸进行识别,根据识别得到的人脸关键点进行人脸对齐,得到对齐后的标准人物图像;
基于所述对齐后的标准人物图像采用人脸分割模型进行人像分割,所述人脸分割模型为轻量型网络模型。
一种机器人,包括存储器和处理器,所述存储器存储有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行以下步骤:
获取待分割的原始人物图像;
对所述原始人物图像中的人脸进行识别,根据识别得到的人脸关键点进行人脸对齐,得到对齐后的标准人物图像;
基于所述对齐后的标准人物图像采用人脸分割模型进行人像分割,所述人脸分割模型为轻量型网络模型。
一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时,使得所述处理器执行以下步骤:
获取待分割的原始人物图像;
对所述原始人物图像中的人脸进行识别,根据识别得到的人脸关键点进行人脸对齐,得到对齐后的标准人物图像;
基于所述对齐后的标准人物图像采用人脸分割模型进行人像分割,所述人脸分割模型为轻量型网络模型。
上述人像分割方法、机器人及存储介质,首先获取待分割的原始人物图像,然后对原始人物图像中的人脸进行识别,基于识别到的人脸关键点进行人脸对齐,得到对齐后的标准人物图像,然后采用人脸分割模型对该标准人物图像进行人像分割。由于在进行人像分割之前,将原始人物图像进行了对齐处理,人像分割模型只需要针对对齐后的标准人物图像进行人像分割,有利于提高人像分割的准确度,而且不需要设置复杂的算法,人脸分割模型采用轻量化网络模型即可实现,适用于部署在算力有限的机器人端。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施 例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
其中:
图1是一个实施例中人像分割方法的流程图;
图2是一个实施例中人像对齐的示意图;
图3是一个实施例中单人人像分割前后的示意图;
图4是一个实施例中人像分割装置的结构框图;
图5是一个实施例中得到训练人物图像和对应的分割标注的示意图;
图6是另一个实施例中人像分割装置的结构框图;
图7是一个实施例中机器人的内部结构图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
如图1所示,提出了一种人像分割方法,该人像分割方法可以应用于终端,也可以应用于服务器,本实施例以应用于终端举例说明。该人像分割方法具体包括以下步骤:
步骤102,获取待分割的原始人物图像。
其中,原始人物图像中包含有待分割的人物图像。不同原始人物图像中人体占比往往是不一致的,比如,有些原始人物图像中人体占比是标到肚子区域,有些人体占比是标到了腿部,还有一些人体占比是标到了肩膀等。如果直接对这些原始人物图像中的人物图像进行分割,那么势必需要人像分割模型适应多种情况,相对的模型设计会特别复杂,导致计算量大,而由于机器人端往往算 力有限,所以不适用于在机器人端使用。原始人物图像可以是通过摄像头直接拍摄得到的图像,也可以是从相册中获取到的图像。在一个实施例中,终端为机器人端。
步骤104,对原始人物图像中的人脸进行识别,根据识别得到的人脸关键点进行人脸对齐,得到对齐后的标准人物图像。
其中,人脸关键点是指反映人脸面部特征的特征点,包括:眉毛、眼睛、鼻子、嘴巴和脸部轮廓的特征点。为了便于后续分割,采用人脸对齐方式将原始人物图像进行对齐,得到对齐后的标准人物图像。
标准人物图像是指预设的规范化的人物图像,比如,可以设置标准任务图像中头发为起始位置,肩部区域为终点位置。人脸对齐的过程相当于等距变换+均匀尺度缩放,其效果具有角度、平行性和垂直性不变特性。对齐后的标准人物图像中人脸是正向的,人体占比符合预设的占比规则。如图2所示,为一个实施例中,人像对齐的示意图。
具体地,人脸对齐的目标是人脸中的5个关键点(左眼、右眼、鼻子、嘴左、嘴右)映射到目标空间的指定位置,而其他部分发生非失真的变化。5个关键点的作用是将人脸映射为正脸,然后其他部分随之也相应地进行映射到目标空间。目标空间的选择是根据人体占比进行选择的,如果人体占比小,相应的目标空间也比较小。
步骤106,基于对齐后的标准人物图像采用人脸分割模型进行人像分割,人脸分割模型为轻量型网络模型。
其中,由于标准人物图像具有统一的单一特点,那么人脸分割模型在针对标准人物图像进行训练学习时,不需要涉及复杂的网络,采用轻量化网络模型即可实现,不仅有利于提高人像分割的准确度,而且有利于提高人像分割的速度,适合部署在算力有限的机器人端,从而实现在机器人端进行人像分割。上述人像分割方法尤其适用于单人像分割,如图3所示,为一个实施例中,单人人像分割前后的示意图。
上述人像分割方法,首先获取待分割的原始人物图像,然后对原始人物图像中的人脸进行识别,基于识别到的人脸关键点进行人脸对齐,得到对齐后的标准人物图像,然后采用人脸分割模型对该标准人物图像进行人像分割。由于在进行人像分割之前,将原始人物图像进行了对齐处理,人像分割模型只需要针对对齐后的标准人物图像进行人像分割,有利于提高人像分割的准确度,而且不需要设置复杂的算法,人脸分割模型采用轻量化网络模型即可实现,适用于部署在算力有限的机器人端。
在一个实施例中,所述对所述原始人物图像中的人脸进行识别,根据识别得到的人脸关键点进行人脸对齐,得到对齐后的标准人物图像,还包括:将所述人脸关键点映射到预设空间的指定位置,得到预设空间内对齐后的标准人物图像。
其中,传统的人脸对齐仅仅是将脸部进行对齐,预设空间往往比较小,比如,尺寸在112X112,然后在该有限的空间内,设置五个关键点映射后的坐标位置。比如,5个关键点(左眼、右眼、鼻子、嘴左、嘴右)的坐标位置分别为{[38.2946,51.6963],[73.5318,51.5014],[56.0252,71.7366],[41.5493,92.3655],[70.7299,92.2041]}。如果标准人物图像不仅需要包含人脸区域,还需要扩展到其他部分,比如,延至肩膀处。那么相应的空间需要扩大,比如,尺寸需要设置为256X256,且相应的5个关键点的坐标位置也需要发生改变,以使得脸部上方头发和脸部下方到肩膀的部位可以显示在标准人物图像中。
在一个实施例中,首先根据图像中预设的人体占比确定人脸关键点映射到预设空间的目标坐标位置,目标坐标位置即指定位置,比如,如果预设的人体占比是从头发开始到肩部,那么在人脸对齐时,为了给头发部分和头以下的部分预留出映射空间,需要将人脸关键点坐标尽量映射到图像中间部位。以图像左下角为原点坐标进行说明,与传统的人脸对齐相比,可以将左眼、右眼位置的纵坐标降低,将左眼的横坐标和右眼的横坐标往图像中间靠近,即扩大左眼 的横坐标,减少右眼的横坐标。这样可以为上面的头发部分预留空间,同时为脸的左右部分预留空间,同样的原理,鼻子的横坐标保持不变,纵坐标减少,为脸部下面预留空间,同时嘴左和嘴右的横坐标往图像中间靠拢,纵坐标增加等。
在一个实施例中,所述基于所述对齐后的标准人物图像采用人脸分割模型进行人像分割,包括:将所述对齐后的人物图像作为人像分割模型的输入,所述人像分割模型用于从所述标准人物图像中分割出目标人物图像;获取所述人像分割模型输出的分割得到的目标人物图像。
其中,人像分割模型用于对标准人物图像中的目标人物图像进行分割,得到目标人物图像。人像分割模型采用轻量化的卷积神经网络mobilenetv2来实现。
在一个实施例中,所述人像分割模型的训练方式如下:获取训练数据集,所述训练数据集中包括训练人物图像和对应的分割标注,所述训练人物图像和对应的分割标注是通过对已有人体图像集以及对应的人体分割标注进行对齐分割得到的;将所述训练人物图像作为所述人像分割模型的输入,将对应的分割标注作为期望的输出进行训练得到目标人像分割模型。
其中,对人像分割模型进行训练需要进行训练数据集的收集。训练数据集的收集往往需要耗费比较大的人力物力,因为不仅需要获取训练人物图像,还需要对训练人物图像进行分割标注。为了提高训练数据集的收集速度,在本申请中创新性地对已有人体图像集以及对应的人体分割标注进行对齐分割得到训练人物图像以及对应的分割标注。由于网上目前已经存在一些开源的人体图像集以及对应的人体分割标注,而不存在人物图像和对应的分割标注。人体图像是指包含有全身的人物图像,而人物图像是指主要包含有脸部的人物图像。如图4所示,为一个实施例中,得到训练人物图像和对应的分割标注的示意图。通过将人体图像和人体图像对应的分割标注进行对齐得到人物图像以及人物 图像的分割标注的示意图。
在一个实施例中,所述将所述训练人物图像作为所述人像分割模型的输入,将对应的分割标注作为期望的输出进行训练得到目标人像分割模型,包括:将所述训练人物图像作为所述人像分割模型的输入,获取所述人像分割模型的实际输出;根据所述实际输出和所述期望输出采用dice损失函数计算得到损失值,根据所述损失值利用反向传播算法更新所述人像分割模型中的权重,以使所述损失值朝着减小的方向改变,直到收敛。
其中,对人像分割模型采用训练数据集进行有监督的训练。具体训练时,将训练人物图像作为人像分割模型的输入,获取人像分割模型的实际输出,根据实际输出与期望输出计算得到损失值,然后根据损失值来反向调节人像分割模型中的权重,以减少损失值,重复上述步骤,直至最后损失值收敛,模型训练完毕。通过实验表明,采用dice函数作为损失函数有利于提高人像分割模型训练的准确度。具体地,dice损失函数的计算公式为:
Figure PCTCN2020133932-appb-000001
表示X和Y之间的交集;|X|和|Y|分别表示X和Y的元素个数。其中,分子中的系数2,是因为分母存在重复计算X和Y之间的共同元素的原因.语义分割问题而言,X和Y分别表示期望输出和实际输出。
在一个实施例中,所述人像分割模型采用卷积神经网络训练得到,包括多个卷积层,所述卷积层用于对图像进行特征提取;采用所述卷积层进行特征提取之前,还包括:对所述图像进行边缘扩增,以使进行卷积后得到的图像分辨率与输入的标准人物图像的分辨率一致。
其中,为了使得卷积前后图像分辨率保持不变,在进行卷积之前,先对图像进行边缘扩增,即扩大图像,然后再基于扩大图像进行卷积操作,以使得卷积操作后得到的标准人物图像分辨率与原始输入的人物图像的分辨率保持一致,从而可以保证人像分割的精度。
在一个实施例中,人像分割模型是通过对原始的Unet网络进行改进得到的。Unet是现有的一种深度学习分割网络。Unet的网络最初是在医学图像应用的,医学图像的特点是分辨率大,细节才明显,才容易进行分割。而在人像分割任务上,为了提高人像分割精度和速度,对人像以及标注进行了精简,使用256X256大小的图像作为输入即可。而且为了避免语义信息的丢失,在进行卷积之前,对图像进行了边缘扩增,保持输入和输出一致,即最后得到256X256的输出结果。相对于传统的512X384的尺寸,本方案采用256X256的分辨率进行训练,有利于保证精度的同时进一步提高速度,具有速度快,显存占用小的优势。
在上述256x256unet基础上,为了进一步提高速度,把Unet网络中的encoder部分的特征提取换成了mobilenetv2(一种轻量型网络模型),在decoder部分把卷积换成了可分离卷积,速度进一步得到加快。
如图5所示,在一个实施例中,提出了一种人像分割装置,包括:
获取模块502,用于获取待分割的原始人物图像;
对齐模块504,用于对所述原始人物图像中的人脸进行识别,根据识别得到的人脸关键点进行人脸对齐,得到对齐后的标准人物图像;
分割模块506,用于基于所述对齐后的标准人物图像采用人脸分割模型进行人像分割,所述人脸分割模型为轻量型网络模型。
在一个实施例中,所述对齐模块还用于将所述人脸关键点映射到预设空间的指定位置,得到预设空间内对齐后的标准人物图像。
在一个实施例中,所述对齐模块还用于根据预设的人体占比确定人脸关键点在预设空间的目标坐标位置,所述目标坐标位置作为所述指定位置。
在一个实施例中,所述分割模块还用于将所述对齐后的人物图像作为人像分割模型的输入,所述人像分割模型用于从所述标准人物图像中分割出目标人物图像;获取所述人像分割模型输出的所述目标人物图像。
如图6所示,在一个实施例中,上述人像分割装置还包括:
训练模块501,用于获取训练数据集,所述训练数据集中包括训练人物图像和对应的分割标注,所述训练人物图像和对应的分割标注是通过对已有人体图像集以及对应的人体分割标注进行对齐分割得到的,将所述训练人物图像作为所述人像分割模型的输入,将对应的分割标注作为期望的输出进行训练得到目标人像分割模型。
在一个实施例中,所述训练模块还用于将所述训练人物图像作为所述人像分割模型的输入,获取所述人像分割模型的实际输出;根据所述实际输出和所述期望输出采用dice损失函数计算得到损失值,根据所述损失值利用反向传播算法更新所述人像分割模型中的权重,以使所述损失值朝着减小的方向改变,直到收敛。
在一个实施例中,所述人像分割模型采用卷积神经网络训练得到,包括多个卷积层,所述卷积层用于对图像进行特征提取;所述分割模块还用于对所述图像进行边缘扩增,以使进行卷积后得到的图像分辨率与输入的标准人物图像的分辨率一致。
图7示出了一个实施例中机器人的内部结构图。如图7所示,该机器人包括通过系统总线连接的处理器、存储器、摄像头和网络接口。其中,存储器包括非易失性存储介质和内存储器。该机器人的非易失性存储介质存储有操作系统,还可存储有计算机程序,该计算机程序被处理器执行时,可使得处理器实现上述的人像分割方法。该内存储器中也可储存有计算机程序,该计算机程序被处理器执行时,可使得处理器执行上述的人像分割方法。本领域技术人员可以理解,图7中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的机器人的限定,具体的机器人可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
在一个实施例中,提出了一种机器人,包括存储器和处理器,所述存储器存储有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行以下步骤:获取待分割的原始人物图像;对所述原始人物图像中的人脸进行识别,根据识别得到的人脸关键点进行人脸对齐,得到对齐后的标准人物图像;基于所述对齐后的标准人物图像采用人脸分割模型进行人像分割,所述人脸分割模型为轻量型网络模型。
在一个实施例中,所述对所述原始人物图像中的人脸进行识别,根据识别得到的人脸关键点进行人脸对齐,得到对齐后的标准人物图像,还包括:将所述人脸关键点映射到预设空间的指定位置,得到预设空间内对齐后的标准人物图像。
在一个实施例中,所述将所述人脸关键点映射到预设空间的指定位置之前还包括:根据预设的人体占比确定人脸关键点在预设空间的目标坐标位置,所述目标坐标位置作为所述指定位置。
在一个实施例中,所述基于所述对齐后的标准人物图像采用人脸分割模型进行人像分割,包括:将所述对齐后的人物图像作为人像分割模型的输入,所述人像分割模型用于从所述标准人物图像中分割出目标人物图像;获取所述人像分割模型输出的所述目标人物图像。
在一个实施例中,所述人像分割模型的训练方式如下:获取训练数据集,所述训练数据集中包括训练人物图像和对应的分割标注,所述训练人物图像和对应的分割标注是通过对已有人体图像集以及对应的人体分割标注进行对齐分割得到的;将所述训练人物图像作为所述人像分割模型的输入,将对应的分割标注作为期望的输出进行训练得到目标人像分割模型。
在一个实施例中,所述将所述训练人物图像作为所述人像分割模型的输入,将对应的分割标注作为期望的输出进行训练得到目标人像分割模型,包括:将所述训练人物图像作为所述人像分割模型的输入,获取所述人像分割模型的实际输出;根据所述实际输出和所述期望输出采用dice损失函数计算得到损失 值,根据所述损失值利用反向传播算法更新所述人像分割模型中的权重,以使所述损失值朝着减小的方向改变,直到收敛。
在一个实施例中,所述人像分割模型采用卷积神经网络训练得到,包括多个卷积层,所述卷积层用于对图像进行特征提取;在采用所述卷积层进行特征提取之前,还包括:对所述图像进行边缘扩增,以使进行卷积后得到的图像分辨率与输入的标准人物图像的分辨率一致。
在一个实施例中,提出了一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时,使得所述处理器执行以下步骤:获取待分割的原始人物图像;对所述原始人物图像中的人脸进行识别,根据识别得到的人脸关键点进行人脸对齐,得到对齐后的标准人物图像;基于所述对齐后的标准人物图像采用人脸分割模型进行人像分割,所述人脸分割模型为轻量型网络模型。
在一个实施例中,所述对所述原始人物图像中的人脸进行识别,根据识别得到的人脸关键点进行人脸对齐,得到对齐后的标准人物图像,还包括:将所述人脸关键点映射到预设空间的指定位置,得到预设空间内对齐后的标准人物图像。
在一个实施例中,所述将所述人脸关键点映射到预设空间的指定位置之前还包括:根据预设的人体占比确定人脸关键点在预设空间的目标坐标位置,所述目标坐标位置作为所述指定位置。
在一个实施例中,所述基于所述对齐后的标准人物图像采用人脸分割模型进行人像分割,包括:将所述对齐后的人物图像作为人像分割模型的输入,所述人像分割模型用于从所述标准人物图像中分割出目标人物图像;获取所述人像分割模型输出的所述目标人物图像。
在一个实施例中,所述人像分割模型的训练方式如下:获取训练数据集,所述训练数据集中包括训练人物图像和对应的分割标注,所述训练人物图像和 对应的分割标注是通过对已有人体图像集以及对应的人体分割标注进行对齐分割得到的;将所述训练人物图像作为所述人像分割模型的输入,将对应的分割标注作为期望的输出进行训练得到目标人像分割模型。
在一个实施例中,所述将所述训练人物图像作为所述人像分割模型的输入,将对应的分割标注作为期望的输出进行训练得到目标人像分割模型,包括:将所述训练人物图像作为所述人像分割模型的输入,获取所述人像分割模型的实际输出;根据所述实际输出和所述期望输出采用dice损失函数计算得到损失值,根据所述损失值利用反向传播算法更新所述人像分割模型中的权重,以使所述损失值朝着减小的方向改变,直到收敛。
在一个实施例中,所述人像分割模型采用卷积神经网络训练得到,包括多个卷积层,所述卷积层用于对图像进行特征提取;在采用所述卷积层进行特征提取之前,还包括:对所述图像进行边缘扩增,以使进行卷积后得到的图像分辨率与输入的标准人物图像的分辨率一致。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,所述的程序可存储于一非易失性计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,本申请所提供的各实施例中所使用的对存储器、存储、数据库或其它介质的任何引用,均可包括非易失性和/或易失性存储器。非易失性存储器可包括只读存储器(ROM)、可编程ROM(PROM)、电可编程ROM(EPROM)、电可擦除可编程ROM(EEPROM)或闪存。易失性存储器可包括随机存取存储器(RAM)或者外部高速缓冲存储器。作为说明而非局限,RAM以多种形式可得,诸如静态RAM(SRAM)、动态RAM(DRAM)、同步DRAM(SDRAM)、双数据率SDRAM(DDRSDRAM)、增强型SDRAM(ESDRAM)、同步链路(Synchlink)DRAM(SLDRAM)、存储器总线(Rambus)直接RAM(RDRAM)、直接存储器总线动态RAM(DRDRAM)、以及 存储器总线动态RAM(RDRAM)等。
以上实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对本申请专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (21)

  1. 一种人像分割方法,其特征在于,包括:
    获取待分割的原始人物图像;
    对所述原始人物图像中的人脸进行识别,根据识别得到的人脸关键点进行人脸对齐,得到对齐后的标准人物图像;
    基于所述对齐后的标准人物图像采用人脸分割模型进行人像分割,所述人脸分割模型为轻量型网络模型。
  2. 根据权利要求1所述的方法,其特征在于,所述对所述原始人物图像中的人脸进行识别,根据识别得到的人脸关键点进行人脸对齐,得到对齐后的标准人物图像,还包括:
    将所述人脸关键点映射到预设空间的指定位置,得到预设空间内对齐后的标准人物图像。
  3. 根据权利要求2所述的方法,其特征在于,所述将所述人脸关键点映射到预设空间的指定位置之前还包括:
    根据预设的人体占比确定人脸关键点在预设空间的目标坐标位置,所述目标坐标位置作为所述指定位置。
  4. 根据权利要求1所述的方法,其特征在于,所述基于所述对齐后的标准人物图像采用人脸分割模型进行人像分割,包括:
    将所述对齐后的人物图像作为人像分割模型的输入,所述人像分割模型用于从所述标准人物图像中分割出目标人物图像;
    获取所述人像分割模型输出的所述目标人物图像。
  5. 根据权利要求1所述的方法,其特征在于,所述人像分割模型的训练方式如下:
    获取训练数据集,所述训练数据集中包括训练人物图像和对应的分割标注,所述训练人物图像和对应的分割标注是通过对已有人体图像集以及对应的人体分割标注进行对齐分割得到的;
    将所述训练人物图像作为所述人像分割模型的输入,将对应的分割标注作为期望的输出进行训练得到目标人像分割模型。
  6. 根据权利要求5所述的方法,其特征在于,所述将所述训练人物图像作为所述人像分割模型的输入,将对应的分割标注作为期望的输出进行训练得到目标人像分割模型,包括:
    将所述训练人物图像作为所述人像分割模型的输入,获取所述人像分割模型的实际输出;
    根据所述实际输出和所述期望输出采用dice损失函数计算得到损失值,根据所述损失值利用反向传播算法更新所述人像分割模型中的权重,以使所述损失值朝着减小的方向改变,直到收敛。
  7. 根据权利要求1所述的方法,其特征在于,所述人像分割模型采用卷积神经网络训练得到,包括多个卷积层,所述卷积层用于对图像进行特征提取;
    在采用所述卷积层进行特征提取之前,还包括:对所述图像进行边缘扩增,以使进行卷积后得到的图像分辨率与输入的标准人物图像的分辨率一致。
  8. 一种机器人,包括存储器和处理器,所述存储器存储有计算机程序,所述计算机程序被所述处理器执行时,使得所述处理器执行以下步骤:
    获取待分割的原始人物图像;
    对所述原始人物图像中的人脸进行识别,根据识别得到的人脸关键点进行人脸对齐,得到对齐后的标准人物图像;
    基于所述对齐后的标准人物图像采用人脸分割模型进行人像分割,所述人脸分割模型为轻量型网络模型。
  9. 根据权利要求8所述的机器人,其特征在于,所述对所述原始人物图像中的人脸进行识别,根据识别得到的人脸关键点进行人脸对齐,得到对齐后的标准人物图像,还包括:
    将所述人脸关键点映射到预设空间的指定位置,得到预设空间内对齐后的 标准人物图像。
  10. 根据权利要求9所述的机器人,其特征在于,所述将所述人脸关键点映射到预设空间的指定位置之前还包括:
    根据预设的人体占比确定人脸关键点在预设空间的目标坐标位置,所述目标坐标位置作为所述指定位置。
  11. 根据权利要求8所述的机器人,其特征在于,所述基于所述对齐后的标准人物图像采用人脸分割模型进行人像分割,包括:
    将所述对齐后的人物图像作为人像分割模型的输入,所述人像分割模型用于从所述标准人物图像中分割出目标人物图像;
    获取所述人像分割模型输出的所述目标人物图像。
  12. 根据权利要求8所述的机器人,其特征在于,所述人像分割模型的训练方式如下:
    获取训练数据集,所述训练数据集中包括训练人物图像和对应的分割标注,所述训练人物图像和对应的分割标注是通过对已有人体图像集以及对应的人体分割标注进行对齐分割得到的;
    将所述训练人物图像作为所述人像分割模型的输入,将对应的分割标注作为期望的输出进行训练得到目标人像分割模型。
  13. 根据权利要求12所述的机器人,其特征在于,所述将所述训练人物图像作为所述人像分割模型的输入,将对应的分割标注作为期望的输出进行训练得到目标人像分割模型,包括:
    将所述训练人物图像作为所述人像分割模型的输入,获取所述人像分割模型的实际输出;
    根据所述实际输出和所述期望输出采用dice损失函数计算得到损失值,根据所述损失值利用反向传播算法更新所述人像分割模型中的权重,以使所述损失值朝着减小的方向改变,直到收敛。
  14. 根据权利要求8所述的机器人,其特征在于,所述人像分割模型采用 卷积神经网络训练得到,包括多个卷积层,所述卷积层用于对图像进行特征提取;
    在采用所述卷积层进行特征提取之前,还包括:对所述图像进行边缘扩增,以使进行卷积后得到的图像分辨率与输入的标准人物图像的分辨率一致。
  15. 一种计算机可读存储介质,存储有计算机程序,所述计算机程序被处理器执行时,使得所述处理器执行以下步骤:
    获取待分割的原始人物图像;
    对所述原始人物图像中的人脸进行识别,根据识别得到的人脸关键点进行人脸对齐,得到对齐后的标准人物图像;
    基于所述对齐后的标准人物图像采用人脸分割模型进行人像分割,所述人脸分割模型为轻量型网络模型。
  16. 根据权利要求15所述的存储介质,其特征在于,所述对所述原始人物图像中的人脸进行识别,根据识别得到的人脸关键点进行人脸对齐,得到对齐后的标准人物图像,还包括:
    将所述人脸关键点映射到预设空间的指定位置,得到预设空间内对齐后的标准人物图像。
  17. 根据权利要求16所述的存储介质,其特征在于,所述将所述人脸关键点映射到预设空间的指定位置之前还包括:
    根据预设的人体占比确定人脸关键点在预设空间的目标坐标位置,所述目标坐标位置作为所述指定位置。
  18. 根据权利要求15所述的存储介质,其特征在于,所述基于所述对齐后的标准人物图像采用人脸分割模型进行人像分割,包括:
    将所述对齐后的人物图像作为人像分割模型的输入,所述人像分割模型用于从所述标准人物图像中分割出目标人物图像;
    获取所述人像分割模型输出的所述目标人物图像。
  19. 根据权利要求15所述的存储介质,其特征在于,所述人像分割模型的训练方式如下:
    获取训练数据集,所述训练数据集中包括训练人物图像和对应的分割标注,所述训练人物图像和对应的分割标注是通过对已有人体图像集以及对应的人体分割标注进行对齐分割得到的;
    将所述训练人物图像作为所述人像分割模型的输入,将对应的分割标注作为期望的输出进行训练得到目标人像分割模型。
  20. 根据权利要求19所述的存储介质,其特征在于,所述将所述训练人物图像作为所述人像分割模型的输入,将对应的分割标注作为期望的输出进行训练得到目标人像分割模型,包括:
    将所述训练人物图像作为所述人像分割模型的输入,获取所述人像分割模型的实际输出;
    根据所述实际输出和所述期望输出采用dice损失函数计算得到损失值,根据所述损失值利用反向传播算法更新所述人像分割模型中的权重,以使所述损失值朝着减小的方向改变,直到收敛。
  21. 根据权利要求15所述的存储介质,其特征在于,所述人像分割模型采用卷积神经网络训练得到,包括多个卷积层,所述卷积层用于对图像进行特征提取;
    在采用所述卷积层进行特征提取之前,还包括:对所述图像进行边缘扩增,以使进行卷积后得到的图像分辨率与输入的标准人物图像的分辨率一致。
PCT/CN2020/133932 2020-12-04 2020-12-04 人像分割方法、机器人及存储介质 WO2022116163A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/133932 WO2022116163A1 (zh) 2020-12-04 2020-12-04 人像分割方法、机器人及存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2020/133932 WO2022116163A1 (zh) 2020-12-04 2020-12-04 人像分割方法、机器人及存储介质

Publications (1)

Publication Number Publication Date
WO2022116163A1 true WO2022116163A1 (zh) 2022-06-09

Family

ID=81852721

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/133932 WO2022116163A1 (zh) 2020-12-04 2020-12-04 人像分割方法、机器人及存储介质

Country Status (1)

Country Link
WO (1) WO2022116163A1 (zh)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106203395A (zh) * 2016-07-26 2016-12-07 厦门大学 基于多任务深度学习的人脸属性识别方法
CN107220990A (zh) * 2017-06-22 2017-09-29 成都品果科技有限公司 一种基于深度学习的头发分割方法
CN109523558A (zh) * 2018-10-16 2019-03-26 清华大学 一种人像分割方法及系统
CN110189340A (zh) * 2019-06-03 2019-08-30 北京达佳互联信息技术有限公司 图像分割方法、装置、电子设备及存储介质
US20200058126A1 (en) * 2018-08-17 2020-02-20 12 Sigma Technologies Image segmentation and object detection using fully convolutional neural network

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106203395A (zh) * 2016-07-26 2016-12-07 厦门大学 基于多任务深度学习的人脸属性识别方法
CN107220990A (zh) * 2017-06-22 2017-09-29 成都品果科技有限公司 一种基于深度学习的头发分割方法
US20200058126A1 (en) * 2018-08-17 2020-02-20 12 Sigma Technologies Image segmentation and object detection using fully convolutional neural network
CN109523558A (zh) * 2018-10-16 2019-03-26 清华大学 一种人像分割方法及系统
CN110189340A (zh) * 2019-06-03 2019-08-30 北京达佳互联信息技术有限公司 图像分割方法、装置、电子设备及存储介质

Similar Documents

Publication Publication Date Title
CN109325398B (zh) 一种基于迁移学习的人脸属性分析方法
US11842487B2 (en) Detection model training method and apparatus, computer device and storage medium
WO2021120752A1 (zh) 域自适应模型训练、图像检测方法、装置、设备及介质
WO2020119458A1 (zh) 脸部关键点检测方法、装置、计算机设备和存储介质
WO2018153265A1 (zh) 关键词提取方法、计算机设备和存储介质
WO2020042975A1 (zh) 人脸姿态估计/三维人脸重构方法、装置及电子设备
CN110197146B (zh) 基于深度学习的人脸图像分析方法、电子装置及存储介质
WO2020252917A1 (zh) 一种模糊人脸图像识别方法、装置、终端设备及介质
WO2020015076A1 (zh) 人脸图像比对方法、装置、计算机设备及存储介质
WO2021051543A1 (zh) 人脸旋转模型的生成方法、装置、计算机设备及存储介质
WO2020207177A1 (zh) 图像增广与神经网络训练方法、装置、设备及存储介质
CN109740537B (zh) 人群视频图像中行人图像属性的精确标注方法及系统
WO2022033513A1 (zh) 目标分割方法、装置、计算机可读存储介质及计算机设备
US10977767B2 (en) Propagation of spot healing edits from one image to multiple images
WO2022057309A1 (zh) 肺部特征识别方法、装置、计算机设备及存储介质
WO2021169754A1 (zh) 构图提示方法、装置、存储介质及电子设备
CN112465936A (zh) 人像卡通化方法、装置、机器人及存储介质
WO2020037962A1 (zh) 面部图像校正方法、装置及存储介质
WO2021189959A1 (zh) 大脑中线识别方法、装置、计算机设备及存储介质
WO2022178997A1 (zh) 医学影像配准方法、装置、计算机设备及存储介质
WO2022134354A1 (zh) 车损检测模型训练、车损检测方法、装置、设备及介质
CN110543906A (zh) 基于数据增强和Mask R-CNN模型的肤质自动识别方法
CN112464839A (zh) 人像分割方法、装置、机器人及存储介质
Du High-precision portrait classification based on mtcnn and its application on similarity judgement
CN112101127A (zh) 人脸脸型的识别方法、装置、计算设备及计算机存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20964009

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20964009

Country of ref document: EP

Kind code of ref document: A1