CN110349164A

CN110349164A - A kind of image, semantic dividing method, device and terminal device

Info

Publication number: CN110349164A
Application number: CN201910655931.8A
Authority: CN
Inventors: 古迎冬; 李骊
Original assignee: Beijing HJIMI Technology Co Ltd
Current assignee: Beijing HJIMI Technology Co Ltd
Priority date: 2019-07-19
Filing date: 2019-07-19
Publication date: 2019-10-18

Abstract

This application discloses a kind of image, semantic dividing method, device and terminal devices.The image to be split comprising target object is obtained first；Thereafter segmented image is treated to be pre-processed to obtain pretreated image；Semantic segmentation finally is carried out to pretreated image using trained image, semantic parted pattern, the image after being divided.Image, semantic segmentation can effectively promote the real-time of image, semantic segmentation, image resolution ratio with higher after obtained segmentation.The coding and decoding structure of the FPN structure of model guarantees the segmentation fineness of image.This method is suitable for the weaker terminal devices of processing capacities such as mobile phone, promotes terminal device to the segmentation effect of high-definition picture, promotes the usage experience of user.

Description

A kind of image, semantic dividing method, device and terminal device

Technical field

This application involves technical field of image processing, more particularly to a kind of image, semantic dividing method, device and terminal Equipment.

Background technique

Image processing techniques is widely used in the numerous areas such as medium, scientific research and industrial design.Image Segmentation Technology belongs to One of image processing techniques, the purpose of image segmentation are to divide the image into several areas specific, with unique properties Domain, and extract interesting target.

Image, semantic segmentation is that image segmentation is realized in pixel scale, will belong to of a sort pixel and be classified as one kind.Letter For it, image, semantic segmentation be to understand image from pixel scale and carry out image segmentation.Current image, semantic segmentation, mainstream Technology be the color images based on deep learning, for different segmentation tasks, there are many classical image segmentation nets Network, such as FCN, UNet, SegNet, deeplab series.

Wherein, FCN is work of the deep learning in the introductory song of image segmentation field, and segmentation precision is low, and speed is slow. The precision that image, semantic is divided is referred to a new height by deeplab v3+, but its characteristic extracting module Xception65 high calculation amount allows many application apparatus all to hang back.Although deeplab v3+ is also supported faster Mobilenet v2 is as feature extractor, but its precision has been had a greatly reduced quality with respect to xception65.And it is in mobile phone On the speed of service it is also very slow.

In recent years for situations that mobile devices computing capability is weak, rate request is high such as mobile phones, there are many classics Lightweight neural network, such as mobilenet series, shuffenet series, squeezenet, it is intended to reduce network model Operand.The wherein appearance of mobilenet v2, allows upper one layer of the arithmetic speed of lightweight network, mobilenet v2 conduct Image, semantic divides the backbone of network deeplab v3+, and the image Segmentation Technology based on deep learning is allowed to be able in mobile phone End behaves.But its speed or very slow, general mode input photo resolution is 513*513, in the valiant dragon of high pass On 845 processors, less than 1 frame/second, the operation frame per second of GPU environment also much reaches the operation frame per second of CPU environment less than 5 frames/second Less than real-time output.

If being reluctantly the real-time for promoting image segmentation, image input resolution ratio is reduced to improve the operation speed of model Degree, but once by the segmentation result of output in the resolution ratio for being put into 1920*1080, the edge sawtooth of segmentation result can be very Obviously.That is, existing image, semantic parted pattern can not be taken into account on the weaker terminal device of the processing capacities such as mobile phone to image The real-time of segmentation and the double requirements of segmentation precision.

Summary of the invention

Based on the above issues, this application provides a kind of image, semantic dividing method, device and terminal devices, to realize reality When and high-precision image, semantic divide.

The embodiment of the present application discloses following technical solution:

In a first aspect, the application provides a kind of image, semantic dividing method, comprising:

Obtain the image to be split comprising target object；

The image to be split is pre-processed, pretreated image is obtained；

Semantic segmentation is carried out to the pretreated image using trained image, semantic parted pattern, is divided Image afterwards；

Described image semantic segmentation model is characterized pyramid network FPN structure, including coding structure and decoding structure；Institute Stating coding structure includes: m Standard convolution block and n residual error convolution block；The decoding structure includes: n Standard convolution block and m + 1 Standard convolution block；Wherein, the m is the integer more than or equal to 1, and the n is the integer greater than 1；

In the coding structure, the m Standard convolution block is used to carry out standard volume to the pretreated image Long-pending and down-sampling, each residual error convolution block are used for the output to previous convolution block and carry out residual error convolution sum down-sampling；

In the decoding structure, each Standard convolution block and each residual error convolution in the n Standard convolution block Block corresponds, and carries out Standard convolution for the output to corresponding residual error convolution block, alternatively, carrying out Standard convolution and above adopting Sample obtains all consistent characteristic pattern of n resolution ratio and port number as a result, the summation obtained to the n characteristic pattern results added As a result the input as the m+1 Standard convolution block；The m+1 Standard convolution block is for marking the summed result Quasi- convolution sum up-sampling, the image after obtaining the segmentation.

Optionally, described that the image to be split is pre-processed, it specifically includes:

The pixel value of each pixel in the image to be split is normalized into pre-set interval；The pre-set interval be [0, 1] or [- 1,1].

Optionally, before the pixel value by each pixel in the image to be split normalizes to pre-set interval, The method also includes:

Judge whether the image to be split meets first condition and second condition；The first condition is described to be split The ratio of width to height of image is consistent with the ratio of width to height of input picture that described image semantic segmentation model allows；The second condition is institute The resolution ratio for stating image to be split is consistent with the resolution ratio of input picture that described image semantic segmentation model allows；

If the image to be split is unsatisfactory for the first condition and/or the second condition, to described to be split Image is adjusted, so that the width for the input picture that the ratio of width to height of image adjusted and described image semantic segmentation model allow Height is than consistent, and point of the resolution ratio of the image adjusted and the input picture of described image semantic segmentation model permission Resolution is consistent；

The pixel value by each pixel in the image to be split normalizes to pre-set interval, specifically includes:

The pixel value of each pixel in the image adjusted is normalized into pre-set interval.

Optionally, after the image after described divided, the method also includes:

Image after the segmentation is adjusted, the segmented image after being adjusted；The segmented image adjusted The ratio of width to height it is consistent with the ratio of width to height of the image to be split, the resolution ratio of the segmented image adjusted with it is described to be split The resolution ratio of image is consistent.

Optionally, method further include:

Obtain the training image comprising the first object；

The training image is pre-processed, pretreated training image is obtained；

Obtain the corresponding segmentation result of the pretreated training image；The pretreated training image and described The corresponding segmentation result of pretreated training image is collectively as training set；

Training pattern is treated using the training set to be trained, and obtains described image semantic segmentation model.

Optionally, it is treated described using the training set before training pattern is trained, the method also includes:

Initial learning rate and decaying learning rate are set；The initial learning rate is greater than the decaying learning rate；

The utilization training set is treated training pattern and is trained, and specifically includes:

Using the training set, successively according to the initial learning rate and the decaying learning rate to described to training pattern It is trained.

Second aspect, the application provide a kind of image, semantic segmenting device, comprising:

Image collection module to be split, for obtaining the image to be split comprising target object；

Image pre-processing module obtains pretreated image for pre-processing to the image to be split；

Image segmentation module, for being carried out using trained image, semantic parted pattern to the pretreated image Semantic segmentation, the image after being divided；

Optionally, described image preprocessing module specifically includes:

Normalization unit, for the pixel value of each pixel in the image to be split to be normalized to pre-set interval；Institute Stating pre-set interval is [0,1] or [- 1,1].

Optionally, described image preprocessing module, further includes:

Judging unit, for judging whether the image to be split meets first condition and second condition；Described first Part is that the ratio of width to height of the image to be split is consistent with the ratio of width to height of input picture that described image semantic segmentation model allows；Institute State the resolution of the resolution ratio that second condition is the image to be split and the input picture that described image semantic segmentation model allows Rate is consistent；

Image control unit, for being unsatisfactory for the first condition and/or the second condition when the image to be split When, the image to be split is adjusted, so that the ratio of width to height of image adjusted and described image semantic segmentation model are permitted Perhaps the ratio of width to height of input picture is consistent, and the resolution ratio of the image adjusted and described image semantic segmentation model are permitted Perhaps the resolution ratio of input picture is consistent；

The normalization unit, it is pre- specifically for normalizing to the pixel value of each pixel in the image adjusted If section.

Optionally, device further include:

Image adjustment module, for being adjusted to the image after the segmentation, the segmented image after being adjusted；It is described The ratio of width to height of segmented image adjusted is consistent with the ratio of width to height of the image to be split, point of the segmented image adjusted Resolution is consistent with the resolution ratio of the image to be split.

Optionally, device further include: model training module, for obtaining the training image comprising the first object；To described Training image is pre-processed, and pretreated training image is obtained；Obtain corresponding point of the pretreated training image Cut result；The pretreated training image and the corresponding segmentation result of the pretreated training image are collectively as instruction Practice collection；Training pattern is treated using the training set to be trained, and obtains described image semantic segmentation model.

Optionally, device further include: learning rate setup module, for initial learning rate and decaying learning rate to be arranged；It is described Initial learning rate is greater than the decaying learning rate；

The model training module is specifically used for utilizing the training set, successively according to the initial learning rate and described Decaying learning rate is trained to described to training pattern.

The third aspect, the application provide a kind of terminal device, comprising: photographic device and processor；

The photographic device is sent for acquiring the image to be split comprising target object, and by the image to be split To the processor；

The processor executes the image provided such as first aspect for running computer program when described program is run Semantic segmentation method.

Compared to the prior art, the application has the advantages that

Image, semantic dividing method provided by the present application obtains the image to be split comprising target object first；Thereafter, right Image to be split is pre-processed, and pretreated image is obtained；Finally, using trained image, semantic parted pattern to pre- Treated, and image carries out semantic segmentation, the image after being divided.This method uses a feature pyramid network FPN structure Image, semantic parted pattern, the model include: coding structure and decoding structure, wherein coding structure includes m Standard convolution Block and n residual error convolution block；Decoding structure includes n Standard convolution block and m+1 Standard convolution block.In coding structure, by Down-sampling respectively carried out to the output of previous convolution block in Standard convolution block and residual error convolution block, and convolution process generate compared with The characteristic pattern of few port number guarantees that model carries out the real-time of image, semantic segmentation to improve the processing speed of model；? It decodes in structure, Standard convolution and up-sampling is carried out by output to residual error convolution block and obtain that resolution ratio is consistent and port number one The characteristic pattern of cause, it is unified to carry out add operation, it not only can guarantee the high speed of operation, but also the feature of resolution ratio with higher can be obtained Figure.As it can be seen that this method application image semantic segmentation model can effectively promote the real-time of image, semantic segmentation, obtained segmentation Image resolution ratio with higher afterwards.In addition, the coding and decoding structure of the FPN structure of model guarantees that the segmentation of image is fine Degree.This method is suitable for the weaker terminal devices of processing capacities such as mobile phone, promotes segmentation of the terminal device to high-definition picture Effect promotes the usage experience of user.

Detailed description of the invention

In order to illustrate the technical solutions in the embodiments of the present application or in the prior art more clearly, to embodiment or will show below There is attached drawing needed in technical description to be briefly described, it should be apparent that, the accompanying drawings in the following description is only this Some embodiments of application without any creative labor, may be used also for those of ordinary skill in the art To obtain other drawings based on these drawings.

Fig. 1 is a kind of flow chart of image, semantic dividing method provided by the embodiments of the present application；

Fig. 2 is a kind of structural schematic diagram of image, semantic parted pattern provided by the embodiments of the present application；

Fig. 3 is the flow chart of another image, semantic dividing method provided by the embodiments of the present application；

Fig. 4 is a kind of flow diagram of model training provided by the embodiments of the present application；

Fig. 5 is the structural schematic diagram that the application is a kind of image, semantic segmenting device that embodiment provides；

Fig. 6 is a kind of structural schematic diagram of terminal device provided by the embodiments of the present application；

Fig. 7 is the structural schematic diagram of another terminal device provided by the embodiments of the present application.

Specific embodiment

It describes as discussed above, the terminal device that current image, semantic dividing method can not be weaker in processing capacities such as mobile phones Above while realizing high real-time and high-precision image segmentation.The reason is that promote image segmentation real-time, usually Reduce the operand of parted pattern, and to will lead to segmentation not fine enough for the reduction of operand, reverts to after original resolution, Dividing the low problem of fineness can highly significant；To guarantee higher resolution ratio, then operand is usually larger, is not suitable in hand Real-time implementation is ineffective on the terminal devices such as machine.

Based on the problem, inventor provides a kind of image, semantic dividing method, device and terminal device after study.At this In application, the dividing processing to image is realized using the image, semantic parted pattern of a feature pyramid network FPN structure.It should Processing speed is improved by down-sampling in coding stage in model structure；Resolution ensure that by up-sampling in decoding stage Rate.While significant increase real-time, make it possible the promotion of segmentation precision.In the weaker terminal of the processing capacities such as mobile phone Using with preferable image segmentation in equipment.

In order to enable those skilled in the art to better understand the solution of the present invention, below in conjunction in the embodiment of the present invention Attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is only this Invention a part of the embodiment, instead of all the embodiments.Based on the embodiments of the present invention, those of ordinary skill in the art exist Every other embodiment obtained under the premise of creative work is not made, shall fall within the protection scope of the present invention.

First embodiment

Referring to Fig. 1, which is a kind of flow chart of image, semantic dividing method provided by the embodiments of the present application.

As shown in Figure 1, image, semantic dividing method provided in this embodiment, comprising:

Step 101: obtaining the image to be split comprising target object.

In the present embodiment, for target object concrete type without limit.As an example, target object can be people Or animal etc..

As a specific implementation, image to be split is specially color image in the present embodiment.

Step 102: the image to be split being pre-processed, pretreated image is obtained.

In practical applications, treat segmented image carry out pretreatment can be there are many implementation.

As a specific implementation, if the input that the ratio of width to height of image to be split and image, semantic parted pattern allow The ratio of width to height of image is inconsistent, and the ratio of width to height that can treat segmented image is adjusted so that the ratio of width to height of image to be split with The ratio of width to height for the input picture that described image semantic segmentation model allows is consistent.

As another specific implementation, if the resolution ratio of image to be split allow with image, semantic parted pattern it is defeated The resolution ratio for entering image is inconsistent, and the resolution ratio that can treat segmented image is adjusted, so that the resolution ratio of image to be split It is consistent with the resolution ratio of input picture that described image semantic segmentation model allows.

As another specific implementation, the pixel value of each pixel of image to be split can also be normalized to default Section.For example, pre-set interval can be [0,1] section, it is also possible to [- 1,1] section.Herein to pre-set interval without limit It is fixed.

Pass through the example of above-mentioned implementation, it is known that, in practical applications, the pretreatment mode for treating segmented image can be with According to actual needs and image, semantic parted pattern is adjusted.That is, the combination of above example mode can be used Or other implementations are treated segmented image and are pre-processed.Therefore, the concrete mode of step 102 is not limited herein It is fixed.

Step 103: semantic segmentation is carried out to the pretreated image using trained image, semantic parted pattern, Image after being divided.

Pretreated image can be used as the input picture of this step images semantic segmentation model.In the present embodiment, figure As semantic segmentation model is characterized pyramid network FPN structure.The model includes coding structure and decoding structure, wherein coding Structure can be regarded as structure of the model in coding direction；Decoding structure can be regarded as structure of the model on decoding direction.

Coding structure includes: m Standard convolution block and n residual error convolution block.And decoding structure altogether includes: n+m+1 mark Quasi- convolution block.For the image, semantic parted pattern, quantity m is the integer more than or equal to 1, and quantity n is the integer greater than 1. In practical applications, m < n.As an example, m=1, n=4.

It is m Standard convolution block first from coding direction in the coding structure, is followed by n residual error convolution block. Wherein, m Standard convolution block is used to carry out Standard convolution, normalization, activation and down-sampling to the pretreated image.? In m Standard convolution block, the pretreated image that first Standard convolution block is used to input in the model carry out Standard convolution, Normalization, activation and down-sampling, and Standard convolution block thereafter is then to carry out standard volume to the result of previous Standard convolution block Product, normalization, activation and down-sampling.In coding structure, each residual error convolution block is used for previous convolution from coding direction The output of block carries out residual error convolution sum down-sampling.That is, in n residual error convolution block, first residual error convolution block for pair The output of the last one Standard convolution block carries out residual error convolution sum down-sampling in m Standard convolution block of coding structure, thereafter Each residual error convolution block is used for the output to the previous residual error convolution block of itself and carries out residual error convolution sum down-sampling.

In the decoding structure, it is n Standard convolution block first from decoding direction, is followed by m+1 Standard convolution Block.Wherein, each Standard convolution block is corresponded with residual error convolution block each in afore-mentioned code structure in n Standard convolution block. As an example it is supposed that n residual error convolution block of coding structure is a1, a2 ..., an respectively from coding direction；And decode n, structure Standard convolution block is bn, b (n-1) ..., b1 respectively from decoding direction.A1 is corresponding with b1, and a2 is corresponding with b2 ..., a (n- 1) corresponding with b (n-1), an is corresponding with bn.Decode structure n Standard convolution block in each Standard convolution block be used for pair The output of corresponding residual error convolution module carries out Standard convolution and up-sampling.For example, b1 to the output of a1 carry out Standard convolution and Up-sampling, b2 carry out Standard convolution and up-sampling to the output of a2.

Structure is decoded from the preceding n Standard convolution block on decoding direction, each Standard convolution block is used for corresponding residual The output of poor convolution block carries out Standard convolution, alternatively, carrying out Standard convolution and up-sampling.For example, from the 1st to the on decoding direction (n-1) a Standard convolution block, i.e. bn, b (n-1) ..., b2 to the output of corresponding residual error convolution block carry out Standard convolution and on adopt Sample, n-th of Standard convolution block, i.e. b1 only carry out Standard convolution to the output of corresponding residual error convolution block.It should be noted that In the present embodiment, the resolution ratio of the image of each Standard convolution block output of n Standard convolution block of coding structure is identical.Example Such as, each Standard convolution block output resolution ratio of n Standard convolution block is the figure of 480*270, and port number is consistent, such as Port number is all 64.And port number consistent image identical for these resolution ratio is summed with the corresponding element of the image of output As a result the input as m+1 Standard convolution block after n Standard convolution block on decoding direction.The corresponding element of image is asked With, it can be understood as it is by image addition, since port number is consistent, resolution ratio is consistent, therefore practical effect after image add operation Fruit is that the corresponding element (also being understood as pixel) on each image realizes summation.

Specifically, last m+1 Standard convolution block, first Standard convolution therein on the decoding direction of structure are decoded The summed result that block obtains the image addition of the output of aforementioned n Standard convolution block carries out Standard convolution, normalizing as input Change, activate and up-samples；Thereafter m Standard convolution block, respectively using the output of the Standard convolution block before itself as input Carry out Standard convolution, normalization, activation and up-sampling.Finally, the output for decoding the last one Standard convolution block on direction is made For the final output image of image, semantic analysis model.

For those skilled in the art, realize that image summation belongs to the technology of comparative maturity, it is not described here in detail.

In practical applications, the final output image of model can be carried out by phase according to specific image pretreatment operation The adjustment and recovery answered, then the image after being divided；It can also be directly using the final output image of model as the figure after segmentation Picture.

The structure of image, semantic parted pattern for ease of understanding, can refer to Fig. 2, which is provided by the embodiments of the present application one The structural schematic diagram of kind image, semantic parted pattern.It is example with m=1, n=4 in Fig. 2.In the example, convolution step-length is equal It is 2.By Fig. 2, it can be seen that, the input picture of model is the high-resolution colour picture of 1920*1080*3, wherein 3 indicate figure The port number of picture.

In the image characteristics extraction stage, i.e., in coding direction, convolution first is carried out to input picture with a Standard convolution block It is operated with down-sampling, obtains 960*540*16, wherein 16 indicate the port number of image.Thereafter, then with 4 residual error convolution blocks according to The secondary output to previous convolution block carries out residual error convolution sum down-sampling, successively obtains 480*270*64,240*140*196, The image of 120*70*196,60*35*196.

In decoding stage, the characteristic pattern that common convolution block exports corresponding residual error convolution block carries out Standard convolution, or Person's Standard convolution and up-sampling.For example, one layer of characteristic pattern most upper for coding direction, resolution ratio 60*35*196, processing Process are as follows: conv → 2x → conv → 2x → conv → 2x is indicated: convolution+up-sampling+convolution+up-sampling+convolution+up-sampling, Obtain 480*270*64.Similarly, conv → 2x of other each layers → conv → 2x is indicated: convolution+up-sampling+convolution+up-sampling； Conv → 2x is indicated: convolution+up-sampling；Conv is indicated: convolution.

By Fig. 2, it can be seen that, the more characteristic pattern of upper layer (i.e. down-sampling number is more) carries out convolution sum in decoding stage When up-sampling, the carry out number of number convolution sum up-sampling is more.It is exported after preceding 4 Standard convolution blocks processing of decoding structure Characteristic pattern as a result, its resolution ratio is consistent and port number is consistent.I.e. resolution ratio is 480*270, and port number is 64.In Fig. 2 Plus sige "+" indicates to sum the characteristic pattern results added of the output of Standard convolution block, i.e. the corresponding element summation of image.Summation knot Fruit is directed toward subsequent Standard convolution block on decoding direction, the summed result that subsequent standards convolution block obtains characteristic pattern results added Convolution sum up-sampling is carried out, the characteristic pattern of 960*540*2 is obtained；The last one Standard convolution block is again to the spy of the 960*540*2 Sign figure carries out convolution sum and up-samples to obtain the image, semantic segmentation result of model, i.e. image after the segmentation of 1920*1080*2.

It should be noted that in the embodiment of the present application, for the port number of each convolution block of image, semantic parted pattern It can be configured as desired.For example, the port number of the input picture of setting model is 3, the most upper of coding direction is set Layer, referring to fig. 2 in top layer residual error convolution block, processing after characteristic pattern port number be 196.By the way that lesser channel is arranged Number can generate the characteristic pattern of less port number in convolution process, guarantee that model calculation amount is smaller, to guarantee image, semantic point It is very fast to the splitting speed of image to cut model, guarantees real-time.

More than, image, semantic dividing method as provided by the present application.This obtains to be split comprising target object first Image；Thereafter, it treats segmented image to be pre-processed, obtains pretreated image；Finally, utilizing trained image, semantic Parted pattern carries out semantic segmentation to pretreated image, the image after being divided.This method uses a feature gold word The image, semantic parted pattern of tower network FPN structure, in the coding structure of model, due to Standard convolution block and residual error convolution block Down-sampling respectively is carried out to the output of previous convolution block, and convolution process generates the characteristic pattern of less port number, to mention The processing speed of model has been risen, has guaranteed that model carries out the real-time of image, semantic segmentation；In solution to model code structure, by right The output of residual error convolution block carries out Standard convolution and obtains that resolution ratio is consistent and the consistent characteristic pattern of port number with up-sampling, it is unified into Row add operation not only can guarantee the high speed of operation, but also can obtain the characteristic pattern of resolution ratio with higher.As it can be seen that this method is answered The real-time of image, semantic segmentation can be effectively promoted with image, semantic parted pattern, image is with higher after obtained segmentation Resolution ratio.In addition, the coding and decoding structure of the FPN structure of model guarantees the segmentation fineness of image.This method is suitable for hand The weaker terminal device of the processing capacities such as machine promotes terminal device to the segmentation effect of high-definition picture, promotes making for user With experience.

In practical application, its resolution ratio of image and the ratio of width to height for needing to be split may be with the input pictures of model permission Resolution ratio and the ratio of width to height have differences.To guarantee that model realizes normal segmentation to image, need to carry out in pretreatment corresponding Adjustment.Therefore, the application furthermore provides another image, semantic dividing method.Below with reference to embodiment and attached drawing to this The specific implementation of method is illustrated.

Second embodiment

Referring to Fig. 3, which is the flow chart of another image, semantic dividing method provided by the embodiments of the present application.

As shown in figure 3, image, semantic dividing method provided in this embodiment, comprising:

Step 301: obtaining the training image comprising the first object.

Herein, the first object can be the object with preceding aim object same type, be also possible to and aforementioned targets The different types of object of body.In the present embodiment, the training image comprising the first object is color image.

Step 302: the training image being pre-processed, pretreated training image is obtained.

In practical applications, pre-processing to training image can be there are many implementation.

As a specific implementation, if the width of the ratio of width to height of training image and the input picture allowed to training pattern Height can be adjusted the ratio of width to height of training image than inconsistent, so that the ratio of width to height of training image permits with to training pattern Perhaps the ratio of width to height of input picture is consistent.

As another specific implementation, if the resolution ratio of training image and the input picture to training pattern permission Resolution ratio is inconsistent, the resolution ratio of training image can be adjusted so that the resolution ratio of training image with to training pattern The resolution ratio of the input picture of permission is consistent.

As another specific implementation, the pixel value of each pixel of training image can also be normalized to preset areas Between.For example, pre-set interval can be [0,1] section, it is also possible to [- 1,1] section.Herein to pre-set interval without limiting.

In practical applications, the pretreatment mode of training image can be carried out according to actual needs and to training pattern Adjustment.That is, can treat segmented image using the combination or other implementations of above example mode and be subject to pre- place Reason.Therefore, the concrete mode of step 302 is not limited herein.

Step 303: obtaining the corresponding segmentation result of the pretreated training image.

In the present embodiment, the corresponding segmentation result of the pretreated image obtained by step 303 divides fineness It is higher.The implementation of step 303 is not limited, for example, can be using the segmentation higher comparative maturity of fineness Mode obtains segmentation result, and finer segmentation result can also be obtained by the way of manually participating in.In addition, should for obtaining The when consumption of the process of segmentation result is not limited.That is, in model training stage for obtaining high quality in training set The acquisition speed of segmentation result be not strict with.It is the service stage to model training after good in practical application with higher Real-time demand.

Thereafter, the pretreated training that pretreated training image back obtained and this step obtain The corresponding segmentation result of image is collectively as training set.Wherein, segmentation result is used as based on pretreated training image The training objective to training pattern.

In the present embodiment, the quantity of training set is not limited.

Step 304: treating training pattern using the training set and be trained, obtain described image semantic segmentation model.

Referring to fig. 4, which is the flow diagram of model training.During hands-on, mould to be trained can be loaded Type and load training set, start model training.During whether training of judgement terminates, it is alternatively possible to be applied to one A little majorized functions, such as when meeting majorized function, deconditioning；Optionally, frequency of training can also be set, it is default when reaching Frequency of training after deconditioning.After training, trained model is saved with for later use.

As a kind of optional implementation, initial learning rate and decaying learning rate can be set before model training, Wherein initial learning rate is greater than decaying learning rate.The present embodiment limits initial learning rate and decaying learning rate without numerical value It is fixed.This step 304 is in specific execute, successively according to the initial learning rate and the decaying learning rate to the mould to be trained Type is trained.For ease of understanding, it is illustrated below with reference to example.

For example, before progress 500 times it is trained when, the update of model parameter is carried out using the biggish initial learning rate；It connects In training process behind, the 501st time to the 1000th time training carries out the update of model parameter using decaying learning rate a1, 1001st time to 1500 times training carry out the update of model parameter, the 1501st time to the 2000th time training using decaying learning rate a2 Using decaying learning rate a3 carry out model parameter update ... wherein, a1 > a2 > a3.In this way, the receipts of lift scheme Speed is held back, the stable image, semantic parted pattern of performance can be obtained as early as possible.After the model training is good, hand can be deployed in Divide on the terminal devices such as machine, tablet computer for image, semantic.

Step 301-304 describes the training process to image, semantic parted pattern.Step 305-310 is then described to training The use process of good image, semantic model.

Step 305: obtaining the image to be split comprising target object.

This step implementation is identical as the implementation of previous embodiment step 101, and description can be joined as described in this step According to previous embodiment.

Step 306: judging whether the image to be split meets first condition and second condition, if being unsatisfactory for described One condition and/or the second condition, then follow the steps 307.

The present embodiment first condition is that the ratio of width to height of the image to be split and described image semantic segmentation model allow The ratio of width to height of input picture is consistent；Second condition is that the resolution ratio of the image to be split and described image semantic segmentation model are permitted Perhaps the resolution ratio of input picture is consistent.

Step 307: the image to be split being adjusted, so that the ratio of width to height of image adjusted and described image language The ratio of width to height for the input picture that adopted parted pattern allows is consistent, and the resolution ratio of the image adjusted and described image language The resolution ratio for the input picture that adopted parted pattern allows is consistent.

If image to be split is unsatisfactory for first condition, as a kind of possible implementation, segmented image can be treated It is filled pixel filling, the width for the input picture for allowing the ratio of width to height of filled image and image, semantic parted pattern is high Than consistent.For example, the ratio between the width of image to be split and height are 16:9 before filling, model allows the ratio of width to height of input picture 4:3 is handled by filling, and the ratio of width to height of final image is also 4:3.It so, it is possible to guarantee that original content will not be sent out in image Change shape.

If image to be split is unsatisfactory for second condition, as a kind of possible implementation, segmented image can be treated Zoom in and out operation.For example, the resolution ratio of image (or passing through the image after pixel filling) to be split is x*y, model allows The resolution ratio of input picture be 1920*1080, wherein x ≠ 1920, y ≠ 1080.Since the ratio of width to height has been met the requirements, It can be to the width of image and zooming in or out for high progress equimultiple, so that its resolution ratio becomes 1920*1080, to meet mould Resolution requirement of the type to input picture.

Step 308: the pixel value of each pixel in the image adjusted being normalized into pre-set interval, obtains pre- place Image after reason.

Step 309: semantic segmentation is carried out to the pretreated image using trained image, semantic parted pattern, Image after being divided.

The implementation of step 309 and the implementation of previous embodiment step 103 are essentially identical, about step 309 Description can refer to previous embodiment, and details are not described herein again.

The model that timely step 304 training of the image, semantic parted pattern of this step application obtains.

Step 310: the image after the segmentation being adjusted, the segmented image after being adjusted is described adjusted The ratio of width to height of segmented image is consistent with the ratio of width to height of the image to be split, the resolution ratio of the segmented image adjusted and institute The resolution ratio for stating image to be split is consistent.

The specific implementation of this step and the specific implementation of step 307 are associated, step 310 can be interpreted as step 307 contrary operation.

For example, step 307 is only filled image, then this step is cut out the image after model segmentation, so that The ratio of width to height for cutting out image adjusted is consistent with before step 307 filling.

For example, step 307 only carries out reduction operation to image, then this step amplifies behaviour to the image after model segmentation Make, so that the resolution ratio for amplifying image adjusted is consistent with before step 307 reduces.

For example, step 307 only amplifies operation to image, then this step carries out diminution behaviour to the image after model segmentation Make, so that the resolution ratio for reducing image adjusted is consistent with before step 307 amplification.

For example, step 307 is first filled image, after amplify/reduction operation, then after this step is to model segmentation Image first carry out reducing/enlarging operation, followed by cut out.The ratio of width to height for the image that final adjustment obtains and image to be split The ratio of width to height it is consistent, the resolution ratio of image adjusted is also consistent with the resolution ratio of image to be split.

It is handled by the adjustment of step 310, the segmentation result for finally being presented to user is exactly image adjusted, or is adjusted The corresponding region of target object of going out is divided in image after whole.For a user, in the segmentation result watched, target Deformation occurs for object, and consistent before resolution ratio and segmentation, meets user's subsequent use demand.

Based on the image, semantic dividing method that previous embodiment provides, correspondingly, the application also provides a kind of image, semantic Segmenting device.It is described and illustrates below with reference to implementation of the embodiment and attached drawing to the device.

3rd embodiment

Referring to Fig. 5, which is the structural schematic diagram that the application is a kind of image, semantic segmenting device that embodiment provides.

As shown in figure 5, image, semantic segmenting device provided in this embodiment, comprising:

Image collection module 501 to be split, for obtaining the image to be split comprising target object；

Image pre-processing module 502 obtains pretreated image for pre-processing to the image to be split；

Image segmentation module 503, for utilizing trained image, semantic parted pattern to the pretreated image Carry out semantic segmentation, the image after being divided；

In the coding structure, the m Standard convolution block is used to carry out standard volume to the pretreated image Product, normalization, activation and down-sampling, each residual error convolution block are used for the output to previous convolution block and carry out residual error convolution And down-sampling；

In the decoding structure, each Standard convolution block and each residual error convolution in the n Standard convolution block Block corresponds, and carries out Standard convolution for the output to corresponding residual error convolution block, alternatively, carrying out Standard convolution and above adopting Sample obtains all consistent characteristic pattern of n resolution ratio and port number as a result, the summation obtained to the n characteristic pattern results added As a result the input as the m+1 Standard convolution block；The m+1 Standard convolution block is for marking the summed result Quasi- convolution, normalization, activation and up-sampling, the image after obtaining the segmentation.

The device uses the image, semantic parted pattern of a feature pyramid network FPN structure, in the coding knot of model In structure, since Standard convolution block and residual error convolution block respectively carry out down-sampling, and convolution mistake to the output of previous convolution block Journey generates the characteristic pattern of less port number, to improve the processing speed of model, guarantees that model carries out image, semantic segmentation Real-time；In solution to model code structure, Standard convolution block carries out Standard convolution by the output to residual error convolution block and above adopts Sample obtains that resolution ratio is consistent and the consistent characteristic pattern of port number, unified to carry out add operation, not only can guarantee the high speed of operation, but also energy Obtain the characteristic pattern of resolution ratio with higher.As it can be seen that the device application image semantic segmentation model can effectively promote image The real-time of semantic segmentation, image resolution ratio with higher after obtained segmentation.In addition, the coding of the FPN structure of model and Decode the segmentation fineness that structure guarantees image.The device is suitable for the weaker terminal devices of processing capacities such as mobile phone, is promoted eventually End equipment promotes the usage experience of user to the segmentation effect of high-definition picture.

Optionally, described image preprocessing module specifically includes:

In practical application, its resolution ratio of image and the ratio of width to height for needing to be split may be with the input pictures of model permission Resolution ratio and the ratio of width to height have differences.To guarantee that model realizes normal segmentation to image, need to carry out in pretreatment corresponding Adjustment.Optionally, described image preprocessing module, further includes:

To guarantee the image segmentation result being presented to the user, deformation occurs compared with before segmentation, while keeping original Resolution ratio, image, semantic segmenting device provided in this embodiment, further includes:

Optionally, image, semantic segmenting device, further includes:

Model training module, for obtaining the training image comprising the first object；The training image is pre-processed, Obtain pretreated training image；Obtain the corresponding segmentation result of the pretreated training image；After the pretreatment Training image and the corresponding segmentation result of the pretreated training image collectively as training set；Utilize the training set It treats training pattern to be trained, obtains described image semantic segmentation model.

Optionally, image, semantic segmenting device, further includes:

Learning rate setup module, for initial learning rate and decaying learning rate to be arranged；The initial learning rate is greater than described Decaying learning rate；

Model training, the convergence of the device lift scheme are carried out by the way that learning rate is arranged, and using the learning rate set Speed can obtain the stable image, semantic parted pattern of performance as early as possible.After the model training is good, hand can be deployed in Divide on the terminal devices such as machine, tablet computer for image, semantic.

On the basis of the image, semantic dividing method and image, semantic segmenting device that previous embodiment provides, correspondingly, The application also provides a kind of terminal device.The specific implementation of terminal device is described below with reference to embodiment and attached drawing.

Fourth embodiment

Referring to Fig. 6, which is a kind of structural schematic diagram of terminal device provided by the embodiments of the present application.

As shown in fig. 6, terminal device provided in this embodiment, comprising:

Photographic device 601 and processor 602；

Wherein, the photographic device 601, for acquiring the image to be split comprising target object, and by image to be split It is sent to the processor 602；

The processor 602 is executed for running computer program, when described program is run as in preceding method embodiment The image, semantic dividing method.

In practical applications, which can be the weaker equipment of the CPU processing capacity such as mobile phone or tablet computer.This For the concrete type of terminal device without limiting in embodiment.

Front referred to that it is higher that image, semantic dividing method provided by the embodiments of the present application can guarantee that image segmentation has Real-time, while guaranteeing resolution ratio and fineness.In turn, terminal device provided in this embodiment is utilized, it is also ensured that right Terminal device image to be split collected carries out real-time and high-precision segmentation.

As shown in fig. 7, optionally, terminal device provided in this embodiment can also further comprise: display device 603.

As an example, display device 603 can be display screen.After the operation computer program of processor 602 is divided Image perhaps after segmented image adjusted can by after segmentation image or segmented image adjusted be sent to it is aobvious Showing device 603 is shown.

Optionally, terminal device provided in this embodiment can also further comprise: memory 604.Memory 604 is for depositing Store up aforementioned computer program.

In addition, memory 604 can also store the coefficient of image, semantic parted pattern, consequently facilitating transporting in processor 602 During row computer program realizes image segmentation, load image semantic segmentation model executes corresponding cutting operation at any time.

It should be noted that all the embodiments in this specification are described in a progressive manner, each embodiment it Between same and similar part may refer to each other, each embodiment focuses on the differences from other embodiments. For equipment and system embodiment, since it is substantially similar to the method embodiment, so describe fairly simple, The relevent part can refer to the partial explaination of embodiments of method.Equipment and system embodiment described above is only schematic , wherein unit may or may not be physically separated as illustrated by the separation member, as unit prompt Component may or may not be physical unit, it can and it is in one place, or may be distributed over multiple networks On unit.Some or all of the modules therein can be selected to achieve the purpose of the solution of this embodiment according to the actual needs. Those of ordinary skill in the art can understand and implement without creative efforts.

The above, only a kind of specific embodiment of the application, but the protection scope of the application is not limited thereto, Within the technical scope of the present application, any changes or substitutions that can be easily thought of by anyone skilled in the art, Should all it cover within the scope of protection of this application.Therefore, the protection scope of the application should be with scope of protection of the claims Subject to.

Claims

1. a kind of image, semantic dividing method characterized by comprising

Obtain the image to be split comprising target object；

The image to be split is pre-processed, pretreated image is obtained；

Semantic segmentation is carried out to the pretreated image using trained image, semantic parted pattern, after being divided Image；

Described image semantic segmentation model is characterized pyramid network FPN structure, including coding structure and decoding structure；The volume Code structure includes: m Standard convolution block and n residual error convolution block；The decoding structure includes: n Standard convolution block and m+1 Standard convolution block；Wherein, the m is the integer more than or equal to 1, and the n is the integer greater than 1；

In the coding structure, the m Standard convolution block be used to carry out the pretreated image Standard convolution and Down-sampling, each residual error convolution block are used for the output to previous convolution block and carry out residual error convolution sum down-sampling；

In the decoding structure, each Standard convolution block and each residual error convolution block one in the n Standard convolution block One is corresponding, carries out Standard convolution for the output to corresponding residual error convolution block, alternatively, carrying out Standard convolution and up-sampling, obtains To all consistent characteristic pattern of n resolution ratio and port number as a result, making to the summed result that the n characteristic pattern results added obtains For the input of the m+1 Standard convolution block；The m+1 Standard convolution block is used to carry out Standard convolution to the summed result And up-sampling, the image after obtaining the segmentation.

2. image, semantic dividing method according to claim 1, which is characterized in that described to be carried out to the image to be split Pretreatment, specifically includes:

The pixel value of each pixel in the image to be split is normalized into pre-set interval；The pre-set interval be [0,1] or [-1,1]。

3. image, semantic dividing method according to claim 2, which is characterized in that it is described will be in the image to be split The pixel value of each pixel normalizes to before pre-set interval, the method also includes:

Judge whether the image to be split meets first condition and second condition；The first condition is the image to be split The ratio of width to height it is consistent with the ratio of width to height of input picture that described image semantic segmentation model allows；The second condition be it is described to The resolution ratio of segmented image is consistent with the resolution ratio of input picture that described image semantic segmentation model allows；

If the image to be split is unsatisfactory for the first condition and/or the second condition, to the image to be split It is adjusted, so that the ratio of width to height for the input picture that the ratio of width to height of image adjusted and described image semantic segmentation model allow Unanimously, and the resolution ratio of input picture that allows of the resolution ratio of the image adjusted and described image semantic segmentation model Unanimously；

4. image, semantic dividing method according to claim 3, which is characterized in that image after described divided it Afterwards, the method also includes:

Image after the segmentation is adjusted, the segmented image after being adjusted；The width of the segmented image adjusted It is high more consistent than with the ratio of width to height of the image to be split, the resolution ratio of the segmented image adjusted and the image to be split Resolution ratio it is consistent.

5. image, semantic dividing method according to claim 1-4, which is characterized in that further include:

Obtain the training image comprising the first object；

The training image is pre-processed, pretreated training image is obtained；

Obtain the corresponding segmentation result of the pretreated training image；The pretreated training image and the pre- place The corresponding segmentation result of training image after reason is collectively as training set；

6. image, semantic dividing method according to claim 5, which is characterized in that treated described using the training set Before training pattern is trained, the method also includes:

Using the training set, successively carried out to described to training pattern according to the initial learning rate and the decaying learning rate Training.

7. a kind of image, semantic segmenting device characterized by comprising

Image segmentation module, it is semantic for being carried out using trained image, semantic parted pattern to the pretreated image Segmentation, the image after being divided；

8. image, semantic segmenting device according to claim 7, which is characterized in that described image preprocessing module, specifically Include:

Normalization unit, for the pixel value of each pixel in the image to be split to be normalized to pre-set interval；It is described pre- If section is [0,1] or [- 1,1].

9. image, semantic segmenting device according to claim 8, which is characterized in that described image preprocessing module is also wrapped It includes:

Judging unit, for judging whether the image to be split meets first condition and second condition；The first condition is The ratio of width to height of the image to be split is consistent with the ratio of width to height of input picture that described image semantic segmentation model allows；Described Two conditions are the resolution ratio one of the resolution ratio of the image to be split and the input picture of described image semantic segmentation model permission It causes；

Image control unit is right for when the image to be split is unsatisfactory for the first condition and/or the second condition The image to be split is adjusted so that the ratio of width to height of image adjusted allow with described image semantic segmentation model it is defeated The ratio of width to height for entering image is consistent, and the resolution ratio of the image adjusted allow with described image semantic segmentation model it is defeated The resolution ratio for entering image is consistent；

The normalization unit, specifically for the pixel value of each pixel in the image adjusted is normalized to preset areas Between.

10. image, semantic segmenting device according to claim 9, which is characterized in that further include:

Image adjustment module, for being adjusted to the image after the segmentation, the segmented image after being adjusted；The adjustment The ratio of width to height of segmented image afterwards is consistent with the ratio of width to height of the image to be split, the resolution ratio of the segmented image adjusted It is consistent with the resolution ratio of the image to be split.

11. according to the described in any item image, semantic segmenting devices of claim 7-10, which is characterized in that further include: model instruction Practice module, for obtaining the training image comprising the first object；The training image is pre-processed, is obtained pretreated Training image；Obtain the corresponding segmentation result of the pretreated training image；The pretreated training image and institute The corresponding segmentation result of pretreated training image is stated collectively as training set；Using the training set treat training pattern into Row training, obtains described image semantic segmentation model.

12. image, semantic segmenting device according to claim 11, which is characterized in that further include: learning rate setup module, For initial learning rate and decaying learning rate to be arranged；The initial learning rate is greater than the decaying learning rate；

The model training module is specifically used for utilizing the training set, successively according to the initial learning rate and the decaying Learning rate is trained to described to training pattern.

13. a kind of terminal device characterized by comprising photographic device and processor；

The photographic device is sent to institute for acquiring the image to be split comprising target object, and by the image to be split State processor；

The processor, for running computer program, described program executes as claimed in any one of claims 1 to 6 when running Image, semantic dividing method.