CN110807784A - Method and device for segmenting an object - Google Patents

Method and device for segmenting an object Download PDF

Info

Publication number
CN110807784A
CN110807784A CN201911047016.7A CN201911047016A CN110807784A CN 110807784 A CN110807784 A CN 110807784A CN 201911047016 A CN201911047016 A CN 201911047016A CN 110807784 A CN110807784 A CN 110807784A
Authority
CN
China
Prior art keywords
target object
object image
information
position information
pixel
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911047016.7A
Other languages
Chinese (zh)
Other versions
CN110807784B (en
Inventor
李莹莹
谭啸
孙昊
文石磊
丁二锐
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201911047016.7A priority Critical patent/CN110807784B/en
Publication of CN110807784A publication Critical patent/CN110807784A/en
Application granted granted Critical
Publication of CN110807784B publication Critical patent/CN110807784B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/11Region-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/70Determining position or orientation of objects or cameras
    • G06T7/73Determining position or orientation of objects or cameras using feature-based methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/10Image acquisition modality
    • G06T2207/10024Color image
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20084Artificial neural networks [ANN]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/30Subject of image; Context of image processing
    • G06T2207/30248Vehicle exterior or interior
    • G06T2207/30252Vehicle exterior; Vicinity of vehicle

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)

Abstract

The embodiment of the disclosure discloses a method and a device for segmenting an object. The method comprises the following steps: acquiring a target object image; coding the position information of the target object image to obtain coding information; embedding the coding information as constraints into the following positions of the object segmentation model: at least one layer of an encoder and at least one layer of a decoder; and inputting the target object image into the object segmentation model embedded with the coding information to obtain a segmentation result of the target object in the target object image. In the method, the encoder and the decoder of the object segmentation model respectively introduce the encoding information of the position information of the image as constraints, so that the segmentation result of the target object output by the object segmentation model is more accurate.

Description

Method and device for segmenting an object
Technical Field
The present disclosure relates to the field of computer technologies, and in particular, to the field of image recognition technologies, and in particular, to a method and an apparatus for segmenting an object.
Background
In a scene such as V2X (vehicle to outside information exchange), it is necessary to analyze the road surface condition, and lane line segmentation is one of the important links.
Currently, for lane line segmentation, a deep semantic segmentation model is mainly adopted, and more general segmentation models include FCN, deeplab series, pspnet and the like, light-weight semantic segmentation models include net, shufflenet and the like, lane line segmentation models include LaneNet and the like, and the segmentation models are mainly trained based on rgb images to predict categories for each pixel, so as to obtain segmentation results.
Most of the existing segmentation models directly extract image features by using CNN, and commonly use an encoder-decoder structure, wherein an encoder module gradually reduces feature map resolution, captures high-level semantic information, and a decoder module gradually restores spatial information.
Disclosure of Invention
The embodiment of the disclosure provides a method and a device for segmenting an object.
In a first aspect, an embodiment of the present disclosure provides a method for segmenting an object, including: acquiring a target object image; coding the position information of the target object image to obtain coding information; embedding the coding information as constraints into the following positions of the object segmentation model: at least one layer of an encoder and at least one layer of a decoder; and inputting the target object image into the object segmentation model embedded with the coding information to obtain a segmentation result of the target object in the target object image.
In some embodiments, encoding the position information of the target object image, and obtaining the encoded information includes: dividing the position information of the pixels in the target object image by the reference value to obtain angle information of the position information in each dimension code; the reference value is a power operation value, the power operation is based on the ratio of the pixel amplitude of the target object image to 2 pi, and the ratio of the current dimension to the total dimension is an exponent; and acquiring sine codes of angle information coded by the position information in each dimension and cosine codes of angle information coded by the position information in each dimension.
In some embodiments, encoding the position information of the target object image comprises: encoding width position information of pixels in the target object image and encoding height position information of pixels in the target object image; the pixel amplitudes of the target object image include: when the width position information of the pixel in the target object image is coded, the pixel amplitude of the target object image is the pixel width of the target object image; when encoding the height position information of the pixels in the target object image, the pixel amplitude of the target object image is the pixel height of the target object image.
In some embodiments, encoding the position information of the target object image comprises: encoding pixel position information of the target object image after the target object image is stretched into one dimension; the pixel amplitudes of the target object image include: the pixel amplitude of the target object image is the pixel size of the target object image.
In some embodiments, embedding the encoding information as constraints into an encoder and a decoder of the object segmentation model comprises: the coding information is embedded as constraints to the first layer of the encoder and the last layer of the decoder of the object segmentation model.
In some embodiments, the target object comprises one or more of: lane lines, traffic lights, traffic signs, curbs, trash cans, billboards, trees, and buildings.
In a second aspect, an embodiment of the present disclosure provides an apparatus for segmenting an object, including: an image acquisition unit configured to acquire a target object image; the information coding unit is configured to code the position information of the target object image to obtain coded information; an information embedding unit configured to embed the encoded information as a constraint to the following positions of the object segmentation model: at least one layer of an encoder and at least one layer of a decoder; and the result determining unit is configured to input the target object image into the object segmentation model embedded with the coding information, and obtain a segmentation result of the target object in the target object image.
In some embodiments, the information encoding unit includes: an angle information determining unit configured to divide the position information of the pixel in the target object image by the reference value to obtain angle information of position information coded in each dimension; the reference value is a power operation value, the power operation is based on the ratio of the pixel amplitude of the target object image to 2 pi, and the ratio of the current dimension to the total dimension is an exponent; an angle information encoding unit configured to acquire a sine code of angle information of which position information is encoded in each dimension and a cosine code of angle information of which position information is encoded in each dimension.
In some embodiments, the information encoding unit is further configured to: encoding width position information of pixels in the target object image and encoding height position information of pixels in the target object image; the pixel amplitude of the target object image in the angle information determination unit includes: when the width position information of the pixel in the target object image is coded, the pixel amplitude of the target object image is the pixel width of the target object image; when encoding the height position information of the pixels in the target object image, the pixel amplitude of the target object image is the pixel height of the target object image.
In some embodiments, the information encoding unit is further configured to: encoding pixel position information of the target object image after the target object image is stretched into one dimension; the pixel amplitude of the target object image in the angle information determination unit includes: the pixel amplitude of the target object image is the pixel size of the target object image.
In some embodiments, the information embedding unit is further configured to: the coding information is embedded as constraints to the first layer of the encoder and the last layer of the decoder of the object segmentation model.
In some embodiments, the target object comprises one or more of: lane lines, traffic lights, traffic signs, curbs, trash cans, billboards, trees, and buildings.
In a third aspect, an embodiment of the present disclosure provides an electronic device/terminal/server, including: one or more processors; storage means for storing one or more programs; when the one or more programs are executed by the one or more processors, the one or more processors implement the method for segmenting an object as described in any one of the above.
In a fourth aspect, the embodiments of the present disclosure provide a computer readable medium, on which a computer program is stored, which when executed by a processor, implements the method for segmenting an object as described in any one of the above.
According to the method and the device for segmenting the object, firstly, a target object image is obtained; then, coding the position information of the target object image to obtain coding information; then, the coded information is embedded as a constraint into the following positions of the object segmentation model: at least one layer of an encoder and at least one layer of a decoder; and finally, inputting the target object image into the object segmentation model embedded with the coding information to obtain a segmentation result of the target object in the target object image. In the process, the encoder and the decoder of the object segmentation model respectively introduce the encoding information of the position information of the image as constraints, so that the segmentation result of the target object output by the object segmentation model is more accurate.
Drawings
Other features, objects, and advantages of the disclosure will become more apparent upon reading of the following detailed description of non-limiting embodiments thereof, with reference to the accompanying drawings in which:
FIG. 1 is an exemplary system architecture diagram in which the present disclosure may be applied;
FIG. 2 is a schematic flow chart diagram illustrating one embodiment of a method for segmenting an object in accordance with an embodiment of the present disclosure;
FIG. 3 is an exemplary application scenario of a method for segmenting objects according to an embodiment of the present disclosure;
FIG. 4 is a schematic flow chart diagram illustrating one embodiment of a method for determining encoding information in a method for segmenting an object according to the present disclosure;
FIG. 5 is an exemplary block diagram of one embodiment of an apparatus for segmenting objects of the present disclosure;
FIG. 6 is a schematic block diagram of a computer system suitable for use with a server embodying embodiments of the present disclosure.
Detailed Description
The present disclosure is described in further detail below with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the relevant invention and not restrictive of the invention. It should be noted that, for convenience of description, only the portions related to the related invention are shown in the drawings.
It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict. The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments.
Fig. 1 shows an exemplary system architecture 100 to which embodiments of the presently disclosed method for segmenting an object or apparatus for segmenting an object may be applied.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have various communication client applications installed thereon, such as a browser application, a shopping application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like.
The terminal apparatuses 101, 102, and 103 may be hardware or software. When the terminal devices 101, 102, 103 are hardware, they may be various electronic devices that support browser applications, including but not limited to tablet computers, laptop portable computers, desktop computers, and the like. When the terminal apparatuses 101, 102, 103 are software, they can be installed in the electronic apparatuses listed above. It may be implemented, for example, as multiple software or software modules to provide distributed services, or as a single software or software module. And is not particularly limited herein.
The server 105 may be a server providing various services, such as a background server providing support for browser applications running on the terminal devices 101, 102, 103. The background server can analyze and process the received data such as the request and feed back the processing result to the terminal equipment.
The server may be hardware or software. When the server is hardware, it may be implemented as a distributed server cluster formed by multiple servers, or may be implemented as a single server. When the server is software, it may be implemented as multiple pieces of software or software modules, for example, to provide distributed services, or as a single piece of software or software module. And is not particularly limited herein.
In practice, the method for segmenting the object provided by the embodiment of the present disclosure may be performed by the terminal device 101, 102, 103 and/or the server 105, 106, and the apparatus for segmenting the object may also be disposed in the terminal device 101, 102, 103 and/or the server 105, 106.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
With continued reference to fig. 2, fig. 2 illustrates a flow 200 of one embodiment of a method for segmenting an object according to the present disclosure. The method for segmenting an object comprises the steps of:
step 201, acquiring a target object image.
In the present embodiment, an execution subject (e.g., a terminal or a server shown in fig. 1) of the method for segmenting an object may acquire a target object image from which a target object needs to be segmented, through a local or remote database. The target object here refers to an object that needs to be segmented by a person skilled in the art. In one particular example scenario, the target object may include one or more of: lane lines, traffic lights, traffic signs, curbs, trash cans, billboards, trees, buildings, and the like.
Step 202, encoding the position information of the target object image to obtain encoded information.
In this embodiment, the executing body may acquire position information of the target object image, and then encode the position information, thereby obtaining encoded information. The position information of the target object image may be position information that can indicate an image feature of the target object. For example, the position information of the target object image may be: position information of a pixel of the target object image, position information of a feature point of the target object image, position information of a color of the target object image, position information of a texture of the target object image, or the like.
Step 203, embedding the coded information as a constraint into the following positions of the object segmentation model: at least one layer of an encoder and at least one layer of a decoder.
In this embodiment, the execution body may use the encoding information as a constraint and embed the constraint in any one layer of the multilayer network of the encoder and any one layer of the multilayer network of the decoder of the object segmentation model, or embed the constraint in all layers of the multilayer network of the encoder and all layers of the multilayer network of the decoder of the object segmentation model, to improve the accuracy of the features employed by the object segmentation model in the segmentation process.
The object segmentation model may be a model for segmenting an object, which is constructed by an encoder-decoder in the prior art or in a future-developed technology, and is not limited in this application. The object segmentation model mainly comprises an object segmentation model trained on an RGB image, and a class is predicted for each pixel, so that a segmentation result is obtained. The encoder-decoder structure is also an encoder-decoder structure, the encoder module gradually reduces feature map resolution, captures high-level semantic information, and the decoder module gradually restores spatial information.
In some specific examples, the object segmentation model may be an object segmentation model based on a Full Convolution Network (FCN), an object segmentation model based on a deeplab series, an object segmentation model based on a pyramid scene analysis network (pspnet), and the like.
In other specific examples, the object segmentation model may also be a lightweight semantic segmentation model, such as an enert-based object segmentation model, a shufflenet-based object segmentation model, and the like, a lanonet-based lane line segmentation model, and the like.
In some optional implementations of this embodiment, embedding the encoding information as constraints into the encoder and the decoder of the object segmentation model comprises: the coding information is embedded as constraints to the first layer of the encoder and the last layer of the decoder of the object segmentation model.
In the present implementation, by embedding the constraint of encoding information in the first layer of the encoder and the last layer of the decoder of the object segmentation model, it is possible to refer to the position information of the target object image when the encoder first reduces the resolution and captures high-level semantic information, and to refer to the position information of the target object image when the decoder last restores spatial information, thereby taking into account the efficiency and accuracy of object segmentation.
Step 204, inputting the target object image into the object segmentation model embedded with the coding information, and obtaining the segmentation result of the target object in the target object image.
In the present embodiment, after embedding the encoding information as constraints into the encoder and the decoder of the object segmentation model, the target object image may be input into the object segmentation model so as to obtain the segmentation result of the target object in the target object image output by the object segmentation model.
The method for segmenting an object according to the above embodiment of the present disclosure takes the encoded information obtained based on the position information of the target object image as the constraint of object segmentation, and embeds the constraint into the following positions of the object segmentation model: at least one layer of the coder and at least one layer of the decoder can obtain the segmentation result which is output by the object segmentation model and accords with the constraint after the target object image is input into the object segmentation model embedded with the coding information, and the more accurate segmentation result can be obtained because the position information of the target object image is referred in the processes of coding and decoding.
An exemplary application scenario of the method for segmenting an object of the present disclosure is described below in conjunction with fig. 3.
As shown in fig. 3, fig. 3 illustrates one exemplary application scenario of the method for segmenting an object according to the present disclosure.
As shown in fig. 3, a method 300 for segmenting an object operates in an electronic device 310 and may include:
firstly, acquiring a target object image 301;
then, encoding the position information 302 of the target object image 301 to obtain encoded information 303;
thereafter, the encoding information 303 is embedded as constraints in at least one layer of the encoder 305 and at least one layer of the decoder 306 of the object segmentation model 304;
finally, the target object image 301 is input to the object segmentation model 304 in which the encoding information is embedded, and the segmentation result 307 of the target object in the target object image 301 is obtained.
It should be understood that the application scenario of the method for segmenting an object illustrated in fig. 3 is only an exemplary description of the method for segmenting an object, and does not represent a limitation of the method. For example, the steps shown in fig. 3 above may be implemented in further detail.
With further reference to fig. 4, fig. 4 shows a schematic flow chart of an embodiment of a method of determining encoding information in a method for segmenting an object according to the present disclosure.
As shown in fig. 4, the method 400 for determining coding information of the present embodiment may include the following steps:
step 401, dividing the position information of the pixel in the target object image by the reference value to obtain the angle information of the position information encoded in each dimension.
In this embodiment, the implementation subject (e.g., the terminal or the server shown in fig. 1) of the method for segmenting the object may first determine the reference value, and then divide the position information of the pixels in the target object image by the reference value, thereby obtaining the angle information of the position information encoded in each dimension.
The reference value is a power operation value, the power operation is based on the ratio of the pixel amplitude of the target object image to 2 pi, and the ratio of the current dimension to the total dimension is an exponent. Here, the dimension of the encoding can be determined by a skilled person as needed. For example, the skilled person may determine the dimensions of the encoding empirically or according to the application scenario.
The position information of the pixel here may include width position information and/or height position information of the pixel. When the position information of the pixel is coded, the exponentiation takes the ratio of the pixel amplitude of the target object image to 2 pi as the base and the ratio of the current dimension to the total dimension as the exponent, so that the situation that the coding has no repeated angle can be ensured, and the smaller the reference value is, the larger the data difference is, the more the pixel characteristics can be fully utilized, and the distinguishing effect and the accuracy of the coding information on the position information of the pixel are improved.
Step 402, obtaining the sine code of the angle information coded by the position information in each dimension and the cosine code of the angle information coded by the position information in each dimension.
In this embodiment, after obtaining the angle information of the position information encoded in each dimension, the sine code of the angle information of the position information encoded in each dimension and the cosine code of the angle information of the position information encoded in each dimension may be obtained, and the sine code and the cosine code of the angle information of the position information encoded in each dimension are taken as the determined encoded information.
Here, the sinusoidal coding of the angle information, where the position information is coded in various dimensions, means: projection of the position information in the y direction of the angle information encoded in each dimension. Cosine-coded representation of angle information of position information coded in each dimension: projection of position information in x direction of angle information encoded in each dimension.
In some optional implementations of the present embodiment, encoding the position information of the target object image includes: encoding width position information of pixels in the target object image and encoding height position information of pixels in the target object image; the pixel amplitudes of the target object image include: when the width position information of the pixel in the target object image is coded, the pixel amplitude of the target object image is the pixel width of the target object image; when encoding the height position information of the pixels in the target object image, the pixel amplitude of the target object image is the pixel height of the target object image.
In the present implementation, assuming that the width and height of the target object image are W and H, respectively, when encoding the height position information of the pixels in the target object image, the width position information (X-coordinate value of the pixel) POS _ W of all the pixels in the image is extracted, which is encoded in the dimension C:
Figure BDA0002254375230000091
Figure BDA0002254375230000092
wherein the content of the first and second substances,
Figure BDA0002254375230000094
is shown in CjThe encoded angle information, equation, for this dimension is then taken as the sin, cos values for the angle information, respectively. POS _ WiIndicates the width position of the ith pixel,
Figure BDA0002254375230000095
representing a reference value, which is a value of an exponentiation with a base of W/2 pi and an exponent of 2Cj/C。
Here, when W/bandth <2 pi, no repetition angle occurs, while the smaller bandth, the larger the data difference. Therefore, max (bandth) should be a minimum value greater than W/2 pi, i.e., max (bandth) W/2 pi, so the base number is W/2 pi.
When encoding the height position information of the pixels in the target object image, the width position information POS _ H (Y coordinate value of the pixel) of all the pixels in the image is extracted, which is encoded in the dimension C:
Figure BDA0002254375230000102
Figure BDA0002254375230000103
wherein the content of the first and second substances,is shown in CjThe encoded angle information, equation, for this dimension is then taken as the sin, cos values for the angle information, respectively. POS _ HiIndicating the height position of the ith pixel,
Figure BDA0002254375230000105
representing a reference value, which is a value of an exponentiation with a base of H/2 pi and an exponent of 2Cj/C。
Here, when H/bandth <2 pi, no repeat angle occurs, and the smaller the bandth, the larger the data difference. Therefore, max (bandth) should be a minimum value greater than H/2 pi, i.e., max (bandth) H/2 pi, so the base number is H/2 pi.
In the method for determining the coding information in the implementation manner, the width position information and the height position information of the pixel in the target object image are respectively subjected to C-dimensional coding, so that 2C-dimensional coding information can be respectively obtained in the x direction and the y direction, the information content of the coding information is improved, the position information of the pixel in the target object image can be fully adopted when the object is segmented, and the segmentation precision of the object is further improved.
In some optional implementations of the present embodiment, encoding the position information of the target object image includes: encoding pixel position information of the target object image after the target object image is stretched into one dimension; the pixel amplitudes of the target object image include: the pixel amplitude of the target object image is the pixel size of the target object image.
In this implementation, assuming that the width and height of the target object image are W and H, respectively, when the width and height information of the position information of the target object image is encoded together, the width and height position information POS _ WH (pixel position after the picture is stretched into one dimension) of all pixels in the target object image may be extracted and encoded with the dimension C:
Figure BDA0002254375230000106
Figure BDA0002254375230000107
wherein the content of the first and second substances,
Figure BDA0002254375230000112
is shown in CjThe encoded angle information of this dimension is followed by the sin, cos values of the angle, respectively. POS _ WHi: indicates the position of the i-th pixel,
Figure BDA0002254375230000113
representing a reference value which is a value of an exponentiation with a base of W x H/2 pi and an exponent of 2Cj/C。
Here, W × H/bandth <2 π, no repeat angle occurs, while the smaller bandth, the larger the data difference. Therefore, max (bandth) should be a minimum value greater than W × H/2 pi, i.e., max (bandth) ═ W × H/2 pi, so the base number is W × H/2 pi.
In the method for determining the coding information in the implementation manner, the C-dimensional coding is performed on the width and height information of the position information of the pixel in the target object image, so that the C-dimensional coding information can be obtained in the x direction and the y direction respectively, the information content of the coding information is improved, and the constrained computation amount is reduced, so that the position information of the pixel in the target object image can be fully utilized when the object is segmented, and the segmentation precision and the segmentation efficiency of the object are improved.
The method for segmenting the object in the embodiment of fig. 4 of the present disclosure may obtain the encoding information based on the position information of the target object image on the basis of the method for segmenting the object shown in fig. 2, and improves the distinguishing effect of the encoding information on the position information of the pixel and the accuracy of the encoding information.
With further reference to fig. 5, as an implementation of the methods shown in the above-mentioned figures, an embodiment of the present disclosure provides an embodiment of an apparatus for segmenting an object, where the embodiment of the apparatus corresponds to the embodiment of the method shown in fig. 2 to 4, and the apparatus may be specifically applied to an apparatus including a publishing terminal and a service terminal.
As shown in fig. 5, the apparatus 500 for segmenting an object of the present embodiment may include: an image acquisition unit 510 configured to acquire a target object image; an information encoding unit 520 configured to encode the position information of the target object image to obtain encoded information; an information embedding unit 530 configured to embed the encoding information as constraints to the following positions of the object segmentation model: at least one layer of an encoder and at least one layer of a decoder; and a result determining unit 540 configured to input the target object image into the object segmentation model embedded with the encoding information, and obtain a segmentation result of the target object in the target object image.
In some embodiments, the information encoding unit 520 includes (not shown in the figures): an angle information determining unit configured to divide the position information of the pixel in the target object image by the reference value to obtain angle information of position information coded in each dimension; the reference value is a power operation value, the power operation is based on the ratio of the pixel amplitude of the target object image to 2 pi, and the ratio of the current dimension to the total dimension is an exponent; an angle information encoding unit configured to acquire a sine code of angle information of which position information is encoded in each dimension and a cosine code of angle information of which position information is encoded in each dimension.
In some embodiments, the information encoding unit 520 is further configured to: encoding width position information of pixels in the target object image and encoding height position information of pixels in the target object image; the pixel amplitude of the target object image in the angle information determination unit includes: when the width position information of the pixel in the target object image is coded, the pixel amplitude of the target object image is the pixel width of the target object image; when encoding the height position information of the pixels in the target object image, the pixel amplitude of the target object image is the pixel height of the target object image.
In some embodiments, the information encoding unit 520 is further configured to: encoding pixel position information of the target object image after the target object image is stretched into one dimension; the pixel amplitude of the target object image in the angle information determination unit includes: the pixel amplitude of the target object image is the pixel size of the target object image.
In some embodiments, the information embedding unit 530 is further configured to: the coding information is embedded as constraints to the first layer of the encoder and the last layer of the decoder of the object segmentation model.
In some embodiments, the target object comprises one or more of: lane lines, traffic lights, traffic signs, curbs, trash cans, billboards, trees, and buildings.
It should be understood that the various elements recited in the apparatus 500 correspond to the various steps recited in the method described with reference to fig. 2-4. Thus, the operations and features described above for the method are equally applicable to the apparatus 500 and the various units included therein and will not be described again here.
Referring now to fig. 6, a schematic diagram of an electronic device (e.g., the server or terminal device of fig. 1) 600 suitable for use in implementing embodiments of the present disclosure is shown. Terminal devices in embodiments of the present disclosure may include, but are not limited to, devices such as notebook computers, desktop computers, and the like. The terminal device/server shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present disclosure.
As shown in fig. 6, electronic device 600 may include a processing means (e.g., central processing unit, graphics processor, etc.) 601 that may perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)602 or a program loaded from a storage means 608 into a Random Access Memory (RAM) 603. In the RAM603, various programs and data necessary for the operation of the electronic apparatus 600 are also stored. The processing device 601, the ROM 602, and the RAM603 are connected to each other via a bus 604. An input/output (I/O) interface 605 is also connected to bus 604.
Generally, the following devices may be connected to the I/O interface 605: input devices 606 including, for example, a touch screen, touch pad, keyboard, mouse, camera, microphone, accelerometer, gyroscope, etc.; output devices 607 including, for example, a Liquid Crystal Display (LCD), a speaker, a vibrator, and the like; storage 608 including, for example, tape, hard disk, etc.; and a communication device 609. The communication means 609 may allow the electronic device 600 to communicate with other devices wirelessly or by wire to exchange data. While fig. 6 illustrates an electronic device 600 having various means, it is to be understood that not all illustrated means are required to be implemented or provided. More or fewer devices may alternatively be implemented or provided. Each block shown in fig. 6 may represent one device or may represent multiple devices as desired.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication means 609, or may be installed from the storage means 608, or may be installed from the ROM 602. The computer program, when executed by the processing device 601, performs the above-described functions defined in the methods of embodiments of the present disclosure. It should be noted that the computer readable medium described in the embodiments of the present disclosure may be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In embodiments of the disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In embodiments of the present disclosure, however, a computer readable signal medium may comprise a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: electrical wires, optical cables, RF (radio frequency), etc., or any suitable combination of the foregoing.
The computer readable medium may be embodied in the electronic device; or may exist separately without being assembled into the electronic device. The computer readable medium carries one or more programs which, when executed by the electronic device, cause the electronic device to: acquiring a target object image; coding the position information of the target object image to obtain coding information; embedding the coding information as constraints into the following positions of the object segmentation model: at least one layer of an encoder and at least one layer of a decoder; and inputting the target object image into the object segmentation model embedded with the coding information to obtain a segmentation result of the target object in the target object image.
Computer program code for carrying out operations for embodiments of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units described in the embodiments of the present disclosure may be implemented by software or hardware. The described units may also be provided in a processor, and may be described as: a processor includes an image acquisition unit, an information encoding unit, an information embedding unit, and a result determination unit. The names of these units do not in some cases constitute a limitation on the unit itself, and for example, the image acquisition unit may also be described as a "unit that acquires an image of a target object".
The foregoing description is only exemplary of the preferred embodiments of the disclosure and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention in the present disclosure is not limited to the specific combination of the above-mentioned features, but also encompasses other embodiments in which any combination of the above-mentioned features or their equivalents is possible without departing from the inventive concept as defined above. For example, the above features and (but not limited to) the features disclosed in this disclosure having similar functions are replaced with each other to form the technical solution.

Claims (14)

1. A method for segmenting an object, comprising:
acquiring a target object image;
coding the position information of the target object image to obtain coding information;
embedding the encoded information as constraints into the following locations of an object segmentation model: at least one layer of an encoder and at least one layer of a decoder;
and inputting the target object image into the object segmentation model embedded with the coding information to obtain a segmentation result of the target object in the target object image.
2. The method of claim 1, wherein encoding the position information of the target object image, resulting in encoded information comprises:
dividing the position information of the pixels in the target object image by the reference value to obtain angle information of the position information in each dimension code; the reference value is a value of power operation, the power operation takes the ratio of the pixel amplitude of the target object image to 2 pi as a base and the ratio of the current dimension to the total dimension as an exponent;
and acquiring sine codes of the angle information of the position information coded in each dimension and cosine codes of the angle information of the position information coded in each dimension.
3. The method of claim 2, wherein the encoding the position information of the target object image comprises: encoding width position information of pixels in the target object image and encoding height position information of pixels in the target object image;
the pixel amplitude of the target object image comprises: when the width position information of the pixel in the target object image is coded, the pixel amplitude of the target object image is the pixel width of the target object image; when encoding the height position information of the pixels in the target object image, the pixel amplitude of the target object image is the pixel height of the target object image.
4. The method of claim 2, wherein the encoding the position information of the target object image comprises: encoding pixel position information of the target object image after being stretched into one dimension;
the pixel amplitude of the target object image comprises: the pixel amplitude of the target object image is the pixel size of the target object image.
5. The method of claim 1, wherein said embedding the coding information as constraints into an encoder and a decoder of an object segmentation model comprises:
embedding the coding information as a constraint into a first layer of an encoder and a last layer of a decoder of an object segmentation model.
6. The method of claim 1, wherein the target object comprises one or more of: lane lines, traffic lights, traffic signs, curbs, trash cans, billboards, trees, and buildings.
7. An apparatus for segmenting an object, comprising:
an image acquisition unit configured to acquire a target object image;
an information encoding unit configured to encode the position information of the target object image to obtain encoded information;
an information embedding unit configured to embed the encoded information as a constraint to the following locations of an object segmentation model: at least one layer of an encoder and at least one layer of a decoder;
and the result determining unit is configured to input the target object image into the object segmentation model embedded with the coding information to obtain a segmentation result of the target object in the target object image.
8. The apparatus of claim 7, wherein the information encoding unit comprises:
an angle information determining unit configured to divide the position information of the pixel in the target object image by the reference value to obtain angle information of position information coded in each dimension; the reference value is a value of power operation, the power operation takes the ratio of the pixel amplitude of the target object image to 2 pi as a base and the ratio of the current dimension to the total dimension as an exponent;
an angle information encoding unit configured to acquire a sine code of angle information encoded in each dimension by the position information and a cosine code of angle information encoded in each dimension by the position information.
9. The apparatus of claim 8, wherein the information encoding unit is further configured to: encoding width position information of pixels in the target object image and encoding height position information of pixels in the target object image;
the pixel amplitude of the target object image in the angle information determination unit includes: when the width position information of the pixel in the target object image is coded, the pixel amplitude of the target object image is the pixel width of the target object image; when encoding the height position information of the pixels in the target object image, the pixel amplitude of the target object image is the pixel height of the target object image.
10. The apparatus of claim 8, wherein the information encoding unit is further configured to: encoding pixel position information of the target object image after being stretched into one dimension;
the pixel amplitude of the target object image in the angle information determination unit includes: the pixel amplitude of the target object image is the pixel size of the target object image.
11. The apparatus of claim 7, wherein the information embedding unit is further configured to:
embedding the coding information as a constraint into a first layer of an encoder and a last layer of a decoder of an object segmentation model.
12. The apparatus of claim 7, wherein the target object comprises one or more of: lane lines, traffic lights, traffic signs, curbs, trash cans, billboards, trees, and buildings.
13. An electronic device/terminal/server comprising:
one or more processors;
storage means for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-6.
14. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the method according to any one of claims 1-6.
CN201911047016.7A 2019-10-30 2019-10-30 Method and device for segmenting an object Active CN110807784B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911047016.7A CN110807784B (en) 2019-10-30 2019-10-30 Method and device for segmenting an object

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911047016.7A CN110807784B (en) 2019-10-30 2019-10-30 Method and device for segmenting an object

Publications (2)

Publication Number Publication Date
CN110807784A true CN110807784A (en) 2020-02-18
CN110807784B CN110807784B (en) 2022-07-26

Family

ID=69489668

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911047016.7A Active CN110807784B (en) 2019-10-30 2019-10-30 Method and device for segmenting an object

Country Status (1)

Country Link
CN (1) CN110807784B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113610754A (en) * 2021-06-28 2021-11-05 浙江文谷科技有限公司 Defect detection method and system based on Transformer

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103971377A (en) * 2014-05-27 2014-08-06 中国科学院遥感与数字地球研究所 Building extraction method based on prior shape level set segmentation
CN108648194A (en) * 2018-04-23 2018-10-12 清华大学 Based on the segmentation of CAD model Three-dimensional target recognition and pose measuring method and device
CN110176027A (en) * 2019-05-27 2019-08-27 腾讯科技(深圳)有限公司 Video target tracking method, device, equipment and storage medium
CN110264522A (en) * 2019-06-24 2019-09-20 北京百度网讯科技有限公司 Detection method, device, equipment and the storage medium of object manipulation person

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103971377A (en) * 2014-05-27 2014-08-06 中国科学院遥感与数字地球研究所 Building extraction method based on prior shape level set segmentation
CN108648194A (en) * 2018-04-23 2018-10-12 清华大学 Based on the segmentation of CAD model Three-dimensional target recognition and pose measuring method and device
CN110176027A (en) * 2019-05-27 2019-08-27 腾讯科技(深圳)有限公司 Video target tracking method, device, equipment and storage medium
CN110264522A (en) * 2019-06-24 2019-09-20 北京百度网讯科技有限公司 Detection method, device, equipment and the storage medium of object manipulation person

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
马北川 等: "基于形状先验和轮廓预定位的目标分割", 《北京工业大学学报》 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113610754A (en) * 2021-06-28 2021-11-05 浙江文谷科技有限公司 Defect detection method and system based on Transformer
CN113610754B (en) * 2021-06-28 2024-05-07 浙江文谷科技有限公司 Defect detection method and system based on transducer

Also Published As

Publication number Publication date
CN110807784B (en) 2022-07-26

Similar Documents

Publication Publication Date Title
US10650236B2 (en) Road detecting method and apparatus
CN112396613B (en) Image segmentation method, device, computer equipment and storage medium
CN110413812B (en) Neural network model training method and device, electronic equipment and storage medium
CN112668588B (en) Parking space information generation method, device, equipment and computer readable medium
CN110222775B (en) Image processing method, image processing device, electronic equipment and computer readable storage medium
CN111310770B (en) Target detection method and device
CN109118456B (en) Image processing method and device
CN111209856B (en) Invoice information identification method and device, electronic equipment and storage medium
CN113408507B (en) Named entity identification method and device based on resume file and electronic equipment
CN110827341A (en) Picture depth estimation method and device and storage medium
CN114581336A (en) Image restoration method, device, equipment, medium and product
CN114463769A (en) Form recognition method and device, readable medium and electronic equipment
CN110807784B (en) Method and device for segmenting an object
CN110751251B (en) Method and device for generating and transforming two-dimensional code image matrix
CN112884780A (en) Estimation method and system for human body posture
US20140101180A1 (en) Mapping Infrastructure Layout Between Non-Corresponding Datasets
CN109816670B (en) Method and apparatus for generating image segmentation model
CN111967332A (en) Visibility information generation method and device for automatic driving
CN115984868A (en) Text processing method, device, medium and equipment
CN111611420B (en) Method and device for generating image description information
CN115760607A (en) Image restoration method, device, readable medium and electronic equipment
CN115393423A (en) Target detection method and device
CN114627023A (en) Image restoration method, device, equipment, medium and product
CN114596203A (en) Method and apparatus for generating images and for training image generation models
CN114004229A (en) Text recognition method and device, readable medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant