CN111950570B

CN111950570B - Target image extraction method, neural network training method and device

Info

Publication number: CN111950570B
Application number: CN202010871161.3A
Authority: CN
Inventors: 颜波
Original assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Current assignee: Guangdong Oppo Mobile Telecommunications Corp Ltd
Priority date: 2020-08-26
Filing date: 2020-08-26
Publication date: 2023-11-21
Anticipated expiration: 2040-08-26
Also published as: WO2022042120A1; CN111950570A

Abstract

The disclosure provides a target image extraction method, a neural network training method and a device, and relates to the technical field of image processing. The method comprises the following steps: performing first feature extraction on an input image to obtain a reference feature map; performing second feature extraction on the reference feature map to obtain target feature maps, the number of which is the same as that of the vertexes of the target image in the input image; normalizing the feature values of each target feature map to obtain probability feature maps corresponding to each target feature map one by one; acquiring coordinate feature maps corresponding to the probability feature maps according to the probability feature maps, and determining the target vertex coordinates of each vertex of the target image by utilizing the probability feature maps and the corresponding coordinate feature maps; and finishing the extraction of the target image according to the target vertex coordinates. The present disclosure improves the accuracy of target image extraction.

Description

Target image extraction method, neural network training method and device

Technical Field

The present disclosure relates to the field of image processing technologies, and in particular, to a neural network training method for target image extraction, a target image extraction method, a neural network training apparatus for target image extraction, a target image extraction apparatus, a computer-readable medium, and an electronic device.

Background

Along with the popularization of intelligent equipment, the extraction of target images in input images has wide application scenes in life.

However, in the extraction method of the target image in the prior art, the probability that each pixel point on the input image is the edge of the target image is mainly calculated, and the association relationship between the pixel points is not considered, so that the extraction result is not accurate enough.

Therefore, it is necessary to design a new extraction method of the target image.

It should be noted that the information disclosed in the above background section is only for enhancing understanding of the background of the present disclosure and thus may include information that does not constitute prior art known to those of ordinary skill in the art.

Disclosure of Invention

The present disclosure aims to provide a neural network training method for target image extraction, a target image extraction method, a neural network training device for target image extraction, a target image extraction device, a computer readable medium and an electronic device, so as to overcome the problem in the prior art that the extraction result of the target image extraction method is not accurate enough at least to a certain extent.

According to a first aspect of the present disclosure, there is provided a target image extraction method including:

Acquiring an input image to be detected, and determining the number of vertexes of a target image in the input image;

extracting first features of the input image to obtain a reference feature map;

performing second feature extraction on the reference feature map to obtain target feature maps, the number of which is the same as that of the vertexes of the target image in the input image;

normalizing the feature values of each target feature map to obtain probability feature maps which are in one-to-one correspondence with each target feature map;

acquiring a coordinate feature map corresponding to each probability feature map according to each probability feature map, and determining the target vertex coordinates of each vertex of the target image by using each probability feature map and the corresponding coordinate feature map;

and completing the extraction of the target image according to the target vertex coordinates.

According to a second aspect of the present disclosure, there is provided a neural network training method for target image extraction, comprising:

acquiring sample data, wherein the sample data comprises an input image and standard vertex coordinates of a target image in the input image;

extracting first features of the input image to obtain a reference feature map;

and updating parameters of the neural network according to the target vertex coordinates and the standard vertex coordinates.

According to a third aspect of the present disclosure, there is provided a neural network training device for target image extraction, comprising:

the data acquisition module is used for acquiring sample data, wherein the sample data comprises an input image and standard vertex coordinates of a target image in the input image, and determining the vertex number of the target image in the input image;

the first feature extraction module is used for carrying out first feature extraction on the input image to obtain a reference feature map;

the second feature extraction module is used for carrying out second feature extraction on the reference feature image to obtain target feature images, the number of which is the same as that of the vertexes of the target image in the input image;

The first image processing module is used for carrying out normalization processing on the characteristic values of the target characteristic images to obtain probability characteristic images corresponding to the target characteristic images one by one;

the first coordinate determining module is used for acquiring coordinate feature graphs corresponding to the probability feature graphs according to the probability feature graphs and determining the target vertex coordinates of each vertex of the target image by utilizing the probability feature graphs and the corresponding coordinate feature graphs;

and the parameter updating module is used for updating the parameters of the neural network according to the target vertex coordinates and the standard vertex coordinates.

According to a fourth aspect of the present disclosure, there is provided a target image extraction apparatus characterized by comprising:

the third feature extraction module is used for carrying out first feature extraction on the input image to obtain a reference feature map;

the fourth feature extraction module is used for carrying out second feature extraction on the reference feature image to obtain target feature images, the number of which is the same as that of the vertexes of the target image in the input image;

the second image processing module is used for carrying out normalization processing on the characteristic values of the target characteristic images to obtain probability characteristic images corresponding to the target characteristic images one by one;

The second coordinate determining module is used for acquiring coordinate feature graphs corresponding to the probability feature graphs according to the probability feature graphs and determining the target vertex coordinates of each vertex of the target image by utilizing the probability feature graphs and the corresponding coordinate feature graphs;

and the target image extraction module is used for completing the extraction of the target image according to the target vertex coordinates.

According to a fifth aspect of the present disclosure, there is provided a computer readable medium having stored thereon a computer program which, when executed by a processor, implements the method described above.

According to a sixth aspect of the present disclosure, there is provided an electronic apparatus, comprising:

a processor; and

and a memory for storing one or more programs that, when executed by the one or more processors, cause the one or more processors to implement the methods described above.

According to the target image extraction method provided by the embodiment of the disclosure, the reference feature image obtained by carrying out first feature extraction on the input image is subjected to second feature extraction to obtain target feature images with the same number as that of the vertices of the target image in the input image, then the feature values of the target feature images are normalized to obtain probability feature images, finally the coordinate feature images matched with the probability feature images are obtained by utilizing the probability feature images, so that the target vertex coordinates of each vertex of the target image in the input image are obtained, and then the extraction of the target image is completed. Compared with the prior art, the coordinate feature map is obtained by utilizing the obtained probability feature map, the coordinate feature map and the probability feature map are combined to obtain the coordinates of the target vertexes, and the information of the probability feature map is more comprehensively used by utilizing the coordinate feature map, so that the extraction precision of the target image is improved.

Further, the specification discloses a method of matrix point formation by using the probability feature map and the coordinate feature map to determine the target vertex coordinates of the target image, and each feature value in the probability feature map can be considered, so that the accuracy of the obtained target vertex coordinates is higher, and the accuracy of the extracted target image is further improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the disclosure and together with the description, serve to explain the principles of the disclosure. It will be apparent to those of ordinary skill in the art that the drawings in the following description are merely examples of the disclosure and that other drawings may be derived from them without undue effort. In the drawings:

FIG. 1 illustrates a schematic diagram of an exemplary system architecture to which embodiments of the present disclosure may be applied;

FIG. 2 shows a schematic diagram of an electronic device to which embodiments of the present disclosure may be applied;

FIG. 3 schematically illustrates a flow chart of a target image extraction method in an exemplary embodiment of the present disclosure;

FIG. 4 schematically illustrates a flow chart of a first feature extraction of an input image in an exemplary embodiment of the disclosure;

fig. 5 schematically illustrates a schematic diagram of a bottleneck with a step size of 1 in an exemplary embodiment of the present disclosure;

fig. 6 schematically illustrates a schematic diagram of a structure of a bottleneck with a step size of 2 in an exemplary embodiment of the present disclosure;

FIG. 7 schematically illustrates a schematic diagram of a probability signature in an exemplary embodiment of the present disclosure;

FIG. 8 schematically illustrates a schematic view of a first coordinate direction coordinate feature map in an exemplary embodiment of the present disclosure;

FIG. 9 schematically illustrates a schematic diagram of a second coordinate direction coordinate feature map in an exemplary embodiment of the present disclosure;

fig. 10 schematically illustrates a frame diagram of a target image extraction method in an exemplary embodiment of the present disclosure;

FIG. 11 schematically illustrates a schematic diagram of a user interface of a target image extraction method in an exemplary embodiment of the present disclosure;

FIG. 12 schematically illustrates a flowchart of a neural network training method for target image extraction in an exemplary embodiment of the present disclosure;

FIG. 13 schematically illustrates a composition diagram of a neural network training device for target image extraction in an exemplary embodiment of the present disclosure;

Fig. 14 schematically illustrates a composition diagram of a target image extraction apparatus in an exemplary embodiment of the present disclosure.

Detailed Description

Example embodiments will now be described more fully with reference to the accompanying drawings. However, the exemplary embodiments may be embodied in many forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of the example embodiments to those skilled in the art. The described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments.

Furthermore, the drawings are merely schematic illustrations of the present disclosure and are not necessarily drawn to scale. The same reference numerals in the drawings denote the same or similar parts, and thus a repetitive description thereof will be omitted. Some of the block diagrams shown in the figures are functional entities and do not necessarily correspond to physically or logically separate entities. These functional entities may be implemented in software or in one or more hardware modules or integrated circuits or in different networks and/or processor devices and/or microcontroller devices.

Fig. 1 illustrates a schematic diagram of a system architecture of an exemplary application environment to which a target image extraction method and apparatus of embodiments of the present disclosure may be applied.

As shown in fig. 1, the system architecture 100 may include one or more of the terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others. The terminal devices 101, 102, 103 may be various electronic devices having image processing functions including, but not limited to, desktop computers, portable computers, smart phones, tablet computers, and the like. It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation. For example, the server 105 may be a server cluster formed by a plurality of servers.

The target image extraction method provided by the embodiments of the present disclosure is generally performed in the terminal apparatuses 101, 102, 103, and accordingly, the target image extraction device is generally provided in the terminal apparatuses 101, 102, 103. However, it will be readily understood by those skilled in the art that the method for extracting a target image provided in the embodiment of the present disclosure may be performed by the server 105, and accordingly, the target image extracting apparatus may be provided in the server 105, which is not particularly limited in the present exemplary embodiment. For example, in an exemplary embodiment, the user may acquire an input image through the terminal device 101, 102, 103, and then upload the input image data to the server 105, and after the server completes extraction of the target image through the target image extraction method provided by the embodiment of the present disclosure, the target image is transmitted to the terminal device 101, 102, 103, and so on.

Exemplary embodiments of the present disclosure provide an electronic device for implementing a target image extraction method, which may be the terminal device 101, 102, 103 or the server 105 in fig. 1. The electronic device comprises at least a processor and a memory for storing executable instructions of the processor, the processor being configured to perform the target image extraction method via execution of the executable instructions.

The configuration of the electronic device will be exemplarily described below using the mobile terminal 200 of fig. 2 as an example. It will be appreciated by those skilled in the art that the configuration of fig. 2 can also be applied to stationary type devices in addition to components specifically for mobile purposes. In other embodiments, mobile terminal 200 may include more or less components than illustrated, or certain components may be combined, or certain components may be split, or different arrangements of components. The illustrated components may be implemented in hardware, software, or a combination of software and hardware. The interfacing relationship between the components is shown schematically only and does not constitute a structural limitation of the mobile terminal 200. In other embodiments, the mobile terminal 200 may also employ a different interface from that of fig. 2, or a combination of interfaces.

As shown in fig. 2, the mobile terminal 200 may specifically include: processor 210, internal memory 221, external memory interface 222, universal serial bus (Universal Serial Bus, USB) interface 230, charge management module 240, power management module 241, battery 242, antenna 1, antenna 2, mobile communication module 250, wireless communication module 260, audio module 270, speaker 271, receiver 272, microphone 273, headset interface 274, sensor module 280, display screen 290, camera module 291, indicator 292, motor 293, keys 294 and user identification module (subscriber identification module, SIM) card interface 295, NFC module 296, etc. Wherein the sensor module 280 may include a depth sensor 2801, a pressure sensor 2802, a gyro sensor 2803, and the like.

Processor 210 may include one or more processing units such as, for example: the processor 210 may include an application processor (Application Processor, AP), a modem processor, a graphics processor (Graphics Processing Unit, GPU), an image signal processor (Image Signal Processor, ISP), a controller, a video codec, a digital signal processor (Digital Signal Processor, DSP), a baseband processor, and/or a Neural network processor (Neural-Network Processing Unit, NPU), etc. Wherein the different processing units may be separate devices or may be integrated in one or more processors.

The NPU is a Neural-Network (NN) computing processor, and can rapidly process input information by referencing a biological Neural Network structure, for example, referencing a transmission mode between human brain neurons, and can also continuously perform self-learning. Applications such as intelligent awareness of the mobile terminal 200 may be implemented by the NPU, for example: image recognition, face recognition, speech recognition, text understanding, etc.

The processor 210 has a memory disposed therein. The memory may store instructions for implementing six modular functions: detection instructions, connection instructions, information management instructions, analysis instructions, data transfer instructions, and notification instructions, and are controlled to be executed by the processor 210.

The charge management module 240 is configured to receive a charge input from a charger. The power management module 241 is used for connecting the battery 242, the charge management module 240 and the processor 210. The power management module 241 receives input from the battery 242 and/or the charge management module 240 and provides power to the processor 210, the internal memory 221, the display 290, the camera module 291, the wireless communication module 260, and the like.

The wireless communication function of the mobile terminal 200 may be implemented by the antenna 1, the antenna 2, the mobile communication module 250, the wireless communication module 260, a modem processor, a baseband processor, and the like. Wherein the antenna 1 and the antenna 2 are used for transmitting and receiving electromagnetic wave signals; the mobile communication module 250 may provide a solution including 2G/3G/4G/5G wireless communication applied to the mobile terminal 200; the modem processor may include a modulator and a demodulator; the Wireless communication module 260 may provide a solution for Wireless communication including Wireless local area network (Wireless Local Area Networks, WLAN) (e.g., wireless fidelity (Wi-Fi) network), bluetooth (BT), etc. applied on the mobile terminal 200. In some embodiments, antenna 1 and mobile communication module 250 of mobile terminal 200 are coupled, and antenna 2 and wireless communication module 260 are coupled, so that mobile terminal 200 may communicate with a network and other devices through wireless communication techniques.

The mobile terminal 200 implements display functions through a GPU, a display screen 290, an application processor, and the like. The GPU is a microprocessor for image processing, and is connected to the display screen 290 and the application processor. The GPU is used to perform mathematical and geometric calculations for graphics rendering. Processor 210 may include one or more GPUs that execute program instructions to generate or change display information.

The mobile terminal 200 may implement a photographing function through an ISP, a camera module 291, a video codec, a GPU, a display screen 290, an application processor, and the like. The ISP is used for processing the data fed back by the camera module 291; the camera module 291 is used for capturing still images or videos; the digital signal processor is used for processing digital signals, and can process other digital signals besides digital image signals; video codec is used to compress or decompress digital video, and the mobile terminal 200 may also support one or more video codecs.

The external memory interface 222 may be used to connect an external memory card, such as a Micro SD card, to enable expansion of the memory capabilities of the mobile terminal 200. The external memory card communicates with the processor 210 via an external memory interface 222 to implement data storage functions. For example, files such as music, video, etc. are stored in an external memory card.

The internal memory 221 may be used to store computer executable program code that includes instructions. The internal memory 221 may include a storage program area and a storage data area. The storage program area may store an application program (such as a sound playing function, an image playing function, etc.) required for at least one function of the operating system, etc. The storage data area may store data (e.g., audio data, phonebook, etc.) created during use of the mobile terminal 200, and the like. In addition, the internal memory 221 may include a high-speed random access memory, and may further include a nonvolatile memory such as at least one magnetic disk storage device, a flash memory device, a universal flash memory (Universal Flash Storage, UFS), and the like. The processor 210 performs various functional applications of the mobile terminal 200 and data processing by executing instructions stored in the internal memory 221 and/or instructions stored in a memory provided in the processor.

The mobile terminal 200 may implement audio functions through an audio module 270, a speaker 271, a receiver 272, a microphone 273, an earphone interface 274, an application processor, and the like. Such as music playing, recording, etc.

The depth sensor 2801 is used to acquire depth information of a scene. In some embodiments, a depth sensor may be provided at the camera module 291.

The pressure sensor 2802 is used to sense a pressure signal, and may convert the pressure signal into an electrical signal. In some embodiments, pressure sensor 2802 may be disposed on display 290. The pressure sensor 2802 is of various types, such as a resistive pressure sensor, an inductive pressure sensor, a capacitive pressure sensor, and the like.

The gyro sensor 2803 may be used to determine a motion gesture of the mobile terminal 200. In some embodiments, the angular velocity of mobile terminal 200 about three axes (i.e., a first coordinate direction, a second coordinate direction, and a z-axis) may be determined by gyro sensor 2803. The gyro sensor 2803 can be used to capture anti-shake, navigation, motion-sensing game scenes, and the like.

In addition, sensors for other functions, such as an air pressure sensor, a magnetic sensor, an acceleration sensor, a distance sensor, a proximity sensor, a fingerprint sensor, a temperature sensor, a touch sensor, an ambient light sensor, a bone conduction sensor, etc., may be provided in the sensor module 280 according to actual needs.

Other devices that provide auxiliary functionality may also be included in mobile terminal 200. For example, the keys 294 include a power-on key, a volume key, etc., by which a user can generate key signal inputs related to user settings and function controls of the mobile terminal 200. As another example, indicator 292, motor 293, SIM card interface 295, and the like.

The specific details of each module in the above apparatus are already described in the method section, and the details that are not disclosed can be referred to the embodiment of the method section, so that they will not be described in detail.

Those skilled in the art will appreciate that the various aspects of the present disclosure may be implemented as a system, method, or program product. Accordingly, various aspects of the disclosure may be embodied in the following forms, namely: an entirely hardware embodiment, an entirely software embodiment (including firmware, micro-code, etc.) or an embodiment combining hardware and software aspects may be referred to herein as a "circuit," module "or" system.

Exemplary embodiments of the present disclosure also provide a computer-readable storage medium having stored thereon a program product capable of implementing the method described above in the present specification. In some possible implementations, various aspects of the disclosure may also be implemented in the form of a program product comprising program code for causing a terminal device to carry out the steps according to the various exemplary embodiments of the disclosure described in the "exemplary methods" section of this specification, when the program product is run on the terminal device, e.g. any one or more of fig. 1, 2 and 10 may be carried out.

It should be noted that the computer readable medium shown in the present disclosure may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

In the context of this disclosure, a computer-readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present disclosure, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

Furthermore, the program code for carrying out operations of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C++ or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).

In the related art, the extraction of the target image is generally completed in two ways, one is that the input image is subjected to gray level conversion and binarization processing and then edge detection is performed, and the method has poor model adaptability and is easily influenced by illumination, deformation and the like. The other is that the deep learning method can automatically extract high-level features from the image, can deal with more complex use scenes, and has good robustness and adaptability, but the existing deep learning method mainly takes a model based on a full convolution neural network as a main part to obtain the probability of whether each pixel point in the input image is a target image edge or not, but the full convolution neural network does not consider the association relation between pixels and loses partial space information, so that the prediction result is not fine enough, and meanwhile, the full convolution neural network has a complex general structure and large calculation amount and cannot be detected on mobile terminal equipment such as mobile phones in real time.

Based on the above drawbacks, the present disclosure first proposes a target image extraction method, which can be used to extract a target image in an input image, for example, to extract an identification card, a bank card, a traffic card, a job, a letter, a book, an invoice, and the like in the input image.

The target image extraction method according to the exemplary embodiment of the present disclosure is specifically described below.

Fig. 3 shows a flow of a target image extraction method in the present exemplary embodiment, including the following steps S310 to S350:

step S310, carrying out first feature extraction on the input image to obtain a reference feature map;

step S320, performing second feature extraction on the reference feature map to obtain target feature maps, the number of which is the same as that of the vertexes of the target image in the input image;

step S330, carrying out normalization processing on the feature values of the target feature graphs to obtain probability feature graphs corresponding to the target feature graphs one by one;

step S340, acquiring coordinate feature maps corresponding to the probability feature maps according to the probability feature maps, and determining the target vertex coordinates of each vertex of the target image by utilizing the probability feature maps and the corresponding coordinate feature maps;

And step S350, extracting the target image according to the target vertex coordinates.

Compared with the prior art, the method and the device have the advantages that the obtained probability feature map is used for obtaining the coordinate feature map, the coordinate feature map and the probability feature map are combined to obtain the coordinates of the target vertexes, the information of the probability feature map is used more comprehensively by using the coordinate feature map, and the extraction precision of the target image is improved.

The above steps are described in detail with reference to examples.

In step S310, a first feature extraction is performed on the input image to obtain a reference feature map.

In an exemplary embodiment of the present disclosure, the server may first obtain an input image to be extracted from a target image, where an image format in the input image may be a bmp format, or may be a jpg format, a png format, a tif format, a gif format, or the like, which is not specifically limited in this exemplary embodiment.

The target image in the input image may be an identification card, a bank card, a traffic card, a job, a letter, a book, an invoice, or other common documents such as a slide show, a poster, or the like, which is not specifically limited in this example embodiment.

In the present exemplary embodiment, when an input image is acquired, the input image may be acquired by receiving a photograph taken directly by the user using a device capable of performing the above-described object detection method, or may be acquired by receiving an input image uploaded locally by the user terminal. After the input image is acquired, the number of vertices of the target image in the input image may be determined, for example, when the target image is an identification card image, the number of vertices of the target image is 4, the number of vertices of the target image may vary according to the target image, for example, when the target image is a poster, the number of vertices of the target may be three, five, or the like, since the shape of the poster may include a plurality of, for example, triangles, pentagons, or the like.

In an exemplary embodiment of the present disclosure, the server may perform feature extraction on the input image at the time of the above to obtain the reference feature map, and may employ a res net network, a mobilenet v1 network, a mobilenet v2 network, or the like to perform feature extraction on the input image, which is not specifically limited in this exemplary embodiment.

In the present exemplary embodiment, as shown in fig. 4, the first feature extraction of the input image to obtain the reference feature map may include steps S410 to S430,

in step S410, an input feature image of the input image is acquired, and a first feature map is obtained by performing feature dimension up-scaling on the input feature image by using a first convolution layer, where the first convolution layer is configured with a nonlinear activation function.

In this exemplary embodiment, the input feature image of the input image may be obtained first, the input feature image of the input image may be obtained by adopting a conv2d function, then the feature up-scaling expansion may be performed on the input image by using the first convolution layer to obtain the first feature map, the expansion multiple may be 2 times, 3 times, 4 times, or the like, or may be customized according to the requirement, and in this exemplary embodiment, the method is not specifically limited. The first convolution layer is configured with a nonlinear activation function, specifically, the first convolution layer may include a 1×1 2-dimensional convolution kernel, first convolve the input image, and set a larger number of channels so as to provide a larger first feature map, for example, the size of the input image is 800×600×3, and the first convolution kernel may be set to a number of channels greater than 3 so as to enable up-scaling of the input image to be completed, for example, the number of channels is set to 12, 15, and the like, which may be set in a user-defined manner according to a user's requirement.

In this exemplary embodiment, the first convolution layer is configured with a nonlinear activation function, and the nonlinear activation may be a ReLU (Rectified linear function, linear rectification function) or a nonlinear activation function such as a logic function or a hyperbolic tangent function, which is not specifically limited in this exemplary embodiment.

In step S420, feature extraction is performed on the first feature map by using a depth-separable convolution layer to obtain a second feature map, where the depth-separable convolution layer is configured with a nonlinear activation function.

In this exemplary embodiment, the server may perform feature extraction on the first feature image through the depth-separable convolution layer to obtain the second feature image, where the second convolution layer may include a 3×3 convolution kernel for performing depth-separable convolution on the first feature image to complete feature extraction, and the second convolution layer is configured with a nonlinear activation function, which has been described in detail above, and therefore, will not be described herein.

In step S430, the second feature map is subjected to dimension reduction processing by using a second convolution layer to obtain the reference feature map, where the second convolution layer is configured with a linear activation function.

In this example embodiment, after the extraction of the second feature image is completed, the server may perform the dimension reduction processing on the second feature image by using a second convolution layer, where the second convolution layer may include a 2-dimensional convolution kernel of 1×1 and the second convolution layer performs filtering processing on the input channel, but cannot generate new features in combination with the features of each channel, so the second feature image may perform convolution operation to linearly combine the output of the depth convolution, and may compress the channels thereof, and reduce the dimension back to the original dimension to obtain the reference feature image, and since the non-limiting activation function may destroy the features at the low-level control, the second convolution layer is configured with a linear activation function, where the variety of the linear activation function is more, in this example embodiment, a linear function with a form of f (x) =x may be used, or other forms of linear activation function may be set according to the user requirement, and is not specifically limited in this example embodiment.

In an example embodiment of the present disclosure, the server may employ a MobileNetV2 network to perform feature extraction on the input image, so as to obtain a reference image, specifically, may first determine a target output layer of the MobileNetV2 network, may input the input image to the MobileNetV2 network, and uses an output of a MobileNetV2 network target from the output layer as the reference feature map.

In this exemplary embodiment, when the size of the input image is 224×224×3, the following table may be referred to for a specific operation procedure of the mobilenetv2 network:

table 1 operation of mobilenet v2 network with input image size 224 x 3

In this exemplary embodiment, referring to fig. 5 and 6, when the size of the input image is 224×224×3, feature extraction may be performed on the input image 510 with reference to table 1, where each bottleck (bottleneck section) may include the first convolution layer 520, the depth-separable convolution layer 530, and the above-mentioned second convolution layer 540, the bottleck with a step size of 1 in table 1 is a residual network convolutional neural network, and the bottleck with a step size of 2 in table 1 is a conventional convolutional neural network.

Specifically, in performing feature extraction, if the step size in the bottleck is 1, the bottleck may be set as a residual convolutional neural network, and specifically, the input feature image and the obtained reference feature image may be added to obtain a new reference feature image via the addition layer 550.

In the present exemplary embodiment, the step S410 to the step S430 may be performed a plurality of times, the number of times may be 1, 2, 3 or more times, the number of times may be determined to be the optimal number of times according to experiments, and the present exemplary embodiment is not particularly limited.

In this exemplary embodiment, if the above steps S410 to S430 need to be performed multiple times, the reference feature image obtained in the previous step S430 may be substituted as the input image into the step S410 of the current execution step, so as to complete updating of the reference feature image. For example, the reference feature images obtained by performing steps S410 to S430 from the first time are used as the input images for performing steps S410 to S430 from the second time.

In this exemplary embodiment, the feature extraction of the input image may be performed with an optimal number of times, that is, 17 times, that is, the determined target is from the output layer to the 19 th layer. For example, if the size of the input image is 224×224×3, the size of the obtained reference image is 7×7×1280; if the size of the input image is 800×600×3, the size of the obtained reference image is 25×19×1280.

In this example embodiment, the mobile terminal can implement real-time document detection by performing the first feature extraction on the input image by using the mobile netv2 network, so that the model size and the running time can be reduced, and the model structure can be lightened, thereby facilitating the user to complete the extraction of the target image at the mobile terminal such as the mobile computer and the mobile phone.

In step S320, the second feature extraction is performed on the reference feature map to obtain target feature maps with the same number of vertices as the target image in the input image.

In this exemplary embodiment, the second feature extraction operation may be performed on the reference feature map to obtain the target feature map, and specifically, the target feature map may be obtained by performing the second feature extraction on the reference feature image by using a target convolution kernel, where the convolution kernel in the target convolution layer may be a 1×1 convolution kernel, or may be another type of convolution kernel, such as a 2×2 convolution kernel, a 3×3 convolution kernel, or the like, and is not specifically limited in this exemplary embodiment.

In this exemplary embodiment, the number of channels of the target convolution layer may be determined according to the number of vertices of the target image in the input image, for example, if the target image is an id card, the number of vertices of the target image is 4, and at this time, the number of channels of the target convolution layer may be set to 4; if the target image is a pentagonal poster, the number of vertices of the target image is 5, and at this time, the number of channels of the target convolution layer is set to 5.

In this exemplary embodiment, the details will be described by taking the example that the number of vertices of the target image is 4, that is, the number of channels of the target convolution layer is 4, and if the size of the obtained reference feature image is 25×19×1280, the size of the target feature image obtained after passing through the target convolution layer is 25×19×4, that is, the target feature image includes 4 target feature images each having a size of 25×19, which are respectively used to represent the feature images of 4 vertices of the target image.

In step S330, the feature values of each of the target feature graphs are normalized to obtain probability feature graphs corresponding to each of the target feature graphs one by one.

In the present exemplary embodiment, the feature values of the target feature map obtained as described above may be normalized, specifically, the feature values of the target feature map may be normalized by a softmax function (normalized exponential function), and the feature values of the target feature map may be normalized by a normalization method such as max-min normalization, Z-score normalization, or the like, which is not particularly limited in the present exemplary embodiment. Wherein the softmax function is specifically:

wherein n is the number of feature values in the target feature graph, and x _i And e is a natural constant, and is the i-th target characteristic value in the target characteristic diagram.

In this exemplary embodiment, the target feature map obtained by normalizing the feature values is taken as a probability feature map, where each target feature map corresponds to one probability feature map.

In step S340, a coordinate feature map corresponding to each of the probability feature maps is obtained according to each of the probability feature maps, and the target vertex coordinates of each vertex of the target image are determined using each of the probability feature maps and the corresponding coordinate feature map.

In this exemplary embodiment, the server may first obtain size information in the first coordinate direction and size information in the second coordinate direction of the probability feature map, where the size information may be pixel size information or physical size information, and is not specifically limited in this exemplary embodiment; and then, acquiring coordinate feature maps corresponding to the first coordinate direction coordinate feature map and the second coordinate direction of each probability feature map by utilizing a coordinate mapping relation according to the size information of the first coordinate direction and the size information of the second coordinate direction of the probability feature map.

Specifically, the above-described size information is exemplified as pixel size information. Firstly, acquiring size information of a probability feature map, which may include size information of a first coordinate direction and size information of a second coordinate direction, wherein the size information in the real-time mode of the example is a pixel size, for example, the size of the first coordinate direction is 5, and the size of a second coordinate direction line is 5, namely, the probability feature map is a feature map of 5×5; or the size 25 of the first coordinate direction and the size 19 of the second coordinate direction line, namely the probability feature map is 25×19, and then the coordinate feature map in the direction is calculated by using the upward coordinate mapping relation of each mode.

The coordinate mapping relationship may include a first coordinate direction coordinate mapping relationship and a second coordinate direction coordinate mapping relationship, the coordinate feature map of the first coordinate direction may be calculated by using the first coordinate direction mapping relationship, and the coordinate feature map of the second coordinate direction may be calculated by using the second coordinate direction coordinate mapping relationship.

In this example embodiment, the first coordinate direction coordinate mapping relationship is:

the second coordinate direction coordinate mapping relation is as follows:

where i=1, 2, … …, n, j=1, 2, … …, m, n and m represent size information of the first coordinate direction and size information of the second coordinate direction of the probability feature map, respectively.

The above-described coordinate feature map acquisition is described in detail below with reference to the accompanying drawings and a specific exemplary embodiment.

In the present exemplary embodiment, a detailed description will be given with reference to fig. 7, in which the probability feature map is 5×5, where n is 5 and m is also 5, and the first coordinate direction coordinate map and the second coordinate direction coordinate map are substituted with n and m, respectively, so that the first coordinate direction coordinate feature map shown in fig. 8 and the second coordinate direction coordinate feature map shown in fig. 9 can be obtained. Based on the first coordinate direction coordinate map and the second coordinate direction coordinate map, it can be known that the calculated coordinate feature map is identical to the probability feature map in size.

For example, when i=1 and j=1, i=1 and n=5 are substituted into the first coordinate direction coordinatesMapping relation, X can be calculated _1,1 -0.8, the value of the first row and the first column in the first coordinate direction coordinate feature map in fig. 8; when i=1 and j=1, substituting j=1 and m=5 into the second coordinate direction coordinate mapping relation, Y can be calculated _1,1 -0.8, i.e. the value of the first row and first column in the second coordinate direction coordinate feature map in fig. 9. The coordinate feature maps shown in fig. 8 and 9 can be obtained by taking different values for i and j, respectively.

In this exemplary embodiment, after the coordinate feature map is obtained, the probability feature map and the coordinate feature map may be weighted by coordinates, and the vertex relative coordinates may be obtained, and then the target vertex coordinates may be obtained by using a pixel mapping relationship according to the vertex relative coordinates and the size information of the input image in the first coordinate direction and the size information of the input image in the second coordinate direction.

Specifically, the probability feature map and the coordinate feature map in the first direction are used to calculate the relative coordinates of the vertices in the first direction, and the probability feature map and the coordinate feature map in the second direction are used to calculate the relative coordinates of the vertices in the second direction. In this example embodiment, the accuracy of extracting the coordinates of the target vertex is improved by fully utilizing each probability value in the plurality of probability feature maps, so that the accuracy of extracting the target image is improved.

In this exemplary embodiment, the vertex relative coordinates in the first direction may be obtained by performing dot multiplication between the matrices using the probability feature map and the coordinate features in the first direction, and the vertex relative coordinates in the second direction may be obtained by performing dot multiplication between the matrices using the probability feature map and the coordinate features in the second direction. For example, referring to fig. 7, 8 and 9, matrix dot multiplication is performed using the probability feature map shown in fig. 7 and the coordinate feature map in the first coordinate direction shown in fig. 8 to obtain the vertex relative coordinates in the first coordinate direction. Matrix dot multiplication is performed by using the probability feature map shown in fig. 7 and the coordinate feature map in the second coordinate direction shown in fig. 9 to obtain the vertex relative coordinates in the second coordinate direction. The value of the relative coordinates of the vertices in the first direction is calculated to be 0.1, and the value of the relative coordinates of the vertices in the second direction is calculated to be-0.44.

In the present exemplary embodiment, after the above-described vertex relative coordinates are obtained, the target vertex coordinates may be calculated using the vertex relative coordinates and the size information of the input image.

Specifically, the size information of the first coordinate direction and the size information of the second coordinate direction of the input image may be first acquired, then the pixel coordinate value of the target vertex coordinate in the first coordinate direction may be calculated using the first coordinate direction pixel mapping relationship, and the pixel coordinate value of the target vertex coordinate in the second coordinate direction may be calculated using the second coordinate direction pixel mapping relationship.

In this example embodiment, the first coordinate direction pixel mapping relationship is:

the second coordinate direction pixel mapping relation is as follows:

wherein x is a value in a first coordinate direction in the vertex relative coordinates, y is a value in a second coordinate direction in the vertex relative coordinates, p is a pixel coordinate value in the first coordinate direction of the target vertex coordinates, q is a pixel coordinate value in the first coordinate direction of the target vertex coordinates, a is size information in the first coordinate direction of the input image, and b is size information in the second coordinate direction of the input image.

For example, the calculated relative coordinate value of the vertex in the first coordinate direction is 0.1, that is, x=0.1, a is the size information of the input image in the first coordinate direction, and when the input image is fixed, the value of a is fixed, assuming a=244, the pixel coordinate value in the first coordinate direction of the target vertex coordinate can be obtained by substituting x=0.1, a=244 into the pixel mapping relationship in the first coordinate direction. Similarly, the calculated relative coordinate value of the vertex in the second coordinate direction is-0.44, that is, y= -0.44, assuming b=244, and substituting y= -0.44, b=244 into the pixel mapping relationship in the second coordinate direction, the pixel coordinate value in the second coordinate direction of the target vertex coordinate can be obtained. The target vertex coordinates may then be constructed using the pixel coordinate values in the first coordinate direction and the pixel coordinate values in the second coordinate direction.

In this exemplary embodiment, the above steps may be performed for each probability feature map, and since the number of obtained target feature maps is the same as the number of vertices, the number of obtained probability feature maps is also the same as the number of vertices, and therefore the number of obtained target vertex coordinates is also the same as the number of vertices, that is, the obtained plurality of target vertex coordinates respectively correspond to the plurality of vertices of the target image.

In step S350, the extraction of the target image is completed according to the target vertex coordinates.

In the present exemplary embodiment, after determining the plurality of target vertex coordinates, the contour of the target image may be determined according to the target vertex coordinates, and then the position of the target image in the input image may be determined by the contour, and the extraction of the target image may be completed.

In an exemplary embodiment, after obtaining the plurality of target vertex coordinates, the plurality of target vertex coordinates may be connected by a straight line to form the outline of the target image; the outline of the target image can also be formed by adopting a fitting curve mode, the position of the target image in the input image is determined according to the outline, and the extraction is completed.

In the present exemplary embodiment, after the extraction is completed according to the outline, the extracted target image is subjected to stretching adjustment, position adjustment, and color adjustment to obtain a more accurate target image.

The method of extracting the above-described target image will be described in detail with reference to fig. 10 by taking the target image as a document as an example.

In this exemplary embodiment, an input image 510 may be first obtained, where the input image includes a target image, that is, an image corresponding to a document, then the input image may be input to the MobileNetV2 network 1010 to perform a first feature extraction to obtain a reference feature image, then the reference feature image may be subjected to a second feature extraction by using the target convolution layer 1020 to obtain a target feature image, the target vertex coordinates of each vertex of the target image in each input image are obtained by using the target feature image, then the contour of the target image is determined by using each target vertex coordinate, extraction is performed on the target image according to the contour, and finally the extracted target image may be corrected to complete extraction of the target image.

In this example real-time manner, referring to fig. 11, when a user obtains an input image, the user may select a manner of obtaining the input image in the graphical user interface a of the mobile terminal 1100, or the user may take a photograph directly by using a device capable of executing the above-mentioned target detection method, that is, trigger the scan image 1110 control, or may, for example, directly transmit an input image from a local area, that is, trigger the upload local area 1120 control. After the input image 1130 is acquired, the extraction start 1140 control may be triggered, at this time, the gui a jumps to the gui B to obtain the vertex image 1150 of the target image, then the user may trigger the storage continuation 1160 control, at this time, the gui B jumps to the gui C, and acquires the target image according to the vertex image in the background, after stretching adjustment, the target image 1170 is displayed on the gui, at this time, the user may trigger the end and exit 1180 control to complete the extraction of the target image. And exits the graphical user interface for target image extraction.

In summary, in the present exemplary embodiment, the coordinate feature map is obtained by using the obtained probability feature map, and the coordinates of the target vertex are obtained by combining the coordinate feature map and the probability feature map, so that the information of the probability feature map is more comprehensively used by using the coordinate feature map, and the accuracy of extracting the target image is improved.

There is also provided in the present disclosure a new neural network training method for target image extraction, as shown with reference to fig. 12, the method may include the steps of:

step S1210, obtaining sample data, wherein the sample data comprises an input image and standard vertex coordinates of a target image in the input image;

step S1220, performing a first feature extraction on the input image to obtain a reference feature map;

step S1230, performing second feature extraction on the reference feature map to obtain target feature maps with the same number as the number of vertexes;

step S1240, performing normalization processing on the feature values of the target feature graphs to obtain probability feature graphs corresponding to the target feature graphs one by one;

step S1250, acquiring coordinate feature graphs corresponding to the probability feature graphs according to the probability feature graphs, and determining target vertex coordinates of each vertex of the target image by utilizing the probability feature graphs and the corresponding coordinate feature graphs;

Step S1260, updating parameters of the neural network according to the target vertex coordinates and the standard vertex coordinates.

Compared with the prior art, the coordinate feature map is obtained by utilizing the obtained probability feature map, the coordinate feature map and the probability feature map are combined to obtain the coordinates of the target vertex, the information of the probability feature map is more comprehensively used by utilizing the coordinate feature map, and the accuracy of the obtained neural network on the extraction of the target image is improved.

In step S1210, sample data is acquired, where the sample data includes an input image and standard vertex coordinates of a target image in the input image;

in this exemplary embodiment, a plurality of sets of sample data may be first acquired, where each set of sample data includes an input image and standard vertex coordinates of a target image in the input image, where the standard vertex coordinates of the target image may be considered as calibrated, and an image format in the input image may be a bmp format, or may be, for example, a jpg format, a png format, a tif format, a gif format, or the like, and is not specifically limited in this exemplary embodiment.

The number of vertices of the target image can be obtained while the vertex coordinates of the target image are determined, for example, when the target image is an identity card image, the number of vertices of the target image is 4, and thus four labeling vertex coordinates can be obtained. The number of vertices of the target image may vary depending on the target image, for example, when the target image is a poster, the number of vertices of the target may be three since the shape of the poster may include various kinds, for example, a triangle, a pentagon, and the like. Five, etc.

The specific steps of steps S1220 to S1250 have been described in detail in the above description of the target image extraction method, and the specific process of steps S1220 to S1250 may refer to the above description of the target image extraction method, so that the description thereof is omitted here.

In step S1260, the parameters of the neural network are updated according to the target vertex coordinates and the standard vertex coordinates.

In this exemplary embodiment, after the target vertex coordinates are acquired, a loss function adopted by the neural network may be determined first, then a loss function value is calculated using the target vertex coordinates and the standard vertex coordinates, and finally parameters of the neural network are updated using the loss function value.

In this example embodiment, an MSE loss function may be employed, and in particular, the loss function value may be calculated using the following equation:

in the method, in the process of the invention,the jth vertex coordinate of the document edge predicted for the model, (x) _j ，y _j ) The j-th vertex coordinate, which is the true document edge, N is the number of samples, i.e., the number of input images.

After the loss function value is calculated, a Momentum algorithm can be adopted to optimize the neural network, parameters of the neural network are adjusted, and the parameters with the minimum loss function value are used as final parameter values of the neural network, so that training of the neural network is completed.

It is noted that the above-described figures are merely schematic illustrations of processes involved in a method according to exemplary embodiments of the present disclosure, and are not intended to be limiting. It will be readily appreciated that the processes shown in the above figures do not indicate or limit the temporal order of these processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, for example, among a plurality of modules.

Further, referring to fig. 13, in this exemplary embodiment, there is further provided a neural network training device 1300 for target image extraction, including a data acquisition module 1310, a first feature extraction module 1320, a second feature extraction module 1330, a first image processing module 1340, a first coordinate determination module 1350, and a parameter updating module 1360. Wherein:

The data acquisition module 1310 may be configured to acquire sample data, where the sample data includes an input image and standard vertex coordinates of a target image in the input image, and determine a number of vertices of the target image in the input image;

the first feature extraction module 1320 may be configured to perform a first feature extraction on the input image to obtain a reference feature map;

the second feature extraction module 1330 may be configured to perform second feature extraction on the reference feature map to obtain target feature maps with the number of vertices of the target image being the same as that of vertices of the target image in the input image;

the first image processing module 1340 may be configured to normalize feature values of each of the target feature graphs to obtain probability feature graphs corresponding to each of the target feature graphs one to one;

the first coordinate determining module 1350 may be configured to obtain, according to each of the probability feature maps, a coordinate feature map corresponding to each of the probability feature maps, and determine, using each of the probability feature maps and the corresponding coordinate feature map, a target vertex coordinate of each vertex of the target image;

a parameter update module 1360 may be used to update parameters of the neural network based on the target vertex coordinates and the standard vertex coordinates.

Since each functional module of the neural network training device for target image extraction according to the exemplary embodiment of the present disclosure corresponds to a step of the above-described exemplary embodiment of the neural network training method for target image extraction, for details not disclosed in the embodiment of the apparatus of the present disclosure, please refer to the embodiment of the plug-in display method described in the present disclosure.

Still further, referring to fig. 14, in the embodiment of the present example, there is further provided a target image extraction apparatus 1400, including a third feature extraction module 1410, a fourth feature extraction module 1420, a second image processing module 1430, a second coordinate determination module 1440, and a target image extraction module 1450. Wherein:

the third feature extraction module 1410 may be configured to perform a first feature extraction on the input image to obtain a reference feature map;

the fourth feature extraction module 1420 may be configured to perform second feature extraction on the reference feature map to obtain target feature maps with the number of vertices of the target image being the same as that of vertices of the target image in the input image;

the second image processing module 1430 may be configured to normalize the feature values of each of the target feature images to obtain probability feature images corresponding to each of the target feature images one to one;

The second coordinate determining module 1440 may be configured to obtain, according to each of the probability feature maps, a coordinate feature map corresponding to each of the probability feature maps, and determine, using each of the probability feature maps and the corresponding coordinate feature map, a target vertex coordinate of each vertex of the target image;

the target image extraction module 1450 may be configured to update parameters of the neural network according to the target vertex coordinates and the standard vertex coordinates.

Since each functional module of the target image extraction device according to the exemplary embodiment of the present disclosure corresponds to a step of the exemplary embodiment of the target image extraction method described above, for details not disclosed in the embodiment of the device of the present disclosure, please refer to the embodiment of the plug-in display method described above in the present disclosure.

It should be noted that although in the above detailed description several modules or units of a device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit in accordance with embodiments of the present disclosure. Conversely, the features and functions of one module or unit described above may be further divided into a plurality of modules or units to be embodied.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any adaptations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

Claims

1. A target image extraction method, characterized by comprising:

performing first feature extraction on an input image to obtain a reference feature map;

completing the extraction of the target image according to the target vertex coordinates;

the obtaining, according to each probability feature map, a coordinate feature map corresponding to each probability feature map includes:

acquiring size information of the probability feature map in a first coordinate direction and size information of the probability feature map in a second coordinate direction;

and acquiring coordinate feature maps corresponding to the first coordinate direction coordinate feature map and the second coordinate direction by utilizing a coordinate mapping relation according to the size information of the first coordinate direction and the size information of the second coordinate direction of each probability feature map.

2. The method of claim 1, wherein the first feature extraction is performed on the input image to obtain a reference feature map; comprising the following steps:

acquiring an input characteristic image of the input image, and performing characteristic dimension-lifting expansion on the input characteristic image by using a first convolution layer to obtain a first characteristic image, wherein the first convolution layer is configured with a nonlinear activation function;

Performing feature extraction on the first feature map by using a depth-separable convolution layer to obtain a second feature map, wherein the depth-separable convolution layer is configured with a nonlinear activation function;

and performing dimension reduction processing on the second feature map by using a second convolution layer to obtain the reference feature map, wherein the second convolution layer is configured with a linear activation function.

3. The method according to claim 1, wherein the performing the second feature extraction on the reference feature map to obtain the target feature maps with the same number as the vertices includes:

and carrying out second feature extraction on the reference feature map by using a target convolution layer with the same number of channels as the number of vertexes to obtain target feature maps with the same number of vertexes as the number of target images in the input image.

4. The method of claim 1, wherein the coordinate mapping relationship comprises a first coordinate direction coordinate mapping relationship and a second coordinate direction coordinate mapping relationship;

the first coordinate direction coordinate mapping relation is as follows:

the second coordinate direction coordinate mapping relation is as follows:

5. The method of claim 1, wherein determining target vertex coordinates for each vertex of the target image using each of the probability feature maps and the corresponding coordinate feature map comprises:

carrying out coordinate weighting by utilizing each probability feature image and the corresponding coordinate feature image to obtain the vertex relative coordinates of each vertex of the target image;

and obtaining the coordinates of the target vertex by utilizing a pixel mapping relation according to the relative coordinates of the vertex, the size information of the input image in the first coordinate direction and the size information of the input image in the second coordinate direction.

6. The method of claim 5, wherein said weighting the coordinates using each of the probability feature maps and the corresponding coordinate feature map to obtain the vertex relative coordinates of each vertex of the target image comprises:

and performing matrix point multiplication by using the matrix of each probability feature map and the matrix of the corresponding coordinate feature map to obtain the vertex relative coordinates.

7. The method of claim 5, wherein the pixel map comprises a first coordinate direction pixel map and a second coordinate direction pixel map;

The first coordinate direction pixel mapping relation is as follows:

the second coordinate direction pixel mapping relation is as follows:

wherein x is a value in a first coordinate direction in the vertex relative coordinates, y is a value in a second coordinate direction in the vertex relative coordinates, p is a pixel coordinate value in the first coordinate direction of the target vertex coordinates, q is a pixel coordinate value in the first coordinate direction of the target vertex coordinates, a is size information in the first coordinate direction of the input image, and b is size information in the second coordinate direction of the input image;

and forming the target vertex coordinate by using the pixel coordinate value in the first coordinate direction and the pixel coordinate value obtained in the second coordinate direction.

8. The method of claim 1, wherein extracting the target image is accomplished according to the target vertex coordinates, comprising:

determining the outline of the target image according to the target vertex coordinates;

and determining the position of the target image in the input image according to the outline, and completing the extraction of the target image.

9. A neural network training method for target image extraction, comprising:

Extracting first features of the input image to obtain a reference feature map;

updating parameters of the neural network according to the target vertex coordinates and the standard vertex coordinates;

10. A neural network training device for target image extraction, comprising:

the data acquisition module is used for acquiring sample data, wherein the sample data comprises an input image and standard vertex coordinates of a target image in the input image;

the second feature extraction module is used for carrying out second feature extraction on the reference feature image to obtain target feature images, the number of which is the same as that of the vertexes of the target image in the input image; the method comprises the steps of carrying out a first treatment on the surface of the

the parameter updating module is used for updating parameters of the neural network according to the target vertex coordinates and the standard vertex coordinates;

The first coordinate determining module obtains coordinate feature maps corresponding to the probability feature maps according to the probability feature maps, and the first coordinate determining module comprises:

11. A target image extraction apparatus, characterized by comprising:

The target image extraction module is used for completing the extraction of the target image according to the target vertex coordinates;

the second coordinate determining module obtains coordinate feature maps corresponding to the probability feature maps according to the probability feature maps, and the second coordinate determining module comprises:

12. A computer readable medium on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the method as claimed in any one of claims 1 to 8 or in claim 9.

13. An electronic device, comprising:

a processor; and

a memory for storing executable instructions of the processor;

wherein the processor is configured to perform the method of any one of claims 1 to 8 or claim 9 via execution of the executable instructions.