CN112634174A - Image representation learning method and system - Google Patents
Image representation learning method and system Download PDFInfo
- Publication number
- CN112634174A CN112634174A CN202011632703.8A CN202011632703A CN112634174A CN 112634174 A CN112634174 A CN 112634174A CN 202011632703 A CN202011632703 A CN 202011632703A CN 112634174 A CN112634174 A CN 112634174A
- Authority
- CN
- China
- Prior art keywords
- image
- frame
- enhanced
- representation learning
- encoder
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 61
- 238000013507 mapping Methods 0.000 claims abstract description 13
- 238000004364 calculation method Methods 0.000 claims abstract description 8
- 238000013135 deep learning Methods 0.000 claims description 18
- 239000000284 extract Substances 0.000 claims description 3
- 238000012549 training Methods 0.000 abstract description 8
- 238000001514 detection method Methods 0.000 abstract description 7
- 230000011218 segmentation Effects 0.000 abstract description 6
- 239000010410 layer Substances 0.000 description 16
- 238000004891 communication Methods 0.000 description 10
- 230000006870 function Effects 0.000 description 9
- 230000008569 process Effects 0.000 description 8
- 238000006243 chemical reaction Methods 0.000 description 7
- 238000003384 imaging method Methods 0.000 description 5
- 238000012545 processing Methods 0.000 description 5
- 230000003068 static effect Effects 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 238000005516 engineering process Methods 0.000 description 4
- 238000013178 mathematical model Methods 0.000 description 4
- 239000011159 matrix material Substances 0.000 description 4
- 230000035945 sensitivity Effects 0.000 description 4
- 239000007787 solid Substances 0.000 description 4
- 238000013528 artificial neural network Methods 0.000 description 3
- 238000004590 computer program Methods 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 230000008901 benefit Effects 0.000 description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 239000003990 capacitor Substances 0.000 description 2
- 230000001364 causal effect Effects 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000011161 development Methods 0.000 description 2
- 230000018109 developmental process Effects 0.000 description 2
- 210000000887 face Anatomy 0.000 description 2
- 241000282412 Homo Species 0.000 description 1
- 230000009471 action Effects 0.000 description 1
- 238000004458 analytical method Methods 0.000 description 1
- 230000003416 augmentation Effects 0.000 description 1
- 230000006399 behavior Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013136 deep learning model Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000002349 favourable effect Effects 0.000 description 1
- 210000003128 head Anatomy 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000000877 morphologic effect Effects 0.000 description 1
- 230000001537 neural effect Effects 0.000 description 1
- 238000010422 painting Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000001902 propagating effect Effects 0.000 description 1
- 230000000306 recurrent effect Effects 0.000 description 1
- 238000012827 research and development Methods 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 238000005070 sampling Methods 0.000 description 1
- 238000004379 similarity theory Methods 0.000 description 1
- 239000000126 substance Substances 0.000 description 1
- 239000002344 surface layer Substances 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
- 230000000007 visual effect Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T5/00—Image enhancement or restoration
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/088—Non-supervised learning, e.g. competitive learning
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T7/00—Image analysis
- G06T7/10—Segmentation; Edge detection
- G06T7/181—Segmentation; Edge detection involving edge growing; involving edge linking
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Evolutionary Computation (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Computing Systems (AREA)
- General Engineering & Computer Science (AREA)
- Life Sciences & Earth Sciences (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Image Analysis (AREA)
Abstract
The application discloses an image representation learning method and system. The image representation learning method includes: an enhanced image acquisition step: obtaining an enhanced image of an original image; a characteristic mapping obtaining step: obtaining, by an encoder, a feature map of the enhanced image; a prediction step: predicting a frame of the enhanced image by using a frame regression network, and acquiring a predicted frame; a calculation step: and calculating the final loss of the real frame and the predicted frame, and updating the frame regression network and the encoder according to the final loss. The invention provides an image representation learning method and system, and the method improves the expression capability of a model by using a frame regression training method, so that the model can acquire more position information. Two enhancement modes are used for an original picture, the robustness of the model is enhanced, noise can be better suppressed, and meanwhile, the accuracy of detection and segmentation tasks is improved.
Description
Technical Field
The present application relates to the field of image representation technologies, and in particular, to an image representation learning method and system.
Background
Deep learning, whether classification, detection, or segmentation, when optimizing a particular task, typically loads a pre-trained model classified on imagenet and then migrates to a downstream task, but this training mode obscures location information. In addition to the pre-training models classified on imagenet, the current unsupervised method is also widely researched, and the expression capability of the pre-training models is improved by a contrast learning method. The other method for improving the expression capacity is provided, the expression capacity is learned by using a frame regression method on an imagenet data set, the position information is blurred by the classification task, more position information can be learned by using the frame regression, and the method is favorable for a downstream task which is sensitive to the position information.
Therefore, in view of the above current situation, the present invention provides an image representation learning method and system, and the present invention improves the expression capability of the model, particularly improves the sensitivity of the model to the position and the details, and enables the model to acquire more position information by using a frame regression instead of a classification training method. Two enhancement modes are used for the original picture, the regression frames are respectively used, the loss function is overlapped with the loss function, the robustness of the model is enhanced, noise can be better inhibited, and meanwhile the accuracy of detection and segmentation tasks is improved.
Disclosure of Invention
The embodiment of the application provides an image representation learning method and system, which are used for at least solving the problem of subjective factor influence in the related technology.
The invention provides an image representation learning method, which comprises the following steps:
an enhanced image acquisition step: obtaining an enhanced image of an original image;
a characteristic mapping obtaining step: obtaining, by an encoder, a feature map of the enhanced image;
a prediction step: predicting a frame of the enhanced image by using a frame regression network, and acquiring a predicted frame;
a calculation step: and calculating the final loss of the real frame and the predicted frame, and updating the frame regression network and the encoder according to the final loss.
In the above-mentioned image representation learning method, the enhanced image acquiring step may acquire at least two enhanced images of the original images by using a data enhancement method for each of the original images.
In the above image representation learning method, the feature mapping obtaining step includes using deep learning features to extract the encoder composed of the backbone network and the multilayer perceptron, and obtaining the feature mapping according to the encoder.
In the above image representation learning method, the predicting step includes predicting the bounding box of each of the enhanced images by using the box regression network, and obtaining the predicted bounding box.
In the above image representation learning method, the calculating step includes calculating the loss of the real frame of the original image and the predicted frame of each of the enhanced images respectively by using an intersection-to-parallel ratio loss, adding at least two of the losses to obtain a final loss, and updating the encoder and the frame regression network according to the final loss back propagation.
The present invention provides an image representation learning system to which the above image representation learning method is applied, the image representation learning system including:
an enhanced image acquisition unit: obtaining an enhanced image of an original image;
a feature mapping obtaining unit: obtaining, by an encoder, a feature map of the enhanced image;
a prediction unit: predicting a frame of the enhanced image by using a frame regression network to obtain a predicted frame;
a calculation unit: and calculating the final loss of the real frame and the predicted frame, and updating the frame regression network and the encoder according to the final loss.
In the above-described image representation learning system, for each of the original images, the enhanced image acquiring unit acquires at least two of the enhanced images of the original image by using a data enhancement method.
In the above image representation learning system, the feature mapping obtaining unit extracts the encoder composed of the backbone network and the multilayer perceptron by using the deep learning feature, and obtains the feature mapping according to the encoder.
In the above-described image representation learning system, the prediction unit may predict the bounding box of each of the enhanced images using the box regression network, and then obtain the predicted bounding box.
In the above image representation learning system, the calculating unit calculates the loss of the real frame of the original image and the predicted frame of each of the enhanced images respectively by using an intersection-to-parallel ratio loss, adds at least two of the losses to obtain a final loss, and updates the encoder and the frame regression network according to the final loss back propagation.
Compared with the related technology, the invention provides the image representation learning method and the image representation learning system, and the invention improves the expression capability of the model by using a frame regression instead of a classification training method, particularly improves the sensitivity of the model to the position and the detail, and enables the model to acquire more position information. Two enhancement modes are used for the original picture, the regression frames are respectively used, the loss function is overlapped with the loss function, the robustness of the model is enhanced, noise can be better inhibited, and meanwhile the accuracy of detection and segmentation tasks is improved.
The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the application.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a flow diagram of a method of image representation learning according to an embodiment of the present application;
FIG. 2 is a flow chart of an application according to an embodiment of the present application;
FIG. 3 is a schematic diagram of the architecture of the image representation learning system of the present invention;
fig. 4 is a frame diagram of an electronic device according to an embodiment of the present application.
Wherein the reference numerals are:
an enhanced image acquisition unit: 51;
a feature mapping obtaining unit: 52;
a prediction unit: 53;
a calculation unit: 54, a first electrode;
81: a processor;
82: a memory;
83: a communication interface;
80: a bus.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application.
It is obvious that the drawings in the following description are only examples or embodiments of the present application, and that it is also possible for a person skilled in the art to apply the present application to other similar contexts on the basis of these drawings without inventive effort. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of ordinary skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.
Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar words throughout this application are not to be construed as limiting in number, and may refer to the singular or the plural. The present application is directed to the use of the terms "including," "comprising," "having," and any variations thereof, which are intended to cover non-exclusive inclusions; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or elements, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Reference to "connected," "coupled," and the like in this application is not intended to be limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. The term "plurality" as referred to herein means two or more. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, for example, "A and/or B" may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. Reference herein to the terms "first," "second," "third," and the like, are merely to distinguish similar objects and do not denote a particular ordering for the objects.
The present invention is based on image representation learning by frame regression, which is briefly described below.
Deep learning is a class of machine learning algorithms: progressively higher level features are extracted from the original input using multiple layers. For example, in image processing, a lower layer may identify edges, while a higher layer may identify parts that are significant to humans, such as numbers/letters or faces. Most modern deep learning models are based on artificial neural networks, in particular Convolutional Neural Networks (CNNs), although they may also include propositional formulas or latent variables organized layer by layer in a deep generative model, such as nodes in deep belief networks and deep boltzmann machines. In deep learning, each level of learning converts its input data into a somewhat abstract and complex representation. In image recognition applications, the original input may be a matrix of pixels; the first generation surface layer can extract pixels and encode edges; the second layer may be organized and coded edge alignment; the third layer may encode the nose and eyes; and the fourth layer may recognize images containing human faces. Importantly, an in-depth learning process can learn which features are best placed at which level. (of course, this does not completely avoid the need for manual adjustment; for example, different numbers of layers and layer sizes may provide different degrees of abstraction.) the "depth" in "deep learning" refers to the number of layers of data conversion. More specifically, the deep learning system has a substantial Credit Allocation Path (CAP) depth. CAP is a conversion chain from input to output. CAP describes the potential causal relationship between input and output. For a feed-forward neural network, the depth of the CAP is the depth of the network, equal to the number of hidden layers plus 1 (since the output layer is also parameterized). For a recurrent neural network, where the signal may propagate through one layer more than once, the CAP depth may be infinite. There is no universally agreed depth threshold to distinguish between shallow and deep learning, but most researchers agree on CAP depth >2 in deep learning. A CAP depth of 2 has proven to be a generic approximator because it can model any function. In addition, more layers do not increase the functional approximation capability of the network. The depth model (CAP >2) is able to extract better features than the shallow model, so the extra layers help to learn features. Deep learning architectures are typically built with a greedy layer-by-layer approach. Deep learning helps to clean up these abstractions and to find out which features may improve performance. For supervised learning tasks, the deep learning approach avoids feature engineering by converting the data into a compact intermediate representation similar to the principal components and deriving a hierarchical structure with redundant representations eliminated. The deep learning algorithm may be applied to unsupervised learning tasks. This is an important benefit because unlabeled data is richer than labeled data. Examples of deep structures that can be trained unsupervised are neural history compressors and deep belief networks.
The frame regression is defined as a process of approximating the generated candidate frame with the marked real frame as the target in the target detection process. Since the box on an image can be uniquely determined by the center point coordinates (Xc, Yc) and the width, W, height, H, the process of this approximation can be modeled as a regression problem. By performing frame regression on the candidate frame, the finally detected target positioning can be closer to the true value, and the positioning accuracy is improved. The image is a generic name of video and picture, and is a substance reproduction perceived by human vision. The images can be acquired by optical devices such as cameras, mirrors, telescopes, microscopes, and the like; and the method can also be created manually, such as manually drawing images and the like. The image may be recorded, saved on paper media, film, etc., on media sensitive to light signals. The image designed professionally can be developed into visual language for communication between people, and a large amount of plane painting, three-dimensional sculpture and building in world art can be known. The image is also called image. Refers to a non-photographic imaging sensor imaging mode, and the essence of the imaging mode is the extension of a photographic image. The photographic image is generally referred to as optical photographic image and recorded on a photosensitive film, and is a passive remote sensing image. The image can receive the visible light, infrared, thermal infrared and microwave information from the ground object by optical-mechanical, optical-electronic or antenna scanning, and is recorded on the magnetic tape or on the photosensitive film by electro-optical conversion. It is wider in content and form than photo. The inclusion of "imagery" rather than "photographs" is the result of the development of aerial reconnaissance as remote sensing and photography (photographics) as imaging (imaging). The model is an object (the object is not equal to an object, is not limited to a solid body and a virtual body, and is not limited to a plane and a solid body) which forms an expression purpose for objectively describing a morphological structure through subjective consciousness by means of a solid body or a virtual body. Model ≠ Commodity. Any object is defined as a model in the research and development process before the commodity, and when the model and the specification are defined and matched with corresponding prices, the model is presented in a commodity form. In a broad sense: if one thing can change as another thing changes, then that thing is a model of another thing. The model is used for expressing the properties of different concepts, one concept can enable a plurality of models to be changed to different degrees, but the properties of one concept can be expressed by only a few models, so that the expression form of the properties of one concept can be changed by referring to different models. When a model is associated with an object, a framework with properties that determine how the model changes with the object is created. The model forming forms are classified into a physical model (a physical form concept physical object having a volume and a weight) and a virtual model (a form formed by a digital representation using electronic data and other effective representations). The physical model is divided into a static model (the physical state is relatively static, a power system without energy conversion is arranged in the physical model, the integrity of the structure and the body structure is not expressed under the external acting force), a power-assisted model (the static model is used as the basis, the self expression structure is not changed under the action of external kinetic energy, and the connection relation of an object structure is detected through physical movement) and a dynamic model (the kinetic energy can be generated through an energy conversion mode, the dynamic conversion system is arranged in the self structure, and the relative continuous physical movement form is expressed in the energy conversion process). The virtual model is divided into a virtual static model, a virtual dynamic model and a virtual fantasy model. Wherein the mathematical model is a type of model described in a mathematical language. The mathematical model may be one or a set of algebraic, differential, integral or statistical equations, or some suitable combination thereof, by which the interrelationship or causal relationship between the system variables is described quantitatively or qualitatively. In addition to mathematical models described by equations, there are also models described by other mathematical tools, such as algebra, geometry, topology, mathematical logic, etc. It should be noted that the mathematical model describes the behavior and characteristics of the system and not the actual structure of the system. The physical model is also called a physical model, and can be divided into a physical model and an analog model. Physical model: the real object which is manufactured according to the similarity theory and is scaled down (can be enlarged or the same as the original system in size) of the original system, such as an airplane model in a wind tunnel experiment, a hydraulic system experiment model, a building model, a ship model and the like. An aerospace model analogy model: in systems in different physical fields (mechanical, electrical, thermal, hydrodynamic, etc.) the respective variables sometimes follow the same law, from which analogous and analogical models with completely different physical meanings can be produced. For example, the pressure response of a pneumatic system consisting of a throttle valve and a gas capacitor under certain conditions has a similar law to the output voltage characteristic of a circuit consisting of a resistor and a capacitor, so that the pneumatic system can be simulated by a circuit which is relatively easy to experiment.
The image representation is the representation and storage way of the image information in the computer. The image table and the image operation together form an image model, which is an important component in the pattern analysis. The image may be represented at different levels of image information. The most basic physical image is obtained by extracting a two-dimensional gray-scale array (matrix) from a continuous image domain according to the rectangular grid sampling principle. It is also possible to express a two-dimensional gray matrix by a long amount by scanning the gray matrix in columns (or rows) and connecting the head of the next column (or row) to the tail of the previous column (or row). The image representation is used for a text or graphic symbology for expressing the model.
The invention provides an image representation learning method and system, and the method improves the expression capability of a model by using a frame regression instead of a classification training method, particularly improves the sensitivity of the model to positions and details, and enables the model to acquire more position information. Two enhancement modes are used for the original picture, the regression frames are respectively used, the loss function is overlapped with the loss function, the robustness of the model is enhanced, noise can be better inhibited, and meanwhile the accuracy of detection and segmentation tasks is improved.
The following describes embodiments of the present application with reference to image representation learning as an example.
Example one
The present embodiment provides an image representation learning method. Referring to fig. 1-2, fig. 1 is a flowchart of an image representation learning method according to an embodiment of the present application, as shown in fig. 1, the image representation learning method includes the following steps:
enhanced image acquisition step S1: obtaining an enhanced image of an original image;
feature map acquisition step S2: obtaining, by an encoder, a feature map of the enhanced image;
prediction step S3: predicting a frame of the enhanced image by using a frame regression network, and acquiring a predicted frame;
calculation step S4: and calculating the final loss of the real frame and the predicted frame, and updating the frame regression network and the encoder according to the final loss.
In an embodiment, the enhanced image obtaining step S1 includes, for each of the original images, obtaining at least two of the enhanced images of the original image using a data enhancement method.
In an embodiment, the step S2 of obtaining the feature map includes extracting the encoder, which is composed of a backbone network and a multilayer perceptron, by using deep learning features, and obtaining the feature map according to the encoder.
In an embodiment, the predicting step S3 includes predicting the bounding box of each of the enhanced images using the box regression network, and obtaining the predicted bounding box.
In an embodiment, the calculating step S4 includes calculating the loss of the real bounding box of the original image and the predicted bounding box of each of the enhanced images respectively by using an intersection ratio loss, adding at least two of the losses to obtain a final loss, and updating the encoder and the frame regression network according to the final loss back propagation.
Referring to fig. 2, fig. 2 is a flowchart illustrating an application according to an embodiment of the present application. Referring to fig. 2, the image representation learning method of the present invention is described below with reference to fig. 2 as an embodiment.
Step 1, for each image, using a data enhancement method, two enhanced images x1 and x2 are acquired.
And 2, acquiring the feature mapping of x1 and x2 by using an encoder (encoder) jointly composed of a deep learning feature extraction network backbone and a multilayer perceptron mlp.
And 3, predicting boxes p1 and p2 by using a box regression network h.
And 4, calculating the Loss of p1, p2 and groudtuthy by using IoU Loss respectively, wherein the Loss is finally added by the IoU Loss.
And 5, reversely propagating and updating an encoder (encoder) f and a frame regression network h.
In specific implementation, the pseudo code is as follows:
//f:backbone+projection mlp
//p:prediction
for x in loader://load a minibatch x with n samples
x1,x2=aug(x),aug(x)//random augmentation
z1,z2=f(x1),f(x2)//encoder
p1,p2=h(z1),h(z2)//box regression
L=IoULoss(p1,y)+IoULoss(p2,y)//loss
L.backward()//update f&h
therefore, the invention provides an image representation learning method and system, and the invention improves the expression capability of the model by using a frame regression instead of a classification training method, particularly improves the sensitivity of the model to the position and the details, and enables the model to acquire more position information. Two enhancement modes are used for the original picture, the regression frames are respectively used, the loss function is overlapped with the loss function, the robustness of the model is enhanced, noise can be better inhibited, and meanwhile the accuracy of detection and segmentation tasks is improved.
Example two
Referring to fig. 3, fig. 3 is a schematic structural diagram of an image representation learning system according to the present invention. As shown in fig. 3, the image representation learning system of the present invention is applied to the above-described image representation learning method, and includes:
the enhanced image acquisition unit 51: obtaining an enhanced image of an original image;
feature map acquisition unit 52: obtaining, by an encoder, a feature map of the enhanced image;
the prediction unit 53: predicting a frame of the enhanced image by using a frame regression network to obtain a predicted frame;
the calculation unit 54: and calculating the final loss of the real frame and the predicted frame, and updating the frame regression network and the encoder according to the final loss.
In the present embodiment, for each of the original images, the enhanced image acquiring unit 51 acquires at least two of the enhanced images of the original images using a data enhancement method.
In this embodiment, the feature mapping obtaining unit 52 uses deep learning features to extract the encoder composed of the backbone network and the multilayer perceptron, and obtains the feature mapping according to the encoder.
In this embodiment, the prediction unit 53 uses the frame regression network to predict the frame of each of the enhanced images, and obtains the predicted frame.
In this embodiment, the calculating unit 54 calculates the loss of the real frame of the original image and the predicted frame of each of the enhanced images respectively by using an intersection ratio loss, adds at least two losses to obtain a final loss, and updates the encoder and the frame regression network according to the final loss back propagation.
EXAMPLE III
Referring to fig. 4, this embodiment discloses a specific implementation of an electronic device. The electronic device may include a processor 81 and a memory 82 storing computer program instructions.
Specifically, the processor 81 may include a Central Processing Unit (CPU), or A Specific Integrated Circuit (ASIC), or may be configured to implement one or more Integrated circuits of the embodiments of the present Application.
The memory 82 may be used to store or cache various data files for processing and/or communication use, as well as possible computer program instructions executed by the processor 81.
The processor 81 implements any of the image representation learning methods in the above embodiments by reading and executing computer program instructions stored in the memory 82.
In some of these embodiments, the electronic device may also include a communication interface 83 and a bus 80. As shown in fig. 4, the processor 81, the memory 82, and the communication interface 83 are connected via the bus 80 to complete communication therebetween.
The communication interface 83 is used for implementing communication between modules, devices, units and/or equipment in the embodiment of the present application. The communication port 83 may also be implemented with other components such as: the data communication is carried out among external equipment, image/data acquisition equipment, a database, external storage, an image/data processing workstation and the like.
The bus 80 includes hardware, software, or both to couple the components of the electronic device to one another. Bus 80 includes, but is not limited to, at least one of the following: data Bus (Data Bus), Address Bus (Address Bus), Control Bus (Control Bus), Expansion Bus (Expansion Bus), and Local Bus (Local Bus). By way of example, and not limitation, Bus 80 may include an Accelerated Graphics Port (AGP) or other Graphics Bus, an Enhanced Industry Standard Architecture (EISA) Bus, a Front-Side Bus (FSB), a Hyper Transport (HT) Interconnect, an ISA (ISA) Bus, an InfiniBand (InfiniBand) Interconnect, a Low Pin Count (LPC) Bus, a memory Bus, a microchannel Architecture (MCA) Bus, a PCI (Peripheral Component Interconnect) Bus, a PCI-Express (PCI-X) Bus, a Serial Advanced Technology Attachment (SATA) Bus, a Video Electronics Bus (audio Electronics Association), abbreviated VLB) bus or other suitable bus or a combination of two or more of these. Bus 80 may include one or more buses, where appropriate. Although specific buses are described and shown in the embodiments of the application, any suitable buses or interconnects are contemplated by the application.
The electronic device may be connected to an image representation learning system to implement the methods described in connection with fig. 1-2.
The technical features of the embodiments described above may be arbitrarily combined, and for the sake of brevity, all possible combinations of the technical features in the embodiments described above are not described, but should be considered as being within the scope of the present specification as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent shall be subject to the appended claims.
Claims (10)
1. An image representation learning method, comprising:
an enhanced image acquisition step: obtaining an enhanced image of an original image;
a characteristic mapping obtaining step: obtaining, by an encoder, a feature map of the enhanced image;
a prediction step: predicting a frame of the enhanced image by using a frame regression network, and acquiring a predicted frame;
a calculation step: and calculating the final loss of the real frame and the predicted frame, and updating the frame regression network and the encoder according to the final loss.
2. The image representation learning method according to claim 1, wherein the enhanced image acquiring step includes acquiring, for each of the original images, at least two of the enhanced images of the original images using a data enhancement method.
3. The image representation learning method according to claim 1, wherein the feature map obtaining step includes extracting the encoder composed of a backbone network and a multi-layer perceptron together using deep learning features, and obtaining the feature map according to the encoder.
4. The image representation learning method according to claim 1, wherein the predicting step includes predicting the bounding box of each of the enhanced images using the box regression network and obtaining the predicted bounding box.
5. The image representation learning method of claim 4, wherein the calculating step comprises calculating the loss of the real frame of the original image and the predicted frame of each of the enhanced images respectively by using cross-over ratio loss, adding at least two of the losses to obtain a final loss, and updating the encoder and the frame regression network according to the final loss back propagation.
6. An image representation learning system adapted to the image representation learning method according to any one of claims 1 to 5, the image representation learning system comprising:
an enhanced image acquisition unit: obtaining an enhanced image of an original image;
a feature mapping obtaining unit: obtaining, by an encoder, a feature map of the enhanced image;
a prediction unit: predicting a frame of the enhanced image by using a frame regression network to obtain a predicted frame;
a calculation unit: and calculating the final loss of the real frame and the predicted frame, and updating the frame regression network and the encoder according to the final loss.
7. The image representation learning system according to claim 6, wherein the enhanced image acquiring unit acquires at least two of the enhanced images of the original images using a data enhancement method for each of the original images.
8. The image representation learning system according to claim 7, wherein the feature map obtaining unit extracts the encoder composed of a backbone network and a multilayer perceptron together using deep learning features, and obtains the feature map according to the encoder.
9. The image representation learning system according to claim 8, wherein the prediction unit predicts the bounding box of each of the enhanced images using the box regression network and obtains the predicted bounding box.
10. The image representation learning system according to claim 9, wherein the computing unit computes the loss of the real bounding box of the original image and the predicted bounding box of each of the enhanced images respectively by using an intersection-to-parallel ratio loss, adds at least two of the losses to obtain a final loss, and updates the encoder and the frame regression network according to the final loss back propagation.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011632703.8A CN112634174B (en) | 2020-12-31 | 2020-12-31 | Image representation learning method and system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011632703.8A CN112634174B (en) | 2020-12-31 | 2020-12-31 | Image representation learning method and system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112634174A true CN112634174A (en) | 2021-04-09 |
CN112634174B CN112634174B (en) | 2023-12-12 |
Family
ID=75289953
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011632703.8A Active CN112634174B (en) | 2020-12-31 | 2020-12-31 | Image representation learning method and system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112634174B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113569934A (en) * | 2021-07-20 | 2021-10-29 | 上海明略人工智能(集团)有限公司 | LOGO classification model construction method and system, electronic device and storage medium |
WO2024080699A1 (en) * | 2022-10-10 | 2024-04-18 | Samsung Electronics Co., Ltd. | Electronic device and method of low latency speech enhancement using autoregressive conditioning-based neural network model |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110287927A (en) * | 2019-07-01 | 2019-09-27 | 西安电子科技大学 | Based on the multiple dimensioned remote sensing image object detection method with context study of depth |
CN110503112A (en) * | 2019-08-27 | 2019-11-26 | 电子科技大学 | A kind of small target deteection of Enhanced feature study and recognition methods |
CN110782430A (en) * | 2019-09-29 | 2020-02-11 | 郑州金惠计算机系统工程有限公司 | Small target detection method and device, electronic equipment and storage medium |
CN111062953A (en) * | 2019-12-17 | 2020-04-24 | 北京化工大学 | Method for identifying parathyroid hyperplasia in ultrasonic image |
CN111612017A (en) * | 2020-07-07 | 2020-09-01 | 中国人民解放军国防科技大学 | Target detection method based on information enhancement |
CN111652140A (en) * | 2020-06-03 | 2020-09-11 | 广东小天才科技有限公司 | Method, device, equipment and medium for accurately segmenting questions based on deep learning |
CN111738231A (en) * | 2020-08-06 | 2020-10-02 | 腾讯科技(深圳)有限公司 | Target object detection method and device, computer equipment and storage medium |
CN111931617A (en) * | 2020-07-29 | 2020-11-13 | 中国工商银行股份有限公司 | Human eye image recognition method and device based on image processing and self-service terminal |
CN112132832A (en) * | 2020-08-21 | 2020-12-25 | 苏州浪潮智能科技有限公司 | Method, system, device and medium for enhancing image instance segmentation |
-
2020
- 2020-12-31 CN CN202011632703.8A patent/CN112634174B/en active Active
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110287927A (en) * | 2019-07-01 | 2019-09-27 | 西安电子科技大学 | Based on the multiple dimensioned remote sensing image object detection method with context study of depth |
CN110503112A (en) * | 2019-08-27 | 2019-11-26 | 电子科技大学 | A kind of small target deteection of Enhanced feature study and recognition methods |
CN110782430A (en) * | 2019-09-29 | 2020-02-11 | 郑州金惠计算机系统工程有限公司 | Small target detection method and device, electronic equipment and storage medium |
CN111062953A (en) * | 2019-12-17 | 2020-04-24 | 北京化工大学 | Method for identifying parathyroid hyperplasia in ultrasonic image |
CN111652140A (en) * | 2020-06-03 | 2020-09-11 | 广东小天才科技有限公司 | Method, device, equipment and medium for accurately segmenting questions based on deep learning |
CN111612017A (en) * | 2020-07-07 | 2020-09-01 | 中国人民解放军国防科技大学 | Target detection method based on information enhancement |
CN111931617A (en) * | 2020-07-29 | 2020-11-13 | 中国工商银行股份有限公司 | Human eye image recognition method and device based on image processing and self-service terminal |
CN111738231A (en) * | 2020-08-06 | 2020-10-02 | 腾讯科技(深圳)有限公司 | Target object detection method and device, computer equipment and storage medium |
CN112132832A (en) * | 2020-08-21 | 2020-12-25 | 苏州浪潮智能科技有限公司 | Method, system, device and medium for enhancing image instance segmentation |
Non-Patent Citations (2)
Title |
---|
KEUNDONG LEE ET AL: "An Intensive Study of Backbone and Architectures with Test Image Augmentation and Box Refinement for Object Detection and Segmentation", 《ICTC》, pages 673 - 677 * |
徐国标等: "基于YOLO改进算法的远程塔台运动目标检测", 《科学技术与工程》, vol. 19, no. 14, pages 377 - 383 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113569934A (en) * | 2021-07-20 | 2021-10-29 | 上海明略人工智能(集团)有限公司 | LOGO classification model construction method and system, electronic device and storage medium |
CN113569934B (en) * | 2021-07-20 | 2024-01-23 | 上海明略人工智能(集团)有限公司 | LOGO classification model construction method, LOGO classification model construction system, electronic equipment and storage medium |
WO2024080699A1 (en) * | 2022-10-10 | 2024-04-18 | Samsung Electronics Co., Ltd. | Electronic device and method of low latency speech enhancement using autoregressive conditioning-based neural network model |
Also Published As
Publication number | Publication date |
---|---|
CN112634174B (en) | 2023-12-12 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109886066B (en) | Rapid target detection method based on multi-scale and multi-layer feature fusion | |
CN110188705B (en) | Remote traffic sign detection and identification method suitable for vehicle-mounted system | |
Zhang et al. | Deep hierarchical guidance and regularization learning for end-to-end depth estimation | |
CN108734210B (en) | Object detection method based on cross-modal multi-scale feature fusion | |
CN110782420A (en) | Small target feature representation enhancement method based on deep learning | |
CN111461212A (en) | Compression method for point cloud target detection model | |
Ouyang et al. | Vehicle target detection in complex scenes based on YOLOv3 algorithm | |
CN115131797B (en) | Scene text detection method based on feature enhancement pyramid network | |
CN111368634B (en) | Human head detection method, system and storage medium based on neural network | |
dos Santos Rosa et al. | Sparse-to-continuous: Enhancing monocular depth estimation using occupancy maps | |
CN110807362A (en) | Image detection method and device and computer readable storage medium | |
CN112634174B (en) | Image representation learning method and system | |
CN112861915A (en) | Anchor-frame-free non-cooperative target detection method based on high-level semantic features | |
US20220237896A1 (en) | Method for training a model to be used for processing images by generating feature maps | |
Janalipour et al. | Evaluation of effectiveness of three fuzzy systems and three texture extraction methods for building damage detection from post-event LiDAR data | |
KR20230071052A (en) | Apparatus and method for image processing | |
CN116434303A (en) | Facial expression capturing method, device and medium based on multi-scale feature fusion | |
Sun et al. | Two-stage deep regression enhanced depth estimation from a single RGB image | |
Lowphansirikul et al. | 3D Semantic segmentation of large-scale point-clouds in urban areas using deep learning | |
CN116310850A (en) | Remote sensing image target detection method based on improved RetinaNet | |
CN116486393A (en) | Scene text detection method based on image segmentation | |
CN116977872A (en) | CNN+ transducer remote sensing image detection method | |
CN116844032A (en) | Target detection and identification method, device, equipment and medium in marine environment | |
CN110675311A (en) | Sketch generation method and device under sketch order constraint and storage medium | |
CN114511785A (en) | Remote sensing image cloud detection method and system based on bottleneck attention module |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |