WO2019088657A1 - Procédé et dispositif de compression d'image et de réseau neuronal à l'aide d'une variable cachée - Google Patents

Procédé et dispositif de compression d'image et de réseau neuronal à l'aide d'une variable cachée Download PDF

Info

Publication number
WO2019088657A1
WO2019088657A1 PCT/KR2018/013013 KR2018013013W WO2019088657A1 WO 2019088657 A1 WO2019088657 A1 WO 2019088657A1 KR 2018013013 W KR2018013013 W KR 2018013013W WO 2019088657 A1 WO2019088657 A1 WO 2019088657A1
Authority
WO
WIPO (PCT)
Prior art keywords
components
hidden variable
neural network
variable
hidden
Prior art date
Application number
PCT/KR2018/013013
Other languages
English (en)
Korean (ko)
Inventor
이주영
조승현
김연희
석진욱
임웅
김종호
이대열
정세윤
김휘용
최진수
Original Assignee
한국전자통신연구원
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 한국전자통신연구원 filed Critical 한국전자통신연구원
Priority to US16/760,609 priority Critical patent/US11477468B2/en
Priority claimed from KR1020180130680A external-priority patent/KR102285064B1/ko
Publication of WO2019088657A1 publication Critical patent/WO2019088657A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/124Quantisation
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/102Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or selection affected or controlled by the adaptive coding
    • H04N19/13Adaptive entropy coding, e.g. adaptive variable length coding [AVLC] or context adaptive binary arithmetic coding [CABAC]
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N19/00Methods or arrangements for coding, decoding, compressing or decompressing digital video signals
    • H04N19/10Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding
    • H04N19/134Methods or arrangements for coding, decoding, compressing or decompressing digital video signals using adaptive coding characterised by the element, parameter or criterion affecting or controlling the adaptive coding
    • H04N19/136Incoming video signal characteristics or properties

Definitions

  • the following embodiments relate to a method and apparatus for image compression, and more particularly, to a method and apparatus for image and neural network compression using hidden variables.
  • High-efficiency video encoding and / or decoding techniques are required to solve these problems that occur as images become ultra-high resolution and high quality.
  • the neural network for image compression based on the auto encoder consists of an encoder neural network and a decoder neural network.
  • the encoder neural network receives the image as input and generates a lower level of concealment variable called a bottleneck.
  • the decoder neural network receives the hidden variable as an input and reconstructs the image using the received hidden variable.
  • Neural network pruning is a neural network compression method that speeds up the execution speed of a network by reducing the size of parameters. It is a technique that allows the performance of the neural network to be maintained as much as possible even when the size of the parameters is reduced.
  • the complexity and performance of the neural network have a trade-off relationship with each other, and the ratio of required performance to complexity may vary depending on the application purpose and environment of the neural network.
  • the rate and distortion that is, the required bit amount and the image quality of the reconstructed image
  • the neural network pruning method can be used to derive neural networks of various performance and complexity from one network.
  • One embodiment may provide an apparatus and method for performing encoding and / or decoding on an image using an ordered hidden variable.
  • One embodiment may provide an apparatus and method for performing neural network pruning using ordered hidden variables.
  • obtaining a concealment variable obtaining a concealment variable; And performing a selection or processing associated with the hidden variable, wherein a plurality of components of the hidden variable are arranged in a specified order.
  • the plurality of components may be arranged in order of importance.
  • the hidden variable may be obtained through a pre-learned encoder neural network.
  • Losses in the learning of the neural network may include reconstruction losses and rare penalties.
  • the rare penalty may be a weighted sum of values of a plurality of components of the hidden variable.
  • the weight given to each component of the plurality of components may be related to the frequency with which feature information represented by each component is used in input images used for learning.
  • only the element of the component of the upper index of a part of the plurality of components of the hidden variable may be activated for each learning step of the plurality of learning steps.
  • the number of components of the upper index of the activated part may be determined at random.
  • the loss in learning of the neural network may be reconstruction loss.
  • the probability that each component influences the magnitude of the reconstruction loss can be adjusted by adjusting the activation of each component of the plurality of components of the hidden variable.
  • the selection may be to remove information of insignificance in the reconstruction of the image from the hidden variable.
  • the processing may be to reduce the amount of data for the hidden variable by removing a specified component of the plurality of components of the hidden variable.
  • the specified component may be one or more components having the lowest importance.
  • the processing may be a quantization for the hidden variable.
  • the plurality of components may be arranged in order of importance.
  • the processing method may further include generating the concealment variable using the concealed variable generation information.
  • the reconstructed image may be generated using only feature information of some of the plurality of components of the hidden variable.
  • the ranking of each component of the plurality of components may be determined based on the component index of each component.
  • a computer-readable recording medium storing a bitstream is provided.
  • the bitstream may include hidden variable generation information.
  • a concealment variable may be generated using the concealment variable information.
  • the reconstructed image can be generated using the decoder neural network to which the hidden variable is applied.
  • the plurality of components of the hidden variable may be arranged in a specified order.
  • An apparatus and method for performing encoding and / or decoding on an image using an ordered hidden variable are provided.
  • An apparatus and method for performing neural network pruning using ordered hidden variables are provided.
  • FIG. 1 illustrates a relationship between an input image, a restored image, and a concealment parameter according to an exemplary embodiment of the present invention.
  • FIG. 2 is a structural diagram of an encoder according to an embodiment.
  • FIG. 3 is a structural diagram of a decoder according to an embodiment.
  • FIG. 4 is a flow chart of an encoding method according to one embodiment.
  • FIG. 5 is a flow chart of a decoding method according to one embodiment.
  • FIG. 6 shows a learning method for sorting hidden variables according to an embodiment in order of importance.
  • FIG. 8 shows a learning method for sorting hidden variables according to an embodiment in order of importance.
  • FIG. 9 illustrates an electronic device implementing an encoder in accordance with one embodiment.
  • FIG. 10 illustrates an electronic device implementing a decoder in accordance with one embodiment.
  • FIG. 11 shows a neural network pruning method according to an embodiment.
  • FIG. 12 illustrates a method of providing various performance neural networks based on a learned neural network to which a neural network pruning method according to an exemplary embodiment is applied.
  • first and second may be used to describe various components, but the components should not be limited by the terms above. The above terms are used to distinguish one component from another.
  • the first component may be referred to as a second component, and similarly, the second component may also be referred to as a first component.
  • each component is listed as each component for convenience of explanation. For example, at least two of the components may be combined into a single component. Also, one component can be divided into a plurality of components. The integrated embodiments and the separate embodiments of each of these components are also included in the scope of the right without departing from the essence.
  • components are not essential components to perform essential functions, but may be optional components only to improve performance.
  • Embodiments may be implemented only with components that are essential to implementing the essentials of the embodiments, and structures within which the optional components are excluded, such as, for example, components used only for performance enhancement, are also included in the scope of the right.
  • DCT discrete cosine transform
  • Embodiments describe a transformation method that uses artificial neural networks based on convolutional auto-encoders to provide better energy condensation performance over conventional 2D DCT. This conversion method can be called a transformative autoencoder.
  • FIG. 1 illustrates a relationship between an input image, a restored image, and a concealment parameter according to an exemplary embodiment of the present invention.
  • the auto-encoder may be an artificial neural network that converts the input image data into a hidden variable space. As the coefficients of the transformed hidden variable have energy condensation properties, the image information can be expressed with fewer components due to the coefficients of the transformed hidden variable.
  • the component of the hidden variable may be one value constituting the hidden variable.
  • the components of the hidden variable may be a set of values obtained from the same feature filter. If the component of the hidden variable is a set, the index identifying the component may be an index to identify a particular set from the sets. For example, if the neural network is constructed using a convolution layer, the component may denote the channel of the hidden variable, and the component index may denote the channel index.
  • Fig. 1 the hiding parameters of the auto encoder are shown. Hidden variables can be sorted in order of importance.
  • the encoder neural network can receive the input image and extract feature information of the input image.
  • the encoder neural network can construct hidden variables based on extracted feature information.
  • the decoder neural network can receive the hidden variable and reconstruct the image using the received hidden variable.
  • the decoder neural network can output the reconstructed image.
  • the concealment variable can be composed of N components.
  • Each component constituting the concealment variable having N components can represent unique feature information on the input image.
  • Feature information can represent various characteristics of the input image such as brightness, color and shape learned through the encoder neural network.
  • the lower the index of the component the more important the feature information is in reconstructing the image.
  • the characteristic information of a low frequency component considered to be important in a human visual system (HVS) can be represented by a component having a low index.
  • the higher the index of the component the more the feature information of the high frequency component can be expressed. That is to say, a component having a low index can be regarded as a higher component.
  • the auto encoder can concentrate the above information on higher-order components based on the index of the hidden variable as the information that greatly influences reconstruction of the image.
  • the sorted feature information can be learned variously according to the importance.
  • FIG. 2 is a structural diagram of an encoder according to an embodiment.
  • the encoder 200 can receive an input image and can generate a bitstream.
  • the encoder 200 may include a feature extraction unit 210, a hidden variable selection unit 220, and an encoding unit 230.
  • FIG. 3 is a structural diagram of a decoder according to an embodiment.
  • the decoder 300 can receive the bitstream and generate reconstructed images.
  • the decoder 300 may include a decoding unit 310, a hidden variable restoring unit 320, and an image reconstructing unit 330.
  • decoding unit 310 Functions and operations of the decoding unit 310, the hidden variable restoring unit 320, and the image reconstructing unit 330 will be described below with reference to FIG. 5 and the like.
  • FIG. 4 is a flow chart of an encoding method according to one embodiment.
  • the encoder 200 can receive an input image, and can perform encoding on the input image to generate an encoded bitstream.
  • the feature extraction unit 210 may obtain the concealment variable.
  • the encoder 200 may include an encoder neural network. Alternatively, the encoder 200 may operate the learning of the encoder neural network. The encoder 200 can perform learning of the encoder neural network using the input image. The hidden variable of the encoder neural network can be constructed according to the learning of the encoder neural network.
  • the feature extraction unit 210 may acquire the concealment variable through the previously learned encoder neural network.
  • the obtained hidden variables may be arranged in a specified order.
  • the concealment variable may have N components.
  • a hidden variable arranged in a specified order may mean that the N components of the hidden variable are arranged in a specified order.
  • sorting hidden variables in a specified order may mean sorting the N components of the hidden variable in a specified order.
  • the specified order may be an order of importance.
  • the feature extraction unit 210 may obtain hidden variables arranged in order of importance through the previously-learned encoder neural network.
  • the hidden variable selector 220 may perform a selection or processing associated with the hidden variable.
  • the hidden variable selecting unit 220 may generate hidden variable generating information by performing selection or processing related to the hidden variable.
  • the secret variable creation information may be information used to generate the secret variable to which the above selection or processing is applied.
  • the format of the hiding variable generation information may be different from that of the hidden variable.
  • the hidden variable selecting unit 220 may remove information having low importance in reconstructing an image from the hidden variables.
  • information of high importance can be selectively extracted from the hidden variables.
  • the hidden variable selector 220 may reduce the amount of data for the hidden variable by removing a specified component among the N components of the hidden variable.
  • the specified component may be a component with low importance.
  • the hidden variable selecting unit 220 may remove M components in order of decreasing significance among the N components of the hidden variable. M may be an integer less than N.
  • the hidden variable selector 220 may remove one or more components having the lowest significance among the N components of the hidden variable.
  • hidden variables arranged in order of importance may have an energy compaction property.
  • the hidden variable selecting unit 220 can remove the low energy value from the hidden variable by performing quantization on the hidden variable.
  • the encoding unit 230 may perform entropy encoding.
  • the encoding unit 230 may generate an encoded bitstream by performing entropy encoding on the output from the hidden variable selecting unit 220.
  • the output from the hidden variable selecting unit 220 may include hidden variable generating information.
  • the encoded bitstream may include information of an encoded image.
  • Step 430 may be optional.
  • FIG. 5 is a flow chart of a decoding method according to one embodiment.
  • the decoder 300 may receive the encoded bit stream and may perform decoding using the encoded bit stream to generate a reconstructed image.
  • the decoding unit 310 may generate an input to the hidden variable restoring unit 320 by performing entropy decoding on the encoded bitstream.
  • the input to the hidden variable restoring unit 320 may include hidden variable generating information.
  • Step 510 may be optional.
  • the hidden variable restoring unit 320 may generate the hidden variable using the hidden variable generating information.
  • the generated hidden variables may be hidden variables arranged in a specified order.
  • step 420 by selection or processing associated with the hidden variable, hidden variable creation information having a format different from that of the hidden variable may be generated.
  • the selection or processing may include the removal of components and the process of quantization.
  • the creation of a hidden variable using the hidden variable creation information may include initializing or recovering the removed element of the hidden variable.
  • the creation of the hidden variable may include returning the form of the quantized hidden variable back to its pre-quantization form.
  • the hidden variable restoring unit 320 can generate hidden variables that can be interpreted by the decoder neural network of the decoder 300 using the hidden variable information. For example, the hidden variable creation information may be changed to the hidden variable form by the hidden variable restoring unit 320.
  • the image reconstructing unit 330 may apply the reconstructed concealed variable to the decoder neural network, and generate the reconstructed image using the decoder neural network to which the reconstructed concealment variable is applied.
  • the reconstructed image can be as similar as the input image.
  • FIG. 6 shows a learning method for sorting hidden variables according to an embodiment in order of importance.
  • a conventional auto encoder can perform learning of a neural network based on an error between an input image and a reconstructed image (i.e., reconstruction loss).
  • the feature information included in the components of the hidden variable may not be sorted in order of importance.
  • the component including the feature information having a large influence on the image quality of the reconstructed image and the component including the feature information having the small influence are randomly arranged .
  • the influence of the sorting of the hidden variables can be clearly revealed when the reconstructed image is generated using only the feature information of some of the components of the hidden variable.
  • the reconstruction of the reconstructed image may be performed using only the components of the upper portion of the plurality of components of the hidden variable.
  • the plurality of components may be N pieces.
  • the rank (i.e., upper and lower) of each component of the plurality of components can be determined based on the component index of the component. For example, the lower the component index of a component, the higher the component.
  • the quality of the reconstructed image generated using the sorted hidden variable and the unaligned hidden variable can be almost the same.
  • the quality of the reconstructed image may be higher than that of the reconstructed image generated using the unaligned concealment variable.
  • the quality of the reconstructed image generated using the sorted hidden variable and the reconstructed image generated using the unaligned hidden variable The difference between the qualities can be larger.
  • the quality of the reconstructed image may appear to increase linearly with the number of components used in the reconstruction. At this time, in order to obtain a reconstructed image of good quality, the entirety of the components of the hidden variable may have to be used.
  • Fig. 6 a learning method for sorting the plurality of components of the concealment variable in order of importance is shown.
  • the loss in learning of the neural network may include reconstruction loss and a differential sparsity penalty.
  • losses may include reconstruction losses and differential rare penalties.
  • the reconstruction loss may be an error between the input image and the reconstructed image.
  • the rare penalty may be a weighted-sum of the values of the plurality of components of the hidden variable Z. That is to say, different weights can be assigned to the plurality of components of the confidential variable Z, respectively, and values can be obtained by respectively multiplying the values and weights of the plurality of components. The sum of the obtained values can be used as a rare penalty.
  • the weight assigned to each component of the plurality of components may be related to the frequency with which the feature information represented by each component is used in the input images used for learning.
  • the weight assigned to each component of the plurality of components may correspond to the frequency with which the feature information represented by each component is used in the input images used for learning.
  • the weight assigned to each component of the plurality of components may be proportional to the frequency with which the feature information represented by each component is used in the input images used for the learning.
  • the neural network can be learned so that the component to which the weight is applied can express feature information frequently used for various input images. Also, as the value of the differential weight is higher, the neural network can be learned so that the component to which the weight is applied can express characteristic information rarely used for various input images.
  • FIG. 8 shows a learning method for sorting hidden variables according to an embodiment in order of importance.
  • the loss in the learning of the neural network of the embodiment may be a reconstruction loss.
  • the auto encoder can perform learning.
  • the auto-encoder may be at least a part of the encoder 200 or at least a part of the decoder 300.
  • the auto-encoder may be included in the feature extraction unit 210 or the decoding unit 310.
  • the auto encoder may be a device separate from the encoder 200 and the decoder 300.
  • the auto-encoder can transmit the results of the learning to the encoder 200 and the decoder 300. [ The results of the learning can include hidden variables.
  • the loss function of learning can be calculated using the l1 distance between the patch of the input image and the patch of the reconstructed image.
  • the components of the upper index of any part of all the components of the hidden variable Z are activated, and the remaining components can be deactivated. Also, when one level of learning is performed, learning can be performed only on the neural network weight values connected to the activated components.
  • learning is performed by N learning steps, and it is shown that only a certain number of upper index components are activated in each step.
  • the component located below is shown to be a component of the upper index.
  • higher-order components can participate in learning at a higher frequency
  • lower-order components can participate in lower learning.
  • the probability that each component influences the magnitude of the reconstruction loss can be adjusted by adjusting the activation of each component of the plurality of components of the concealed variable.
  • the vector element of the component of the upper index of the hidden variable can influence the magnitude of the reconstruction loss with high probability. Therefore, the vector element of the component of the upper index of the hidden variable can express relatively more important feature information. On the other hand, the vector element of the component of the upper index of the hidden variable can influence the size of the reconstruction loss with a low probability, and can express relatively less important feature information.
  • a component having a higher index among the components of the hidden parameter of the auto encoder may contain more important information in the reconstruction of the image.
  • the number of higher-order components involved in the reconstruction of the image among the N components of the hidden variable may be defined using the random variable M. Even if any upper M components are used, the corresponding M components may contain the maximum information required for reconstruction.
  • the auto encoder can consider the values of the remaining components except for the upper components as '0' in reconstructing the image. In this case, the component set to '0' may not affect reconstruction of the image. Therefore, if the elements of the array are used, the auto-encoder can be regarded as an auto-encoder whose total number of elements of the hidden variable is m.
  • the probability distribution of the auto-encoder using a concealment variable with m components may be p ( x
  • x , ? , m ) Can be regarded as an expected value of. Can be expressed as Equation 1 below.
  • Lt; EMI ID 1.0 >
  • Equation 1 p ( x ) can mean the distribution of the input image x .
  • p (m ) can be a prior probability of the number of upper components used in reconstructing an image.
  • P for each of the m order to maximize (x
  • the learning process described below may be a process of optimizing the parameter [theta ].
  • the frequency with which each component of the hidden variable participates in the school can be set differentially.
  • m the number of components participating in learning, can be sampled according to the prior probability of Equation 2 below for each learning step, and only the top m components of the components of the hidden variable can participate in the learning have.
  • the first ingredient that has to contain the most important information is always able to participate in learning, and as the index of the ingredient increases, the frequency with which the ingredient participates in the learning process may be lower.
  • an element-wise multiplication of binary masks and concealment variables consisting of m consecutive ones and Nm consecutive zeros can be performed during learning .
  • the total number N of the components of the hidden variable is shown to be 16 for the simplicity of the notation.
  • a new m can be sampled and a sampled m can be used for learning.
  • the magnitude of the effect of the upper component of the hidden variable on the reconstructed image and the magnitude of the effect of the sub-component of the hidden variable on the reconstructed image may be different from each other. That is to say, the higher the component of the concealed variable is, the larger the influence of the above components on the reconstructed image can be.
  • Equation 3 The effect of the components of the hidden vector on the reconstructed image can be expressed as Equation 3 below.
  • Equation 3 g i can mean the degree of influence of z i , the ith component of the hidden variable z , on the output of the auto-encoder.
  • c is May be the number of channels.
  • w Can be defined as the width.
  • h is Quot; height "
  • the degree to which each component of the hidden variable affects video reconstruction may be different.
  • the inter-element multiplication of z and g can be used to perform normalization on the hidden variable signal based on these different degrees.
  • the final transformed signal can be generated by element-by-element multiplication.
  • the auto-encoder may include a transformed neural network and an inverse transformed neural network.
  • the transformed neural network transforms an 8x8 red, green, blue (RGB) image patch x into 192 hidden variables z using the convolution layer and the fully connected (FC) layer .
  • the convolution layer can have five layers.
  • the transformed neural network can obtain the final transformed signal through the inter-element multiplication of g and z .
  • g can represent the degree to which the components affect image reconstruction.
  • the inverse transform network is a process for converting the reconstructed image patch for the image patch x through the reverse process of the conversion process by the above- Lt; / RTI >
  • the process of generating the inter-element multiplication of z and g may be omitted in the learning process.
  • the binary masking operation for the hidden variable z can be performed only in the learning process, and may not be applied to the test for the auto encoder.
  • FIG. 9 illustrates an electronic device implementing an encoder in accordance with one embodiment.
  • the encoder 200 may be implemented as the electronic device 900 shown in Fig.
  • the electronic device 900 may be a general purpose computer system that operates as an encoder 200.
  • the electronic device 900 may include at least some of a processing unit 910, a communication unit 920, a memory 930, a storage 940, and a bus 990.
  • Components of the electronic device 900 such as the processing unit 910, the communication unit 920, the memory 930, and the storage 940, may communicate with each other via the bus 990.
  • the processing unit 910 may be a semiconductor device that executes the processing instructions stored in the memory 930 or the storage 940.
  • the processing unit 910 may be at least one hardware processor.
  • the processing unit 910 can process the job required for the operation of the electronic device 900. [ The processing unit 910 may execute the code of the operation or step of the processing unit 910 described in the embodiments.
  • the processing unit 910 may perform generation, storage, and output of information, and may perform operations of the steps performed in the other electronic device 900.
  • the communication unit 920 can be connected to the network 999. May receive data or information required for operation of electronic device 900 and may transmit data or information required for operation of electronic device 900. [ The communication unit 920 can transmit data to another device via the network 999 and can receive data from another device.
  • the communication unit 920 may be a network chip or a port.
  • Memory 930 and storage 940 may be various types of volatile or non-volatile storage media.
  • memory 930 may include at least one of ROM (R) 931 and RAM (RAM) 932.
  • the storage 940 may include internal storage media such as RAM, flash memory and hard disk, and may include removable storage media such as memory cards and the like.
  • the function or operation of the electronic device 900 may be performed as the processing unit 910 executes at least one program module.
  • the memory 930 and / or the storage 940 may store at least one program module.
  • At least one program module may be configured to be executed by the processing unit 910.
  • At least some of the feature extraction unit 210, the hidden variable selection unit 220 and the encoding unit 230 of the encoder 200 may be at least one program module.
  • the program modules may be included in the electronic device 900 in the form of an operating system, an application module, a library, and other program modules, and may be physically stored on various well-known storage devices. Also, at least some of these program modules may be stored in a remote storage device capable of communicating with the electronic device 900.
  • These program modules may be implemented as routines, subroutines, programs, and / or programs for performing specific operations or tasks according to the embodiments or for executing specific abstract data types. and may include, but is not limited to, an object, a component, and a data structure.
  • the electronic device 900 may further include a user interface (UI) input device 950 and a UI output device 960.
  • the UI input device 950 may receive the user's input required for operation of the electronic device 900.
  • the UI output device 960 can output information or data according to the operation of the electronic device 900.
  • FIG. 10 illustrates an electronic device implementing a decoder in accordance with one embodiment.
  • the decoder 300 may be implemented as the electronic device 1000 shown in FIG.
  • the electronic device 1000 may be a general purpose computer system that operates as a decoder 300.
  • the electronic device 1000 may include at least some of the processing unit 1010, the communication unit 1020, the memory 1030, the storage 1040, and the bus 1090. [ Components of the electronic device 1000 such as the processing unit 1010, the communication unit 1020, the memory 1030 and the storage 1040 can communicate with each other via the bus 1090. [
  • the processing unit 1010 may be a semiconductor device that executes the processing instructions stored in the memory 1030 or the storage 1040. [
  • the processing unit 1010 may be at least one hardware processor.
  • the processing unit 1010 can process a job required for the operation of the electronic device 1000. [ The processing unit 1010 may execute the code of the operation or step of the processing unit 1010 described in the embodiments.
  • the processing unit 1010 may perform generation, storage, and output of information, and may perform the operations of the steps performed in the other electronic device 1000.
  • the communication unit 1020 can be connected to the network 1099. May receive data or information required for operation of electronic device 1000 and may transmit data or information required for operation of electronic device 1000. [ The communication unit 1020 can transmit data to another device via the network 1099 and can receive data from another device.
  • the communication unit 1020 may be a network chip or a port.
  • Memory 1030 and storage 1040 can be various types of volatile or non-volatile storage media.
  • the memory 1030 may include at least one of a ROM (ROM) 1031 and a RAM (RAM)
  • the storage 1040 may include internal storage media such as RAM, flash memory and a hard disk, and may include removable storage media such as memory cards and the like.
  • the function or operation of the electronic device 1000 may be performed as the processing unit 1010 executes at least one program module.
  • Memory 1030 and / or storage 1040 may store at least one program module.
  • the at least one program module may be configured to be executed by the processing unit 1010.
  • At least some of the decoding unit 310, the hidden variable restoring unit 320 and the image reconstructing unit 330 of the decoder 300 may be at least one program module.
  • the program modules may be included in the electronic device 1000 in the form of an operating system, an application module, a library, and other program modules, and may be physically stored on various known memory devices. Also, at least some of these program modules may be stored in a remote storage device capable of communicating with electronic device 1000. These program modules may be implemented as routines, subroutines, programs, and / or programs for performing specific operations or tasks according to the embodiments or for executing specific abstract data types. and may include, but is not limited to, an object, a component, and a data structure.
  • the electronic device 1000 may further include a user interface (UI) input device 1050 and a UI output device 1060.
  • the UI input device 1050 can receive a user's input required for operation of the electronic device 1000.
  • the UI output device 1060 can output information or data according to the operation of the electronic device 1000.
  • FIG. 11 shows a neural network pruning method according to an embodiment.
  • a sorting method of hidden variables based on importance can be used for neural network pruning.
  • the encoder 200 performing the neural network pruning may be named as a neural network pruning device or the like.
  • the feature extraction unit 210 may acquire the concealment variable.
  • the obtained hidden variables can be sorted in order of importance.
  • the processing associated with the hidden variable may be neural network pruning.
  • Neural network pruning may be to remove nodes with low importance on the neural network.
  • the neural network may have an input layer and an output layer, and may have a plurality of hidden layers between the input layer and the output layer.
  • Fig. 11 illustratively two hidden layers are shown.
  • One layer may include a plurality of nodes.
  • the nodes of the neural network may correspond to the components of the above-mentioned hidden variables.
  • the plurality of components of the hidden variable may correspond to the nodes of the layer of the neural network, respectively.
  • the nodes of the neural network with the same indices and the components of the hidden variable may correspond to each other.
  • the alignment of the hidden variables of the embodiment can be applied to a plurality of layers at the same time.
  • the intensity of each node of the two hidden layers may indicate the importance of the sorted nodes according to the learning of the node.
  • the hidden variable selecting unit 220 may preferentially remove the node having the lowest importance in order to reduce the number of nodes of the neural network. Alternatively, the hidden variable selection unit 220 may remove M nodes in order of decreasing significance among the N nodes of the hidden layer.
  • This neural network pruning can be applied not only to compression of video and image but also to various kinds of neural networks, and the order of importance of hidden variables can be applied to the neural networks of various purposes as well.
  • FIG. 12 illustrates a method of providing various performance neural networks based on a learned neural network to which a neural network pruning method according to an exemplary embodiment is applied.
  • the concealed variable selection unit 220 may perform neural network pruning.
  • the hidden variable selection unit 220 can derive other neural networks having various performances and complexities from one neural network by defining parameters.
  • the hidden variable selecting unit 220 may derive a plurality of neural networks from one neural network using parameters indicating the pruning degree of the neural network pruning.
  • the plurality of neural networks may differ in performance and complexity.
  • the parameter may be the number or percentage of nodes used in each hidden layer of the plurality of hidden layers of the neural network. That is to say, the concealment variable selection unit 220 can derive other neural networks having various performances and complexities from one neural network by defining the number or the ratio of nodes used in each concealment layer of the plurality of concealment layers of the neural network .
  • Fig. 12 the value of the parameter p is shown on the right, and the neural network according to the value of the parameter p is shown on the left.
  • the parameter p may be the ratio of the nodes used among the nodes of each hidden layer of the entire hidden layers.
  • the parameter p may be the ratio of the nodes to be activated among the nodes of each hidden layer of the entire hidden layers.
  • the parameter p may be the number of nodes used among the nodes of each hidden layer of the entire hidden layers.
  • the parameter p may be common to all the hidden layers. Alternatively, the parameter p may be used for each of the plurality of hidden layers of the neural network.
  • the parameter p may be transmitted from another device to the neural network pruning device as needed and transmitted between neural network rendering devices.
  • the encoded bit stream may comprise a parameter p.
  • the encoder 200 may include the parameter p in the bitstream.
  • the decoder 300 may obtain the parameter p from the bitstream.
  • the degree of rate-distortion can be controlled by transmitting the information of parameter p as additional information.
  • the parameter p may be utilized as profile information for applying the neural network to the device appropriate to the performance level of the device. For example, with the parameter p, a hidden layer having a high ratio of nodes used in a device having high performance can be used, and a hidden layer having a low ratio of nodes used in a device having low performance can be used.
  • the apparatus described above may be implemented as a hardware component, a software component, and / or a combination of hardware components and software components.
  • the apparatus and components described in the embodiments may be implemented within a computer system, such as, for example, a processor, a controller, an arithmetic logic unit (ALU), a digital signal processor, a microcomputer, a field programmable array (FPA) A programmable logic unit (PLU), a microprocessor, or any other device capable of executing and responding to instructions.
  • the processing device may execute an operating system (OS) and one or more software applications running on the operating system.
  • the processing device may also access, store, manipulate, process, and generate data in response to execution of the software.
  • OS operating system
  • the processing device may also access, store, manipulate, process, and generate data in response to execution of the software.
  • the processing apparatus may be described as being used singly, but those skilled in the art will recognize that the processing apparatus may have a plurality of processing elements and / As shown in FIG.
  • the processing unit may comprise a plurality of processors or one processor and one controller.
  • Other processing configurations are also possible, such as a parallel processor.
  • the software may include a computer program, code, instructions, or a combination of one or more of the foregoing, and may be configured to configure the processing device to operate as desired or to process it collectively or collectively Device can be commanded.
  • the software and / or data may be in the form of any type of machine, component, physical device, virtual equipment, computer storage media, or device , Or may be permanently or temporarily embodied in a transmitted signal wave.
  • the software may be distributed over a networked computer system and stored or executed in a distributed manner.
  • the software and data may be stored on one or more computer readable recording media.
  • the method according to an embodiment may be implemented in the form of a program command that can be executed through various computer means and recorded in a computer-readable medium.
  • the computer-readable medium may include program instructions, data files, data structures, and the like, alone or in combination.
  • the program instructions to be recorded on the medium may be those specially designed and configured for the embodiments or may be available to those skilled in the art of computer software.
  • the computer-readable recording medium may include information used in embodiments according to the present invention.
  • the computer readable recording medium may comprise a bit stream, and the bit stream may comprise the information described in embodiments according to the present invention.
  • the computer-readable recording medium may comprise a non-transitory computer-readable medium.
  • Examples of computer-readable media include magnetic media such as hard disks, floppy disks and magnetic tape; optical media such as CD-ROMs and DVDs; magnetic media such as floppy disks; Includes hardware devices that are specially configured to store and execute program instructions such as magneto-tical media and ROM, RAM, flash memory, and the like.
  • Examples of program instructions include machine language code such as those produced by a compiler, as well as high-level language code that can be executed by a computer using an interpreter or the like.
  • the hardware devices described above may be configured to operate as one or more software modules to perform the operations of the embodiments, and vice versa.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Signal Processing (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Compression Or Coding Systems Of Tv Signals (AREA)
  • Image Analysis (AREA)

Abstract

L'invention concerne un procédé et un dispositif permettant de compresser une image à l'aide d'une variable cachée. Une pluralité de composants d'une variable cachée peuvent être agencés par ordre d'importance. Lorsqu'une image est reconstruite uniquement à l'aide d'informations de caractéristiques de certains composants parmi la pluralité de composants sur la base d'un tel agencement, la qualité de l'image reconstruite peut être améliorée. Afin de générer la variable cachée dont des composants sont agencés par ordre d'importance, divers types d'apprentissage peuvent être effectués. En outre, des informations présentant une faible importance peuvent être supprimées de la variable cachée, et un traitement tel qu'une quantification peut être appliqué à la variable cachée. La suppression d'informations et le traitement concernant la variable cachée peuvent réduire la quantité de données sur la variable cachée.
PCT/KR2018/013013 2017-10-30 2018-10-30 Procédé et dispositif de compression d'image et de réseau neuronal à l'aide d'une variable cachée WO2019088657A1 (fr)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US16/760,609 US11477468B2 (en) 2017-10-30 2018-10-30 Method and device for compressing image and neural network using hidden variable

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
KR10-2017-0142066 2017-10-30
KR20170142066 2017-10-30
KR1020180130680A KR102285064B1 (ko) 2017-10-30 2018-10-30 은닉 변수를 이용하는 영상 및 신경망 압축을 위한 방법 및 장치
KR10-2018-0130680 2018-10-30

Publications (1)

Publication Number Publication Date
WO2019088657A1 true WO2019088657A1 (fr) 2019-05-09

Family

ID=66333282

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2018/013013 WO2019088657A1 (fr) 2017-10-30 2018-10-30 Procédé et dispositif de compression d'image et de réseau neuronal à l'aide d'une variable cachée

Country Status (1)

Country Link
WO (1) WO2019088657A1 (fr)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112188202A (zh) * 2019-07-01 2021-01-05 西安电子科技大学 基于神经网络的自学习视频编解码技术
CN115115721A (zh) * 2022-07-26 2022-09-27 北京大学深圳研究生院 一种神经网络图像压缩模型的枝剪方法和装置
JP2023510504A (ja) * 2020-10-08 2023-03-14 テンセント・アメリカ・エルエルシー ニューラルネットワークベースのビデオコーディングのためのプルーニング方法、装置およびコンピュータプログラム
US11734577B2 (en) 2019-06-05 2023-08-22 Samsung Electronics Co., Ltd Electronic apparatus and method of performing operations thereof

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20000031283A (ko) * 1998-11-02 2000-06-05 구자홍 영상 부호화 장치
KR20160001648A (ko) * 2014-06-27 2016-01-06 후지쯔 가부시끼가이샤 은닉 데이터 대조 장치, 은닉 데이터 갱신 프로그램을 저장한 컴퓨터 판독 가능한 기록 매체 및 은닉 데이터 갱신 방법
KR101791573B1 (ko) * 2016-10-21 2017-10-31 서강대학교산학협력단 컨볼루션 신경망을 이용한 비디오 스트림에 대한 수퍼 해상도 장치 및 방법

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20000031283A (ko) * 1998-11-02 2000-06-05 구자홍 영상 부호화 장치
KR20160001648A (ko) * 2014-06-27 2016-01-06 후지쯔 가부시끼가이샤 은닉 데이터 대조 장치, 은닉 데이터 갱신 프로그램을 저장한 컴퓨터 판독 가능한 기록 매체 및 은닉 데이터 갱신 방법
KR101791573B1 (ko) * 2016-10-21 2017-10-31 서강대학교산학협력단 컨볼루션 신경망을 이용한 비디오 스트림에 대한 수퍼 해상도 장치 및 방법

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
A. KRIZHEVSKY ET AL.: "Using very deep autoencoders for content-bas ed image retrieval", EUROPEAN SYMPOSIUM ON ARTIFICIAL NEURAL NETWORKS, 2011, pages 1 - 7, XP055613852 *
GEORGE TODERIC ET AL.: "Variable rate image compression with recurrent neural networks", ICLR, 20 March 2016 (2016-03-20), pages 1 - 12, XP055506355 *
LEE, JOO YOUNG ET AL.: "A study on the transformative artificial neural network for image compression", 30TH WORKSHOP ON IMAGE PROCESSING AND IMAGE UNDERSTANDING, 7 February 2018 (2018-02-07), pages 1 - 5 *

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11734577B2 (en) 2019-06-05 2023-08-22 Samsung Electronics Co., Ltd Electronic apparatus and method of performing operations thereof
CN112188202A (zh) * 2019-07-01 2021-01-05 西安电子科技大学 基于神经网络的自学习视频编解码技术
JP2023510504A (ja) * 2020-10-08 2023-03-14 テンセント・アメリカ・エルエルシー ニューラルネットワークベースのビデオコーディングのためのプルーニング方法、装置およびコンピュータプログラム
US11765376B2 (en) 2020-10-08 2023-09-19 Tencent America LLC Pruning methods and apparatuses for neural network based video coding
JP7489466B2 (ja) 2020-10-08 2024-05-23 テンセント・アメリカ・エルエルシー ニューラルネットワークベースのビデオコーディングのためのプルーニング方法、装置およびコンピュータプログラム
CN115115721A (zh) * 2022-07-26 2022-09-27 北京大学深圳研究生院 一种神经网络图像压缩模型的枝剪方法和装置
CN115115721B (zh) * 2022-07-26 2024-03-15 北京大学深圳研究生院 一种神经网络图像压缩模型的枝剪方法和装置

Similar Documents

Publication Publication Date Title
WO2019088657A1 (fr) Procédé et dispositif de compression d'image et de réseau neuronal à l'aide d'une variable cachée
WO2016064185A1 (fr) Procédé et appareil pour effectuer une prédiction basée sur des graphes à l'aide d'une fonction d'optimisation
WO2020080665A1 (fr) Procédés et appareils permettant d'effectuer un codage par intelligence artificielle et un décodage par intelligence artificielle sur une image
WO2019125010A1 (fr) Procédé et appareil de gestion de points dupliqués dans une compression de nuage de points
WO2020080698A1 (fr) Procédé et dispositif d'évaluation de la qualité subjective d'une vidéo
WO2019009489A1 (fr) Procédé d'encodage/de décodage d'image et dispositif pour cela
WO2017010850A1 (fr) Procédé et dispositif de traitement de signal vidéo au moyen d'une transformation graphique séparable
WO2016129872A1 (fr) Procédé et appareil pour traiter un signal vidéo à l'aide d'une transformation basée sur un graphique
WO2012044076A2 (fr) Procédé et dispositif de codage vidéo et procédé et dispositif de décodage
WO2019009490A1 (fr) Procédé d'encodage/décodage d'une image et dispositif à cet effet
WO2016190690A1 (fr) Procédé et dispositif de traitement de signal vidéo au moyen d'une transformée adaptative et séparable basée graphique
WO2021002719A1 (fr) Procédé et appareil de traitement vidéo
EP2471265A2 (fr) Procédé et appareil de codage et de décodage d'image par utilisation de transformation rotationnelle
WO2010027170A2 (fr) Procédé et dispositif de codage/décodage d'images au moyen d'une conversion de direction de prédiction et par codage sélectif
WO2018044125A1 (fr) Procédé et appareil pour effectuer une transformation à l'aide d'une transformée de givens en couches
WO2016195455A1 (fr) Procédé et dispositif de traitement de signal vidéo au moyen d'une transformée basée graphique
WO2022071695A1 (fr) Dispositif permettant de traiter une image et son procédé de fonctionnement
EP3844962A1 (fr) Procédés et appareils permettant d'effectuer un codage par intelligence artificielle et un décodage par intelligence artificielle sur une image
WO2016104854A1 (fr) Procédé et appareil de codage, et procédé et appareil de décodage
EP4000272A1 (fr) Appareil et procédé d'utilisation de métadonnées d'ia associées à la qualité d'image
WO2016076677A1 (fr) Procédé et dispositif de codage entropique ou de décodage entropique d'un signal vidéo, pour un traitement parallèle haute capacité
EP2510691A2 (fr) Procédé et appareil pour le codage et le décodage d'une image à l'aide d'une transformation rotationnelle
WO2016064242A1 (fr) Procédé et appareil pour décoder/coder un signal vidéo à l'aide de transformation déduite d'un modèle de graphe
WO2021172956A1 (fr) Procédé et appareil de codage/décodage d'image pour la signalisation d'informations de caractéristique d'image, et procédé de transmission de flux binaire
WO2019225903A1 (fr) Procédé et dispositif de vérification d'intégrité d'image à base de filigrane

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18872707

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 18872707

Country of ref document: EP

Kind code of ref document: A1