WO2022083599A1 - 基于声波的图像对抗样本生成方法及系统 - Google Patents

基于声波的图像对抗样本生成方法及系统 Download PDF

Info

Publication number
WO2022083599A1
WO2022083599A1 PCT/CN2021/124791 CN2021124791W WO2022083599A1 WO 2022083599 A1 WO2022083599 A1 WO 2022083599A1 CN 2021124791 W CN2021124791 W CN 2021124791W WO 2022083599 A1 WO2022083599 A1 WO 2022083599A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
adversarial
sample
target object
pixel
Prior art date
Application number
PCT/CN2021/124791
Other languages
English (en)
French (fr)
Inventor
冀晓宇
徐文渊
程雨诗
张月鹏
王凯
闫琛
Original Assignee
浙江大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 浙江大学 filed Critical 浙江大学
Priority to US17/702,662 priority Critical patent/US20220215652A1/en
Publication of WO2022083599A1 publication Critical patent/WO2022083599A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N25/00Circuitry of solid-state image sensors [SSIS]; Control thereof
    • H04N25/60Noise processing, e.g. detecting, correcting, reducing or removing noise
    • H04N25/61Noise processing, e.g. detecting, correcting, reducing or removing noise the noise originating only from the lens unit, e.g. flare, shading, vignetting or "cos4"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/01Dynamic search techniques; Heuristics; Dynamic trees; Branch-and-bound
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N7/00Computing arrangements based on specific mathematical models
    • G06N7/01Probabilistic graphical models, e.g. probabilistic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T11/002D [Two Dimensional] image generation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/10Image acquisition
    • G06V10/12Details of acquisition arrangements; Constructional details thereof
    • G06V10/14Optical characteristics of the device performing the acquisition or on the illumination arrangements
    • G06V10/147Details of sensors, e.g. sensor lenses
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/764Arrangements for image or video recognition or understanding using pattern recognition or machine learning using classification, e.g. of video objects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/772Determining representative reference patterns, e.g. averaging or distorting patterns; Generating dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/774Generating sets of training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/776Validation; Performance evaluation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/778Active pattern-learning, e.g. online learning of image or video features
    • G06V10/7796Active pattern-learning, e.g. online learning of image or video features based on specific statistical tests
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/77Processing image or video features in feature spaces; using data integration or data reduction, e.g. principal component analysis [PCA] or independent component analysis [ICA] or self-organising maps [SOM]; Blind source separation
    • G06V10/80Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level
    • G06V10/809Fusion, i.e. combining data from various sources at the sensor level, preprocessing level, feature extraction level or classification level of classification results, e.g. where the classifiers operate on the same input data
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/68Control of cameras or camera modules for stable pick-up of the scene, e.g. compensating for camera body vibrations
    • H04N23/681Motion detection
    • H04N23/6812Motion detection based on additional sensors, e.g. acceleration sensors
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/68Control of cameras or camera modules for stable pick-up of the scene, e.g. compensating for camera body vibrations
    • H04N23/682Vibration or motion blur correction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04NPICTORIAL COMMUNICATION, e.g. TELEVISION
    • H04N23/00Cameras or camera modules comprising electronic image sensors; Control thereof
    • H04N23/60Control of cameras or camera modules
    • H04N23/68Control of cameras or camera modules for stable pick-up of the scene, e.g. compensating for camera body vibrations
    • H04N23/682Vibration or motion blur correction
    • H04N23/684Vibration or motion blur correction performed by controlling the image sensor readout, e.g. by controlling the integration time

Definitions

  • the invention belongs to the field of artificial intelligence, and relates to a method and system for generating image confrontation samples based on sound waves.
  • Machine vision is widely used in modern intelligent systems, such as intelligent robots, self-driving cars, etc.
  • Machine vision uses cameras to capture information about the surrounding environment of an intelligent system, and uses deep learning algorithms to detect and recognize objects contained in images, so as to achieve the purpose of perceiving the environment. Since the perceptual results of machine vision are usually used as the information source for the subsequent decision-making of intelligent systems, the security of the perceptual results of machine vision is very important.
  • Image adversarial samples refer to samples that can interfere with the results of machine vision perception. Studying image adversarial samples has important guiding significance for ensuring the security of machine systems and intelligent systems.
  • the research on image adversarial samples mainly focuses on the digital domain, that is, directly modifying the pixel values of digital images to construct image adversarial samples.
  • the image adversarial samples constructed by this method usually have good adversarial effects, it is difficult to apply in practical systems.
  • there are currently methods for constructing image adversarial samples based on the physical domain but usually need to modify the appearance of the target object or inject light into the camera, so the concealment is not good. .
  • a method for generating an image adversarial sample based on a sound wave comprising the following steps:
  • the above-mentioned countermeasure parameters are injected into the inertial sensor of the target machine vision system by means of sound waves, so that the countermeasure parameters become sensor readings, which will cause the image stabilization module in the target machine vision system to work. , which produces specific blurry patterns in the generated real pictures, thereby generating image adversarial samples in the physical world.
  • the construction of the acoustic wave countermeasure sample simulation model includes the following three steps:
  • the acoustic wave countermeasure sample simulation model is constructed from the four dimensions of the accelerometer x, y, z axis and the gyroscope roll axis in the above six dimensions.
  • the false camera motion -a x will cause the opposite direction during the imaging process , where f is the focal length of the camera, u is the object distance of the target object or target scene, and T is the exposure time of the camera;
  • the false camera motion-a z will cause the pixel to move away from the center of the picture during the imaging process
  • the displacement of where ro is the distance between the pixel and the center of the screen
  • the false camera motion - ⁇ r will cause the pixel to move in the opposite direction ⁇ r Tr c during the imaging process, where rc is the rotation of the pixel and the angular velocity distance from the center.
  • Image blur modeling Pixel motion during imaging will result in blurred images.
  • false camera motion in the x- and y-axis dimensions of the accelerometer causes linear pixel motion, resulting in linear image blurring; false camera motion in the accelerometer z-axis dimension causes radial pixel motion, resulting in radial image blurring; gyroscopes False camera motion in the roll axis dimension, resulting in rotated pixel motion, resulting in blurred rotated images. Constructing a unified image blur model for the above blur is as follows:
  • X is the original image
  • B is the blurred image
  • (i, j) is the pixel coordinates
  • B(i, j) is the pixel with the coordinates (i, j) in the blurred image
  • n is the number of discrete points
  • (c 0 , c 1 ) are the coordinates of the image center
  • (o 0 , o 1 ) are the coordinates of the rotation center.
  • using an adversarial sample optimization method to optimize the generated simulated image samples to obtain optimal adversarial samples and corresponding adversarial parameters includes the following steps:
  • optimization function design Different optimization functions are designed for different types of adversarial image samples.
  • the first is adversarial image samples with hidden effects, which can make the deep learning algorithm unable to recognize the target object; the second is adversarial image samples with creative effects, which can create a depth in the current image.
  • the target object detected by the learning algorithm; the third type is an adversarial image sample with a changing effect, which can make the deep learning algorithm detect the target object as other objects.
  • p is the number of the target object
  • w 1 and w 2 are the weight values that balance the effectiveness of the adversarial image samples and the cost of sample generation
  • ⁇ 1 and ⁇ 2 are the upper limits of the impact of sound waves on the accelerometer and gyroscope readings ;
  • o is the number of the target object to be created
  • the detection confidence for the target object area to be created output by the deep learning algorithm is the detection confidence level of the target object category to be created output by the deep learning algorithm
  • p is the number of the existing object in the image
  • m is the number of existing objects in the image
  • Uop is the area of the object O to be created and the area of the existing object p
  • the intersection ratio of , w 3 and w 4 are weight values that balance the effectiveness of adversarial image samples and the cost of sample generation
  • ⁇ 1 and ⁇ 2 are the upper limits of the impact of sound waves on accelerometer and gyroscope readings;
  • p is the number of the target object
  • p is the detection confidence for the modified target object region output by the deep learning algorithm
  • Upp′ is the difference between the area of the target object p before modification and the area of the target object p′ after modification
  • the cross-union ratio, w 5 and w 6 are the weight values to balance the effectiveness of the adversarial image samples and the cost of sample generation, and ⁇ 1 and ⁇ 2 are the upper bounds of the influence of sound waves on the accelerometer and gyroscope readings.
  • the inertial sensor injection method includes the steps of:
  • a DC component is introduced into the analog-to-electrical converter to stabilize the sensor output
  • the sensor output waveform is shaped so that the sensor readings approximate the countermeasures.
  • the optimal countermeasure parameters for the target object can be injected into the inertial sensor of the target machine vision system by means of sound waves, making it a sensor reading, which will cause the image stabilization module in the target machine vision system. work to generate specific blurry patterns in the generated real-world pictures to generate image adversarial examples in the physical world.
  • an image countermeasure sample generation system based on sound waves comprising a sound wave countermeasure simulation module, a counter sample optimization module, and a sensor reading injection module;
  • the acoustic wave confrontation simulation module is used for false camera motion modeling, pixel motion modeling and image blur modeling;
  • the adversarial sample optimization module is used for optimization function design and optimization function solution
  • the sensor reading injection module is used for resonance frequency search, false reading stabilization, and false reading shaping;
  • the system utilizes the above-mentioned modules to implement the above-mentioned method for generating image adversarial samples based on sound waves.
  • an image adversarial sample generation system based on sound waves characterized in that the system includes:
  • the image stabilization module in the target machine vision system works to generate specific blurry patterns in the generated real pictures to generate image adversarial examples in the physical world.
  • an image adversarial sample generation system based on sound waves characterized in that the system includes:
  • a processor that executes the instructions stored in the memory to perform the sound wave-based image adversarial sample generation method as described above.
  • a non-transitory computer-readable storage medium storing instructions that, when executed by a processor, cause the processor to execute the above-described acoustic wave-based image adversarial sample Generate method.
  • the sound wave-based image adversarial sample generation method proposed in the present invention has better practicability and concealment, provides a new idea for the construction of image adversarial samples, and provides machine learning security analysis and security. Protection provides new guidance.
  • FIG. 1 is a flowchart illustrating a method for generating an image adversarial sample based on a sound wave according to an embodiment of the present application.
  • FIG. 2 is a more detailed schematic diagram illustrating a method for generating an image adversarial sample based on a sound wave according to an embodiment of the present application.
  • FIG. 3 is a block diagram illustrating an image adversarial sample generation system based on a sound wave according to an embodiment of the present application.
  • FIG. 4 is a block diagram illustrating a computing device according to an embodiment of the present application.
  • the present application provides a method and system for generating an image adversarial sample based on sound waves.
  • the method uses sound waves to affect the inertial sensor readings in a machine vision system, causing the image stabilization module to erroneously compensate and blur the image, thereby constructing three different types of images. adversarial example.
  • This method exploits the vulnerability of inertial sensors and deep learning algorithms in existing machine vision systems, and innovatively proposes to use sound waves to construct image adversarial samples suitable for the physical world, thereby deceiving machine vision systems.
  • the image adversarial sample constructed by the method provided in the embodiment of the present application has stronger applicability in the real physical world; compared with the existing image adversarial sample based on the physical domain
  • the sample construction method, the method provided by the embodiment of the present application does not need to modify the appearance of the object or inject light into the camera, and has better concealment.
  • FIG. 1 is a flowchart illustrating a method for generating an image adversarial sample based on a sound wave according to an embodiment of the present application.
  • the method 100 for generating an image adversarial sample based on a sound wave according to an embodiment of the present application includes the following steps:
  • the optimal adversarial parameters for the target object can be injected into the inertial sensor of the target machine vision system by means of acoustic waves, so that it becomes the sensor reading, This reading will cause the image stabilization module in the target machine vision system to work, producing a specific blurring pattern in the generated real picture, thereby generating an image adversarial example in the physical world.
  • FIG. 2 is a more detailed schematic diagram illustrating a method for generating an image adversarial sample based on a sound wave according to an embodiment of the present application.
  • the acoustic wave countermeasure sample simulation model is constructed by the following steps:
  • the present invention mainly considers four dimensions of the accelerometer x, y, z axis and the gyroscope roll axis in the above six dimensions to construct the acoustic wave countermeasure sample simulation model.
  • the false camera motion -a x will cause the opposite direction during the imaging process , where f is the focal length of the camera, u is the object distance of the target object or target scene, and T is the exposure time of the camera;
  • the false camera motion-a z will cause the pixel to move away from the center of the picture during the imaging process
  • the displacement of where ro is the distance between the pixel and the center of the screen
  • the false camera motion - ⁇ r will cause the pixel to move in the opposite direction ⁇ r Tr c during the imaging process, where rc is the rotation of the pixel and the angular velocity distance from the center.
  • Image blur modeling Pixel motion during imaging will result in blurred images.
  • false camera motion in the x- and y-axis dimensions of the accelerometer causes linear pixel motion, resulting in linear image blurring; false camera motion in the accelerometer z-axis dimension causes radial pixel motion, resulting in radial image blurring; gyroscopes False camera motion in the roll axis dimension, resulting in rotated pixel motion, resulting in blurred rotated images. Constructing a unified image blur model for the above blur is as follows:
  • X is the original image
  • B is the blurred image
  • (ij) is the pixel coordinates
  • B(ij) is the pixel with the coordinates (ij) in the blurred image
  • n is the number of discrete points
  • (c 0 , c 1 ) is the image center coordinate
  • (o 0 , o 1 ) is the rotation center coordinate.
  • the adversarial sample optimization method mainly includes the following steps:
  • the present invention designs different optimization functions.
  • the present invention considers three adversarial image samples with different effects. The first is adversarial image samples with hidden effects, which can make the deep learning algorithm unable to recognize the target object; the second is adversarial image samples with creative effects, which can create a depth in the current image. The target object detected by the learning algorithm; the third type is an adversarial image sample with a changing effect, which can make the deep learning algorithm detect the target object as other objects.
  • p is the number of the target object
  • w 1 and w 2 are the weight values that balance the effectiveness of the adversarial image samples and the cost of sample generation
  • ⁇ 1 and ⁇ 2 are the upper limits of the impact of sound waves on the accelerometer and gyroscope readings ;
  • o is the number of the target object to be created
  • the detection confidence for the target object area to be created output by the deep learning algorithm It is the detection confidence level of the target object category to be created output by the deep learning algorithm
  • p is the number of the existing object in the image
  • m is the number of existing objects in the image
  • Uop is the area of the object o to be created and the area of the existing object p
  • the intersection ratio of , w 3 and w 4 are weight values that balance the effectiveness of adversarial image samples and the cost of sample generation
  • ⁇ 1 and ⁇ 2 are the upper limits of the impact of sound waves on accelerometer and gyroscope readings;
  • p is the number of the target object
  • p is the detection confidence for the modified target object region output by the deep learning algorithm
  • Upp′ is the difference between the area of the target object p before modification and the area of the target object p′ after modification
  • the cross-union ratio, w 5 and w 6 are the weight values to balance the effectiveness of the adversarial image samples and the cost of sample generation, and ⁇ 1 and ⁇ 2 are the upper bounds of the influence of sound waves on the accelerometer and gyroscope readings.
  • the inertial sensor reading injection method includes the following steps:
  • the sensor output waveform is shaped by amplitude modulation, so that the sensor reading is close to the confrontation parameter.
  • FIG. 3 shows an image adversarial sample generation system 300 based on sound waves according to an embodiment of the present application.
  • the acoustic wave-based image adversarial sample generation system 300 includes: an acoustic wave adversarial simulation module 301, an adversarial sample optimization module 302, and a sensor reading injection module 303;
  • the sound wave confrontation simulation module 301 is used for false camera motion modeling, pixel motion modeling and image blur modeling;
  • the adversarial sample optimization module 302 is used for optimization function design and optimization function solution
  • the sensor reading injection module 303 is used for resonance frequency search, false reading stabilization, and false reading shaping;
  • the system utilizes the above-mentioned modules 301-303 to implement the above-mentioned method for generating image adversarial samples based on sound waves.
  • FIG. 4 is a block diagram illustrating a computing device 400 according to an embodiment of the present application.
  • the above sound wave-based image adversarial sample generator can be implemented by the computing device 400 .
  • computing device 400 may include one or more processors or processor cores 401 and memory 402 .
  • processors or processor cores 401 may include any type of processor, such as a central processing unit, microprocessor, and the like.
  • the processor 401 may be implemented as an integrated circuit having multiple cores, eg, a multi-core microprocessor.
  • memory 402 may be system memory.
  • Computing device 400 may include mass storage device 403 (eg, magnetic disk, hard drive, volatile memory (eg, dynamic random-access memory (DRAM), compact disc read-only memory) memory, CD-ROM), digital versatile disk (DVD), etc.).
  • mass storage device 403 eg, magnetic disk, hard drive, volatile memory (eg, dynamic random-access memory (DRAM), compact disc read-only memory) memory, CD-ROM), digital versatile disk (DVD), etc.
  • memory 402 and/or mass storage device 403 may be any type of temporary and/or persistent storage , including but not limited to volatile and non-volatile memory, optical, magnetic, and/or solid-state mass storage, etc.
  • Volatile memory may include, but is not limited to, static and/or dynamic random access memory.
  • Non-volatile memory Resistive memory may include, but is not limited to, electrically erasable programmable read-only memory, phase change memory, resistive memory, and the like.
  • Computing device 400 may also include input/output (I/O) devices 404 (eg, displays (eg, touch screen displays), keyboards, cursor controls, remote controls, game controllers, image capture devices, etc.) and communication interfaces 405 (eg, network interface cards, modems, infrared receivers, radio receivers (eg, Bluetooth), etc.).
  • I/O input/output
  • Communication interface 405 may include a communication chip, which may be configured for wired or wireless communication with other devices.
  • system bus 406 represents one or more buses. In the case of multiple buses, they may be bridged by one or more bus bridges (not shown). Each of these elements may perform its conventional functions known in the art. Specifically, memory 402 and mass storage device 403 may be employed to store working and permanent copies of programming instructions for operation of device 400 . Various elements may be implemented by assembly instructions supported by processor(s) 401 or a high-level language that may be compiled into such instructions.
  • a permanent copy of the programming instructions may be placed into mass storage device 403 at the factory or distributed in the field via, for example, a distribution medium (not shown) such as a compact disc (CD), or via communication interface 405 . That is, one or more distribution media having an implementation of the agent program may be employed to distribute the agent and program various computing devices.
  • a distribution medium such as a compact disc (CD)
  • CD compact disc
  • computing device 400 may include a laptop, netbook, notebook, ultrabook, smartphone, tablet, personal digital assistant (PDA), ultra-mobile PC, mobile phone, or digital One or more components of a camera.
  • computing device 400 may be any other electronic device that processes data.
  • Various embodiments may include any suitable combination of the above-described embodiments, including alternative (or) embodiments of the embodiments described above in conjunction (and) (eg, "and” may be "and/or”).
  • some embodiments may include one or more articles of manufacture (eg, non-transitory computer-readable media) having stored thereon instructions that, when executed, cause the actions of any of the above-described embodiments.
  • some embodiments may include devices or systems having any suitable means for performing the various operations of the above-described embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Software Systems (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Databases & Information Systems (AREA)
  • Signal Processing (AREA)
  • General Engineering & Computer Science (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Probability & Statistics with Applications (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Optimization (AREA)
  • Pure & Applied Mathematics (AREA)
  • Mathematical Analysis (AREA)
  • Computational Mathematics (AREA)
  • Algebra (AREA)
  • Vascular Medicine (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)
  • Studio Devices (AREA)

Abstract

本发明公开了一种基于声波的图像对抗样本生成方法及系统。该方法包括:获取包含目标物体或目标场景的图片;针对所获取的图片,使用声波对抗样本仿真模型来生成对目标机器视觉系统中深度学习算法具有对抗作用的仿真图像样本;使用对抗样本优化方法优化所生成的仿真图像样本,获得最优的对抗样本以及对应的对抗参数;使用惯性传感器读数注入方法,将所述对抗参数通过声波的方式注入到所述目标机器视觉系统的惯性传感器中,使所述对抗参数成为传感器读数,该传感器读数将引起所述目标机器视觉系统中的图像防抖模块工作,以在生成的真实图片中产生特定的模糊图案,从而生成物理世界中的图像对抗样本。

Description

基于声波的图像对抗样本生成方法及系统
本申请要求于2020年10月20日提交的申请号为202011124293.6的中国发明专利申请的优先权,该申请通过引用全部并入本文。
技术领域
本发明属于人工智能领域,涉及一种基于声波的图像对抗样本生成方法及系统。
背景技术
随着人工智能技术的不断发展,机器视觉在现代智能系统中广泛应用,如智能机器人、自动驾驶汽车等。机器视觉利用摄像头拍摄智能系统周围环境信息,并使用深度学习算法对图像中包含的物体进行检测和识别,从而达到感知环境的目的。由于机器视觉的感知结果通常作为智能系统后续决策的信息来源,机器视觉感知结果的安全性至关重要。
近些年来,针对图像对抗样本的研究日益增多。图像对抗样本指的是可以干扰机器视觉感知结果的样本,研究图像对抗样本对保障机器系统以及智能系统的安全性具有重要指导意义。当前,针对图像对抗样本的研究主要集中于数字域,即直接修改数字图像的像素值来构造图像对抗样本。尽管该方法构造的图像对抗样本通常具有较好的对抗效果,但在实际系统中较难应用。此外,当前也存在基于物理域的图像对抗样本的构造方法,但通常需要修改目标物体的外表或向摄像头中注入光线,因此隐蔽性欠佳。。
发明内容
根据本申请的一方面,提供一种基于声波的图像对抗样本生成方法,该方法包括以下步骤:
获取包含目标物体或目标场景的图片;
针对所获取的图片,使用声波对抗样本仿真模型,生成对目标机器视觉系统中深度学习算法具有对抗作用的仿真图像样本;
使用对抗样本优化方法优化所生成的仿真图像样本,获得最优的对抗样本以及对应的对抗参数;以及
使用惯性传感器读数注入方法,将上述得到的对抗参数通过声波的方式注入到目标机器视觉系统的惯性传感器中,使对抗参数成为传感器读数,该读数将引起目标机器视觉系统中的图像防抖模块工作,在生成的真实图片中产生特定的模糊图案,从而生成物理世界中的图像对抗样本。
在一些实施例中,声波对抗样本仿真模型的构建包括以下三个步骤:
(1)虚假摄像头运动建模。假设由于声波攻击造成的虚假惯性传感器读数为M f={a x,a y,a zrpy},其中a x,a y,a z分别为加速度计x,y,z轴的虚假加速度读数,ω rpy分别为陀螺仪roll,pitch,yaw轴的虚假角速度读数。假设图像防抖模块可以进行完全的补偿,此时,由于声波攻击导致的虚假摄像头运动为M c={-a x,-a y,-a z,-ω r,-ω p,-ω y}。从以上六个维度中的加速度计x,y,z轴和陀螺仪roll轴这四个维度来构建声波对抗样本仿真模型。
(2)像素运动建模。虚假摄像头运动将导致目标物体或目标场景的成像位置不同,从而导致输出图片中像素发生运动。
针对加速度计x轴维度,对于图片中任意像素点,虚假摄像头运动-a x将在成像过程中造成相反方向
Figure PCTCN2021124791-appb-000001
的像素位移,其中f为摄像头焦距,u为目标物体或目标场景物距,T为摄像头曝光时间;
针对加速度计y轴维度,对于图片中任意像素点,虚假摄像头运动-a y将在成像过程中造成相反方向
Figure PCTCN2021124791-appb-000002
的像素位移;
针对加速度计z轴维度,对于图片中的任意像素点,虚假摄像头运动-a z将在成像过程中造成该像素点朝远离画面中心方向
Figure PCTCN2021124791-appb-000003
的位移,其中r o为该像素点与画面中心的距离;
针对陀螺仪roll轴维度,对于图片中的任意像素点,虚假摄像头运动-ω r将在成像过程中造成该像素点朝相反方向ω rTr c的位移,其中r c为该像素点与角速度旋转中心的距离。
(3)图像模糊建模。成像过程中的像素运动将导致图像模糊。特别地,加速度计x轴和y轴维度的虚假摄像头运动,造成线性像素运动,导致线性图像模糊;加速度计z轴维度的虚假摄像头运动,造成径向像素运动,导致径向图像模糊;陀螺仪roll轴维度的虚假摄像头运动,造成旋转像素运动,导致旋转图像模糊。为上述模糊构建统一的图像模糊模型如下:
Figure PCTCN2021124791-appb-000004
[i′(k),j′(k)] T=[u(k),v(k)] T+[i,j] T
Figure PCTCN2021124791-appb-000005
Figure PCTCN2021124791-appb-000006
β=ω rT
Figure PCTCN2021124791-appb-000007
Figure PCTCN2021124791-appb-000008
r c=‖(i,j),(c 0,c 1)‖ 2
r o=‖(i,j),(o 0,o 1)‖ 2
其中,X为原始图像,B为模糊后图像,(i,j)为像素点坐标,B(i,j)为模糊后图像中坐标为(i,j)的像素点,n为离散点数,(c 0,c 1)为图像中心坐标,(o 0,o 1)为旋转中心坐标。使用上述虚假摄像头运动建模、像素运动建模和图像模糊建模的模型,可以获得在不同对抗参数下的仿真对抗图像样本。
在一些实施例中,使用对抗样本优化方法优化所生成的仿真图像样本以获得最优的对抗样本以及对应的对抗参数包括以下以下步骤:
(1)优化函数设计。针对不同类型的对抗图像样本,设计不同的优化函数。考虑三种具有不同效果的对抗图像样本。第一种为具有隐藏效果的对抗图像样本,该类样本可以使得深度学习算法无法识别目标物体;第二种为具有创造效果的对抗图像样本,该类样本可以在当前图像中创造一 个可被深度学习算法检测的目标物体;第三种为具有改变效果的对抗图像样本,该类样本可以使得深度学习算法将目标物体检测为其他物体。
针对具有隐藏效果的对抗图像样本,其优化函数为:
Figure PCTCN2021124791-appb-000009
s.t.|a x+a y+a z|<ε 1
r|<ε 2
其中,p为目标物体的编号,
Figure PCTCN2021124791-appb-000010
为深度学习算法输出的目标物体区域检测置信度,
Figure PCTCN2021124791-appb-000011
为深度学习算法输出的目标物体类别检测置信度,w 1和w 2为平衡对抗图像样本有效性和样本生成成本的权重值,ε 1和ε 2为声波对加速度计和陀螺仪读数影响的上限;
针对具有创造效果的对抗图像样本,其优化函数为:
Figure PCTCN2021124791-appb-000012
s.t.|a x+a y+a z|<ε 1
r|<ε 2
其中,o为待创造的目标物体的编号,C o=T为待创造的目标物体的类别,
Figure PCTCN2021124791-appb-000013
为深度学习算法输出的待创造目标物体区域检测置信度,
Figure PCTCN2021124791-appb-000014
为深度学习算法输出的待创造目标物体类别检测置信度,p为图像中已有物体的编号,m为图像中已有物体的数量,Uop为待创造物体O的区域与已有物体p的区域的交并比,w 3和w 4为平衡对抗图像样本有效性和样本生成成本的权重值,ε 1和ε 2为声波对加速度计和陀螺仪读数影响的上限;
针对具有改变效果的对抗图像样本,其优化函数为:
Figure PCTCN2021124791-appb-000015
s.t.|a x+a y+a z|<ε 1
r|<ε 2
其中,p为目标物体的编号,
Figure PCTCN2021124791-appb-000016
为深度学习算法输出的修改后的目标物体区域检测置信度,
Figure PCTCN2021124791-appb-000017
为深度学习算法输出的修改后的目标物体类别检测置信度,C p′=T为修改后的目标物体的类别,Upp′为修改前目标物体p的区域与修改后目标物体p′的区域的交并比,w 5和w 6为平衡对抗图像样本有效性和样本生成成本的权重值,ε 1和ε 2为声波对加速度计和陀螺仪读数 影响的上限。
(2)优化函数求解。针对上述优化函数,使用贝叶斯优化法求解最佳对抗参数。
在一些实施例中,惯性传感器注入方法包括以下步骤:
通过频率扫描,找到目标机器视觉系统中惯性传感器的共振频率;
通过调整声波共振频率,在模电转换器中引入直流分量,稳定传感器输出;以及
通过调幅调制,对传感器输出波形进行整形,使得传感器读数逼近对抗参数。
通过以上步骤,可以将针对目标对象的最优对抗参数,通过声波的方式注入到目标机器视觉系统的惯性传感器中,使其成为传感器读数,该读数将引起目标机器视觉系统中的图像防抖模块工作,以在生成的真实图片中产生特定的模糊图案,从而生成物理世界中的图像对抗样本。
根据本申请的另一方面,还提供一种基于声波的图像对抗样本生成系统,该系统包括声波对抗仿真模块、对抗样本优化模块、传感器读数注入模块;
所述声波对抗仿真模块用于虚假摄像头运动建模、像素运动建模和图像模糊建模;
所述对抗样本优化模块用于优化函数设计和优化函数求解;
所述传感器读数注入模块用于共振频率搜索、虚假读数稳定、虚假读数整形;
该系统利用上述模块来来实现上述基于声波的图像对抗样本生成方法。
根据本申请的另一方面,还提供一种基于声波的图像对抗样本生成系统,其特征在于,该系统包括:
用于获取包含目标物体或目标场景的图片的装置;
用于针对所获取的图片,使用声波对抗样本仿真模型来生成对目标机器视觉系统中深度学习算法具有对抗作用的仿真图像样本的装置;
用于使用对抗样本优化方法优化所生成的仿真图像样本以获得最优的对抗样本以及对应的对抗参数的装置;
用于使用惯性传感器读数注入方法,将所述对抗参数通过声波的方式注入到所述目标机器视觉系统的惯性传感器中,使所述对抗参数成为传感器读数的装置,所述传感器读数将引起所述目标机器视觉系统中的图像防抖模块工作,以在生成的真实图片中产生特定的模糊图案,从而生成物理世界中的图像对抗样本。
根据本申请的另一方面,还提供一种基于声波的图像对抗样本生成系统,其特征在于,该系统包括:
存储器,用于存储指令;以及
处理器,所述处理器执行所述存储器中所存储的指令,以执行如上所述的基于声波的图像对抗样本生成方法。
根据本申请的另一方面,还提供一种非暂态计算机可读存储介质,存储有指令,所述指令在被处理器运行时使得所述处理器执行如上所述的基于声波的图像对抗样本生成方法。
相比于现有对抗样本构造方法,本发明提出的基于声波的图像对抗样本生成方法具有较好的实用性和隐蔽性,为图像对抗样本的构造提供了新的思路,为机器学习安全分析与防护提供了新的指导。
附图说明
图1是示出根据本申请实施例的基于声波的图像对抗样本生成方法的流程图。
图2是示出根据本申请实施例的基于声波的图像对抗样本生成方法的更详细示意图。
图3是示出根据本申请实施例的基于声波的图像对抗样本生成系统的框图。
图4是示出根据本申请实施例的计算装置的框图。
具体实施方式
本申请提供一种基于声波的图像对抗样本生成方法及系统,该方法使用声波影响机器视觉系统中的惯性传感器读数,引起图像防抖模块错误补 偿,造成图像模糊,从而构造三种不同类型的图像对抗样本。
该方法利用现有机器视觉系统中惯性传感器和深度学习算法的脆弱性,创新性地提出使用声波构造适用于物理世界的图像对抗样本,从而欺骗机器视觉系统。相比于现有基于数字域的图像对抗样本构造方法,本申请实施例提供的方法构造的图像对抗样本在真实物理世界中具有更强的适用性;相比于现有基于物理域的图像对抗样本构造方法,本申请实施例提供的方法无需修改物体的外表或向摄像头中注入光线,具有更好的隐蔽性。
下面结合实施例和说明书附图对本发明做进一步说明。
图1是示出根据本申请的实施例的基于声波的图像对抗样本生成方法的流程图。如图1所示,根据本申请的实施例的基于声波的图像对抗样本生成方法100,包括以下步骤:
S101,获取包含目标物体或目标场景的图片;
S102,针对上述图片,使用声波对抗样本仿真模型,生成对目标机器视觉系统中深度学习算法具有对抗作用的仿真图像样本;
S103,使用对抗样本优化方法优化上述仿真图像样本,获得最优的对抗样本以及对应的对抗参数;以及
S104,使用惯性传感器读数注入方法,将上述得到的对抗参数,通过声波的方式注入到目标机器视觉系统的惯性传感器中,使其成为传感器读数,该读数将引起目标机器视觉系统中的图像防抖模块工作,以在生成的真实图片中产生特定的模糊图案,从而生成物理世界中的图像对抗样本。
根据本申请实施例的基于声波的图像对抗样本生成方法的以上步骤,可以将针对目标对象的最优对抗参数,通过声波的方式注入到目标机器视觉系统的惯性传感器中,使其成为传感器读数,该读数将引起目标机器视觉系统中的图像防抖模块工作,在生成的真实图片中产生特定的模糊图案,从而生成物理世界中的图像对抗样本。
图2是示出根据本申请实施例的基于声波的图像对抗样本生成方法的更详细示意图。
如图2所示,在一些实施例中,声波对抗样本仿真模型通过以下步骤来构建:
(1)虚假摄像头运动建模。现代机器视觉系统使用摄像头拍摄目标物体或场景,然后使用深度学习算法对图片进行处理,从而实现对周围环境的感知或目标检测等。为了提高感知和检测的准确率,现代机器视觉系统使用图像防抖进行补偿,从而降低由于摄像头抖动带来的图像模糊。由于图像防抖通常使用惯性传感器,即加速度计和陀螺仪,来估计摄像头的运动,且声波攻击可以对惯性传感器造成影响,使其读数改变,因此对于一个稳定的摄像头,声波攻击可以导致图像防抖进行错误补偿,从而反向导致图片模糊。假设由于声波攻击造成的虚假惯性传感器读数为M f={a x,a y,a zrpy},其中a x,a y,a z分别为加速度计x,y,z轴的虚假加速度读数,ω rpy分别为陀螺仪roll,pitch,yaw轴的虚假角速度读数。假设图像防抖模块可以进行完全的补偿,此时,由于声波攻击导致的虚假摄像头运动为M c={-a x,-a y,-a z,-ω r,-ω p,-ω y}。本发明主要考虑从以上六个维度中的加速度计x,y,z轴和陀螺仪roll轴这四个维度来构建声波对抗样本仿真模型。
(2)像素运动建模。虚假摄像头运动将导致目标物体或场景的成像位置不同,从而导致输出图片中像素发生运动。
针对加速度计x轴维度,对于图片中任意像素点,虚假摄像头运动-a x将在成像过程中造成相反方向
Figure PCTCN2021124791-appb-000018
的像素位移,其中f为摄像头焦距,u为目标物体或目标场景物距,T为摄像头曝光时间;
针对加速度计y轴维度,对于图片中任意像素点,虚假摄像头运动-a y将在成像过程中造成相反方向
Figure PCTCN2021124791-appb-000019
的像素位移;
针对加速度计z轴维度,对于图片中的任意像素点,虚假摄像头运动-a z将在成像过程中造成该像素点朝远离画面中心方向
Figure PCTCN2021124791-appb-000020
的位移,其中r o为该像素点与画面中心的距离;
针对陀螺仪roll轴维度,对于图片中的任意像素点,虚假摄像头运动-ω r将在成像过程中造成该像素点朝相反方向ω rTr c的位移,其中r c为该像素点与角速度旋转中心的距离。
(3)图像模糊建模。成像过程中的像素运动将导致图像模糊。特别地,加速度计x轴和y轴维度的虚假摄像头运动,造成线性像素运动,导致线性图像模糊;加速度计z轴维度的虚假摄像头运动,造成径向像素运动, 导致径向图像模糊;陀螺仪roll轴维度的虚假摄像头运动,造成旋转像素运动,导致旋转图像模糊。为上述模糊构建统一的图像模糊模型如下:
Figure PCTCN2021124791-appb-000021
[i′(k),j′(k)] T=[u(k),v(k)] T+[i,j] T
Figure PCTCN2021124791-appb-000022
Figure PCTCN2021124791-appb-000023
β=ω rT
Figure PCTCN2021124791-appb-000024
Figure PCTCN2021124791-appb-000025
r c=‖(i,j),(c 0,c 1)‖ 2
r o=‖(i,j),(o 0,o 1)‖ 2
其中,X为原始图像,B为模糊后图像,(i.j)为像素点坐标,B(i.j)为模糊后图像中坐标为(i.j)的像素点,n为离散点数,(c 0,c 1)为图像中心坐标,(o 0,o 1)为旋转中心坐标。使用上述模型,可以获得在不同对抗参数下的仿真对抗图像样本。
如图2所示,在一些实施例中中,对抗样本优化方法主要包括以下步骤:
(1)优化函数设计。针对不同类型的对抗图像样本,本发明设计不同的优化函数。本发明考虑三种具有不同效果的对抗图像样本。第一种为具有隐藏效果的对抗图像样本,该类样本可以使得深度学习算法无法识别目标物体;第二种为具有创造效果的对抗图像样本,该类样本可以在当前图像中创造一个可被深度学习算法检测的目标物体;第三种为具有改变效果的对抗图像样本,该类样本可以使得深度学习算法将目标物体检测为其他物体。
针对具有隐藏效果的对抗图像样本,其优化函数为:
Figure PCTCN2021124791-appb-000026
s.t.|a x+a y+a z|<ε 1
r|<ε 2
其中,p为目标物体的编号,
Figure PCTCN2021124791-appb-000027
为深度学习算法输出的目标物体区域检测置信度,
Figure PCTCN2021124791-appb-000028
为深度学习算法输出的目标物体类别检测置信度,w 1和w 2为平衡对抗图像样本有效性和样本生成成本的权重值,ε 1和ε 2为声波对加速度计和陀螺仪读数影响的上限;
针对具有创造效果的对抗图像样本,其优化函数为:
Figure PCTCN2021124791-appb-000029
s.t.|a x+a y+a z|<ε 1
r|<ε 2
其中,o为待创造的目标物体的编号,C o=T为待创造的目标物体的类别,
Figure PCTCN2021124791-appb-000030
为深度学习算法输出的待创造目标物体区域检测置信度,
Figure PCTCN2021124791-appb-000031
为深度学习算法输出的待创造目标物体类别检测置信度,p为图像中已有物体的编号,m为图像中已有物体的数量,Uop为待创造物体o的区域与已有物体p的区域的交并比,w 3和w 4为平衡对抗图像样本有效性和样本生成成本的权重值,ε 1和ε 2为声波对加速度计和陀螺仪读数影响的上限;
针对具有改变效果的对抗图像样本,其优化函数为:
Figure PCTCN2021124791-appb-000032
s.t.|a x+a y+a z|<ε 1
r|<ε 2
其中,p为目标物体的编号,
Figure PCTCN2021124791-appb-000033
为深度学习算法输出的修改后的目标物体区域检测置信度,
Figure PCTCN2021124791-appb-000034
为深度学习算法输出的修改后的目标物体类别检测置信度,C p′=T为修改后的目标物体的类别,Upp′为修改前目标物体p的区域与修改后目标物体p′的区域的交并比,w 5和w 6为平衡对抗图像样本有效性和样本生成成本的权重值,ε 1和ε 2为声波对加速度计和陀螺仪读数影响的上限。
(2)优化函数求解。针对上述优化函数,使用贝叶斯优化法求解最佳对抗参数。
如图2所示,在一些实施例中,惯性传感器读数注入方法包括以下步骤:
(1)通过频率扫描,找到目标机器视觉系统中惯性传感器的共振频率;
(2)通过调整声波共振频率,在模电转换器中引入直流分量,稳定传感器输出;
(3)通过调幅调制,对传感器输出波形进行整形,使得传感器读数逼近对抗参数。
以上图示了根据本申请实施例的基于声波的图像对抗样本生成方法。下面描述根据本申请实施例的基于声波的图像对抗样本生成系统。图3示出了根据本申请实施例的基于声波的图像对抗样本生成系统300。
根据本申请实施例的基于声波的图像对抗样本生成系统300包括:声波对抗仿真模块301、对抗样本优化模块302、传感器读数注入模块303;
所述声波对抗仿真模块301用于虚假摄像头运动建模、像素运动建模和图像模糊建模;
所述对抗样本优化模块302用于优化函数设计和优化函数求解;
所述传感器读数注入模块303用于共振频率搜索、虚假读数稳定、虚假读数整形;
该系统利用上述模块301-303来来实现上述基于声波的图像对抗样本生成方法。
图4是示出根据本申请实施例的计算装置400的框图。上述基于声波的图像对抗样本生成方可以通过该计算装置400来实现。
如图所示,计算装置400可包括一个或多个处理器或处理器核心401和存储器402。对于本申请(包括权利要求书)而言,术语“处理器”和“处理器核心”可被视为同义词,除非上下文明确地另有要求。处理器401可包括任何类型的处理器,例如中央处理单元、微处理器,等等。处理器401可被实现为具有多核心的集成电路,例如,多核心微处理器。在实施例中,存储器402可以是系统存储器。计算装置400可包括大容量存储装置403(例如,磁盘、硬盘驱动器、易失性存储器(例如,动态随机 访问存储器(dynamic random-access memory,DRAM)、致密盘只读存储器(compact disc read-only memory,CD-ROM)、数字多功能盘(digital versatile disk,DVD),等等)。一般而言,存储器402和/或大容量存储装置403可以是任何类型的临时性和/或持久性存储,包括但不限于易失性和非易失性存储器、光学、磁性和/或固态大容量存储,等等。易失性存储器可包括但不限于静态和/或动态随机访问存储器。非易失性存储器可包括但不限于电可擦除可编程只读存储器、相变存储器、电阻式存储器,等等。
计算装置400还可包括输入/输出(I/O)装置404(例如,显示器(例如,触摸屏显示器)、键盘、光标控制、遥控器、游戏控制器、图像捕捉装置,等等)和通信接口405(例如网络接口卡、调制解调器、红外接收器、无线电接收器(例如,蓝牙),等等)。通信接口405可包括通信芯片,其可被配置为与其他装置进行有线或无线通信。
上述计算装置400的元素可经由系统总线406相互耦合,系统总线406表示一个或多个总线。在多个总线的情况下,它们可由一个或多个总线桥(未示出)来桥接。这些元素的每一者可执行本领域中已知的其传统功能。具体地,可以采用存储器402和大容量存储装置403来存储用于装置400的操作的编程指令的工作拷贝和永久拷贝。各种元素可由(一个或多个)处理器401所支持的汇编指令或者可被编译成这种指令的高级别语言来实现。
编程指令的永久拷贝可在工厂被放入大容量存储装置403中,或者在现场通过例如分发介质(未示出)(比如致密盘(CD)),或者通过通信接口405来分发。也就是说,可以采用具有代理程序的实现的一个或多个分发介质来分发代理并且对各种计算装置进行编程。
在各种实现方式中,计算装置400可包括膝上型电脑、上网本、笔记本、超极本、智能电话、平板设备、个人数字助理(personal digital assistant,PDA)、超移动PC、移动电话或者数码相机的一个或多个组件。在另外的实现方式中,计算装置400可以是任何其他处理数据的电子装置。各种实施例可包括上述实施例的任何适当组合,包括上文以联合形式(和)描述的实施例的替换(或)实施例(例如,“和”可以是“和/或”)。此 外,一些实施例可包括其上存储有指令的一个或多个制品(例如,非暂态计算机可读介质),这些指令当被执行时导致上述实施例的任何一者的动作。另外,一些实施例可包括具有用于执行上述实施例的各种操作的任何适当装置的装置或系统。
以上对图示的实现方式的描述,包括在摘要中描述的那些,并不打算是详尽无遗的或者将本公开的实施例限制到所公开的精确形式。虽然本文为了说明而描述了特定实现方式和示例,但正如相关领域的技术人员将会认识到的,在本公开的范围内,各种等同的修改是可能的。
可以根据以上详细描述对本公开的实施例做出这些修改。所附权利要求中使用的术语不应当被解释为将本公开的各种实施例限制到在说明书和权利要求中公开的特定实现方式。更确切地说,范围完全由所附权利要求来确定,这些权利要求要根据已确立的权利要求解读准则来进行解释。

Claims (8)

  1. 一种基于声波的图像对抗样本生成方法,其特征在于,该方法包括以下步骤:
    获取包含目标物体或目标场景的图片;
    针对所获取的图片,使用声波对抗样本仿真模型来生成对目标机器视觉系统中深度学习算法具有对抗作用的仿真图像样本;
    使用对抗样本优化方法优化所生成的仿真图像样本,获得最优的对抗样本以及对应的对抗参数;
    使用惯性传感器读数注入方法,将所述对抗参数,通过声波的方式注入到所述目标机器视觉系统的惯性传感器中,使所述对抗参数成为传感器读数,该传感器读数将引起所述目标机器视觉系统中的图像防抖模块工作,以在生成的真实图片中产生特定的模糊图案,从而生成物理世界中的图像对抗样本。
  2. 根据权利要求1所述的基于声波的图像对抗样本生成方法,其特征在于,所述声波对抗样本仿真模型的构建包括以下步骤:
    (1)虚假摄像头运动建模;假设由于声波攻击造成的虚假惯性传感器读数为M f={a x,a y,a zrpy},其中a x,a y,a z分别为加速度计x,y,z轴的虚假加速度读数,ω rpy分别为陀螺仪roll,pitch,,yaw轴的虚假角速度读数;假设所述图像防抖模块可以进行完全的补偿,此时,由于声波攻击导致的虚假摄像头运动为M c={-a x,-a y,-a z,-ω r,-ω p,-ω y};从以上六个维度中的加速度计x,y,z轴和陀螺仪roll轴这四个维度来构建所述声波对抗样本仿真模型;
    (2)像素运动建模;虚假摄像头运动将导致所述目标物体或所述目标场景的成像位置不同,从而导致输出图片中像素发生运动;
    针对加速度计x轴维度,对于图片中任意像素点,虚假摄像头运动-a x将在成像过程中造成相反方向
    Figure PCTCN2021124791-appb-100001
    的像素位移,其中f为摄像头焦距,u为目标物体或目标场景物距,T为摄像头曝光时间;
    针对加速度计y轴维度,对于图片中任意像素点,虚假摄像头运动-a y将在成像过程中造成相反方向
    Figure PCTCN2021124791-appb-100002
    的像素位移;
    针对加速度计z轴维度,对于图片中的任意像素点,虚假摄像头运动-a z将在成像过程中造成该像素点朝远离画面中心方向
    Figure PCTCN2021124791-appb-100003
    的位移,其中r o为该像素点与画面中心的距离;
    针对陀螺仪roll轴维度,对于图片中的任意像素点,虚假摄像头运动-ω r将在成像过程中造成该像素点朝相反方向ω rTr c的位移,其中r c为该像素点与角速度旋转中心的距离;
    (3)图像模糊建模;成像过程中的像素运动将导致图像模糊,加速度计x轴和y轴维度的虚假摄像头运动,造成线性像素运动,导致线性图像模糊;加速度计z轴维度的虚假摄像头运动,造成径向像素运动,导致径向图像模糊;陀螺仪roll轴维度的虚假摄像头运动,造成旋转像素运动,导致旋转图像模糊;为上述模糊构建统一的图像模糊模型如下:
    Figure PCTCN2021124791-appb-100004
    [i′(k),j′(k)] T=[u(k),v(k)] T+[i,j] T
    Figure PCTCN2021124791-appb-100005
    Figure PCTCN2021124791-appb-100006
    β=ω rT
    Figure PCTCN2021124791-appb-100007
    Figure PCTCN2021124791-appb-100008
    r c=‖(i,j),(c 0,c 1)‖ 2
    r o=‖(i,j),(o 0,o 1)‖ 2
    其中,X为原始图像,B为模糊后图像,(i,j)为像素点坐标,B(i,j)为模糊后图像中坐标为(i,j)的像素点,n为离散点数,(c 0,c 1)为图像中心坐标,(o 0,o 1)为旋转中心坐标;
    使用上述虚假摄像头运动建模、像素运动建模和图像模糊建模的模型,可以获得不同对抗参数下的所述仿真图像样本。
  3. 根据权利要求2所述的基于声波的图像对抗样本生成方法,其特征在于,所述使用对抗样本优化方法优化所生成的仿真图像样本包括以下步骤:
    (1)优化函数设计,针对不同类型的对抗图像样本,设计不同的优化函数;考虑三种具有不同效果的对抗图像样本:第一种为具有隐藏效果的对抗图像样本,该类样本可以使得深度学习算法无法识别目标物体;第二种为具有创造效果的对抗图像样本,该类样本可以在当前图像中创造一个可被深度学习算法检测的目标物体;第三种为具有改变效果的对抗图像样本,该类样本可以使得深度学习算法将目标物体检测为其他物体;
    针对具有隐藏效果的对抗图像样本,其优化函数为:
    Figure PCTCN2021124791-appb-100009
    s.t.|a x+a y+a z|<ε 1
    r|<ε 2
    其中,p为目标物体的编号,
    Figure PCTCN2021124791-appb-100010
    为深度学习算法输出的目标物体区域检测置信度,
    Figure PCTCN2021124791-appb-100011
    为深度学习算法输出的目标物体类别检测置信度,w 1和w 2为平衡对抗图像样本有效性和样本生成成本的权重值,ε 1和ε 2为声波对加速度计和陀螺仪读数影响的上限;
    针对具有创造效果的对抗图像样本,其优化函数为:
    Figure PCTCN2021124791-appb-100012
    s.t.|a x+a y+a z|<ε 1
    r|<ε 2
    其中,o为待创造的目标物体的编号,C o=T为待创造的目标物体的类别,
    Figure PCTCN2021124791-appb-100013
    为深度学习算法输出的待创造目标物体区域检测置信度,
    Figure PCTCN2021124791-appb-100014
    为深度学习算法输出的待创造目标物体类别检测置信度,p为图像中已有物体的编号,m为图像中已有物体的数量,Uop为待创造物体o的区域与已有物体p的区域的交并比,w 3和w 4为平衡对抗图像样本有效性和样本生成成本的权重值,ε 1和ε 2为声波对加速度计和陀螺仪读数影响的上限;
    针对具有改变效果的对抗图像样本,其优化函数为:
    Figure PCTCN2021124791-appb-100015
    s.t.|a x+a y+a z|<ε 1
    r|<ε 2
    其中,p为目标物体的编号,
    Figure PCTCN2021124791-appb-100016
    为深度学习算法输出的修改后的目标物体区域检测置信度,
    Figure PCTCN2021124791-appb-100017
    为深度学习算法输出的修改后的目标物体类别检测置信度,C p′=T为修改后的目标物体的类别,Upp′为修改前目标物体p的区域与修改后目标物体p′的区域的交并比,w 5和w 6为平衡对抗图像样本有效性和样本生成成本的权重值,ε 1和ε 2为声波对加速度计和陀螺仪读数影响的上限;
    (2)优化函数求解,针对上述优化函数,使用贝叶斯优化法求解最佳对抗参数。
  4. 根据权利要求1所述的基于声波的图像对抗样本生成方法,其特征在于,所述的惯性传感器注入方法包括以下步骤:
    通过频率扫描,找到所述目标机器视觉系统中惯性传感器的共振频率;
    通过调整声波共振频率,在模电转换器中引入直流分量,稳定传感器输出;以及
    通过调幅调制,对传感器输出波形进行整形,使得传感器读数逼近对抗参数。
  5. 一种基于声波的图像对抗样本生成系统,其特征在于,该系统包括声波对抗仿真模块、对抗样本优化模块、传感器读数注入模块;
    所述声波对抗仿真模块用于虚假摄像头运动建模、像素运动建模和图像模糊建模;
    所述对抗样本优化模块用于优化函数设计和优化函数求解;
    所述传感器读数注入模块用于共振频率搜索、虚假读数稳定、虚假读数整形;
    该系统利用上述模块来实现如权利要求1-4中任一项所述的基于声波的图像对抗样本生成方法。
  6. 一种基于声波的图像对抗样本生成系统,其特征在于,该系统包括:
    用于获取包含目标物体或目标场景的图片的装置;
    用于针对所获取的图片,使用声波对抗样本仿真模型来生成对目标机 器视觉系统中深度学习算法具有对抗作用的仿真图像样本的装置;
    用于使用对抗样本优化方法优化所生成的仿真图像样本以获得最优的对抗样本以及对应的对抗参数的装置;
    用于使用惯性传感器读数注入方法,将所述对抗参数通过声波的方式注入到所述目标机器视觉系统的惯性传感器中,使所述对抗参数成为传感器读数的装置,所述传感器读数将引起所述目标机器视觉系统中的图像防抖模块工作,以在生成的真实图片中产生特定的模糊图案,从而生成物理世界中的图像对抗样本。
  7. 一种基于声波的图像对抗样本生成系统,其特征在于,该系统包括:
    存储器,用于存储指令;以及
    处理器,所述处理器执行所述存储器中所存储的指令,以执行如权利要求1-4中任一项所述的基于声波的图像对抗样本生成方法。
  8. 一种非暂态计算机可读存储介质,存储有指令,所述指令在被处理器运行时使得所述处理器执行如权利要求1-4中任一项所述的基于声波的图像对抗样本生成方法。
PCT/CN2021/124791 2020-10-20 2021-10-19 基于声波的图像对抗样本生成方法及系统 WO2022083599A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/702,662 US20220215652A1 (en) 2020-10-20 2022-03-23 Method and system for generating image adversarial examples based on an acoustic wave

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011124293.6A CN112333402B (zh) 2020-10-20 2020-10-20 一种基于声波的图像对抗样本生成方法及系统
CN202011124293.6 2020-10-20

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/702,662 Continuation US20220215652A1 (en) 2020-10-20 2022-03-23 Method and system for generating image adversarial examples based on an acoustic wave

Publications (1)

Publication Number Publication Date
WO2022083599A1 true WO2022083599A1 (zh) 2022-04-28

Family

ID=74310708

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/124791 WO2022083599A1 (zh) 2020-10-20 2021-10-19 基于声波的图像对抗样本生成方法及系统

Country Status (3)

Country Link
US (1) US20220215652A1 (zh)
CN (1) CN112333402B (zh)
WO (1) WO2022083599A1 (zh)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112333402B (zh) * 2020-10-20 2021-10-22 浙江大学 一种基于声波的图像对抗样本生成方法及系统
CN114363509B (zh) * 2021-12-07 2022-09-20 浙江大学 一种基于声波触发的可触发对抗补丁生成方法
DE102022001241A1 (de) 2022-04-12 2023-10-12 Mercedes-Benz Group AG Verfahren zum Betrieb eines Fahrzeugs
DE102022001731B4 (de) 2022-05-17 2024-01-18 Mercedes-Benz Group AG Sensorvorrichtung mit einem optischen Sensor, einem Beschleunigungssensor und einem Resonator und Kraftfahrzeug mit einer solchen Sensorvorrichtung
CN115081643B (zh) * 2022-07-20 2022-11-08 北京瑞莱智慧科技有限公司 对抗样本生成方法、相关装置及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109902709A (zh) * 2019-01-07 2019-06-18 浙江大学 一种基于对抗学习的工业控制系统恶意样本生成方法
US20200272726A1 (en) * 2019-02-25 2020-08-27 Advanced Micro Devices, Inc. Method and apparatus for generating artificial intelligence resistant verification images
US20200285952A1 (en) * 2019-03-08 2020-09-10 International Business Machines Corporation Quantifying Vulnerabilities of Deep Learning Computing Systems to Adversarial Perturbations
US20200300883A1 (en) * 2016-05-20 2020-09-24 The Regents Of The University Of Michigan Protecting motion sensors from acoustic injection attack
CN112333402A (zh) * 2020-10-20 2021-02-05 浙江大学 一种基于声波的图像对抗样本生成方法及系统

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109471112A (zh) * 2018-10-10 2019-03-15 浙江大学 一种可抗声波干扰的超声波测距传感器及其测距方法
EP3894907B1 (en) * 2018-12-11 2024-01-24 ExxonMobil Technology and Engineering Company Machine learning-augmented geophysical inversion
CN111488895B (zh) * 2019-01-28 2024-01-30 北京达佳互联信息技术有限公司 对抗数据生成方法、装置、设备及存储介质
CN109784424B (zh) * 2019-03-26 2021-02-09 腾讯科技(深圳)有限公司 一种图像分类模型训练的方法、图像处理的方法及装置
US10929719B2 (en) * 2019-03-28 2021-02-23 GM Global Technology Operations LLC Adversarial attack on black box object detection algorithm
CN110210573B (zh) * 2019-06-11 2023-01-06 腾讯科技(深圳)有限公司 对抗图像的生成方法、装置、终端及存储介质
CN110767216B (zh) * 2019-09-10 2021-12-07 浙江工业大学 一种基于pso算法的语音识别攻击防御方法
CN111143873A (zh) * 2019-12-13 2020-05-12 支付宝(杭州)信息技术有限公司 隐私数据处理方法、装置和终端设备
CN111680292B (zh) * 2020-06-10 2023-05-16 北京计算机技术及应用研究所 一种基于高隐蔽性通用扰动的对抗样本生成方法

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200300883A1 (en) * 2016-05-20 2020-09-24 The Regents Of The University Of Michigan Protecting motion sensors from acoustic injection attack
CN109902709A (zh) * 2019-01-07 2019-06-18 浙江大学 一种基于对抗学习的工业控制系统恶意样本生成方法
US20200272726A1 (en) * 2019-02-25 2020-08-27 Advanced Micro Devices, Inc. Method and apparatus for generating artificial intelligence resistant verification images
US20200285952A1 (en) * 2019-03-08 2020-09-10 International Business Machines Corporation Quantifying Vulnerabilities of Deep Learning Computing Systems to Adversarial Perturbations
CN112333402A (zh) * 2020-10-20 2021-02-05 浙江大学 一种基于声波的图像对抗样本生成方法及系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
ZHENG SHUAI; GUPTA CHETAN: "Trace Norm Generative Adversarial Networks for Sensor Generation and Feature Extraction", ICASSP 2020 - 2020 IEEE INTERNATIONAL CONFERENCE ON ACOUSTICS, SPEECH AND SIGNAL PROCESSING (ICASSP), IEEE, 4 May 2020 (2020-05-04), pages 3187 - 3191, XP033793558, DOI: 10.1109/ICASSP40776.2020.9053863 *

Also Published As

Publication number Publication date
CN112333402A (zh) 2021-02-05
US20220215652A1 (en) 2022-07-07
CN112333402B (zh) 2021-10-22

Similar Documents

Publication Publication Date Title
WO2022083599A1 (zh) 基于声波的图像对抗样本生成方法及系统
US11481923B2 (en) Relocalization method and apparatus in camera pose tracking process, device, and storage medium
US10755425B2 (en) Automatic tuning of image signal processors using reference images in image processing environments
WO2019205842A1 (zh) 相机姿态追踪过程的重定位方法、装置及存储介质
US8660362B2 (en) Combined depth filtering and super resolution
CN106164982B (zh) 基于影像的电子设备定位
CN107077548B (zh) 虚拟可穿戴物
US10250800B2 (en) Computing device having an interactive method for sharing events
CN110249626B (zh) 增强现实图像的实现方法、装置、终端设备和存储介质
US8964040B2 (en) High dynamic range image registration using motion sensor data
CN105635588B (zh) 一种稳像方法及装置
JP2021111380A (ja) 入力映像に含まれた客体の3次元ポーズの推定のためのデータを生成する方法、コンピュータシステムおよび推論モデルを構築する方法
WO2019212749A1 (en) Stabilizing video to reduce camera and face movement
CN103875004A (zh) 动态选择真实世界中的表面以用于在上面投影信息
CN103985103A (zh) 一种生成全景图片的方法和装置
US20170374256A1 (en) Method and apparatus for rolling shutter compensation
CN109255749A (zh) 自主和非自主平台中的地图构建优化
US11475636B2 (en) Augmented reality and virtual reality engine for virtual desktop infrastucture
CN109618103A (zh) 无人机图传视频的防抖方法及无人机
CN110427849B (zh) 人脸姿态确定方法、装置、存储介质和电子设备
CN113556464B (zh) 拍摄方法、装置及电子设备
CN114338994A (zh) 光学防抖方法、装置、电子设备和计算机可读存储介质
Ning et al. Moirépose: ultra high precision camera-to-screen pose estimation based on moiré pattern
Rajakaruna et al. Image deblurring for navigation systems of vision impaired people using sensor fusion data
Cheng et al. Adversarial Computer Vision via Acoustic Manipulation of Camera Sensors

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21882015

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21882015

Country of ref document: EP

Kind code of ref document: A1