CN113282266B

CN113282266B - Sound coding method of auxiliary perception system based on sensory substitution

Info

Publication number: CN113282266B
Application number: CN202110450106.1A
Authority: CN
Inventors: 王璐; 黄勇志; 肖洁; 金惠童; 陈凯鑫; 伍楷舜
Original assignee: Shenzhen University
Current assignee: Shenzhen University
Priority date: 2021-04-25
Filing date: 2021-04-25
Publication date: 2022-08-23
Anticipated expiration: 2041-04-25
Also published as: CN113282266A

Abstract

The invention discloses a sound coding method of an auxiliary perception system based on sensory substitution. The method comprises the following steps: identifying attribute information for characterizing the object characteristics, wherein the attribute information comprises a category attribute, a size attribute, a speed attribute and a moving direction attribute, and selecting a baseband signal frequency domain waveform according to the category attribute; identifying scene information of a target object, setting the priority of the target object according to the scene information, and selecting the center frequency of a corresponding baseband waveform; selecting the width of a baseband frequency domain signal according to the size of a target object; setting the sound circulation times according to the speed attribute of the target object; determining the position change in the three-dimensional space according to the moving direction attribute of the target object, reconstructing a three-dimensional sound field, and putting the reconstructed sound field into a ready sound library; and selecting the sound put into the audio track from the ready sound library to play according to the priority of the target object. The invention can enable the visually impaired to accurately perceive the surrounding environment through the recognition of the sound.

Description

Sound coding method of auxiliary perception system based on sensory replacement

Technical Field

The invention relates to the technical field of information processing, in particular to a sound coding method of an auxiliary perception system based on sensory substitution.

Background

With the progress of social science and technology and the progress of medical level, blindness due to retina among visually impaired persons has been able to restore vision by surgical transplantation of retina and the like, however, among these visually impaired persons, there is no relevant treatment for visually impaired persons having problems with the optic pathway nerve. Although genetic screening technology has reduced the birth rate of disabled infants in recent years, the number of the births of the disabled infants still increases year by year with the increase of population base and the increase of population life, wherein the number of the blind people also increases year by year. Therefore, the technology has great significance in helping the blind to perceive the outside world.

At present, in order to enable the visually impaired to normally move in the environment, common auxiliary equipment such as a crutch, a guide dog and a blind document is provided for the visually impaired. These devices, while helping visually impaired people to avoid obstacles, guide blind people to walk, and help blind people to learn knowledge, make their lives convenient, do not help them to perceive and recreate the world.

With the advancement of brain science, people have been able to use non-invasive devices to scan the brain to obtain brain electrical signals of the active region of the brain. Through the electroencephalogram signals, people find that when the blind is trained to use sound as visual information, vision-related areas in the brain of the blind, such as occipital lobe, can generate related electric signals. Such experiments prove that if the blind is trained to use sound as an input of visual information, the blind can generate a visual effect like seeing things with both eyes. This approach is a visual alternative. Thus, many research efforts define methods for converting sound into hearing. The methods are based on the conversion of corresponding picture pixel points, and the conversion generates great information redundancy for the blind, thereby influencing the effect of sensory substitution.

The blind auxiliary perception system using sense organ substitution is a process for enabling the blind to be more suitable for realizing the sense organ substitution by using the sense organ of hearing by using machine vision and sound coding. The sensory substitute invention is an effective way for the blind to realize non-operative treatment, and products related to the sensory substitute are very few. For example, patent application CN201911210888.0 (a method and system for assisted perception based on sensory substitution) proposes a method using auditory sensory substitution, however, the coding of sound requires a reasonable coding standard to be formulated according to the results of experiments.

Disclosure of Invention

The present invention is directed to overcoming the above-mentioned drawbacks of the prior art and providing a method for encoding sounds based on a sensory replacement aided perception system.

The technical scheme of the invention is to provide a sound coding method of an auxiliary perception system based on sensory substitution. The method comprises the following steps:

step S1, identifying attribute information for characterizing object characteristics, wherein the attribute information comprises a category attribute, a size attribute, a speed attribute and a moving direction attribute, and selecting a corresponding baseband signal frequency domain waveform according to the category attribute;

step S2, identifying the scene information of the target object, setting the priority of the target object according to the scene information and selecting the center frequency of the corresponding baseband waveform;

step S3, selecting the corresponding width of the baseband frequency domain signal according to the comparison between the size attribute of the target object and the object information in the current scene in the database;

step S4, setting corresponding sound circulation times according to the speed attribute of the target object;

step S5, determining the position change in the three-dimensional space according to the moving direction attribute of the target object, reconstructing a three-dimensional sound field, and putting the reconstructed sound field into a ready sound library;

in step S6, the sound to be put on the track is selected from the ready sound library for playback according to the priority of the target object.

Compared with the prior art, the method has the advantages that the visual substitution scheme is realized by using the latest brain science research result-sense substitution, and the problem of overlarge information redundancy of converting a pure visual image into a sound image is solved by using a mode identification method, so that a person with visual disorder can sense the surrounding environment by identifying the sound.

Other features of the present invention and advantages thereof will become apparent from the following detailed description of exemplary embodiments thereof, which proceeds with reference to the accompanying drawings.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description, serve to explain the principles of the invention.

Fig. 1 is a flow chart of a sound encoding method of an auxiliary perception system based on sensory substitution.

Detailed Description

Various exemplary embodiments of the present invention will now be described in detail with reference to the accompanying drawings. It should be noted that: the relative arrangement of the components and steps, the numerical expressions and numerical values set forth in these embodiments do not limit the scope of the present invention unless specifically stated otherwise.

The following description of at least one exemplary embodiment is merely illustrative in nature and is in no way intended to limit the invention, its application, or uses.

Techniques, methods, and apparatus known to one of ordinary skill in the relevant art may not be discussed in detail but are intended to be part of the specification where appropriate.

In all examples shown and discussed herein, any particular value should be construed as merely illustrative, and not limiting. Thus, other examples of the exemplary embodiments may have different values.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, further discussion thereof is not required in subsequent figures.

In the invention, based on the attribute characteristics of the target object and the surrounding environment information, the characteristics of the sound signal to be played are set according to the sensitivity of human ears to sound, wherein the characteristics comprise baseband signal frequency domain waveform, baseband signal frequency, harmonic interval frequency, harmonic quantity, the number of cycles per second of sound and the like.

Specifically, referring to fig. 1, the provided sound coding method based on the auxiliary perception system of sensory substitution includes the following steps.

In step S10, a speech coding method corresponding to the type attribute of the target object is established.

In one embodiment, step S10 includes the following sub-steps:

step S101, calibrating the attributes of the target object, for example, selecting four types of attributes including the type of the object, the actual size of the object, the moving speed of the object in the space, the moving direction of the object in the space, and the like;

step S102, different baseband signal frequency domain waveforms are used according to the object type attribute, for example, any non-repeated continuous curve with the bandwidth of the baseband frequency domain waveform within the range of 30-100 Hz.

Step S20, the audio coding method corresponding to the target priority attribute is established.

In one embodiment, step S20 includes the following sub-steps:

step S201, obtaining object information in the environment by using an image recognition algorithm YOLO v 4;

step S202, counting according to the information of surrounding objects, and identifying the scene information of the current time by referring to a database;

step S203, acquiring object priority according to the current scene information;

step S203, selecting the center frequency of each object baseband waveform according to the priority of the object, selecting the object with high priority as a higher baseband signal frequency, and selecting the object with low priority as a lower baseband signal frequency, for example, selecting the signal frequency range at 500-.

Herein, the scene information includes work, home, commute, and the like. The object priority (or called as a danger level) is used for representing the influence degree or the emergency degree of the object on a vision obstacle person (or called as a user), and the corresponding baseband signal frequency is set according to the object priority, so that the user is reminded to take measures in time to avoid the obstacle.

Step S30, a voice coding method corresponding to the target dimension attribute is established.

In one embodiment, step S30 includes the following sub-steps:

step S301, according to a scene, acquiring the maximum width and the maximum height of an object which can appear in the scene from a database, and carrying out maximum and minimum normalization;

step S302, according to the current object information, the width and the height of the object are obtained;

step S303, comparing the width distribution of the object, when the object is positioned in the center of normal distribution, selecting the baseband frequency domain signal width as the normal width (determined according to the customized signal waveform), when the object width is greater than the center of positive-too distribution by 0.1, increasing the signal width by 20-30Hz, and so on;

step S304, comparing the height distribution of the object, when the object is in the center of the normal distribution, selecting the harmonic wave with normal amplitude (the harmonic wave interval is determined by the customized signal waveform), when the height of the object is greater than the positive-Taiwan distribution center by 0.1, enhancing the amplitude of the first harmonic wave, when the height is greater than the positive-Taiwan distribution center by 0.2, enhancing the amplitude of the second harmonic wave and the first harmonic wave, and so on.

Step S40 is to establish a voice encoding method corresponding to the target speed attribute.

In one embodiment, step S40 includes the following sub-steps:

step S401, making different speed levels, such as low speed (0-0.5m/S), medium speed safe (0.5m/S-1m/S), medium speed unsafe (1-3m/S) and high speed (more than 3 m/S);

step S402, calculating the displacement distance and speed of the target in the space every 25ms according to the three-dimensional space where the target object shot by the camera is located;

step S403, judging whether the target is in a non-safety range, and judging the distance to be speed x 5S, otherwise, the target can not be used as speed feedback information;

step S404, detecting whether the target speed is positive, namely the target is approaching the user, if so, continuing to the step S404, otherwise, directly judging as low speed;

in step S404, different sound circulation times are selected according to the measured movement speed of the object, for example, 16 times/S, 8 times/S, 4 times/S, 2 times/S are set for the high speed to the low speed in step S401 respectively

Step S50 is to establish a voice coding method corresponding to the target position attribute.

In one embodiment, step S50 includes the following sub-steps:

step S501, calculating the position in the space once every 200ms of the three-dimensional space where the target object shot by the camera is located;

step S502, substituting the calculated space and the sound in the steps S10-S40 into the HTRFs function according to the HTRFs function to reconstruct a three-dimensional sound field;

and step S503, putting the reconstructed sound field into a ready sound bank.

In step S60, sound synthesis is performed based on the sound of the target object.

In one embodiment, step S60 includes the following sub-steps:

step S601, taking out an object with the priority 10 before the priority appearing in the camera from a ready sound bank according to the priority of the target object;

step S602, placing target object sounds on an audio track, wherein the playing time interval of each target object sound is 20 ms;

in step S603, the composite track is played.

For example, based on the background sound, the loudness of the synthesized sound is selected to be no more than 90% of the human ear tolerance threshold and to be greater than the ambient noise.

Correspondingly, the invention also provides a sound coding system of the blind auxiliary perception system based on sensory substitution, which is used for realizing one or more aspects of the method. For example, the system includes:

the preprocessing module is used for calibrating the camera, calibrating the nine-axis inertial measurement unit, completing the personalized setting of HRTFs response functions through the characteristics of the ear and upper body characteristics of a user, establishing a voice coding library and setting the initial state of the head;

the recognition module is used for executing the input of the external visual environment, detecting the object in the input visual information and cutting the object in the visual information;

the perception-three-dimensional vision reconstruction module is used for inputting vision information, establishing different hash values for different objects in the vision information, searching the same object represented in the vision information in different vision input units, searching the angular points of the same object, pairing and screening the angular points of the same object, and calculating three-dimensional parameters of the vision object;

and the perception-three-dimensional auditory reconstruction module is used for inputting visual information and motion information, selecting sound codes, performing three-dimensional reconstruction on the sound by using a response function of HRTFs, and performing attenuation treatment on the object with the visual frame removed.

And the scene identification module is used for counting the number of each object, identifying the scene, measuring the size of the object and measuring the priority of the object.

And the speed calculation module is used for executing calculation processing of the motion speed of the target object in the space and synthesis of the cycle times of the object sound.

A target object sound module for storing the baseband frequency domain waveform of the target object, the center frequency of the baseband frequency domain waveform, the width and the harmonic position of the baseband frequency domain waveform

And the output module is used for outputting the sound after the auditory sense reconstruction in real time and adjusting the volume.

In one embodiment, the preprocessing module comprises the following units:

a camera calibration unit for calibrating the camera;

the nine-axis inertia measurement calibration unit is used for adjusting the output value of the nine-axis inertia measurement unit during calibration;

the human ear picture input unit is used for establishing personalized HRTFs response functions;

the upper body parameter input unit is used for establishing personalized HRTFs response functions;

the voice pre-coding unit is used for establishing personalized voice coding;

a head initial state setting unit for setting an initial state of the head.

In one embodiment, the identification module comprises the following elements:

the visual input unit is used for receiving external visual information and inputting the visual information into the system;

a visual object detection unit for detecting objects and categories in visual information input into the system;

the visual object cutting unit is used for cutting the object detected in the visual information;

in one embodiment, the perceptual-three-dimensional visual reconstruction module comprises the following units:

the visual information input unit is used for receiving picture information and object type information which are cut out from the visual information according to the object;

the visual information storage unit is used for storing picture information and object type information which are cut from the visual information according to the object;

an object abstraction unit for abstracting the cut object into a hash value;

the object distinguishing unit is used for matching the same object in different visual input units by using the abstracted hash value;

the same object identification unit is used for identifying the same object as the previous frame;

a lost object detection unit, configured to detect a difference between an object in a current frame and an object in a previous detection frame, mark an undetected object as a lost object, mark a storage time, and when the storage time exceeds a set time, clear the object, and re-mark the object, which is detected to be lost in the storage time, as a lost object;

the object displacement vector detection unit is used for detecting the displacement vector generated by the same object of the previous frame and the current frame;

the corner detection unit is used for detecting the corners of the same object in the image of the matched different visual input units;

the corner abstraction unit is used for abstracting the detected corner information into a hash value;

the corner distinguishing unit is used for searching similar corners in the same object by using the abstracted hash value;

the angular point screening unit is used for screening angular points by utilizing the information of the nine-axis inertia measuring unit;

the distance detection unit is used for calculating the distance between the object and the visual input unit by utilizing the positions of the same angular points in different visual input units in the image;

the vertical offset detection unit is used for calculating the vertical offset to obtain the vertical offset height of the object relative to the plane of the visual input unit by utilizing the distance between the object and the visual input unit;

and the horizontal offset detection unit is used for calculating the horizontal offset to obtain the distance of the object which is offset left and right relative to the center of the visual input unit by utilizing the distance of the object relative to the visual input unit.

In one embodiment, the perceptual-three dimensional auditory reconstruction module comprises the following elements:

the three-dimensional information input unit is used for receiving the three-dimensional space position of each object relative to the visual input unit and the class information of the object;

a three-dimensional information storage unit for storing and updating the three-dimensional spatial position of each object relative to the visual input unit and the category information of the object;

the three-dimensional sound field response function unit is used for storing HRTFs response functions in the personalized three-dimensional space;

a voice encoding unit for storing default and personalized voice codes;

a three-dimensional sound field selection unit for selecting a three-dimensional sound field response function according to three-dimensional spatial position information of each object, and selecting sound encoding of the object according to class information of the object;

the reconstruction unit of the three-dimensional sound field is used for convolving the response function of each object with the sound code to obtain the three-dimensional sound reconstructed by each object;

the motion detection unit is used for detecting whether the change of the current nine-axis inertia measurement unit relative to the last detection time exceeds a threshold value or not, and recording the change of the vertical deviation angle, the horizontal course angle, the horizontal rolling angle and the motion direction of the visual input unit at the moment when the change exceeds the threshold value;

the updating detection unit is used for updating the three-dimensional space position of the obstacle object with the label as the disappeared object according to the change detected by the motion detection unit, and updating the three-dimensional space position of the non-obstacle object with the label as the disappeared object according to the displacement vector of the object displacement vector detection unit;

and the attenuation processing unit is used for carrying out attenuation processing on the sound codes of the non-obstacle objects in the disappeared objects.

In one embodiment, the scene recognition module comprises the following elements:

the object quantity calculating unit is used for calculating the frequency of the appearance of the objects in the peripheral sight line shot by the camera;

the scene recognition unit is used for recognizing a scene according to the information and the illumination information which are shot by the camera and are obtained by counting the occurrence times of the objects in the peripheral sight lines;

the object size measuring unit is used for measuring the size and the distribution of the target object;

and the object priority measuring unit is used for measuring the priority of the target object in the scene.

In one embodiment, the velocity calculation module includes the following units:

the speed calculation unit is used for calculating the speed of the target object from the position information of the target object obtained in the three-dimensional reconstruction module;

the safety distance judging unit is used for judging whether the object is in a safety range or not;

the movement direction calculation unit is used for judging whether the movement speed of the object is positive or not, namely whether the object faces the user or not;

and the speed feedback unit is used for feeding back the calculated speed.

In one embodiment, the target object sound module includes the following:

the object type feedback unit is used for storing the baseband frequency domain waveform of each type of target object;

the object priority feedback unit is used for formulating the central frequency of the baseband frequency domain waveforms of the objects with different priorities in different scenes;

the object width feedback unit is used for feeding back the object width and formulating the baseband frequency domain waveform width of the object in different scenes;

and the object height feedback unit is used for feeding back the object height and formulating the harmonic expansion and reduction of the baseband frequency domain waveform of the object in different scenes.

In one embodiment, the output module comprises the following elements:

the volume adjusting unit is used for adjusting the size of the output sound;

and an output unit for outputting the sound after the auditory sense reconstruction.

In conclusion, the invention innovatively designs a visual alternative method for helping the blind to perceive the world based on the result of the sensory alternative research; the adopted sound coding mode considers the surrounding environment information, the danger degree of the target object and the sensitivity degree of human ears to sound, and the verification is more suitable for the auxiliary perception of the blind. The invention can feed back the type, size, moving speed, moving direction and the like of objects in daily life for the blind well, so that the blind can go out safely, can be engaged in low-load work content, and is beneficial to the activity and employment of the blind.

The present invention may be a system, method and/or computer program product. The computer program product may include a computer-readable storage medium having computer-readable program instructions embodied therewith for causing a processor to implement various aspects of the present invention.

The computer readable storage medium may be a tangible device that can hold and store the instructions for use by the instruction execution device. The computer readable storage medium may be, for example, but not limited to, an electronic memory device, a magnetic memory device, an optical memory device, an electromagnetic memory device, a semiconductor memory device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a Static Random Access Memory (SRAM), a portable compact disc read-only memory (CD-ROM), a Digital Versatile Disc (DVD), a memory stick, a floppy disk, a mechanical coding device, such as punch cards or in-groove projection structures having instructions stored thereon, and any suitable combination of the foregoing. Computer-readable storage media as used herein is not to be construed as transitory signals per se, such as radio waves or other freely propagating electromagnetic waves, electromagnetic waves propagating through a waveguide or other transmission medium (e.g., optical pulses through a fiber optic cable), or electrical signals transmitted through electrical wires.

The computer-readable program instructions described herein may be downloaded from a computer-readable storage medium to a respective computing/processing device, or to an external computer or external storage device over a network, such as the internet, a local area network, a wide area network, and/or a wireless network. The network may include copper transmission cables, fiber optic transmission, wireless transmission, routers, firewalls, switches, gateway computers and/or edge servers. The network adapter card or network interface in each computing/processing device receives the computer-readable program instructions from the network and forwards the computer-readable program instructions for storage in a computer-readable storage medium in the respective computing/processing device.

The computer program instructions for carrying out operations of the present invention may be assembler instructions, Instruction Set Architecture (ISA) instructions, machine-related instructions, microcode, firmware instructions, state setting data, or source code or object code written in any combination of one or more programming languages, including an object oriented programming language such as Smalltalk, C + +, Python, or the like, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The computer-readable program instructions may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any type of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet service provider). In some embodiments, aspects of the present invention are implemented by personalizing an electronic circuit, such as a programmable logic circuit, a Field Programmable Gate Array (FPGA), or a Programmable Logic Array (PLA), with state information of computer-readable program instructions, which can execute the computer-readable program instructions.

Aspects of the present invention are described herein with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer-readable program instructions.

These computer-readable program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks. These computer-readable program instructions may also be stored in a computer-readable storage medium that can direct a computer, programmable data processing apparatus, and/or other devices to function in a particular manner, such that the computer-readable medium storing the instructions comprises an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer readable program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer, other programmable apparatus or other devices implement the functions/acts specified in the flowchart and/or block diagram block or blocks.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of instructions, which comprises one or more executable instructions for implementing the specified logical function(s). In some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems that perform the specified functions or acts, or combinations of special purpose hardware and computer instructions. It is well known to those skilled in the art that implementation by hardware, by software, and by a combination of software and hardware are equivalent.

Having described embodiments of the present invention, the foregoing description is intended to be exemplary, not exhaustive, and not limited to the embodiments disclosed. Many modifications and variations will be apparent to those of ordinary skill in the art without departing from the scope and spirit of the described embodiments. The terminology used herein is chosen in order to best explain the principles of the embodiments, the practical application, or improvements made to the technology in the marketplace, or to enable others of ordinary skill in the art to understand the embodiments disclosed herein. The scope of the invention is defined by the appended claims.

Claims

1. A sound coding method based on an auxiliary perception system of sensory substitution comprises the following steps:

step S3, selecting the width of the corresponding baseband frequency domain signal according to the comparison between the size attribute of the target object and the object information in the current scene in the database;

step S6, according to the priority of the target object, selecting the sound on the track from the ready sound bank to play;

wherein, step S3 includes the following substeps:

acquiring the maximum width and the maximum height of an object in a corresponding scene from a database according to the scene information, and performing normalization processing;

acquiring the width and the height of a current target object;

comparing the width distribution of the target object, setting the width of the baseband frequency domain signal as a normal width under the condition that the target object is positioned at the center of normal distribution, and increasing the width of the baseband frequency domain signal along with the increase of the degree that the width of the target object deviates from the center of the normal distribution;

comparing the height distribution of the target object, when the target object is in the center of the normal distribution, selecting the harmonic wave with normal amplitude, and determining the amplitude of the enhanced harmonic wave and determining the enhanced times along with the enhancement of the degree that the height of the target object deviates from the center of the normal distribution.

2. The method according to claim 1, wherein in step S1, the baseband frequency domain waveform bandwidth is a non-repeating continuous curve in the range of 30Hz-100Hz for different kinds of target objects.

3. The method according to claim 1, wherein step S2 comprises the sub-steps of:

obtaining object information in the environment by using an image recognition algorithm;

counting according to the information of surrounding objects, and identifying the current scene information by referring to a database;

setting the priority of the target object according to the current scene information, wherein the priority represents the danger degree of the target object to the user;

the center frequency of each object baseband waveform is selected according to the priority of the target object.

4. A method according to claim 3, wherein a higher baseband signal frequency is selected for a high priority target object and a lower baseband signal frequency is selected for a low priority target object, the selected frequency range being 500Hz-8000Hz and the two object baseband frequency separation ranges being 50Hz-500 Hz.

5. The method according to claim 1, wherein step S4 comprises the sub-steps of:

formulating different speed levels reflecting security for the user;

calculating the displacement distance and speed of the target object in the space at intervals for the three-dimensional space where the shot target object is located;

judging whether the target object is in a non-safety range;

detecting whether the user is approaching according to the speed of the target object;

and under the condition that the target object is judged to be approaching the user, selecting different sound circulation times according to the measured movement speed of the object.

6. The method according to claim 1, wherein step S5 comprises the sub-steps of:

calculating the position in the space at intervals according to the three-dimensional space where the shot target object is located;

substituting the calculated space position change information and the sounds in the steps S1 to S4 into the HTRFs function according to the HTRFs function to reconstruct a three-dimensional sound field;

and putting the reconstructed sound field into a ready sound bank.

7. The method according to claim 1, wherein step S6 comprises the sub-steps of:

according to the priority of the target object, taking N objects before the priority appearing in the camera out of the ready sound library, wherein N is a set integer;

and placing the target object sound on the audio track, setting the playing time interval of each target object sound, and playing the synthesized audio track.

8. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 7.

9. A computer device comprising a memory and a processor, on which memory a computer program is stored which is executable on the processor, characterized in that the steps of the method of any of claims 1 to 7 are implemented when the processor executes the program.