CN110248197A - Sound enhancement method and device - Google Patents
Sound enhancement method and device Download PDFInfo
- Publication number
- CN110248197A CN110248197A CN201810185895.9A CN201810185895A CN110248197A CN 110248197 A CN110248197 A CN 110248197A CN 201810185895 A CN201810185895 A CN 201810185895A CN 110248197 A CN110248197 A CN 110248197A
- Authority
- CN
- China
- Prior art keywords
- space
- region
- image
- voice signal
- terminal
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 59
- 238000012545 processing Methods 0.000 claims abstract description 69
- 239000004568 cement Substances 0.000 claims abstract description 48
- 230000002708 enhancing effect Effects 0.000 claims abstract description 11
- 230000001629 suppression Effects 0.000 claims description 13
- 238000003860 storage Methods 0.000 claims description 11
- 230000005764 inhibitory process Effects 0.000 claims description 2
- 230000004807 localization Effects 0.000 abstract description 12
- 238000004891 communication Methods 0.000 description 11
- 230000002093 peripheral effect Effects 0.000 description 10
- 230000001133 acceleration Effects 0.000 description 9
- 230000006870 function Effects 0.000 description 9
- 230000005540 biological transmission Effects 0.000 description 7
- 238000010586 diagram Methods 0.000 description 7
- 238000005516 engineering process Methods 0.000 description 5
- 230000006835 compression Effects 0.000 description 4
- 238000007906 compression Methods 0.000 description 4
- 239000004973 liquid crystal related substance Substances 0.000 description 4
- 239000010409 thin film Substances 0.000 description 4
- 230000008901 benefit Effects 0.000 description 3
- 230000004927 fusion Effects 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 230000005236 sound signal Effects 0.000 description 3
- 230000003044 adaptive effect Effects 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 2
- 239000000919 ceramic Substances 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 230000005484 gravity Effects 0.000 description 2
- 238000009434 installation Methods 0.000 description 2
- 210000003127 knee Anatomy 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000004044 response Effects 0.000 description 2
- 230000001052 transient effect Effects 0.000 description 2
- 230000009471 action Effects 0.000 description 1
- 230000008859 change Effects 0.000 description 1
- 238000013500 data storage Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 230000005611 electricity Effects 0.000 description 1
- 238000001914 filtration Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000006698 induction Effects 0.000 description 1
- 230000001788 irregular Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000010295 mobile communication Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000003825 pressing Methods 0.000 description 1
- 230000009467 reduction Effects 0.000 description 1
- 238000009877 rendering Methods 0.000 description 1
- 230000006641 stabilisation Effects 0.000 description 1
- 238000011105 stabilization Methods 0.000 description 1
- 239000013589 supplement Substances 0.000 description 1
- 238000012876 topography Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 239000011800 void material Substances 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06T—IMAGE DATA PROCESSING OR GENERATION, IN GENERAL
- G06T3/00—Geometric image transformations in the plane of the image
- G06T3/40—Scaling of whole images or parts thereof, e.g. expanding or contracting
- G06T3/4038—Image mosaicing, e.g. composing plane images from plane sub-images
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/21—Server components or server architectures
- H04N21/218—Source of audio or video content, e.g. local disk arrays
- H04N21/21805—Source of audio or video content, e.g. local disk arrays enabling multiple viewpoints, e.g. using a plurality of cameras
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/233—Processing of audio elementary streams
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04N—PICTORIAL COMMUNICATION, e.g. TELEVISION
- H04N21/00—Selective content distribution, e.g. interactive television or video on demand [VOD]
- H04N21/20—Servers specifically adapted for the distribution of content, e.g. VOD servers; Operations thereof
- H04N21/23—Processing of content or additional data; Elementary server operations; Server middleware
- H04N21/234—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs
- H04N21/23412—Processing of video elementary streams, e.g. splicing of video streams or manipulating encoded video stream scene graphs for generating or manipulating the scene composition of objects, e.g. MPEG-4 objects
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
- G10L21/0216—Noise filtering characterised by the method used for estimating noise
- G10L2021/02161—Number of inputs available containing the signal or the noise to be suppressed
- G10L2021/02166—Microphone arrays; Beamforming
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Health & Medical Sciences (AREA)
- Quality & Reliability (AREA)
- Human Computer Interaction (AREA)
- Computational Linguistics (AREA)
- Acoustics & Sound (AREA)
- General Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- Studio Devices (AREA)
- Circuit For Audible Band Transducer (AREA)
Abstract
This application discloses a kind of sound enhancement method and devices, belong to multi-media processing field.The described method includes: obtaining target image, target image includes N number of image-region;When receiving the predetermined registration operation in N number of image-region on object region, object space corresponding with object region direction is determined, and speech enhan-cement processing is carried out to the corresponding voice signal in object space direction.The application carries out auditory localization by object region specified by predetermined registration operation according to user by speech-enhancement system, so that the object space direction oriented is the direction of enhancing voice required for user, to improve the accuracy of auditory localization and the quality of enhanced voice signal, the performance of speech-enhancement system is provided significantly.
Description
Technical field
The invention relates to multi-media processing field, in particular to a kind of sound enhancement method and device.
Background technique
Sound enhancement method refers to the method for extracting useful voice signal from ambient noise to reduce noise jamming.
Currently, the sound enhancement method includes: that video camera utilizes by taking the sound enhancement method based on microphone array as an example
The respective collected sound signal of multiple microphones is carried out according to the space phase information that collected multiple voice signals respectively contain
Space filtering forms the spatial beams with pointing direction, to enhance the voice signal on assigned direction.
But in the above-mentioned methods, when in use environment there are when multiple voice signals or larger ambient noise, due to
Video camera, which generally selects the strongest voice signal of sound, to be enhanced, it is therefore more likely that will lead to voice signal and the use of enhancing
The inconsistent situation of the voice signal enhanced is actually needed in person.
Summary of the invention
In order to solve the problems, such as that auditory localization inaccuracy during Speech enhancement in the related technology, the embodiment of the present application provide
A kind of sound enhancement method and device.The technical solution is as follows:
In a first aspect, providing a kind of sound enhancement method, which comprises
The target image of video collection area is obtained, the target image includes N number of image-region, and the N is greater than 1
Positive integer;
When receiving the predetermined registration operation in N number of image-region on object region, the determining and target figure
As the corresponding object space direction in region, the object space direction is used to indicate the space side for needing to carry out speech enhan-cement processing
To;
Speech enhan-cement processing is carried out to the corresponding voice signal in the object space direction.
Optionally, described when receiving the predetermined registration operation in the target image on object region, determining and institute
State the corresponding object space direction of object region, comprising:
When receiving the predetermined registration operation in the target image, the corresponding image-region of the predetermined registration operation is determined as
The object region;
According to the first default corresponding relationship, the corresponding direction in space of the object region is determined as object space side
To the first default corresponding relationship includes the corresponding relationship between described image region and the direction in space.
It is optionally, described that speech enhan-cement processing is carried out to the corresponding voice signal in the object space direction, comprising:
Speech enhan-cement processing is carried out to the voice signal from the object space direction, and to from non-targeted sky
Between direction voice signal carry out voice suppression processing;
Wherein, the non-targeted direction in space is its in addition to the object space direction in the video collection area
Its direction in space.
It is optionally, described that speech enhan-cement processing is carried out to the corresponding voice signal in the object space direction, comprising:
According to the second default corresponding relationship, target local space corresponding with the object space direction is determined, described the
Two default corresponding relationships include the corresponding relationship between the direction in space and local space;
Speech enhan-cement processing is carried out to the voice signal from the target local space, and to from non-targeted office
The voice signal in portion space carries out voice suppression processing;
Wherein, the non-targeted local space is its in addition to the target local space in the video collection area
Its space.
Optionally, the video collection area includes M different shooting areas, and the M is the positive integer greater than 1, institute
State the target image for obtaining video collection area, comprising:
Obtain the corresponding shooting image of the M shooting area;
The M shooting image is spliced, the target image is obtained.
Second aspect, provides a kind of speech sound enhancement device, and described device includes:
Module is obtained, for obtaining the target image of video collection area, the target image includes N number of image-region,
The N is the positive integer greater than 1;
Determining module, for determining when receiving the predetermined registration operation in N number of image-region on object region
Object space corresponding with object region direction, the object space direction, which is used to indicate, to need to carry out speech enhan-cement
The direction in space of processing;
Enhance module, for carrying out speech enhan-cement processing to the corresponding voice signal in the object space direction.
Optionally, the determining module is also used to when receiving the predetermined registration operation in N number of image-region, by institute
It states the corresponding image-region of predetermined registration operation and is determined as the object region;According to the first default corresponding relationship, by the mesh
The corresponding direction in space in logo image region is determined as object space direction, and the first default corresponding relationship includes described image area
Corresponding relationship between domain and direction in space.
Optionally, the enhancing module is also used to carry out voice to the voice signal from the object space direction
Enhancing processing, and voice suppression processing is carried out to the voice signal from non-targeted direction in space;
Wherein, the non-targeted direction in space is other direction in spaces in addition to the object space direction.
Optionally, the enhancing module is also used to according to the second default corresponding relationship, the determining and object space direction
Corresponding target local space, the second default corresponding relationship include the corresponding pass between the direction in space and local space
System;Speech enhan-cement processing is carried out to the voice signal from the target local space, and to empty from non-targeted part
Between voice signal carry out voice suppression processing;
Wherein, the non-targeted local space is its in addition to the target local space in the video collection area
Its space.
Optionally, the video collection area includes M different shooting areas, and the M is the positive integer greater than 1, institute
Acquisition module is stated, is also used to obtain the corresponding shooting image of the M shooting area;The M shooting image is carried out
Splicing, obtains the target image.
The third aspect provides a kind of video camera, and the video camera includes processor and memory, deposits in the memory
Contain at least one instruction, at least a Duan Chengxu, code set or instruction set, at least one instruction, an at least Duan Cheng
Sequence, the code set or instruction set are loaded by the processor and are executed to realize as any one in first aspect and first aspect
Sound enhancement method provided by the possible implementation of kind.
Fourth aspect provides a kind of terminal, and the terminal includes processor and memory, is stored in the memory
At least one instruction, at least a Duan Chengxu, code set or instruction set, at least one instruction, an at least Duan Chengxu, institute
Code set or instruction set is stated to be loaded by the processor and executed to realize that any one in first aspect and first aspect such as can
Sound enhancement method provided by the implementation of energy.
5th aspect, provide a kind of speech-enhancement system, the system comprises video camera and terminal, the video camera with
The terminal is connected, and the video camera includes at least three cameras and at least six microphones,
The terminal, for obtaining the target image of video collection area, the target image includes N number of image-region,
The N is the positive integer greater than 1;
The terminal is also used to when receiving the predetermined registration operation in N number of image-region on object region, really
Fixed object space corresponding with object region direction, the object space direction, which is used to indicate, to need to carry out voice increasing
The direction in space of strength reason;
The terminal or the video camera, for carrying out voice increasing to the corresponding voice signal in the object space direction
Strength reason.
6th aspect, provides a kind of computer readable storage medium, at least one finger is stored in the storage medium
Enable, at least a Duan Chengxu, code set or instruction set, at least one instruction, an at least Duan Chengxu, the code set or
Instruction set is loaded by the processor and is executed to realize such as any one possible realization side in first aspect and first aspect
Sound enhancement method provided by formula.
Technical solution provided by the embodiments of the present application has the benefit that
Target image is obtained by speech-enhancement system, target image includes N number of image-region;When receiving N number of image
When predetermined registration operation in region on object region, object space corresponding with object region direction is determined, and to mesh
It marks the corresponding voice signal of direction in space and carries out speech enhan-cement processing;Speech-enhancement system is passed through according to user default
The specified object region of operation carries out auditory localization, so that the object space direction oriented is needed for user
Enhance the direction of voice, to improve the accuracy of auditory localization and the quality of enhanced voice signal, provides significantly
The performance of speech-enhancement system.
Detailed description of the invention
Fig. 1 is the structural schematic diagram for the speech-enhancement system that one exemplary embodiment of the application provides;
Fig. 2 is the structural schematic diagram of video camera in the speech-enhancement system of one exemplary embodiment of the application offer;
Fig. 3 is the flow chart for the sound enhancement method that one exemplary embodiment of the application provides;
Fig. 4 is the flow chart for the sound enhancement method that another exemplary embodiment of the application provides;
Fig. 5 is the division for the video collection area that the sound enhancement method that one exemplary embodiment of the application provides is related to
The schematic diagram of mode;
Fig. 6 is the division mode for the target image that the sound enhancement method that one exemplary embodiment of the application provides is related to
Schematic diagram;
Fig. 7 is the schematic illustration for the sound enhancement method that one exemplary embodiment of the application provides;
Fig. 8 is the structure chart for the speech sound enhancement device that one exemplary embodiment of the application provides;
Fig. 9 is the structural block diagram for the terminal that one exemplary embodiment of the application provides.
Specific embodiment
To keep the purposes, technical schemes and advantages of the application clearer, below in conjunction with attached drawing to the application embodiment party
Formula is described in further detail.
Referring to FIG. 1, the structural representation of the speech-enhancement system provided it illustrates one exemplary embodiment of the application
Figure.The speech-enhancement system includes: video camera 120 and terminal 140.
Video camera 120 includes at least one camera and microphone array, and video camera 120 is used to image by least one
Head obtains the target image of video collection area, and acquires various voice signals by microphone array.
Optionally, M camera is provided in video camera 120, it is corresponding, video collection area is divided into M difference
Shooting area, there are one-to-one relationship, video camera 120 is used to pass through M camera for each camera and shooting area
The shooting image of corresponding shooting area is acquired, and M shooting image is spliced, obtains target image.That is, target
Image includes the corresponding shooting image of M shooting area, and M is the positive integer greater than 1.Target image may be considered panorama
Image or wide angle picture.
Wherein, intersection is not present in M different shooting areas or there are intersections there are at least two.
Optionally, video collection area is border circular areas, there are at least one region is fan-shaped region in M shooting area
Or M shooting area is fan-shaped region.
Optionally, microphone array is classified as annular microphone array, which includes at least six microphones.
In the following, being only illustrated so that video camera 120 includes three cameras and eight microphones as an example.Schematically, it asks
With reference to the structural schematic diagram of video camera 120 shown in Fig. 2.The video camera 120 includes three cameras 122 and eight microphones
124。
Three cameras 122 are respectively the first camera 122, second camera 122 and third camera 122.
For these three cameras 122 relative to origin scattering device, origin refers to the position of the central point of video camera 120, takes the photograph
Camera 120 establishes coordinate system according to the origin.
Optionally, a method of establishing coordinate system are as follows: using the central point of video camera as origin, central point is directed toward default
Direction is positive direction of the y-axis, and the direction for being perpendicularly oriented to right side with y-axis is positive direction of the x-axis.The present embodiment is combined with this coordinate system schemes
2 are illustrated.The present embodiment is not construed as limiting the method for establishing coordinate system.
Respectively a corresponding shooting area, each camera 122 are used to acquire corresponding shooting area three cameras 122
Shooting image.Optionally, the first camera 122 is used to acquire the shooting image of the first shooting area, and the first shooting area is
It is in 0 ° to 120 ° corresponding region with positive direction of the y-axis;Second camera 122 is used to acquire the shooting image of the second shooting area,
It is in 120 ° to 240 ° corresponding regions that second shooting area, which is with positive direction of the y-axis,;Third camera 122 is for acquiring third shooting
The shooting image in region, it is in 240 ° to 360 ° corresponding regions that third shooting area, which is with positive direction of the y-axis,.
The present embodiment is not limited the value range of the first default angle and the second default angle, below only with first
Default angle and the second default angle are to be illustrated for 120 degree.
Optionally, eight microphones 124 are relative to origin scattering device, in eight microphones 124 between every section of microphone
Distance be equal perhaps the distance between every section of microphone be differ or exist at least two sections of microphones it
Between distance be equal.
Optionally, in eight microphones 124 any four microphone 124 in the same plane, or exist at least four
A microphone 124 in the same plane, or there are at least four microphones 124 not in the same plane.
Wherein, the type of camera and microphone can be fixed in video camera 120, be also possible to rotation.
It should be noted that the present embodiment is not limited the position of camera and microphone and type.
Video camera 120 is used to obtain the target image of video collection area, and the target image that will acquire is sent to end
End 140.Corresponding, terminal 140 receives the target image.
Optionally, video camera 120 is established by wireless network or cable network and terminal 140 and is communicated to connect.
Terminal 120 is that have the terminal of display screen, for example, mobile phone, tablet computer, E-book reader, MP3 player
(Moving Picture Experts Group Audio Layer III, dynamic image expert's compression standard audio level 3),
MP4 (Moving Picture Experts Group Audio Layer IV, dynamic image expert's compression standard audio level
4) player, pocket computer on knee and desktop computer etc..
Optionally, display screen is liquid crystal display or OLED display screen;Schematically, liquid crystal display includes STN
(Super Twisted Nematic, super twisted nematic) screen, UFB (Ultra Fine Bright) screen, TFD (Thin
Film Diode, thin film diode) screen, at least one in TFT (Thin Film Transistor thin film transistor (TFT)) screen
Kind.
In general, terminal 140 receives the target image that video camera 120 is sent, and the target image is shown on a display screen.When
When terminal 140 receives the predetermined registration operation in target image on object region, mesh corresponding with object region is determined
Direction in space is marked, speech enhan-cement processing is carried out to the corresponding voice signal in object space direction.
Optionally, above-mentioned wireless network or cable network use standard communication techniques and/or agreement.Network be usually because
Special net, it may also be any network, including but not limited to local area network (Local Area Network, LAN), Metropolitan Area Network (MAN)
(Metropolitan Area Network, MAN), wide area network (Wide Area Network, WAN), mobile, wired or nothing
Any combination of gauze network, dedicated network or Virtual Private Network).In some embodiments, using including hypertext markup
Language (Hyper Text Mark-up Language, HTML), extensible markup language (Extensible Markup
Language, XML) etc. technology and/or format represent the data by network exchange.It additionally can be used such as safe
Socket layer (Secure Socket Layer, SSL), Transport Layer Security (Transport Layer Security, TLS), void
Quasi- dedicated network (Virtual Private Network, VPN), Internet Protocol Security (Internet Protocol
Security, IPsec) etc. conventional encryption techniques encrypt all or some links.In further embodiments, can also make
Replace or supplement above-mentioned data communication technology with customization and/or the exclusive data communication technology.
Referring to FIG. 3, the flow chart of the sound enhancement method provided it illustrates one exemplary embodiment of the application.This
Embodiment is applied to speech-enhancement system shown in figure 1 with the sound enhancement method to illustrate.The sound enhancement method
Include:
Step 301, the target image of video collection area is obtained, target image includes N number of image-region, and N is greater than 1
Positive integer.
Optionally, it includes: video camera acquisition video acquisition that Speech enhancement system, which obtains the target image of video collection area,
The target image in region, and collected target image is sent to terminal, corresponding, terminal receives target image.
Wherein, video camera acquires the target image of video collection area in real time, or interval video is adopted at predetermined time intervals
Collect the target image in region, target image is used to indicate the ambient enviroment of video camera.
Optionally, video collection area is pre-set for acquiring the region of target image, the video collection area
Whole region or default regional area including scene locating for video camera.
When video collection area is whole region, target image is the panoramic picture of whole region;When video acquisition area
When domain is default regional area, target image is the topography of default regional area.Below only using target image as panorama sketch
It is illustrated as.
Optionally, target image is divided into N number of figure according to default division rule after getting target image by terminal
As region, which is used to indicate the number of the image-region of division and the area size of each image-region.
It wherein, is identical there are the area size of at least two image-regions in N number of image-region, alternatively, in the presence of extremely
The area size of few two image-regions is different or the area size of any two image-region is identical.Under
Face, only by the area size of N number of image-region be it is identical for be illustrated.
It includes but is not limited to that following two can that target image is divided into N number of image-region according to default division rule by terminal
The division mode of energy:
Target image is divided into N number of image district according to the quantity of shooting area by the first possible division mode, terminal
Domain, the corresponding shooting area of each image-region.
Target image is divided into M regional area by second of possible division mode, terminal, and each regional area is corresponding
One shooting area, for each regional area after dividing, terminal again divides the regional area further progress, by the office
Portion's region division is K image-region, i.e., target image is divided into altogether M*K image-region, and K is the positive integer greater than 1.
The present embodiment is not limited the value of image-region.Below only with second of possible division mode, the value of M is 3, K
Value be 8, i.e., target image include 24 image-regions for be illustrated.Specific division mode can refer to implementation below
Associated description in example, wouldn't introduce herein.
Step 302, when receiving the predetermined registration operation in N number of image-region on object region, determining and target figure
As the corresponding object space direction in region, object space direction is used to indicate the direction in space for needing to carry out speech enhan-cement processing.
Optionally, it when Speech enhancement system receives the predetermined registration operation in target image on object region, determines
Object space corresponding with object region direction, comprising: after terminal gets target image, displaying target on a display screen
The corresponding image-region of predetermined registration operation is determined as target figure when terminal receives the predetermined registration operation in target image by image
As region, object space corresponding with object region direction is determined.
Predetermined registration operation is the user's operation for determining object region in N number of image-region.Schematically, it presets
Operation includes the combination of any one or more in clicking operation, slide, pressing operation, long press operation.
In other possible implementations, predetermined registration operation can also be realized with speech form.For example, user in the terminal with
Speech form inputs the corresponding presupposed information of object region, after target terminal gets voice signal, believes the voice
It number carries out parsing and obtains voice content, when the pass that there is presupposed information corresponding with object region in voice content and match
When key words, i.e., terminal determines the corresponding object region of the presupposed information.
For terminal according to the object region determined and the first default corresponding relationship, determination is corresponding with object region
Object space direction, the first default corresponding relationship includes the corresponding relationship between image-region and direction in space.Terminal determines
The process in object space direction can refer to the associated description in following example, not introduce first herein.
Wherein, direction in space can be indicated with space angle or space angle section.Space angle is and above-mentioned foundation
Coordinate system in positive direction of the y-axis angle.
Optionally, being formed by angle with positive direction of the y-axis according to clockwise direction is negative angle, according to counter clockwise direction and y
It is positive angle that axis positive direction, which is formed by angle,.This implementation is not limited the representation of direction in space.
Schematically, target image includes 24 image-regions, when terminal receives the predetermined registration operation in target image,
The corresponding object region A of predetermined registration operation is determined in 24 image-regions, according to the first default corresponding relationship determination and mesh
The corresponding object space direction logo image region A is 30 °.
Step 303, speech enhan-cement processing is carried out to the corresponding voice signal in object space direction.
Speech enhancement system carries out speech enhan-cement processing to the corresponding voice signal in object space direction, including but not limited to
The possible implementation of following two:
The first possible implementation: terminal obtains the voice signal set of video collection area, to voice signal collection
The corresponding voice signal in object space direction carries out speech enhan-cement processing in conjunction.
Optionally, video camera acquires the voice signal set of video collection area by microphone array, by voice signal
Set is sent to terminal, corresponding, and terminal receives the voice signal set of video camera transmission.Terminal is to from object space
The voice signal in direction carries out speech enhan-cement processing.
Second of possible implementation: video camera receives the object space direction that terminal is sent, to collected target
The corresponding voice signal of direction in space carries out speech enhan-cement processing.
Optionally, when terminal determines object space direction, the object space direction is sent to video camera, it is corresponding,
Video camera receives the object space direction, carries out speech enhan-cement processing to the voice signal from object space direction.Camera shooting
The process that machine carries out speech enhan-cement processing to the voice signal from object space direction can refer to the phase in following example
Details is closed, is not introduced first herein.
Schematically, when terminal determine object space direction be 30 ° when, video camera to from positive direction of the y-axis be in 30 °
The voice signal in direction carries out speech enhan-cement processing.
It should be noted that step 302 and step 303 can be implemented separately as a kind of sound localization method, the sound source
Localization method is usually completed by terminal, for the determining object space direction for needing to carry out speech enhan-cement processing;Step 303 can
Become a kind of sound enhancement method to be implemented separately, which is usually completed by terminal or video camera, is used for root
The object space direction determined according to step 202 and step 203 carries out voice to the voice signal from object space direction
Enhancing processing.In the following, only completing sound localization method with terminal, and video camera is completed to be illustrated for sound enhancement method.
In conclusion the embodiment of the present application obtains target image by speech-enhancement system, target image includes N number of image
Region;When receiving the predetermined registration operation in N number of image-region on object region, determination is corresponding with object region
Object space direction, and speech enhan-cement processing is carried out to the corresponding voice signal in object space direction;So that speech-enhancement system
Object region specified by predetermined registration operation capable of being passed through according to user carries out auditory localization, so that the target oriented
Direction in space is the direction of enhancing voice required for user, thus improve auditory localization accuracy and enhanced sound
The quality of signal provides the performance of speech-enhancement system significantly.
Referring to FIG. 4, the flow chart of the sound enhancement method provided it illustrates another exemplary embodiment of the application.
The present embodiment is applied to illustrate in speech-enhancement system shown in FIG. 1 with the sound enhancement method.This method includes
Step 401, video camera obtains the corresponding shooting image of M shooting area.
It is stored in terminal included by the angular interval and video collection area of pre-set video collection area
The corresponding angular interval of M shooting area, for each shooting area, video camera acquires the shooting by a camera
The shooting image in region.
Schematically, as shown in figure 5, the angular interval of video collection area is [- 180,180], video collection area packet
Include three shooting areas, respectively shooting area 11 (angular interval be (0,120]), shooting area 12 (angular interval be (-
180, -120] and (120,180]) and shooting area 13 (angular interval be (- 120,0]), video camera includes the first camera, the
There are one-to-one relationships for two cameras and third camera, three cameras and three shooting areas.In synchronization, take the photograph
Camera acquires the shooting image 1 of shooting area 11 by the first camera, and second camera acquires the shooting figure of shooting area 12
As 2, second camera acquires the shooting image 3 of shooting area 13.
Step 402, video camera splices M shooting image, obtains target image.
Optionally, video camera is according to the sequence of positions of shooting area, by the corresponding shooting image of M shooting area into
Row splicing, obtains target image.
Schematically, be based on video collection area shown in fig. 5, terminal by the shooting image 1 of shooting area 11, shooting area
The shooting image 2 in domain 12 and the shooting image 3 of shooting area 13 are successively spliced, and target image is obtained.
Step 403, target image is sent to terminal by video camera.
The target image that splicing obtains is sent to terminal by video camera, corresponding, and terminal receives target image.
Step 404, terminal receives simultaneously displaying target image.
The mode of terminal displaying target image includes but is not limited to the possible implementation of following two:
The first possible implementation: when terminal receives the target image of video camera transmission, directly on a display screen
Show the target image.
Second of possible implementation: when terminal receives the target image of video camera transmission, according to shooting area
Quantity is divided target image to obtain the corresponding shooting image of M shooting area, shows M simultaneously on a display screen
A shooting image, or successively show M shooting image.It is checked in order to facilitate user, it is only possible with the first below
It is illustrated for implementation.
Step 405, when terminal receives the predetermined registration operation in target image, the corresponding image-region of predetermined registration operation is true
It is set to object region.
Optionally, target image is divided into N number of image-region according to above-mentioned second possible division mode by terminal, when
When terminal receives the predetermined registration operation in target image, the corresponding image-region of predetermined registration operation is determined in N number of image-region
For object region.
Schematically, as shown in fig. 6, target image is divided into three regional areas by terminal, respectively and first partial
Region, the second regional area and third regional area, the corresponding shooting area of each regional area, for each of after dividing
The regional area is further divided into 8 image-regions by regional area, terminal, i.e. first partial region includes image-region A1
To image-region H1, the second regional area includes image-region A2 to image-region H2, and third regional area includes image-region
A3 to image-region H3, so that target image is divided into altogether 24 image-regions.When terminal is received to image-region A1
Clicking operation when, image-region A1 is determined as object region.
Step 406, the corresponding direction in space of object region is determined as mesh according to the first default corresponding relationship by terminal
Direction in space is marked, the first default corresponding relationship includes the corresponding relationship between image-region and direction in space.
Optionally, the default corresponding relationship of first be stored in terminal between image-region and direction in space.When terminal is true
When making object region, object space corresponding with object region direction is determined according to the first default corresponding relationship.
Wherein, direction in space can be indicated with space angle or space angle section.In order to reduce data storage capacity, under
Face is only illustrated so that direction in space is indicated with space angle as an example.
Schematically, the division mode based on Fig. 6 target image provided, first between image-region and direction in space
Default corresponding relationship is as shown in Table 1.
Table one
Image-region | Direction in space | Image-region | Direction in space | Image-region | Direction in space |
A1 | 15° | A2 | 135° | A3 | -120° |
B1 | 30° | B2 | 150° | B3 | -105° |
C1 | 45° | C2 | 165° | C3 | -90° |
D1 | 60° | D2 | 180° | D3 | -75° |
E1 | 75° | E2 | -180° | E3 | -60° |
F1 | 90° | F2 | -165° | F3 | -45° |
G1 | 105° | G2 | -150° | G3 | -30° |
H1 | 120° | H2 | -135° | H3 | -15° |
For example, being preset after image-region A1 is determined as object region by terminal according to first that above-mentioned table one provides
Corresponding relationship determines that the corresponding object space direction object region A1 is " 15 ° ".
Step 407, object space direction is sent to video camera by terminal.
The object space direction determined is sent to video camera by terminal, corresponding, and video camera receives terminal transmission
Object space direction.
Step 408, video camera carries out speech enhan-cement processing to the corresponding voice signal in object space direction.
Video camera believes the corresponding sound in object space direction by built-in microphone array collected sound signal set
Number speech enhan-cement processing is carried out, the including but not limited to possible implementation of following two:
The first possible implementation, video camera carry out speech enhan-cement to the voice signal from object space direction
Processing, and voice suppression processing is carried out to the voice signal from non-targeted direction in space.Wherein, non-targeted direction in space is
Other direction in spaces in addition to object space direction.
It schematically, will be from 15 ° of sides when the object space direction that video camera receives terminal transmission is " 15 ° "
To voice signal carry out speech enhan-cement processing, to from other direction in spaces in addition to 15 ° voice signal carry out language
Sound inhibition processing.
Second of possible implementation, video camera is according to the second default corresponding relationship, determining and object space direction pair
The target local space answered, the second default corresponding relationship include the corresponding relationship between direction in space and local space;To coming from
In target local space voice signal carry out speech enhan-cement processing, and to the voice signal from non-targeted local space into
The processing of row voice suppression.
Wherein, non-targeted local space is other spaces in video collection area in addition to target local space.
Optionally, video camera constructs the corresponding three-dimensional space of video collection area according at least one camera in advance, and
Three-dimensional space is divided into N number of local space, second be stored between direction in space and local space in video camera is default to close
System.Wherein, local space refers to the three-dimensional space of the part under scene locating for video camera.
Schematically, three-dimensional space is divided into 24 local spaces, i.e. local space A4 to local space H4, office in advance
Portion space A5 to local space H5 and local space A6 to local space H6, the direction in space and local space stored in video camera
Between the second preset relation it is as shown in Table 2.
Table two
Direction in space | Local space | Direction in space | Local space | Direction in space | Local space |
15° | A4 | 135° | A5 | -120° | A6 |
30° | B4 | 150° | B5 | -105° | B6 |
45° | C4 | 165° | C5 | -90° | C6 |
60° | D4 | 180° | D5 | -75° | D6 |
75° | E4 | -180° | E5 | -60° | E6 |
90° | F4 | -165° | F5 | -45° | F6 |
105° | G4 | -150° | G5 | -30° | G6 |
120° | H4 | -135° | H5 | -15° | H6 |
It should be noted that three-dimensional space is video acquisition since target image is video collection area corresponding image
The corresponding space in region, then the division mode and terminal that three-dimensional space is divided into N number of local space by video camera are by target image
The division mode for being divided into N number of image-region can be corresponding, be also possible to not corresponding.When two division modes are corresponding
When, corresponding relationship existing for image-region and local space, each image-region local space corresponding with the image-region
Space angle range is identical.
Schematically, as shown in fig. 7, when the object space direction that video camera receives terminal transmission is " 15 ° ", according to
The second default corresponding relationship that above-mentioned table two provides determines that local space A4 corresponding with object space direction " 15 ° " is target
Local space 71, the corresponding space angle range of target local space 71 be (0,15 °], video camera will be empty from target part
Between 71 voice signal carry out speech enhan-cement processing, to the sound from other local spaces in addition to target local space 71
Sound signal carries out voice suppression processing.
Optionally, video camera carries out language to the corresponding voice signal in object space direction by adaptive beam-forming algorithm
Sound enhancing processing, output obtain enhanced voice signal.
Wherein, adaptive beam-forming algorithm includes minimum variance distortionless response (Minimum Variance
Distortionless Response, MVDR), Generalized Sidelobe Canceller (Generalized Sidelobe Canceller,
) and transmission function Generalized Sidelobe Canceller (Transfer Function Generalized Sidelobe GSC
At least one of Canceller, TF-GSC).
In conclusion the embodiment of the present application will be also by that will preset when terminal receives the predetermined registration operation in target image
It operates corresponding image-region and is determined as object region, it is according to the first default corresponding relationship, object region is corresponding
Direction in space be determined as object space direction;It enables the terminal to according to specified object region, it is pre- by first
If corresponding relationship determines corresponding object space direction, avoid when, there are when multi-acoustical, generally selecting sound most in environment
Strong voice signal leads to the situation of auditory localization mistake as object space direction, ensure that the accuracy of auditory localization.
The embodiment of the present application is and right also by carrying out speech enhan-cement processing to the voice signal from object space direction
Voice signal from non-targeted direction in space carries out voice suppression processing, effectively reduces the influence of ambient noise, greatly
The noise robustness of speech-enhancement system is improved greatly.
Following is the application Installation practice, can be used for executing the application embodiment of the method.It is real for the application device
Undisclosed details in example is applied, the application embodiment of the method is please referred to.
Referring to FIG. 8, the structural representation of the speech sound enhancement device provided it illustrates one exemplary embodiment of the application
Figure.The speech sound enhancement device can be by special hardware circuit, alternatively, being implemented in combination with for software and hardware increases as the language in Fig. 1
Strong system all or part of, the speech sound enhancement device include: obtain module 810, determining module 820 and enhancing module 830.
Module 810 is obtained, for obtaining the target image of video collection area, target image includes N number of image-region, N
For the positive integer greater than 1;
Determining module 820, for determining when receiving the predetermined registration operation in N number of image-region on object region
Object space corresponding with object region direction, object space direction are used to indicate the sky for needing to carry out speech enhan-cement processing
Between direction;
Enhance module 830, for carrying out speech enhan-cement processing to the corresponding voice signal in object space direction.
Optionally, determining module 820 are also used to when receiving the predetermined registration operation in N number of image-region, by predetermined registration operation
Corresponding image-region is determined as object region;According to the first default corresponding relationship, by the corresponding sky of object region
Between direction be determined as object space direction, the first default corresponding relationship includes the corresponding pass between image-region and direction in space
System.
Optionally, enhance module 830, be also used to carry out at speech enhan-cement to from the voice signal in object space direction
Reason, and voice suppression processing is carried out to the voice signal from non-8 object space direction;
Wherein, non-targeted direction in space is other direction in spaces in addition to object space direction.
Optionally, enhance module 830, be also used to according to the second default corresponding relationship, determination is corresponding with object space direction
Target local space, the second default corresponding relationship includes the corresponding relationship between direction in space and local space;To from
The voice signal of target local space carries out speech enhan-cement processing, and to the voice signal progress from non-targeted local space
Voice suppression processing;
Wherein, non-targeted local space is other spaces in video collection area in addition to target local space.
Optionally, video collection area includes M different shooting areas, and M is the positive integer greater than 1, obtains module
810, it is also used to obtain the corresponding shooting image of M shooting area;M shooting image is spliced, target figure is obtained
Picture.
Optionally, which includes video camera and terminal, and video camera is connected with terminal, and video camera includes at least three camera shootings
Head and at least six microphones.
The embodiment of the present application also provides a kind of video camera, video camera includes processor and memory, is stored in memory
Have at least one instruction, at least a Duan Chengxu, code set or instruction set, at least one instruction, an at least Duan Chengxu, code set or
Instruction set is loaded by processor and is executed to realize the sound enhancement method provided in above-mentioned each embodiment of the method.
The embodiment of the present application also provides a kind of terminal, terminal includes processor and memory, be stored in memory to
Few an instruction, at least a Duan Chengxu, code set or instruction set, at least one instruction, an at least Duan Chengxu, code set or instruction
Collection is loaded by processor and is executed to realize the sound enhancement method provided in above-mentioned each embodiment of the method.
The embodiment of the present application also provides a kind of computer readable storage medium, at least one finger is stored in storage medium
It enables, at least a Duan Chengxu, code set or instruction set, at least one instruction, an at least Duan Chengxu, code set or instruction set are by handling
Device is loaded and is executed to realize the sound enhancement method provided in above-mentioned each embodiment of the method.
Fig. 9 shows the structural block diagram of the terminal 900 of one exemplary embodiment of the application offer.The terminal 900 is upper
State the terminal being connected in speech-enhancement system with video camera.For example, terminal 900 is smart phone, tablet computer, MP3 player
(Moving Picture Experts Group Audio Layer III, dynamic image expert's compression standard audio level 3),
MP4 (Moving Picture Experts Group Audio Layer IV, dynamic image expert's compression standard audio level
4) player, laptop or desktop computer.Terminal 900 is also possible to referred to as user equipment, portable terminal, end on knee
Other titles such as end, terminal console.
In general, terminal 900 includes: processor 901 and memory 902.
Processor 901 may include one or more processing cores, such as 4 core processors, 8 core processors etc..Place
Reason device 901 can use DSP (Digital Signal Processing, Digital Signal Processing), FPGA (Field-
Programmable Gate Array, field programmable gate array), PLA (Programmable Logic Array, may be programmed
Logic array) at least one of example, in hardware realize.Processor 901 also may include primary processor and coprocessor, master
Processor is the processor for being handled data in the awake state, also referred to as CPU (Central Processing
Unit, central processing unit);Coprocessor is the low power processor for being handled data in the standby state.?
In some embodiments, processor 901 can be integrated with GPU (Graphics Processing Unit, image processor),
GPU is used to be responsible for the rendering and drafting of content to be shown needed for display screen.In some embodiments, processor 901 can also be wrapped
AI (Artificial Intelligence, artificial intelligence) processor is included, the AI processor is for handling related machine learning
Calculating operation.
Memory 902 may include one or more computer readable storage mediums, which can
To be non-transient.Memory 902 may also include high-speed random access memory and nonvolatile memory, such as one
Or multiple disk storage equipments, flash memory device.In some embodiments, the non-transient computer in memory 902 can
Storage medium is read for storing at least one instruction, at least one instruction performed by processor 901 for realizing this Shen
Please in the sound enhancement method that provides of each embodiment of the method.
In some embodiments, terminal 900 is also optional includes: peripheral device interface 903 and at least one peripheral equipment.
It can be connected by bus or signal wire between processor 901, memory 902 and peripheral device interface 903.Each peripheral equipment
It can be connected by bus, signal wire or circuit board with peripheral device interface 903.Specifically, peripheral equipment includes: radio circuit
904, at least one of touch display screen 905, camera 906, voicefrequency circuit 907, positioning component 908 and power supply 909.
Peripheral device interface 903 can be used for I/O (Input/Output, input/output) is relevant outside at least one
Peripheral equipment is connected to processor 901 and memory 902.In some embodiments, processor 901, memory 902 and peripheral equipment
Interface 903 is integrated on same chip or circuit board;In some other embodiments, processor 901, memory 902 and outer
Any one or two in peripheral equipment interface 903 can realize on individual chip or circuit board, the present embodiment to this not
It is limited.
Radio circuit 904 is for receiving and emitting RF (Radio Frequency, radio frequency) signal, also referred to as electromagnetic signal.It penetrates
Frequency circuit 904 is communicated by electromagnetic signal with communication network and other communication equipments.Radio circuit 904 turns electric signal
It is changed to electromagnetic signal to be sent, alternatively, the electromagnetic signal received is converted to electric signal.Optionally, radio circuit 904 wraps
It includes: antenna system, RF transceiver, one or more amplifiers, tuner, oscillator, digital signal processor, codec chip
Group, user identity module card etc..Radio circuit 904 can be carried out by least one wireless communication protocol with other terminals
Communication.The wireless communication protocol includes but is not limited to: WWW, Metropolitan Area Network (MAN), Intranet, each third generation mobile communication network (2G, 3G,
4G and 5G), WLAN and/or WiFi (Wireless Fidelity, Wireless Fidelity) network.In some embodiments, it penetrates
Frequency circuit 904 can also include NFC (Near Field Communication, wireless near field communication) related circuit, this
Application is not limited this.
Display screen 905 is for showing UI (User Interface, user interface).The UI may include figure, text, figure
Mark, video and its their any combination.When display screen 905 is touch display screen, display screen 905 also there is acquisition to show
The ability of the touch signal on the surface or surface of screen 905.The touch signal can be used as control signal and be input to processor
901 are handled.At this point, display screen 905 can be also used for providing virtual push button and/or dummy keyboard, also referred to as soft button and/or
Soft keyboard.In some embodiments, display screen 905 can be one, and the front panel of terminal 900 is arranged;In other embodiments
In, display screen 905 can be at least two, be separately positioned on the different surfaces of terminal 900 or in foldover design;In still other reality
It applies in example, display screen 905 can be flexible display screen, be arranged on the curved surface of terminal 900 or on fold plane.Even, it shows
Display screen 905 can also be arranged to non-rectangle irregular figure, namely abnormity screen.Display screen 905 can use LCD (Liquid
Crystal Display, liquid crystal display), OLED (Organic Light-Emitting Diode, Organic Light Emitting Diode)
Etc. materials preparation.
CCD camera assembly 906 is for acquiring image or video.Optionally, CCD camera assembly 906 include front camera and
Rear camera.In general, the front panel of terminal is arranged in front camera, the back side of terminal is arranged in rear camera.One
In a little embodiments, rear camera at least two is main camera, depth of field camera, wide-angle camera, focal length camera shooting respectively
Any one in head, to realize that main camera and the fusion of depth of field camera realize background blurring function, main camera and wide-angle
Camera fusion realizes that pan-shot and VR (Virtual Reality, virtual reality) shooting function or other fusions are clapped
Camera shooting function.In some embodiments, CCD camera assembly 906 can also include flash lamp.Flash lamp can be monochromatic warm flash lamp,
It is also possible to double-colored temperature flash lamp.Double-colored temperature flash lamp refers to the combination of warm light flash lamp and cold light flash lamp, can be used for not
With the light compensation under colour temperature.
Voicefrequency circuit 907 may include microphone and loudspeaker.Microphone is used to acquire the sound wave of user and environment, and will
Sound wave, which is converted to electric signal and is input to processor 901, to be handled, or is input to radio circuit 904 to realize voice communication.
For stereo acquisition or the purpose of noise reduction, microphone can be separately positioned on the different parts of terminal 900 to be multiple.Mike
Wind can also be array microphone or omnidirectional's acquisition type microphone.Loudspeaker is then used to that processor 901 or radio circuit will to be come from
904 electric signal is converted to sound wave.Loudspeaker can be traditional wafer speaker, be also possible to piezoelectric ceramic loudspeaker.When
When loudspeaker is piezoelectric ceramic loudspeaker, the audible sound wave of the mankind can be not only converted electrical signals to, it can also be by telecommunications
Number the sound wave that the mankind do not hear is converted to carry out the purposes such as ranging.In some embodiments, voicefrequency circuit 907 can also include
Earphone jack.
Positioning component 908 is used for the current geographic position of positioning terminal 900, to realize navigation or LBS (Location
Based Service, location based service).Positioning component 908 can be the GPS (Global based on the U.S.
Positioning System, global positioning system), China dipper system or Russia Galileo system positioning group
Part.
Power supply 909 is used to be powered for the various components in terminal 900.Power supply 909 can be alternating current, direct current,
Disposable battery or rechargeable battery.When power supply 909 includes rechargeable battery, which can be wired charging electricity
Pond or wireless charging battery.Wired charging battery is the battery to be charged by Wireline, and wireless charging battery is by wireless
The battery of coil charges.The rechargeable battery can be also used for supporting fast charge technology.
In some embodiments, terminal 900 further includes having one or more sensors 910.The one or more sensors
910 include but is not limited to: acceleration transducer 911, gyro sensor 912, pressure sensor 913, fingerprint sensor 914,
Optical sensor 915 and proximity sensor 916.
The acceleration that acceleration transducer 911 can detecte in three reference axis of the coordinate system established with terminal 900 is big
It is small.For example, acceleration transducer 911 can be used for detecting component of the acceleration of gravity in three reference axis.Processor 901 can
With the acceleration of gravity signal acquired according to acceleration transducer 911, touch display screen 905 is controlled with transverse views or longitudinal view
Figure carries out the display of user interface.Acceleration transducer 911 can be also used for the acquisition of game or the exercise data of user.
Gyro sensor 912 can detecte body direction and the rotational angle of terminal 900, and gyro sensor 912 can
To cooperate with acquisition user to act the 3D of terminal 900 with acceleration transducer 911.Processor 901 is according to gyro sensor 912
Following function may be implemented in the data of acquisition: when action induction (for example changing UI according to the tilt operation of user), shooting
Image stabilization, game control and inertial navigation.
The lower layer of side frame and/or touch display screen 905 in terminal 900 can be set in pressure sensor 913.Work as pressure
When the side frame of terminal 900 is arranged in sensor 913, user can detecte to the gripping signal of terminal 900, by processor 901
Right-hand man's identification or prompt operation are carried out according to the gripping signal that pressure sensor 913 acquires.When the setting of pressure sensor 913 exists
When the lower layer of touch display screen 905, the pressure operation of touch display screen 905 is realized to UI circle according to user by processor 901
Operability control on face is controlled.Operability control includes button control, scroll bar control, icon control, menu
At least one of control.
Fingerprint sensor 914 is used to acquire the fingerprint of user, collected according to fingerprint sensor 914 by processor 901
The identity of fingerprint recognition user, alternatively, by fingerprint sensor 914 according to the identity of collected fingerprint recognition user.It is identifying
When the identity of user is trusted identity out, the user is authorized to execute relevant sensitive operation, the sensitive operation packet by processor 901
Include solution lock screen, check encryption information, downloading software, payment and change setting etc..Terminal can be set in fingerprint sensor 914
900 front, the back side or side.When being provided with physical button or manufacturer Logo in terminal 900, fingerprint sensor 914 can be with
It is integrated with physical button or manufacturer Logo.
Optical sensor 915 is for acquiring ambient light intensity.In one embodiment, processor 901 can be according to optics
The ambient light intensity that sensor 915 acquires controls the display brightness of touch display screen 905.Specifically, when ambient light intensity is higher
When, the display brightness of touch display screen 905 is turned up;When ambient light intensity is lower, the display for turning down touch display screen 905 is bright
Degree.In another embodiment, the ambient light intensity that processor 901 can also be acquired according to optical sensor 915, dynamic adjust
The acquisition parameters of CCD camera assembly 906.
Proximity sensor 916, also referred to as range sensor are generally arranged at the front panel of terminal 900.Proximity sensor 916
For acquiring the distance between the front of user Yu terminal 900.In one embodiment, when proximity sensor 916 detects use
When family and the distance between the front of terminal 900 gradually become smaller, touch display screen 905 is controlled from bright screen state by processor 901
It is switched to breath screen state;When proximity sensor 916 detects user and the distance between the front of terminal 900 becomes larger,
Touch display screen 905 is controlled by processor 901 and is switched to bright screen state from breath screen state.
It will be understood by those skilled in the art that the restriction of the not structure paired terminal 900 of structure shown in Fig. 9, can wrap
It includes than illustrating more or fewer components, perhaps combine certain components or is arranged using different components.
It should be understood that speech sound enhancement device provided by the above embodiment is when carrying out speech enhan-cement, only with above-mentioned each
The division progress of functional module can according to need and for example, in practical application by above-mentioned function distribution by different function
Energy module is completed, i.e., the internal structure of equipment is divided into different functional modules, to complete whole described above or portion
Divide function.In addition, sound enhancement method provided by the above embodiment and Installation practice belong to same design, implemented
Journey is detailed in embodiment of the method, and which is not described herein again.
Above-mentioned the embodiment of the present application serial number is for illustration only, does not represent the advantages or disadvantages of the embodiments.
Those of ordinary skill in the art will appreciate that realizing that all or part of the steps of above-described embodiment can pass through hardware
It completes, relevant hardware can also be instructed to complete by program, the program can store in a kind of computer-readable
In storage medium, storage medium mentioned above can be read-only memory, disk or CD etc..
The foregoing is merely the preferred embodiments of the application, not to limit the application, it is all in spirit herein and
Within principle, any modification, equivalent replacement, improvement and so on be should be included within the scope of protection of this application.
Claims (14)
1. a kind of sound enhancement method, which is characterized in that the described method includes:
The target image of video collection area is obtained, the target image includes N number of image-region, and the N is just whole greater than 1
Number;
When receiving the predetermined registration operation in N number of image-region on object region, the determining and object-image region
The corresponding object space direction in domain, the object space direction are used to indicate the direction in space for needing to carry out speech enhan-cement processing;
Speech enhan-cement processing is carried out to the corresponding voice signal in the object space direction.
2. the method according to claim 1, wherein described ought receive target figure in N number of image-region
When as predetermined registration operation on region, object space corresponding with object region direction is determined, comprising:
When receiving the predetermined registration operation in N number of image-region, the corresponding image-region of the predetermined registration operation is determined as
The object region;
According to the first default corresponding relationship, the corresponding direction in space of the object region is determined as object space direction,
The first default corresponding relationship includes the corresponding relationship between described image region and the direction in space.
3. the method according to claim 1, wherein described to the corresponding voice signal in the object space direction
Carry out speech enhan-cement processing, comprising:
Speech enhan-cement processing is carried out to the voice signal from the object space direction, and to from non-targeted space side
To voice signal carry out voice suppression processing;
Wherein, the non-targeted direction in space is other direction in spaces in addition to the object space direction.
4. the method according to claim 1, wherein described to the corresponding voice signal in the object space direction
Carry out speech enhan-cement processing, comprising:
According to the second default corresponding relationship, target local space corresponding with the object space direction is determined, described second is pre-
If corresponding relationship includes the corresponding relationship between the direction in space and local space;
Speech enhan-cement processing is carried out to the voice signal from the target local space, and to empty from non-targeted part
Between voice signal carry out voice suppression processing;
Wherein, the non-targeted local space is other skies in the video collection area in addition to the target local space
Between.
5. method according to any one of claims 1 to 4, which is characterized in that the video collection area includes M different
Shooting area, the M are the positive integer greater than 1, the target image for obtaining video collection area, comprising:
Obtain the corresponding shooting image of the M shooting area;
The M shooting image is spliced, the target image is obtained.
6. a kind of speech sound enhancement device, which is characterized in that described device includes:
Module is obtained, for obtaining the target image of video collection area, the target image includes N number of image-region, the N
For the positive integer greater than 1;
Determining module, for when receiving the predetermined registration operation in N number of image-region on object region, determining and institute
The corresponding object space direction of object region is stated, the object space direction, which is used to indicate, to need to carry out speech enhan-cement processing
Direction in space;
Enhance module, for carrying out speech enhan-cement processing to the corresponding voice signal in the object space direction.
7. device according to claim 6, which is characterized in that the determining module is also used to that N number of figure ought be received
When as predetermined registration operation in region, the corresponding image-region of the predetermined registration operation is determined as the object region;According to
The corresponding direction in space of the object region is determined as object space direction by the first default corresponding relationship, and described first
Default corresponding relationship includes the corresponding relationship between described image region and direction in space.
8. device according to claim 6, which is characterized in that the enhancing module is also used to from the target
The voice signal of direction in space carries out speech enhan-cement processing, and carries out voice to the voice signal from non-targeted direction in space
Inhibition processing;
Wherein, the non-targeted direction in space is other direction in spaces in addition to the object space direction.
9. device according to claim 6, which is characterized in that the enhancing module is also used to according to the second default correspondence
Relationship determines target local space corresponding with the object space direction, and the second default corresponding relationship includes the sky
Between corresponding relationship between direction and local space;Speech enhan-cement is carried out to the voice signal from the target local space
Processing, and voice suppression processing is carried out to the voice signal from non-targeted local space;
Wherein, the non-targeted local space is other skies in the video collection area in addition to the target local space
Between.
10. according to any device of claim 6 to 9, which is characterized in that the video collection area includes M difference
Shooting area, the M is positive integer greater than 1, and it is respectively right to be also used to obtain the M shooting area for the acquisition module
The shooting image answered;The M shooting image is spliced, the target image is obtained.
11. a kind of video camera, which is characterized in that the video camera includes processor and memory, is stored in the memory
At least one instruction, at least a Duan Chengxu, code set or instruction set, at least one instruction, an at least Duan Chengxu, institute
Code set or instruction set is stated to be loaded by the processor and executed to realize speech enhan-cement as claimed in claim 1 to 5
Method.
12. a kind of terminal, which is characterized in that the terminal includes processor and memory, is stored at least in the memory
One instruction, at least a Duan Chengxu, code set or instruction set, at least one instruction, an at least Duan Chengxu, the generation
Code collection or instruction set are loaded by the processor and are executed to realize sound enhancement method as claimed in claim 1 to 5.
13. a kind of speech-enhancement system, which is characterized in that the system comprises video camera and terminal, the video camera with it is described
Terminal is connected, and the video camera includes at least three cameras and at least six microphones,
The terminal, for obtaining the target image of video collection area, the target image includes N number of image-region, the N
For the positive integer greater than 1;
The terminal is also used to when receiving the predetermined registration operation in N number of image-region on object region, determine with
The corresponding object space direction of the object region, the object space direction, which is used to indicate, to need to carry out at speech enhan-cement
The direction in space of reason;
The terminal or the video camera, for being carried out at speech enhan-cement to the corresponding voice signal in the object space direction
Reason.
14. a kind of computer readable storage medium, which is characterized in that be stored at least one instruction, extremely in the storage medium
A few Duan Chengxu, code set or instruction set, at least one instruction, an at least Duan Chengxu, the code set or instruction
Collection is loaded by the processor and is executed to realize sound enhancement method as claimed in claim 1 to 5.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810185895.9A CN110248197B (en) | 2018-03-07 | 2018-03-07 | Voice enhancement method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810185895.9A CN110248197B (en) | 2018-03-07 | 2018-03-07 | Voice enhancement method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110248197A true CN110248197A (en) | 2019-09-17 |
CN110248197B CN110248197B (en) | 2021-10-22 |
Family
ID=67882419
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810185895.9A Active CN110248197B (en) | 2018-03-07 | 2018-03-07 | Voice enhancement method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110248197B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI708191B (en) * | 2019-11-28 | 2020-10-21 | 睿捷國際股份有限公司 | Sound source distribution visualization method and computer program product thereof |
CN113450769A (en) * | 2020-03-09 | 2021-09-28 | 杭州海康威视数字技术股份有限公司 | Voice extraction method, device, equipment and storage medium |
CN113542466A (en) * | 2021-07-07 | 2021-10-22 | Oppo广东移动通信有限公司 | Audio processing method, electronic device and storage medium |
WO2023231686A1 (en) * | 2022-05-30 | 2023-12-07 | 荣耀终端有限公司 | Video processing method and terminal |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060291816A1 (en) * | 2005-06-28 | 2006-12-28 | Sony Corporation | Signal processing apparatus, signal processing method, program, and recording medium |
CN105474665A (en) * | 2014-03-31 | 2016-04-06 | 松下知识产权经营株式会社 | Sound processing apparatus, sound processing system, and sound processing method |
CN105474666A (en) * | 2014-04-25 | 2016-04-06 | 松下知识产权经营株式会社 | Audio processing apparatus, audio processing system, and audio processing method |
US20160241818A1 (en) * | 2015-02-18 | 2016-08-18 | Honeywell International Inc. | Automatic alerts for video surveillance systems |
CN107230187A (en) * | 2016-03-25 | 2017-10-03 | 北京三星通信技术研究有限公司 | The method and apparatus of multimedia signal processing |
-
2018
- 2018-03-07 CN CN201810185895.9A patent/CN110248197B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20060291816A1 (en) * | 2005-06-28 | 2006-12-28 | Sony Corporation | Signal processing apparatus, signal processing method, program, and recording medium |
CN105474665A (en) * | 2014-03-31 | 2016-04-06 | 松下知识产权经营株式会社 | Sound processing apparatus, sound processing system, and sound processing method |
CN105474666A (en) * | 2014-04-25 | 2016-04-06 | 松下知识产权经营株式会社 | Audio processing apparatus, audio processing system, and audio processing method |
US20160241818A1 (en) * | 2015-02-18 | 2016-08-18 | Honeywell International Inc. | Automatic alerts for video surveillance systems |
CN107230187A (en) * | 2016-03-25 | 2017-10-03 | 北京三星通信技术研究有限公司 | The method and apparatus of multimedia signal processing |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
TWI708191B (en) * | 2019-11-28 | 2020-10-21 | 睿捷國際股份有限公司 | Sound source distribution visualization method and computer program product thereof |
CN113450769A (en) * | 2020-03-09 | 2021-09-28 | 杭州海康威视数字技术股份有限公司 | Voice extraction method, device, equipment and storage medium |
CN113542466A (en) * | 2021-07-07 | 2021-10-22 | Oppo广东移动通信有限公司 | Audio processing method, electronic device and storage medium |
WO2023231686A1 (en) * | 2022-05-30 | 2023-12-07 | 荣耀终端有限公司 | Video processing method and terminal |
Also Published As
Publication number | Publication date |
---|---|
CN110248197B (en) | 2021-10-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110992493B (en) | Image processing method, device, electronic equipment and storage medium | |
CN110427110A (en) | A kind of live broadcasting method, device and direct broadcast server | |
CN110248197A (en) | Sound enhancement method and device | |
CN109982102A (en) | The interface display method and system and direct broadcast server of direct broadcasting room and main broadcaster end | |
CN108256505A (en) | Image processing method and device | |
CN109285178A (en) | Image partition method, device and storage medium | |
CN109558837A (en) | Face critical point detection method, apparatus and storage medium | |
CN110121094A (en) | Video is in step with display methods, device, equipment and the storage medium of template | |
CN109302632A (en) | Obtain method, apparatus, terminal and the storage medium of live video picture | |
CN109166150A (en) | Obtain the method, apparatus storage medium of pose | |
CN109862412A (en) | It is in step with the method, apparatus and storage medium of video | |
CN110081902A (en) | Direction indicating method, device and terminal in navigation | |
CN109192218A (en) | The method and apparatus of audio processing | |
CN109859102A (en) | Special display effect method, apparatus, terminal and storage medium | |
CN110163833A (en) | The method and apparatus for determining the folding condition of disconnecting link | |
CN109547843A (en) | The method and apparatus that audio-video is handled | |
CN110225390A (en) | Method, apparatus, terminal and the computer readable storage medium of video preview | |
CN109254775A (en) | Image processing method, terminal and storage medium based on face | |
CN109065068A (en) | Audio-frequency processing method, device and storage medium | |
CN109660876A (en) | The method and apparatus for showing list | |
CN108965769A (en) | Image display method and device | |
CN109117466A (en) | table format conversion method, device, equipment and storage medium | |
CN110147796A (en) | Image matching method and device | |
CN108829582A (en) | The method and apparatus of program compatibility | |
CN109413440A (en) | Virtual objects management method, device and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |