CN109686369A - Audio-frequency processing method and device - Google Patents
Audio-frequency processing method and device Download PDFInfo
- Publication number
- CN109686369A CN109686369A CN201811573764.4A CN201811573764A CN109686369A CN 109686369 A CN109686369 A CN 109686369A CN 201811573764 A CN201811573764 A CN 201811573764A CN 109686369 A CN109686369 A CN 109686369A
- Authority
- CN
- China
- Prior art keywords
- information
- audio
- target
- processed
- frequency
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000003672 processing method Methods 0.000 title claims abstract description 18
- 238000000034 method Methods 0.000 claims abstract description 44
- 230000008569 process Effects 0.000 claims abstract description 27
- 239000012634 fragment Substances 0.000 claims abstract description 18
- 238000012545 processing Methods 0.000 claims description 26
- 230000009467 reduction Effects 0.000 claims description 5
- 239000000284 extract Substances 0.000 claims description 3
- 238000012544 monitoring process Methods 0.000 abstract description 20
- 230000001755 vocal effect Effects 0.000 description 18
- 238000010586 diagram Methods 0.000 description 6
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 238000007405 data analysis Methods 0.000 description 3
- 238000004891 communication Methods 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 238000012806 monitoring device Methods 0.000 description 2
- 208000003443 Unconsciousness Diseases 0.000 description 1
- 238000006243 chemical reaction Methods 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 230000008451 emotion Effects 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 230000003993 interaction Effects 0.000 description 1
- 230000036651 mood Effects 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000001228 spectrum Methods 0.000 description 1
- 230000002618 waking effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/78—Detection of presence or absence of voice signals
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Physics & Mathematics (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Theoretical Computer Science (AREA)
- Artificial Intelligence (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Quality & Reliability (AREA)
- Telephonic Communication Services (AREA)
Abstract
The invention discloses a kind of audio-frequency processing method and devices.Wherein, this method comprises: determining location information of the target information in audio-frequency information to be processed, wherein location information includes at least the time that target information occurs in audio-frequency information to be processed;The target audio in audio-frequency information to be processed is found according to location information, wherein target audio is the audio fragment in audio-frequency information to be processed including target information;Predetermined process is carried out to target audio.The object that the present invention solves user under monitoring field in the prior art is easy the technical issues of divulging privacy.
Description
Technical field
The present invention relates to field of audio processing, in particular to a kind of audio-frequency processing method and device.
Background technique
In fields such as monitoring, vehicle intelligent, smart home, mobile phone speech assistants, audio usually can be all identified, with therefrom
Extracting Information completes interaction or information excavating with user.In this course, partial information can leave radio reception end, send to mentioning
For the cloud of the enterprise of service.Due to carrying that the vocal print of user, part be conscious in the voice of user or unconscious generation
Privacy content etc., therefore there are the risks of identity information, leakage of private information to enterprise cloud.
Currently, vehicle intelligent or smart home are generally as follows the processing mode of the audio-frequency information to be processed received: this
Only there is reaction at ground radio reception end to wake-up word, wakes up word and does not upload, and only uploads the preceding n after waking up and takes turns voice, defaults this few wheel voice
Itself it is not related to privacy.Other when be in close state, to it is non-wake up word voice content do not receive.Mobile phone speech assistant couple
The processing mode of the audio-frequency information to be processed received is generally as follows: before use, can be confirmed agreement by user, legally be done
Evade.The dialogue of user and voice assistant are that user actively initiates, and user actively terminates.Voice is all handled beyond the clouds, is defaulted
In the process, it is not related to privacy information.Monitoring system is generally as follows the processing mode of the audio-frequency information to be processed received:
What the voice that monitored object may and be unaware of oneself was being enrolled, therefore be more likely to reveal privacy information.
From the foregoing, it will be observed that only can be used in the scene that user cooperates on one's own initiative in existing audio processing scheme, and such as: hand
Machine aided hand, smart home etc., but it is not used in monitoring scene.Since under monitoring scene, user is talked with machine, because
This will not active dodge privacy when speaking.
Aiming at the problem that object of user under monitoring field in the prior art is easy to divulge privacy, not yet propose at present effective
Solution.
Summary of the invention
The embodiment of the invention provides a kind of audio-frequency processing method and devices, at least to solve monitoring field in the prior art
The object of lower user is easy the technical issues of divulging privacy.
According to an aspect of an embodiment of the present invention, a kind of audio-frequency processing method is provided, comprising: determine that target information exists
Location information in audio-frequency information to be processed, wherein location information includes at least target information and goes out in audio-frequency information to be processed
The existing time;The target audio in audio-frequency information to be processed is found according to location information, wherein target audio is sound to be processed
It include the audio fragment of target information in frequency information;Predetermined process is carried out to target audio.
Further, preset antistop list is obtained;Obtain the corresponding text information of audio-frequency information to be processed, wherein text
The time shaft of this information and audio-frequency information to be processed has the first corresponding relationship;From the word identified in text information in antistop list
Language, and determine that the text information of any keyword in hit antistop list is target information;Mesh is obtained according to the first corresponding relationship
Mark the timeline information of the corresponding audio fragment of information;Determine that timeline information is location information.
Further, preset fisrt feature information is obtained;The second feature information in audio-frequency information to be processed is extracted,
In, the time shaft of second feature information and audio-frequency information to be processed has the second corresponding relationship;By second feature information and first
Characteristic information is matched, and determines that the second feature information of hit fisrt feature information is target information;According to the second corresponding pass
System obtains the timeline information of the corresponding audio fragment of target information;Determine that timeline information is location information.
Further, predetermined process carried out to target audio, including it is following any one: target audio is removed;To mesh
Mark with phonetic symbols frequency carries out noise reduction processing;Target audio is replaced using the first preset audio;Superposition second is pre- on the basis of target audio
If audio.
Further, after carrying out predetermined process to target audio, the audio to be processed after progress predetermined process is believed
The characteristic information of breath, which carries out feature, to be obscured;Export the audio-frequency information to be processed after feature is obscured.
Further, determine target information before the location information in audio-frequency information to be processed, obtain audio-frequency information,
Wherein, audio-frequency information includes voice messaging;Denoising disposal is carried out to audio-frequency information, obtains audio-frequency information to be processed.
According to another aspect of an embodiment of the present invention, a kind of apparatus for processing audio is additionally provided, comprising: determining module is used
In determining location information of the target information in audio-frequency information to be processed, wherein location information include at least target information to
The time occurred in processing audio-frequency information;Searching module, for finding the mesh in audio-frequency information to be processed according to location information
Mark with phonetic symbols frequency, wherein target audio is the audio fragment in audio-frequency information to be processed including target information;Processing module, for pair
Target audio carries out predetermined process.
Further, it is determined that module includes: the first acquisition submodule, for obtaining preset antistop list;Second obtains
Submodule, for obtaining the corresponding text information of audio-frequency information to be processed, wherein obtained text information and audio to be processed are believed
The time shaft of breath has the first corresponding relationship;First determine submodule, for from text information identify antistop list in word
Language, and determine that the text information of any keyword in hit antistop list is target information;Third acquisition submodule is used for basis
First corresponding relationship obtains the timeline information of the corresponding audio fragment of target information;Second determines submodule, when for determining
Between axis information be location information.
According to another aspect of an embodiment of the present invention, a kind of storage medium is additionally provided, storage medium includes the journey of storage
Sequence, wherein equipment where control storage medium executes above-mentioned audio-frequency processing method in program operation.
According to another aspect of an embodiment of the present invention, a kind of processor is additionally provided, processor is used to run program,
In, program executes above-mentioned audio-frequency processing method when running.
In embodiments of the present invention, location information of the target information in audio-frequency information to be processed is determined, wherein positioning letter
Breath includes at least the time that target information occurs in audio-frequency information to be processed;Audio letter to be processed is found according to location information
Target audio in breath, wherein target audio is the audio fragment in audio-frequency information to be processed including target information;To target sound
Frequency carries out predetermined process.Above scheme finds out target sound from audio-frequency information to be processed by positioning to target information
Frequently, and by carrying out predetermined process to target audio, specially treated is carried out to the target information in voice messaging to reach
Purpose can carry out specially treated to privacy information, be protected with the privacy information to user and then in monitoring field,
The object for solving user under monitoring field in the prior art is easy the technical issues of divulging privacy.
Detailed description of the invention
The drawings described herein are used to provide a further understanding of the present invention, constitutes part of this application, this hair
Bright illustrative embodiments and their description are used to explain the present invention, and are not constituted improper limitations of the present invention.In the accompanying drawings:
Fig. 1 is a kind of flow chart of audio-frequency processing method according to an embodiment of the present invention;
Fig. 2 is a kind of text based privacy voice location model according to an embodiment of the present invention;
Fig. 3 is a kind of privacy voice location model based on feature according to an embodiment of the present invention;
Fig. 4 is a kind of schematic diagram for carrying out feature and obscuring according to an embodiment of the present invention;
Fig. 5 is a kind of schematic diagram of audio-frequency processing method according to an embodiment of the present invention;And
Fig. 6 is a kind of schematic diagram of apparatus for processing audio according to an embodiment of the present invention.
Specific embodiment
In order to enable those skilled in the art to better understand the solution of the present invention, below in conjunction in the embodiment of the present invention
Attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is only
The embodiment of a part of the invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill people
The model that the present invention protects all should belong in member's every other embodiment obtained without making creative work
It encloses.
It should be noted that description and claims of this specification and term " first " in above-mentioned attached drawing, "
Two " etc. be to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should be understood that using in this way
Data be interchangeable under appropriate circumstances, so as to the embodiment of the present invention described herein can in addition to illustrating herein or
Sequence other than those of description is implemented.In addition, term " includes " and " having " and their any deformation, it is intended that cover
Cover it is non-exclusive include, for example, the process, method, system, product or equipment for containing a series of steps or units are not necessarily limited to
Step or unit those of is clearly listed, but may include be not clearly listed or for these process, methods, product
Or other step or units that equipment is intrinsic.
Embodiment 1
According to embodiments of the present invention, a kind of embodiment of audio-frequency processing method is provided, it should be noted that in attached drawing
The step of process illustrates can execute in a computer system such as a set of computer executable instructions, although also,
Logical order is shown in flow chart, but in some cases, it can be to be different from shown by sequence execution herein or retouch
The step of stating.
Fig. 1 is a kind of flow chart of audio-frequency processing method according to an embodiment of the present invention, as shown in Figure 1, this method includes
Following steps:
Step S102 determines location information of the target information in audio-frequency information to be processed, wherein location information is at least wrapped
Include the time that target information occurs in audio-frequency information to be processed.
Specifically, above-mentioned audio-frequency information to be processed is the audio-frequency information for including voice messaging, above-mentioned target information be can be
Privacy information, or the information to acquire a special sense.
In an alternative embodiment, in monitoring field, the privacy of user speech is revealed in order to prevent, then can will be related to
And the information of privacy is as target information, and such as: name, passport NO. etc..
In an alternative embodiment, in data analysis field, in order to determine that user inclines to the emotion of certain object
To, the information of above-mentioned object can be involved in as target information, such as: to the evaluation sentence of the object, to it is similar other
Comment sentence of object etc..
Above-mentioned location information includes at least the time that target information occurs in audio-frequency information to be processed, therefore according to positioning
Information can position the position occurred to target audio in audio-frequency information to be processed, to handle target audio.One
In kind optional embodiment, the time that target information occurs in audio-frequency information to be processed can be the mesh for including target information
Mark with phonetic symbols frequency duration is also possible to the time point that target audio starts.
Step S104 finds the target audio in audio-frequency information to be processed according to location information, wherein target audio is
It include the audio fragment of target information in audio-frequency information to be processed.
Specifically, can directly be believed according to positioning in the case where location information includes target audio duration
Breath finds target audio from audio-frequency information to be processed, only includes the case where the time point that target audio starts in location information
Under, preset length can be set, preset time is extended according to the time started backward, as the end time of target audio, thus
Target audio is determined according to starting and end time.
Step S106 carries out predetermined process to target audio.
Specifically, above-mentioned predetermined process is for being concealed or being protruded to the target information in audio-frequency information to be processed.
In an alternative embodiment, in monitoring field, target audio can be carried out concealing processing, to prevent from using
The leakage of family privacy.
Herein it should be noted that above-mentioned steps can be executed by the terminal of acquisition audio, terminal passes through the above method pair
After audio is handled, by the server or other servers of treated audio is sent to monitoring, for example, when the above method is answered
When for mobile phone, above scheme is executed by the mobile phone of user, above-mentioned processing carried out to collected audio, then will treated sound
Frequency is sent to the monitoring server of distal end.
It, can be to prompt tone be inserted into before target audio, with right in data analysis field in an alternative embodiment
Target audio is protruded, to improve data analysis efficiency.
From the foregoing, it will be observed that the above embodiments of the present application determine location information of the target information in audio-frequency information to be processed,
In, location information includes at least the time that target information occurs in audio-frequency information to be processed;According to location information find to
Handle the target audio in audio-frequency information, wherein target audio is the audio piece in audio-frequency information to be processed including target information
Section;Predetermined process is carried out to target audio.Above scheme is looked into from audio-frequency information to be processed by positioning to target information
Find out target audio, and by carrying out predetermined process to target audio, thus reached to the target information in voice messaging into
The purpose of row specially treated, and then in monitoring field, specially treated can be carried out to privacy information, with the privacy information to user
It is protected, the object for solving user under monitoring field in the prior art is easy the technical issues of divulging privacy.
As a kind of optional embodiment, location information of the target information in audio-frequency information to be processed is determined, comprising: obtain
Take preset antistop list;Obtain the corresponding text information of audio-frequency information to be processed, wherein text information and audio to be processed are believed
The time shaft of breath has the first corresponding relationship;It identifies that antistop list is matched from text information, determines hit antistop list
In the text information of any keyword be target information;The corresponding audio fragment of target information is obtained according to the first corresponding relationship
Timeline information;Determine that timeline information is location information.
Above scheme determines the target information in audio-frequency information to be processed according to the corresponding text information of voice messaging
Position.Specifically, above-mentioned antistop list may include scheduled word and sentence, and can be empirically determined, in different scenes
Different antistop lists can be determined, for example, often will appear user in the monitoring scene of business hall and say name, certificate number
The information such as code, therefore can be using these information and relevant information as antistop list;For another example in the monitoring scene of catering environment
In, user can often say the information such as name, place during talk, therefore can be using these information and relevant information as pass
Keyword.
Still in the above scheme, the voice messaging in audio-frequency information to be processed is converted to by speech recognition module first
Text information, then from the word or sentence identified in text information in antistop list, to obtain the mesh for including in text information
Mark information.
By taking target information is privacy information as an example, above-mentioned steps can be real by text based privacy voice location model
Existing, Fig. 2 is a kind of text based privacy voice location model according to an embodiment of the present invention, in a kind of optional embodiment
In, still by taking monitoring field as an example, as shown in connection with fig. 2, predetermined antistop list or particular statement are obtained first, the keyword
Table or particular statement are privacy text decision rule, which switchs to audio according to the empirically determined of usage scenario
Text, then by speech recognition module, in the corresponding text of audio, identify the word or particular statement in antistop list, by
Timeline information comprising corresponding voice in text, therefore can be determined privacy voice on a timeline according to recognition result
Position, i.e., above-mentioned location information finally export the position of all privacy informations on a timeline, then complete determining for target information
Position.
As a kind of optional embodiment, location information of the target information in audio-frequency information to be processed is determined, comprising: obtain
Take preset fisrt feature information;Extract the second feature information in audio-frequency information to be processed, wherein second feature information with to
The time shaft for handling audio-frequency information has the second corresponding relationship;Second feature information is matched with fisrt feature information, really
Surely the second feature information for hitting fisrt feature information is target information;It is corresponding that target information is obtained according to the second corresponding relationship
The timeline information of audio fragment;Determine that timeline information is location information.
Specifically, above-mentioned fisrt feature information is preset characteristic information, certain mood subaudio frequency characteristic information can be,
Audio frequency characteristics may include voiceprint, timbre information and tone information etc..Under different scenes, it can be set by experience
Set different fisrt feature information.
Features described above matching module by the second feature information in preset fisrt feature information and audio to be processed into
Row matching determines that the second feature information of hit fisrt feature information is target information.
Still by taking target information is privacy information as an example, above-mentioned steps can pass through the privacy voice location model based on feature
It realizes, in an alternative embodiment, Fig. 3 is a kind of privacy voice positioning mould based on feature according to an embodiment of the present invention
Type first according to usage scenario, determines rule description (the i.e. above-mentioned preset fisrt feature letter of privacy feature as shown in connection with fig. 3
Breath), it reuses characteristic extracting module and extracts second feature information from audio-frequency information to be processed, the second feature information and time
Axis is corresponding.Characteristic matching module is finally used, second feature information and fisrt feature information are subjected to characteristic matching, depending on
Position meets the time shaft position of privacy feature, which is above-mentioned location information.All privacy informations of final output
Audio-frequency information to be processed time shaft on position.
As a kind of optional embodiment, predetermined process carried out to target audio, including it is following any one: by target sound
Frequency is removed;Noise reduction processing is carried out to target audio;Target audio is replaced using the first preset audio;On the basis of target audio
It is superimposed the second preset audio;The second preset audio is superimposed on the basis of target audio.
Above scheme provides four kinds of processing modes to target audio, is illustrated separately below.
In an alternative embodiment, target audio is removed, which can be from audio to be processed, by target
Audio is intercepted and is abandoned, and treated, and an audio-frequency information may be divided into multistage.For example, in audio-frequency information to be processed, the 00th:
There is target audio within 02-01:00 seconds, then intercepted out from audio-frequency information to be processed 00:02-01:00 seconds, thus by target sound
Frequency is removed, and then protects the privacy of user.
Also in an alternative embodiment, noise reduction processing is carried out to target audio, so that the mesh in target audio
Mark information is concealed, and then protects the privacy of user.
In an alternative embodiment, target audio is replaced using the first preset audio.Above-mentioned first preset audio
It can be the audio intercepted from music, be also possible to the audio recorded in advance, replace target audio using the first preset audio
Afterwards, target audio is concealed, and then protects the privacy of user.
In another optional embodiment, above-mentioned second preset audio is also possible to the audio intercepted from music, or
The audio recorded in advance, after the second preset audio is superimposed upon target audio, the second preset audio covers target audio, thus
So that the information of target audio is difficult to reveal, and then protect the privacy of user.
As a kind of optional embodiment, after carrying out predetermined process to target audio, the above method further include: into
The characteristic information of audio-frequency information to be processed after row predetermined process, which carries out feature, to be obscured;Export the audio to be processed after feature is obscured
Information.
In the above scheme, predetermined process is carried out to target audio, to make the privacy information in audio-frequency information to be processed
It is concealed.Above-mentioned steps carry out feature to the characteristic information for the audio to be processed for concealing target information and obscure, to avoid sound is passed through
Identity of the frequency acquisition of information to the user for issuing voice messaging.
Specifically, features described above information can be vocal print feature, tonality feature and tamber characteristic etc., to characteristic information into
Row is obscured, and can be and deforms to characteristic information, to obscure the feature of audio-frequency information itself, so that being difficult to pass through audio
Information goes out the identity of user.
In an alternative embodiment, by taking characteristic information is tamber characteristic as an example, can tone color to audio-frequency information it is special
Sign is deformed, and is obscured with the characteristic information to audio-frequency information.
In an alternative embodiment, characteristic information is for voiceprint, above scheme can pass through vocal print spy
Sign obscures module execution, and Fig. 4 is that a kind of schematic diagram that progress feature is obscured according to an embodiment of the present invention as shown in connection with fig. 4 can
To obtain empirically determined vocal print feature type in advance, using vocal print feature type as carrying out obscuring processing to audio-frequency information
Rule.When handling audio-frequency information, using characteristic extracting module, vocal print feature is extracted from audio-frequency information, then
By vocal print feature locating module, the vocal print feature for belonging to vocal print feature type is determined from the vocal print feature of audio-frequency information,
And determining vocal print is obscured by vocal print feature deformation module, obtain the audio-frequency information that vocal print feature is confused.
It should be noted that vocal print is the sound wave spectrum for the carrying verbal information being display together with electroacoustics, pass through vocal print
It can accurately determine the identity of speaker, even if the tone color or tone to audio are adjusted, still be difficult to influence judgement knot
Fruit, therefore in order to which the identity to speaker is protected, the voiceprint of audio-frequency information can be obscured, thus maximum journey
The identity of degree protection speaker.
It follows that not only having been carried out to the privacy content in audio-frequency information to be processed hidden in the application above scheme
It goes, avoids the leakage of the voice privacy of user under monitoring scene, spy also has been carried out to the characteristic information of audio-frequency information to be processed
Sign is obscured, ensure that the leakage of user identity under monitoring scene, to provide safer privacy guarantee for user.
As a kind of optional embodiment, determine target information before the location information in audio-frequency information to be processed,
Method further include: obtain audio-frequency information, wherein audio-frequency information includes voice messaging;Denoising disposal is carried out to audio-frequency information, is obtained
To audio-frequency information to be processed.
Specifically, above-mentioned audio-frequency information can be the audio-frequency information of monitoring device acquisition, since monitoring device acquires audio
There may be other sound for the environment of information, so if directly handling audio-frequency information, then may be subjected to noise
Interference.Therefore after getting audio-frequency information, in order to which the privacy information to voice messaging in audio-frequency information is handled, first
It needs to carry out denoising to audio-frequency information, and then to obtain from the noise information removed in audio-frequency information except voice messaging
Above-mentioned audio-frequency information to be processed.
Fig. 5 is a kind of schematic diagram of audio-frequency processing method according to an embodiment of the present invention, first can basis in conjunction with Fig. 5
Privacy voice location model determines the location information of privacy information in original audio (audio-frequency information i.e. to be processed), using hidden
Sound of whispering removes module, removes privacy voice messaging from original audio according to the location information of privacy information, obtains privacy language
The audio that sound has been eliminated.Module finally is obscured using vocal print feature, and vocal print feature is carried out to the audio that privacy voice has been eliminated
Obscure, to obtain the audio that privacy information completely eliminates.
Embodiment 2
According to embodiments of the present invention, a kind of embodiment of apparatus for processing audio is provided, Fig. 6 is according to embodiments of the present invention
A kind of apparatus for processing audio schematic diagram, as shown in fig. 6, the device includes:
Determining module 60, for determining location information of the target information in audio-frequency information to be processed, wherein location information
The time occurred in audio-frequency information to be processed including at least target information.
Searching module 62, for finding the target audio in audio-frequency information to be processed according to location information, wherein target
Audio is the audio fragment in audio-frequency information to be processed including target information.
Processing module 64, for carrying out predetermined process to target audio.
As a kind of optional embodiment, determining module includes: the first acquisition submodule, for obtaining preset keyword
Table;Second acquisition submodule, for obtaining the corresponding text information of audio-frequency information to be processed, wherein obtained text information with
The time shaft of audio-frequency information to be processed has the first corresponding relationship;First determines submodule, closes for identifying from text information
Word in keyword table, and determine that the text information of any keyword in hit antistop list is target information;Third obtains son
Module, for obtaining the timeline information of the corresponding audio fragment of target information according to the first corresponding relationship;Second determines submodule
Block, for determining that timeline information is location information.
As a kind of optional embodiment, determining module includes: the 4th acquisition submodule, special for obtaining preset first
Reference breath;Extracting sub-module, for extracting the second feature information in audio-frequency information to be processed, wherein second feature information with
The time shaft of audio-frequency information to be processed has the second corresponding relationship;Third determining module is used for second feature information and first
Characteristic information is matched, and determines that the second feature information of hit fisrt feature information is target information;5th acquisition submodule,
For obtaining the timeline information of the corresponding audio fragment of target information according to the second corresponding relationship;4th determines submodule, uses
In determine timeline information be location information.
As a kind of optional embodiment, processing module is for executing any one following step: target audio is removed;
Noise reduction processing is carried out to target audio;Target audio is replaced using the first preset audio;Is superimposed on the basis of target audio
Two preset audios.
As a kind of optional embodiment, above-mentioned apparatus further include: module is obscured, for making a reservation for target audio
After processing, feature is carried out to the characteristic information of the audio-frequency information to be processed after progress predetermined process and is obscured;Output module is used for
Export the audio-frequency information to be processed after feature is obscured.
As a kind of optional embodiment, above-mentioned apparatus further include: module is obtained, for determining target information wait locate
Before managing the location information in audio-frequency information, audio-frequency information is obtained, wherein audio-frequency information includes voice messaging;Module is denoised, is used
In carrying out Denoising disposal to audio-frequency information, audio-frequency information to be processed is obtained.
Embodiment 3
According to embodiments of the present invention, a kind of storage medium is provided, storage medium includes the program of storage, wherein in institute
Equipment where controlling the storage medium when stating program operation executes audio-frequency processing method described in any one of embodiment 1.
Embodiment 4
According to embodiments of the present invention, a kind of processor is provided, processor is for running program, wherein described program fortune
Audio-frequency processing method described in any one of embodiment 1 is executed when row.
The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.
In the above embodiment of the invention, it all emphasizes particularly on different fields to the description of each embodiment, does not have in some embodiment
The part of detailed description, reference can be made to the related descriptions of other embodiments.
In several embodiments provided herein, it should be understood that disclosed technology contents can pass through others
Mode is realized.Wherein, the apparatus embodiments described above are merely exemplary, such as the division of the unit, Ke Yiwei
A kind of logical function partition, there may be another division manner in actual implementation, for example, multiple units or components can combine or
Person is desirably integrated into another system, or some features can be ignored or not executed.Another point, shown or discussed is mutual
Between coupling, direct-coupling or communication connection can be through some interfaces, the INDIRECT COUPLING or communication link of unit or module
It connects, can be electrical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit
The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple
On unit.It can some or all of the units may be selected to achieve the purpose of the solution of this embodiment according to the actual needs.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit
It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list
Member both can take the form of hardware realization, can also realize in the form of software functional units.
If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product
When, it can store in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially
The all or part of the part that contributes to existing technology or the technical solution can be in the form of software products in other words
It embodies, which is stored in a storage medium, including some instructions are used so that a computer
Equipment (can for personal computer, server or network equipment etc.) execute each embodiment the method for the present invention whole or
Part steps.And storage medium above-mentioned includes: that USB flash disk, read-only memory (ROM, Read-Only Memory), arbitrary access are deposited
Reservoir (RAM, Random Access Memory), mobile hard disk, magnetic or disk etc. be various to can store program code
Medium.
The above is only a preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art
For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered
It is considered as protection scope of the present invention.
Claims (10)
1. a kind of audio-frequency processing method characterized by comprising
Determine location information of the target information in audio-frequency information to be processed, wherein the location information includes at least the mesh
The time that mark information occurs in the audio-frequency information to be processed;
The target audio in the audio-frequency information to be processed is found according to the location information, wherein the target audio is
It include the audio fragment of the target information in the audio-frequency information to be processed;
Predetermined process is carried out to the target audio.
2. the method according to claim 1, wherein determining positioning of the target information in audio-frequency information to be processed
Information, comprising:
Obtain preset antistop list;
Obtain the corresponding text information of the audio-frequency information to be processed, wherein the text information and the audio to be processed are believed
The time shaft of breath has the first corresponding relationship;
From the word identified in the text information in the antistop list, and determines and hit any key in the antistop list
The text information of word is the target information;
The timeline information of the corresponding audio fragment of the target information is obtained according to first corresponding relationship;
Determine that the timeline information is the location information.
3. the method according to claim 1, wherein determining positioning of the target information in audio-frequency information to be processed
Information, comprising:
Obtain preset fisrt feature information;
Extract the second feature information in the audio-frequency information to be processed, wherein the second feature information with it is described to be processed
The time shaft of audio-frequency information has the second corresponding relationship;
The second feature information is matched with the fisrt feature information, determines and hits the of the fisrt feature information
Two characteristic informations are the target information;
The timeline information of the corresponding audio fragment of the target information is obtained according to second corresponding relationship;
Determine that the timeline information is the location information.
4. the method according to claim 1, wherein carrying out predetermined process to the target audio, including as follows
Any one:
The target audio is removed;
Noise reduction processing is carried out to the target audio;
The target audio is replaced using the first preset audio;
The second preset audio is superimposed on the basis of the target audio.
5. the method according to claim 1, wherein to the target audio carry out predetermined process after, institute
State method further include:
Feature is carried out to the characteristic information of the audio-frequency information to be processed after progress predetermined process to obscure;
Export the audio-frequency information to be processed after feature is obscured.
6. the method according to claim 1, wherein determining that target information determines in audio-frequency information to be processed
Before the information of position, the method also includes:
Obtain audio-frequency information, wherein the audio-frequency information includes voice messaging;
Denoising disposal is carried out to the audio-frequency information, obtains the audio-frequency information to be processed.
7. a kind of apparatus for processing audio characterized by comprising
Determining module, for determining location information of the target information in audio-frequency information to be processed, wherein the location information is extremely
It less include the time that the target information occurs in the audio-frequency information to be processed;
Searching module, for finding the target audio in the audio-frequency information to be processed according to the location information, wherein institute
Stating target audio is the audio fragment in the audio-frequency information to be processed including the target information;
Processing module, for carrying out predetermined process to the target audio.
8. device according to claim 7, which is characterized in that determining module includes:
First acquisition submodule, for obtaining preset antistop list;
Second acquisition submodule, for obtaining the corresponding text information of the audio-frequency information to be processed, wherein the obtained text
The time shaft of this information and the audio-frequency information to be processed has the first corresponding relationship;
First determines submodule, for from the word identified in the antistop list in the text information, and determines hit institute
The text information for stating any keyword in antistop list is the target information;
Third acquisition submodule, for according to first corresponding relationship obtain the corresponding audio fragment of the target information when
Between axis information;
Second determines submodule, for determining that the timeline information is the location information.
9. a kind of storage medium, which is characterized in that the storage medium includes the program of storage, wherein run in described program
When control the storage medium where equipment perform claim require any one of 1 to 6 described in audio-frequency processing method.
10. a kind of processor, which is characterized in that the processor is for running program, wherein right of execution when described program is run
Benefit require any one of 1 to 6 described in audio-frequency processing method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811573764.4A CN109686369A (en) | 2018-12-21 | 2018-12-21 | Audio-frequency processing method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811573764.4A CN109686369A (en) | 2018-12-21 | 2018-12-21 | Audio-frequency processing method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109686369A true CN109686369A (en) | 2019-04-26 |
Family
ID=66188099
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811573764.4A Pending CN109686369A (en) | 2018-12-21 | 2018-12-21 | Audio-frequency processing method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109686369A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110660032A (en) * | 2019-09-24 | 2020-01-07 | Oppo广东移动通信有限公司 | Object shielding method, object shielding device and electronic equipment |
CN111524493A (en) * | 2020-05-27 | 2020-08-11 | 珠海格力智能装备有限公司 | Method and device for debugging music score |
CN111984175A (en) * | 2020-08-14 | 2020-11-24 | 维沃移动通信有限公司 | Audio information processing method and device |
CN112216275A (en) * | 2019-07-10 | 2021-01-12 | 阿里巴巴集团控股有限公司 | Voice information processing method and device and electronic equipment |
CN113033191A (en) * | 2021-03-30 | 2021-06-25 | 上海思必驰信息科技有限公司 | Voice data processing method, electronic device and computer readable storage medium |
CN113051902A (en) * | 2021-03-30 | 2021-06-29 | 上海思必驰信息科技有限公司 | Voice data desensitization method, electronic device and computer-readable storage medium |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102231277A (en) * | 2011-06-29 | 2011-11-02 | 电子科技大学 | Method for protecting mobile terminal privacy based on voiceprint recognition |
CN103945039A (en) * | 2014-04-28 | 2014-07-23 | 焦海宁 | External information source encryption and anti-eavesdrop interference device for voice communication device |
US20170034086A1 (en) * | 2015-07-31 | 2017-02-02 | Fujitsu Limited | Information presentation method, information presentation apparatus, and computer readable storage medium |
CN106504744A (en) * | 2016-10-26 | 2017-03-15 | 科大讯飞股份有限公司 | A kind of method of speech processing and device |
CN107369459A (en) * | 2017-08-29 | 2017-11-21 | 维沃移动通信有限公司 | A kind of audio-frequency processing method and mobile terminal |
CN108171073A (en) * | 2017-12-06 | 2018-06-15 | 复旦大学 | A kind of private data recognition methods based on the parsing driving of code layer semanteme |
-
2018
- 2018-12-21 CN CN201811573764.4A patent/CN109686369A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102231277A (en) * | 2011-06-29 | 2011-11-02 | 电子科技大学 | Method for protecting mobile terminal privacy based on voiceprint recognition |
CN103945039A (en) * | 2014-04-28 | 2014-07-23 | 焦海宁 | External information source encryption and anti-eavesdrop interference device for voice communication device |
US20170034086A1 (en) * | 2015-07-31 | 2017-02-02 | Fujitsu Limited | Information presentation method, information presentation apparatus, and computer readable storage medium |
CN106504744A (en) * | 2016-10-26 | 2017-03-15 | 科大讯飞股份有限公司 | A kind of method of speech processing and device |
CN107369459A (en) * | 2017-08-29 | 2017-11-21 | 维沃移动通信有限公司 | A kind of audio-frequency processing method and mobile terminal |
CN108171073A (en) * | 2017-12-06 | 2018-06-15 | 复旦大学 | A kind of private data recognition methods based on the parsing driving of code layer semanteme |
Non-Patent Citations (3)
Title |
---|
李德毅等: "《中国科协新一代信息技术系列丛书人工智能导论》", 31 August 2018 * |
李鹏等: "生物特征模板保护", 《软件学报》 * |
黄蓝: "个人信息保护的国际比较与启示", 《情报科学》 * |
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112216275A (en) * | 2019-07-10 | 2021-01-12 | 阿里巴巴集团控股有限公司 | Voice information processing method and device and electronic equipment |
CN110660032A (en) * | 2019-09-24 | 2020-01-07 | Oppo广东移动通信有限公司 | Object shielding method, object shielding device and electronic equipment |
CN111524493A (en) * | 2020-05-27 | 2020-08-11 | 珠海格力智能装备有限公司 | Method and device for debugging music score |
CN111984175A (en) * | 2020-08-14 | 2020-11-24 | 维沃移动通信有限公司 | Audio information processing method and device |
CN111984175B (en) * | 2020-08-14 | 2022-02-18 | 维沃移动通信有限公司 | Audio information processing method and device |
CN113033191A (en) * | 2021-03-30 | 2021-06-25 | 上海思必驰信息科技有限公司 | Voice data processing method, electronic device and computer readable storage medium |
CN113051902A (en) * | 2021-03-30 | 2021-06-29 | 上海思必驰信息科技有限公司 | Voice data desensitization method, electronic device and computer-readable storage medium |
CN113051902B (en) * | 2021-03-30 | 2024-10-11 | 上海思必驰信息科技有限公司 | Voice data desensitizing method, electronic equipment and computer readable storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109686369A (en) | Audio-frequency processing method and device | |
CN106847292B (en) | Method for recognizing sound-groove and device | |
CN110335612A (en) | Minutes generation method, device and storage medium based on speech recognition | |
CN108875682A (en) | Information-pushing method and device | |
CN106230689B (en) | A kind of method, apparatus and server of voice messaging interaction | |
CN109190124B (en) | Method and apparatus for participle | |
CN107180080B (en) | A kind of intelligent answer method and device of more interactive modes | |
CN104142831B (en) | Application program searching method and device | |
CN111640420B (en) | Audio data processing method and device and storage medium | |
CN107943914A (en) | Voice information processing method and device | |
CN109726265A (en) | Assist information processing method, equipment and the computer readable storage medium of chat | |
CN112562681B (en) | Speech recognition method and apparatus, and storage medium | |
CN104038473A (en) | Method of audio ad insertion, device, equipment and system | |
CN109214683A (en) | A kind of Application of risk decision method and device | |
CN112235230B (en) | Malicious traffic identification method and system | |
CN108255943A (en) | Human-computer dialogue method for evaluating quality, device, computer equipment and storage medium | |
CN110019848A (en) | Conversation interaction method and device and robot | |
CN108806684A (en) | Position indicating method, device, storage medium and electronic equipment | |
CN112183098A (en) | Session processing method and device, storage medium and electronic device | |
CN103227721A (en) | System and method for starting application | |
CN109657093A (en) | Audio search method, device and storage medium | |
CN109873751A (en) | Group chat voice information processing method and device, storage medium and server | |
CN109545226A (en) | A kind of audio recognition method, equipment and computer readable storage medium | |
CN104394169A (en) | Method and server for anonymously sending private messages by both parties | |
CN108959552A (en) | Recognition methods, device, equipment and the storage medium of question and answer class query statement |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190426 |