CN109686369A

CN109686369A - Audio-frequency processing method and device

Info

Publication number: CN109686369A
Application number: CN201811573764.4A
Authority: CN
Inventors: 唐大闰; 徐浩; 吴明辉
Original assignee: Miaozhen Systems Information Technology Co Ltd
Current assignee: Miaozhen Information Technology Co Ltd; Miaozhen Systems Information Technology Co Ltd
Priority date: 2018-12-21
Filing date: 2018-12-21
Publication date: 2019-04-26

Abstract

The invention discloses a kind of audio-frequency processing method and devices.Wherein, this method comprises: determining location information of the target information in audio-frequency information to be processed, wherein location information includes at least the time that target information occurs in audio-frequency information to be processed；The target audio in audio-frequency information to be processed is found according to location information, wherein target audio is the audio fragment in audio-frequency information to be processed including target information；Predetermined process is carried out to target audio.The object that the present invention solves user under monitoring field in the prior art is easy the technical issues of divulging privacy.

Description

Audio-frequency processing method and device

Technical field

The present invention relates to field of audio processing, in particular to a kind of audio-frequency processing method and device.

Background technique

In fields such as monitoring, vehicle intelligent, smart home, mobile phone speech assistants, audio usually can be all identified, with therefrom Extracting Information completes interaction or information excavating with user.In this course, partial information can leave radio reception end, send to mentioning For the cloud of the enterprise of service.Due to carrying that the vocal print of user, part be conscious in the voice of user or unconscious generation Privacy content etc., therefore there are the risks of identity information, leakage of private information to enterprise cloud.

Currently, vehicle intelligent or smart home are generally as follows the processing mode of the audio-frequency information to be processed received: this Only there is reaction at ground radio reception end to wake-up word, wakes up word and does not upload, and only uploads the preceding n after waking up and takes turns voice, defaults this few wheel voice Itself it is not related to privacy.Other when be in close state, to it is non-wake up word voice content do not receive.Mobile phone speech assistant couple The processing mode of the audio-frequency information to be processed received is generally as follows: before use, can be confirmed agreement by user, legally be done Evade.The dialogue of user and voice assistant are that user actively initiates, and user actively terminates.Voice is all handled beyond the clouds, is defaulted In the process, it is not related to privacy information.Monitoring system is generally as follows the processing mode of the audio-frequency information to be processed received: What the voice that monitored object may and be unaware of oneself was being enrolled, therefore be more likely to reveal privacy information.

From the foregoing, it will be observed that only can be used in the scene that user cooperates on one's own initiative in existing audio processing scheme, and such as: hand Machine aided hand, smart home etc., but it is not used in monitoring scene.Since under monitoring scene, user is talked with machine, because This will not active dodge privacy when speaking.

Aiming at the problem that object of user under monitoring field in the prior art is easy to divulge privacy, not yet propose at present effective Solution.

Summary of the invention

The embodiment of the invention provides a kind of audio-frequency processing method and devices, at least to solve monitoring field in the prior art The object of lower user is easy the technical issues of divulging privacy.

According to an aspect of an embodiment of the present invention, a kind of audio-frequency processing method is provided, comprising: determine that target information exists Location information in audio-frequency information to be processed, wherein location information includes at least target information and goes out in audio-frequency information to be processed The existing time；The target audio in audio-frequency information to be processed is found according to location information, wherein target audio is sound to be processed It include the audio fragment of target information in frequency information；Predetermined process is carried out to target audio.

Further, preset antistop list is obtained；Obtain the corresponding text information of audio-frequency information to be processed, wherein text The time shaft of this information and audio-frequency information to be processed has the first corresponding relationship；From the word identified in text information in antistop list Language, and determine that the text information of any keyword in hit antistop list is target information；Mesh is obtained according to the first corresponding relationship Mark the timeline information of the corresponding audio fragment of information；Determine that timeline information is location information.

Further, preset fisrt feature information is obtained；The second feature information in audio-frequency information to be processed is extracted, In, the time shaft of second feature information and audio-frequency information to be processed has the second corresponding relationship；By second feature information and first Characteristic information is matched, and determines that the second feature information of hit fisrt feature information is target information；According to the second corresponding pass System obtains the timeline information of the corresponding audio fragment of target information；Determine that timeline information is location information.

Further, predetermined process carried out to target audio, including it is following any one: target audio is removed；To mesh Mark with phonetic symbols frequency carries out noise reduction processing；Target audio is replaced using the first preset audio；Superposition second is pre- on the basis of target audio If audio.

Further, after carrying out predetermined process to target audio, the audio to be processed after progress predetermined process is believed The characteristic information of breath, which carries out feature, to be obscured；Export the audio-frequency information to be processed after feature is obscured.

Further, determine target information before the location information in audio-frequency information to be processed, obtain audio-frequency information, Wherein, audio-frequency information includes voice messaging；Denoising disposal is carried out to audio-frequency information, obtains audio-frequency information to be processed.

According to another aspect of an embodiment of the present invention, a kind of apparatus for processing audio is additionally provided, comprising: determining module is used In determining location information of the target information in audio-frequency information to be processed, wherein location information include at least target information to The time occurred in processing audio-frequency information；Searching module, for finding the mesh in audio-frequency information to be processed according to location information Mark with phonetic symbols frequency, wherein target audio is the audio fragment in audio-frequency information to be processed including target information；Processing module, for pair Target audio carries out predetermined process.

Further, it is determined that module includes: the first acquisition submodule, for obtaining preset antistop list；Second obtains Submodule, for obtaining the corresponding text information of audio-frequency information to be processed, wherein obtained text information and audio to be processed are believed The time shaft of breath has the first corresponding relationship；First determine submodule, for from text information identify antistop list in word Language, and determine that the text information of any keyword in hit antistop list is target information；Third acquisition submodule is used for basis First corresponding relationship obtains the timeline information of the corresponding audio fragment of target information；Second determines submodule, when for determining Between axis information be location information.

According to another aspect of an embodiment of the present invention, a kind of storage medium is additionally provided, storage medium includes the journey of storage Sequence, wherein equipment where control storage medium executes above-mentioned audio-frequency processing method in program operation.

According to another aspect of an embodiment of the present invention, a kind of processor is additionally provided, processor is used to run program, In, program executes above-mentioned audio-frequency processing method when running.

In embodiments of the present invention, location information of the target information in audio-frequency information to be processed is determined, wherein positioning letter Breath includes at least the time that target information occurs in audio-frequency information to be processed；Audio letter to be processed is found according to location information Target audio in breath, wherein target audio is the audio fragment in audio-frequency information to be processed including target information；To target sound Frequency carries out predetermined process.Above scheme finds out target sound from audio-frequency information to be processed by positioning to target information Frequently, and by carrying out predetermined process to target audio, specially treated is carried out to the target information in voice messaging to reach Purpose can carry out specially treated to privacy information, be protected with the privacy information to user and then in monitoring field, The object for solving user under monitoring field in the prior art is easy the technical issues of divulging privacy.

Detailed description of the invention

The drawings described herein are used to provide a further understanding of the present invention, constitutes part of this application, this hair Bright illustrative embodiments and their description are used to explain the present invention, and are not constituted improper limitations of the present invention.In the accompanying drawings:

Fig. 1 is a kind of flow chart of audio-frequency processing method according to an embodiment of the present invention；

Fig. 2 is a kind of text based privacy voice location model according to an embodiment of the present invention；

Fig. 3 is a kind of privacy voice location model based on feature according to an embodiment of the present invention；

Fig. 4 is a kind of schematic diagram for carrying out feature and obscuring according to an embodiment of the present invention；

Fig. 5 is a kind of schematic diagram of audio-frequency processing method according to an embodiment of the present invention；And

Fig. 6 is a kind of schematic diagram of apparatus for processing audio according to an embodiment of the present invention.

Specific embodiment

In order to enable those skilled in the art to better understand the solution of the present invention, below in conjunction in the embodiment of the present invention Attached drawing, technical scheme in the embodiment of the invention is clearly and completely described, it is clear that described embodiment is only The embodiment of a part of the invention, instead of all the embodiments.Based on the embodiments of the present invention, ordinary skill people The model that the present invention protects all should belong in member's every other embodiment obtained without making creative work It encloses.

It should be noted that description and claims of this specification and term " first " in above-mentioned attached drawing, " Two " etc. be to be used to distinguish similar objects, without being used to describe a particular order or precedence order.It should be understood that using in this way Data be interchangeable under appropriate circumstances, so as to the embodiment of the present invention described herein can in addition to illustrating herein or Sequence other than those of description is implemented.In addition, term " includes " and " having " and their any deformation, it is intended that cover Cover it is non-exclusive include, for example, the process, method, system, product or equipment for containing a series of steps or units are not necessarily limited to Step or unit those of is clearly listed, but may include be not clearly listed or for these process, methods, product Or other step or units that equipment is intrinsic.

Embodiment 1

According to embodiments of the present invention, a kind of embodiment of audio-frequency processing method is provided, it should be noted that in attached drawing The step of process illustrates can execute in a computer system such as a set of computer executable instructions, although also, Logical order is shown in flow chart, but in some cases, it can be to be different from shown by sequence execution herein or retouch The step of stating.

Fig. 1 is a kind of flow chart of audio-frequency processing method according to an embodiment of the present invention, as shown in Figure 1, this method includes Following steps:

Step S102 determines location information of the target information in audio-frequency information to be processed, wherein location information is at least wrapped Include the time that target information occurs in audio-frequency information to be processed.

Specifically, above-mentioned audio-frequency information to be processed is the audio-frequency information for including voice messaging, above-mentioned target information be can be Privacy information, or the information to acquire a special sense.

In an alternative embodiment, in monitoring field, the privacy of user speech is revealed in order to prevent, then can will be related to And the information of privacy is as target information, and such as: name, passport NO. etc..

In an alternative embodiment, in data analysis field, in order to determine that user inclines to the emotion of certain object To, the information of above-mentioned object can be involved in as target information, such as: to the evaluation sentence of the object, to it is similar other Comment sentence of object etc..

Above-mentioned location information includes at least the time that target information occurs in audio-frequency information to be processed, therefore according to positioning Information can position the position occurred to target audio in audio-frequency information to be processed, to handle target audio.One In kind optional embodiment, the time that target information occurs in audio-frequency information to be processed can be the mesh for including target information Mark with phonetic symbols frequency duration is also possible to the time point that target audio starts.

Step S104 finds the target audio in audio-frequency information to be processed according to location information, wherein target audio is It include the audio fragment of target information in audio-frequency information to be processed.

Specifically, can directly be believed according to positioning in the case where location information includes target audio duration Breath finds target audio from audio-frequency information to be processed, only includes the case where the time point that target audio starts in location information Under, preset length can be set, preset time is extended according to the time started backward, as the end time of target audio, thus Target audio is determined according to starting and end time.

Step S106 carries out predetermined process to target audio.

Specifically, above-mentioned predetermined process is for being concealed or being protruded to the target information in audio-frequency information to be processed.

In an alternative embodiment, in monitoring field, target audio can be carried out concealing processing, to prevent from using The leakage of family privacy.

Herein it should be noted that above-mentioned steps can be executed by the terminal of acquisition audio, terminal passes through the above method pair After audio is handled, by the server or other servers of treated audio is sent to monitoring, for example, when the above method is answered When for mobile phone, above scheme is executed by the mobile phone of user, above-mentioned processing carried out to collected audio, then will treated sound Frequency is sent to the monitoring server of distal end.

It, can be to prompt tone be inserted into before target audio, with right in data analysis field in an alternative embodiment Target audio is protruded, to improve data analysis efficiency.

From the foregoing, it will be observed that the above embodiments of the present application determine location information of the target information in audio-frequency information to be processed, In, location information includes at least the time that target information occurs in audio-frequency information to be processed；According to location information find to Handle the target audio in audio-frequency information, wherein target audio is the audio piece in audio-frequency information to be processed including target information Section；Predetermined process is carried out to target audio.Above scheme is looked into from audio-frequency information to be processed by positioning to target information Find out target audio, and by carrying out predetermined process to target audio, thus reached to the target information in voice messaging into The purpose of row specially treated, and then in monitoring field, specially treated can be carried out to privacy information, with the privacy information to user It is protected, the object for solving user under monitoring field in the prior art is easy the technical issues of divulging privacy.

As a kind of optional embodiment, location information of the target information in audio-frequency information to be processed is determined, comprising: obtain Take preset antistop list；Obtain the corresponding text information of audio-frequency information to be processed, wherein text information and audio to be processed are believed The time shaft of breath has the first corresponding relationship；It identifies that antistop list is matched from text information, determines hit antistop list In the text information of any keyword be target information；The corresponding audio fragment of target information is obtained according to the first corresponding relationship Timeline information；Determine that timeline information is location information.

Above scheme determines the target information in audio-frequency information to be processed according to the corresponding text information of voice messaging Position.Specifically, above-mentioned antistop list may include scheduled word and sentence, and can be empirically determined, in different scenes Different antistop lists can be determined, for example, often will appear user in the monitoring scene of business hall and say name, certificate number The information such as code, therefore can be using these information and relevant information as antistop list；For another example in the monitoring scene of catering environment In, user can often say the information such as name, place during talk, therefore can be using these information and relevant information as pass Keyword.

Still in the above scheme, the voice messaging in audio-frequency information to be processed is converted to by speech recognition module first Text information, then from the word or sentence identified in text information in antistop list, to obtain the mesh for including in text information Mark information.

By taking target information is privacy information as an example, above-mentioned steps can be real by text based privacy voice location model Existing, Fig. 2 is a kind of text based privacy voice location model according to an embodiment of the present invention, in a kind of optional embodiment In, still by taking monitoring field as an example, as shown in connection with fig. 2, predetermined antistop list or particular statement are obtained first, the keyword Table or particular statement are privacy text decision rule, which switchs to audio according to the empirically determined of usage scenario Text, then by speech recognition module, in the corresponding text of audio, identify the word or particular statement in antistop list, by Timeline information comprising corresponding voice in text, therefore can be determined privacy voice on a timeline according to recognition result Position, i.e., above-mentioned location information finally export the position of all privacy informations on a timeline, then complete determining for target information Position.

As a kind of optional embodiment, location information of the target information in audio-frequency information to be processed is determined, comprising: obtain Take preset fisrt feature information；Extract the second feature information in audio-frequency information to be processed, wherein second feature information with to The time shaft for handling audio-frequency information has the second corresponding relationship；Second feature information is matched with fisrt feature information, really Surely the second feature information for hitting fisrt feature information is target information；It is corresponding that target information is obtained according to the second corresponding relationship The timeline information of audio fragment；Determine that timeline information is location information.

Specifically, above-mentioned fisrt feature information is preset characteristic information, certain mood subaudio frequency characteristic information can be, Audio frequency characteristics may include voiceprint, timbre information and tone information etc..Under different scenes, it can be set by experience Set different fisrt feature information.

Features described above matching module by the second feature information in preset fisrt feature information and audio to be processed into Row matching determines that the second feature information of hit fisrt feature information is target information.

Still by taking target information is privacy information as an example, above-mentioned steps can pass through the privacy voice location model based on feature It realizes, in an alternative embodiment, Fig. 3 is a kind of privacy voice positioning mould based on feature according to an embodiment of the present invention Type first according to usage scenario, determines rule description (the i.e. above-mentioned preset fisrt feature letter of privacy feature as shown in connection with fig. 3 Breath), it reuses characteristic extracting module and extracts second feature information from audio-frequency information to be processed, the second feature information and time Axis is corresponding.Characteristic matching module is finally used, second feature information and fisrt feature information are subjected to characteristic matching, depending on Position meets the time shaft position of privacy feature, which is above-mentioned location information.All privacy informations of final output Audio-frequency information to be processed time shaft on position.

As a kind of optional embodiment, predetermined process carried out to target audio, including it is following any one: by target sound Frequency is removed；Noise reduction processing is carried out to target audio；Target audio is replaced using the first preset audio；On the basis of target audio It is superimposed the second preset audio；The second preset audio is superimposed on the basis of target audio.

Above scheme provides four kinds of processing modes to target audio, is illustrated separately below.

In an alternative embodiment, target audio is removed, which can be from audio to be processed, by target Audio is intercepted and is abandoned, and treated, and an audio-frequency information may be divided into multistage.For example, in audio-frequency information to be processed, the 00th: There is target audio within 02-01:00 seconds, then intercepted out from audio-frequency information to be processed 00:02-01:00 seconds, thus by target sound Frequency is removed, and then protects the privacy of user.

Also in an alternative embodiment, noise reduction processing is carried out to target audio, so that the mesh in target audio Mark information is concealed, and then protects the privacy of user.

In an alternative embodiment, target audio is replaced using the first preset audio.Above-mentioned first preset audio It can be the audio intercepted from music, be also possible to the audio recorded in advance, replace target audio using the first preset audio Afterwards, target audio is concealed, and then protects the privacy of user.

In another optional embodiment, above-mentioned second preset audio is also possible to the audio intercepted from music, or The audio recorded in advance, after the second preset audio is superimposed upon target audio, the second preset audio covers target audio, thus So that the information of target audio is difficult to reveal, and then protect the privacy of user.

As a kind of optional embodiment, after carrying out predetermined process to target audio, the above method further include: into The characteristic information of audio-frequency information to be processed after row predetermined process, which carries out feature, to be obscured；Export the audio to be processed after feature is obscured Information.

In the above scheme, predetermined process is carried out to target audio, to make the privacy information in audio-frequency information to be processed It is concealed.Above-mentioned steps carry out feature to the characteristic information for the audio to be processed for concealing target information and obscure, to avoid sound is passed through Identity of the frequency acquisition of information to the user for issuing voice messaging.

Specifically, features described above information can be vocal print feature, tonality feature and tamber characteristic etc., to characteristic information into Row is obscured, and can be and deforms to characteristic information, to obscure the feature of audio-frequency information itself, so that being difficult to pass through audio Information goes out the identity of user.

In an alternative embodiment, by taking characteristic information is tamber characteristic as an example, can tone color to audio-frequency information it is special Sign is deformed, and is obscured with the characteristic information to audio-frequency information.

In an alternative embodiment, characteristic information is for voiceprint, above scheme can pass through vocal print spy Sign obscures module execution, and Fig. 4 is that a kind of schematic diagram that progress feature is obscured according to an embodiment of the present invention as shown in connection with fig. 4 can To obtain empirically determined vocal print feature type in advance, using vocal print feature type as carrying out obscuring processing to audio-frequency information Rule.When handling audio-frequency information, using characteristic extracting module, vocal print feature is extracted from audio-frequency information, then By vocal print feature locating module, the vocal print feature for belonging to vocal print feature type is determined from the vocal print feature of audio-frequency information, And determining vocal print is obscured by vocal print feature deformation module, obtain the audio-frequency information that vocal print feature is confused.

It should be noted that vocal print is the sound wave spectrum for the carrying verbal information being display together with electroacoustics, pass through vocal print It can accurately determine the identity of speaker, even if the tone color or tone to audio are adjusted, still be difficult to influence judgement knot Fruit, therefore in order to which the identity to speaker is protected, the voiceprint of audio-frequency information can be obscured, thus maximum journey The identity of degree protection speaker.

It follows that not only having been carried out to the privacy content in audio-frequency information to be processed hidden in the application above scheme It goes, avoids the leakage of the voice privacy of user under monitoring scene, spy also has been carried out to the characteristic information of audio-frequency information to be processed Sign is obscured, ensure that the leakage of user identity under monitoring scene, to provide safer privacy guarantee for user.

As a kind of optional embodiment, determine target information before the location information in audio-frequency information to be processed, Method further include: obtain audio-frequency information, wherein audio-frequency information includes voice messaging；Denoising disposal is carried out to audio-frequency information, is obtained To audio-frequency information to be processed.

Specifically, above-mentioned audio-frequency information can be the audio-frequency information of monitoring device acquisition, since monitoring device acquires audio There may be other sound for the environment of information, so if directly handling audio-frequency information, then may be subjected to noise Interference.Therefore after getting audio-frequency information, in order to which the privacy information to voice messaging in audio-frequency information is handled, first It needs to carry out denoising to audio-frequency information, and then to obtain from the noise information removed in audio-frequency information except voice messaging Above-mentioned audio-frequency information to be processed.

Fig. 5 is a kind of schematic diagram of audio-frequency processing method according to an embodiment of the present invention, first can basis in conjunction with Fig. 5 Privacy voice location model determines the location information of privacy information in original audio (audio-frequency information i.e. to be processed), using hidden Sound of whispering removes module, removes privacy voice messaging from original audio according to the location information of privacy information, obtains privacy language The audio that sound has been eliminated.Module finally is obscured using vocal print feature, and vocal print feature is carried out to the audio that privacy voice has been eliminated Obscure, to obtain the audio that privacy information completely eliminates.

Embodiment 2

According to embodiments of the present invention, a kind of embodiment of apparatus for processing audio is provided, Fig. 6 is according to embodiments of the present invention A kind of apparatus for processing audio schematic diagram, as shown in fig. 6, the device includes:

Determining module 60, for determining location information of the target information in audio-frequency information to be processed, wherein location information The time occurred in audio-frequency information to be processed including at least target information.

Searching module 62, for finding the target audio in audio-frequency information to be processed according to location information, wherein target Audio is the audio fragment in audio-frequency information to be processed including target information.

Processing module 64, for carrying out predetermined process to target audio.

As a kind of optional embodiment, determining module includes: the first acquisition submodule, for obtaining preset keyword Table；Second acquisition submodule, for obtaining the corresponding text information of audio-frequency information to be processed, wherein obtained text information with The time shaft of audio-frequency information to be processed has the first corresponding relationship；First determines submodule, closes for identifying from text information Word in keyword table, and determine that the text information of any keyword in hit antistop list is target information；Third obtains son Module, for obtaining the timeline information of the corresponding audio fragment of target information according to the first corresponding relationship；Second determines submodule Block, for determining that timeline information is location information.

As a kind of optional embodiment, determining module includes: the 4th acquisition submodule, special for obtaining preset first Reference breath；Extracting sub-module, for extracting the second feature information in audio-frequency information to be processed, wherein second feature information with The time shaft of audio-frequency information to be processed has the second corresponding relationship；Third determining module is used for second feature information and first Characteristic information is matched, and determines that the second feature information of hit fisrt feature information is target information；5th acquisition submodule, For obtaining the timeline information of the corresponding audio fragment of target information according to the second corresponding relationship；4th determines submodule, uses In determine timeline information be location information.

As a kind of optional embodiment, processing module is for executing any one following step: target audio is removed； Noise reduction processing is carried out to target audio；Target audio is replaced using the first preset audio；Is superimposed on the basis of target audio Two preset audios.

As a kind of optional embodiment, above-mentioned apparatus further include: module is obscured, for making a reservation for target audio After processing, feature is carried out to the characteristic information of the audio-frequency information to be processed after progress predetermined process and is obscured；Output module is used for Export the audio-frequency information to be processed after feature is obscured.

As a kind of optional embodiment, above-mentioned apparatus further include: module is obtained, for determining target information wait locate Before managing the location information in audio-frequency information, audio-frequency information is obtained, wherein audio-frequency information includes voice messaging；Module is denoised, is used In carrying out Denoising disposal to audio-frequency information, audio-frequency information to be processed is obtained.

Embodiment 3

According to embodiments of the present invention, a kind of storage medium is provided, storage medium includes the program of storage, wherein in institute Equipment where controlling the storage medium when stating program operation executes audio-frequency processing method described in any one of embodiment 1.

Embodiment 4

According to embodiments of the present invention, a kind of processor is provided, processor is for running program, wherein described program fortune Audio-frequency processing method described in any one of embodiment 1 is executed when row.

The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.

In the above embodiment of the invention, it all emphasizes particularly on different fields to the description of each embodiment, does not have in some embodiment The part of detailed description, reference can be made to the related descriptions of other embodiments.

In several embodiments provided herein, it should be understood that disclosed technology contents can pass through others Mode is realized.Wherein, the apparatus embodiments described above are merely exemplary, such as the division of the unit, Ke Yiwei A kind of logical function partition, there may be another division manner in actual implementation, for example, multiple units or components can combine or Person is desirably integrated into another system, or some features can be ignored or not executed.Another point, shown or discussed is mutual Between coupling, direct-coupling or communication connection can be through some interfaces, the INDIRECT COUPLING or communication link of unit or module It connects, can be electrical or other forms.

The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple On unit.It can some or all of the units may be selected to achieve the purpose of the solution of this embodiment according to the actual needs.

It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of software functional units.

If the integrated unit is realized in the form of SFU software functional unit and sells or use as independent product When, it can store in a computer readable storage medium.Based on this understanding, technical solution of the present invention is substantially The all or part of the part that contributes to existing technology or the technical solution can be in the form of software products in other words It embodies, which is stored in a storage medium, including some instructions are used so that a computer Equipment (can for personal computer, server or network equipment etc.) execute each embodiment the method for the present invention whole or Part steps.And storage medium above-mentioned includes: that USB flash disk, read-only memory (ROM, Read-Only Memory), arbitrary access are deposited Reservoir (RAM, Random Access Memory), mobile hard disk, magnetic or disk etc. be various to can store program code Medium.

The above is only a preferred embodiment of the present invention, it is noted that for the ordinary skill people of the art For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered It is considered as protection scope of the present invention.

Claims

1. a kind of audio-frequency processing method characterized by comprising

Determine location information of the target information in audio-frequency information to be processed, wherein the location information includes at least the mesh The time that mark information occurs in the audio-frequency information to be processed；

The target audio in the audio-frequency information to be processed is found according to the location information, wherein the target audio is It include the audio fragment of the target information in the audio-frequency information to be processed；

Predetermined process is carried out to the target audio.

2. the method according to claim 1, wherein determining positioning of the target information in audio-frequency information to be processed Information, comprising:

Obtain preset antistop list；

Obtain the corresponding text information of the audio-frequency information to be processed, wherein the text information and the audio to be processed are believed The time shaft of breath has the first corresponding relationship；

From the word identified in the text information in the antistop list, and determines and hit any key in the antistop list The text information of word is the target information；

The timeline information of the corresponding audio fragment of the target information is obtained according to first corresponding relationship；

Determine that the timeline information is the location information.

3. the method according to claim 1, wherein determining positioning of the target information in audio-frequency information to be processed Information, comprising:

Obtain preset fisrt feature information；

Extract the second feature information in the audio-frequency information to be processed, wherein the second feature information with it is described to be processed The time shaft of audio-frequency information has the second corresponding relationship；

The second feature information is matched with the fisrt feature information, determines and hits the of the fisrt feature information Two characteristic informations are the target information；

The timeline information of the corresponding audio fragment of the target information is obtained according to second corresponding relationship；

Determine that the timeline information is the location information.

4. the method according to claim 1, wherein carrying out predetermined process to the target audio, including as follows Any one:

The target audio is removed；

Noise reduction processing is carried out to the target audio；

The target audio is replaced using the first preset audio；

The second preset audio is superimposed on the basis of the target audio.

5. the method according to claim 1, wherein to the target audio carry out predetermined process after, institute State method further include:

Feature is carried out to the characteristic information of the audio-frequency information to be processed after progress predetermined process to obscure；

Export the audio-frequency information to be processed after feature is obscured.

6. the method according to claim 1, wherein determining that target information determines in audio-frequency information to be processed Before the information of position, the method also includes:

Obtain audio-frequency information, wherein the audio-frequency information includes voice messaging；

Denoising disposal is carried out to the audio-frequency information, obtains the audio-frequency information to be processed.

7. a kind of apparatus for processing audio characterized by comprising

Determining module, for determining location information of the target information in audio-frequency information to be processed, wherein the location information is extremely It less include the time that the target information occurs in the audio-frequency information to be processed；

Searching module, for finding the target audio in the audio-frequency information to be processed according to the location information, wherein institute Stating target audio is the audio fragment in the audio-frequency information to be processed including the target information；

Processing module, for carrying out predetermined process to the target audio.

8. device according to claim 7, which is characterized in that determining module includes:

First acquisition submodule, for obtaining preset antistop list；

Second acquisition submodule, for obtaining the corresponding text information of the audio-frequency information to be processed, wherein the obtained text The time shaft of this information and the audio-frequency information to be processed has the first corresponding relationship；

First determines submodule, for from the word identified in the antistop list in the text information, and determines hit institute The text information for stating any keyword in antistop list is the target information；

Third acquisition submodule, for according to first corresponding relationship obtain the corresponding audio fragment of the target information when Between axis information；

Second determines submodule, for determining that the timeline information is the location information.

9. a kind of storage medium, which is characterized in that the storage medium includes the program of storage, wherein run in described program When control the storage medium where equipment perform claim require any one of 1 to 6 described in audio-frequency processing method.

10. a kind of processor, which is characterized in that the processor is for running program, wherein right of execution when described program is run Benefit require any one of 1 to 6 described in audio-frequency processing method.