CN112102854A - Recording filtering method and device and computer readable storage medium - Google Patents
Recording filtering method and device and computer readable storage medium Download PDFInfo
- Publication number
- CN112102854A CN112102854A CN202010999917.2A CN202010999917A CN112102854A CN 112102854 A CN112102854 A CN 112102854A CN 202010999917 A CN202010999917 A CN 202010999917A CN 112102854 A CN112102854 A CN 112102854A
- Authority
- CN
- China
- Prior art keywords
- recording
- preset
- voice
- filtering
- speaker
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000001914 filtration Methods 0.000 title claims abstract description 107
- 238000000034 method Methods 0.000 title claims abstract description 56
- 238000004891 communication Methods 0.000 description 16
- 230000008569 process Effects 0.000 description 11
- 238000010586 diagram Methods 0.000 description 7
- 230000006870 function Effects 0.000 description 6
- 238000012986 modification Methods 0.000 description 3
- 230000004048 modification Effects 0.000 description 3
- 230000004044 response Effects 0.000 description 3
- 238000001514 detection method Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 230000001413 cellular effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000003287 optical effect Effects 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000000717 retained effect Effects 0.000 description 1
- 230000035939 shock Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B20/00—Signal processing not specific to the method of recording or reproducing; Circuits therefor
- G11B20/10—Digital recording or reproducing
- G11B20/10009—Improvement or modification of read or write signals
- G11B20/10046—Improvement or modification of read or write signals filtering or equalising, e.g. setting the tap weights of an FIR filter
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L25/00—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
- G10L25/48—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
- G10L25/51—Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B20/00—Signal processing not specific to the method of recording or reproducing; Circuits therefor
- G11B20/10—Digital recording or reproducing
- G11B20/10527—Audio or video recording; Data buffering arrangements
-
- G—PHYSICS
- G11—INFORMATION STORAGE
- G11B—INFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
- G11B20/00—Signal processing not specific to the method of recording or reproducing; Circuits therefor
- G11B20/10—Digital recording or reproducing
- G11B20/10527—Audio or video recording; Data buffering arrangements
- G11B2020/10537—Audio or video recording
- G11B2020/10546—Audio or video recording specifically adapted for audio data
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Signal Processing (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Quality & Reliability (AREA)
- Telephonic Communication Services (AREA)
Abstract
The sound recording filtering method disclosed by the invention is used for carrying out voice recognition analysis on the first sound recording; filtering the first recording according to a preset rule to obtain a second recording; wherein the preset rule comprises: preserving or filtering recordings of a preset voice type, the preset voice type comprising: human voice, music, noise; or, the recording meeting the preset condition is reserved or filtered, wherein the preset condition comprises at least one of a preset age range, a preset gender and a preset voiceprint characteristic parameter. Therefore, the recording filtering method provided by the invention can filter the recording according to the preset rule, filter the invalid recording, only reserve the valid recording, reduce the time for manually carrying out playback recognition on the recording, and improve the efficiency of the recording playback recognition.
Description
Technical Field
The invention relates to the technical field of recording processing, in particular to a recording filtering method and device and a computer readable storage medium.
Background
With the continuous popularization of electronic products and the continuous development of electronic technologies, people usually choose to record in a recording mode in scenes (such as meeting scenes or monitoring scenes) needing real-time recording, and then manually play back a recording file, identify and screen effective recordings and manually convert the effective recordings into characters.
Because the duration of the recording file is usually longer, and more invalid recordings may exist in the middle, more time needs to be consumed for manually playing back and identifying the recording, and the efficiency is lower.
Disclosure of Invention
In view of the above, the present invention provides a recording filtering method, an apparatus and a computer-readable storage medium to solve the above technical problems.
Firstly, in order to achieve the above object, the present invention provides a recording filtering method, including:
performing voice recognition analysis on the first sound recording;
filtering the first recording according to a preset rule to obtain a second recording;
wherein the preset rule comprises:
preserving or filtering recordings of a preset voice type, the preset voice type comprising: human voice, music, noise;
or, the recording meeting the preset condition is reserved or filtered, wherein the preset condition comprises at least one of a preset age range, a preset gender and a preset voiceprint characteristic parameter.
Optionally, the performing voice recognition analysis on the first audio recording includes:
performing voice classification on the first sound recording to obtain a voice type, wherein the voice type comprises: human voice, noise, music;
if the voice type is voice, performing voiceprint recognition on the first recording to obtain voiceprint characteristic parameters of the speaker, and/or performing gender judgment on the first recording to obtain gender of the speaker, and/or performing age range judgment on the first recording to obtain age range of the speaker.
Optionally, the preset rule includes retaining or filtering a recording of a preset voice type, and the filtering the first recording according to the preset rule includes:
keeping the recording of the first preset voice type;
and/or filtering the recordings of the second predetermined voice font.
Optionally, the first preset voice type includes a human voice, and/or the second preset voice type includes music and/or noise.
Optionally, the preset condition comprises the preset age range;
the recording which meets the preset condition is reserved or filtered, and the recording comprises the following steps:
judging whether the age range of the speaker in the first recording falls into the preset age range included by the preset condition;
if the age range of the speaker in the first recording does not fall into the preset age range included in the preset conditions, retaining or filtering the recording of the speaker.
Optionally, the preset condition comprises the preset gender;
the voice which meets the preset condition is reserved or filtered, and the voice comprises the following steps:
judging whether the gender of the speaker in the first recording is the same as the preset gender included in the preset condition;
if the gender of the speaker in the first recording is the same as the preset gender included in the preset condition, retaining or filtering the recording of the speaker.
Optionally, the preset condition includes the preset voiceprint characteristic parameter;
the voice which meets the preset condition is reserved or filtered, and the voice comprises the following steps:
judging whether the voiceprint characteristic parameters of the speaker in the first recording are matched with the voiceprint characteristic parameters included in the preset conditions;
if the voiceprint characteristic parameters of the speaker in the first recording are matched with the preset voiceprint characteristic parameters included in the preset conditions, retaining or filtering the recording of the speaker.
Optionally, in the process of performing voice classification on the first recording to obtain the voice type, when noise or music contains voice, the voice type is voice.
Further, to achieve the above object, the present invention also provides a recording filter device, which includes a memory, at least one processor, and at least one program stored on the memory and executable on the at least one processor, wherein the at least one program, when executed by the at least one processor, implements the steps of the method.
Further, to achieve the above object, the present invention provides a computer-readable storage medium storing at least one program executable by a computer, the at least one program causing the computer to perform the steps of the method of any one of the above when the at least one program is executed by the computer.
Compared with the prior art, the sound recording filtering method provided by the invention is used for carrying out voice recognition analysis on the first sound recording; filtering the first recording according to a preset rule to obtain a second recording; wherein the preset rule comprises: preserving or filtering recordings of a preset voice type, the preset voice type comprising: human voice, music, noise; or, the recording meeting the preset condition is reserved or filtered, wherein the preset condition comprises at least one of a preset age range, a preset gender and a preset voiceprint characteristic parameter. Therefore, the recording filtering method provided by the invention can filter the recording according to the preset rule, filter the invalid recording, only reserve the valid recording, reduce the time for manually carrying out playback recognition on the recording, and improve the efficiency of the recording playback recognition.
Drawings
Fig. 1 is a schematic structural diagram of a recording filter device according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of a vehicle-mounted locator according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of a server according to an embodiment of the present invention;
fig. 4 is a schematic flow chart of a recording filtering method according to an embodiment of the present invention.
The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.
Detailed Description
It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
In the following description, suffixes such as "module", "component", or "unit" used to denote elements are used only for facilitating the explanation of the present invention, and have no specific meaning in itself. Thus, "module", "component" or "unit" may be used mixedly.
Referring to fig. 1, fig. 1 is a schematic structural diagram of a recording filter apparatus according to an embodiment of the present invention, as shown in fig. 1, the recording filter apparatus 100 includes a processor 101 and a memory 102, where the memory 102 is used to store related data, such as a program, of the recording filter apparatus 100, and the processor 101 is used to execute the program stored in the memory 102 and implement a corresponding function. In the embodiment of the present invention, the recording filter device 100 may be a vehicle-mounted locator or a server.
Referring to fig. 2, fig. 2 is a schematic structural diagram of a vehicle-mounted locator according to an embodiment of the present invention, as shown in fig. 2, a vehicle-mounted locator 200 includes a processor 201 and a memory 202, where the memory 202 is used to store relevant data of the vehicle-mounted locator 200, for example, data and programs collected by the vehicle-mounted locator 200, and the processor 201 is used to execute the programs stored in the processor 202 and implement corresponding functions.
The in-vehicle locator 200 further includes one or more of a location module 203, a recording module 204, a wireless communication module 205, a shock sensor 206, a low-power detection module 207, and a battery module 208. The positioning module 203 is configured to position the vehicle-mounted locator 200 to obtain position information of the vehicle-mounted locator 200, where the positioning module 203 may be a positioning chip such as a GPS or a beidou, and may also be a WIFI positioning module, a bluetooth positioning module, or a base station positioning module by obtaining longitude and latitude information of a vehicle, and by obtaining address information of peripheral WIFI devices, address information of bluetooth devices, or identification information of a base station.
The recording module 204 is configured to record sound around the vehicle-mounted locator 200, the wireless communication module 205 is configured to implement wireless communication connection between the vehicle-mounted locator 200 and an external device, and the wireless communication module 205 may include one or more of a bluetooth communication module, an infrared communication module, a WIFI communication module, and a mobile cellular network communication module (e.g., a 2G, a 3G, a 4G, or a 5G communication module). It is understood that in some embodiments, the vehicle-mounted locator 200 may include a wired communication module for implementing a wired communication connection between the vehicle-mounted locator 200 and a vehicle-mounted terminal, and further, a communication connection between external devices through the vehicle-mounted terminal. The vibration sensor 206 is configured to detect vibration data of the vehicle, and the processor 201 may determine a driving state (e.g., a moving state or a stationary state) of the vehicle according to the vibration data detected by the vibration sensor 206. The low power detection module 207 is configured to detect power information of the vehicle-mounted locator 200, and report the battery power information to the processor 201, and the battery module 208 is configured to supply power to the vehicle-mounted locator 200.
Referring to fig. 3, fig. 3 is a schematic structural diagram of a server according to an embodiment of the present invention, as shown in fig. 3, a server 300 includes a processor 301 and a memory 302, where the memory 302 is used for storing relevant data, such as a program, of the server 300, and the processor 301 is used for executing the program stored in the memory 302 and implementing a corresponding function.
It should be noted that, when the recording filter apparatus 100 is the vehicle-mounted locator 200 shown in fig. 2, the vehicle-mounted locator 200 may implement a communication connection with a client through the server 300, or may directly establish a communication connection with the client without the server 300. When the recording filter device 100 is the server 300 shown in fig. 3, the server 300 acquires data collected by the vehicle-mounted locator 200, such as position information and sound information, by establishing a communication connection with the vehicle-mounted locator 200.
Based on the schematic structural diagram of the recording filter device 100, various embodiments of the method of the present invention are provided.
Referring to fig. 4, fig. 4 is a flowchart illustrating steps of a recording filtering method according to an embodiment of the present invention, where the method is applied to the recording filtering apparatus 100, and as shown in fig. 4, the method includes:
In this step, the method performs a speech recognition analysis of a first recording, which is a recording recorded by a recording device, for example a recording recorded by a recording pen at a meeting, or a sound recorded by an onboard locator provided in the vehicle. For the case of too long voice content, the voice content can be split into multiple pieces, and then the voice analysis is performed piece by piece.
Performing voice recognition analysis on the first recording, which may specifically include performing voice classification on the first recording to obtain a voice type, where the voice type includes voice, noise, and music; if the voice type is voice, performing voiceprint recognition on the first recording to obtain voiceprint characteristic parameters of the speaker, and/or performing gender judgment on the first recording to obtain gender of the speaker, and/or performing age range judgment on the first recording to obtain age range of the speaker.
It should be noted that a voice recognition device may be disposed inside the recording filter device, and the first recording is subjected to voice analysis through the voice recognition device, or the function of performing voice analysis on the first recording is realized by calling an external voice recognition server without disposing the voice recognition device.
In the step, the method filters the first recording according to a preset rule to obtain a second recording. The preset rule may include filtering according to a voice type, for example, retaining or filtering a recording of a preset voice type, where the preset voice type includes voice, music, and noise; the preset rule may also include filtering according to the voiceprint characteristics of the speaker, for example, retaining or filtering the recording meeting a preset condition, where the preset condition includes at least one of a preset age range, a preset gender, and a preset voiceprint characteristic parameter.
For example, when the user only needs to recognize the voice, the preset rule may be set to keep the recording of the preset voice type, where the preset voice type is the voice. When the user only needs to recognize the voice of the female speaker, the preset rule may be set to retain the recording with gender of female, or filter the recording with gender of male. When the user only needs to recognize the voice of a specified speaker (e.g., a car owner, a driver, or a fixed passenger), the preset rule may be set to retain the recording of the preset voiceprint characteristic parameter, where the preset voiceprint characteristic parameter is the voiceprint characteristic parameter corresponding to the specified speaker. Conversely, when the user needs to recognize the voices of other speakers except the designated speaker, the preset rule may be set to filter the recording of the preset voiceprint characteristic parameter, where the preset voiceprint characteristic parameter is the voiceprint characteristic parameter corresponding to the designated speaker.
In some embodiments of the present invention, in the process of filtering the sound record and/or after the sound record is filtered, the method may further receive a modification operation for the preset rule, and update the preset rule according to the modification operation.
In this embodiment, the recording filtering method performs voice recognition analysis on the first recording; filtering the first recording according to a preset rule to obtain a second recording; wherein the preset rule comprises: preserving or filtering recordings of a preset voice type, the preset voice type comprising: human voice, music, noise; or, the recording meeting the preset condition is reserved or filtered, wherein the preset condition comprises at least one of a preset age range, a preset gender and a preset voiceprint characteristic parameter. Therefore, the recording filtering method provided by the invention can filter the recording according to the preset rule, filter the invalid recording, only reserve the valid recording, reduce the time for manually carrying out playback recognition on the recording, and improve the efficiency of the recording playback recognition.
The following describes the method and process of the present invention in detail by taking the recording filter device as a server and taking the first recording as a recording recorded by a vehicle-mounted locator as an example.
When the administrator needs to perform playback recognition on the recording on the vehicle, the administrator can start an application program on the client, and send a recording filtering request to the server through the application program, wherein the recording filtering request carries filtering parameters, and the filtering parameters at least comprise preset rules and can also comprise other information, such as at least one of a user account number, a vehicle-mounted locator identifier, a vehicle passenger limitation number and user information (such as name, gender, age, contact information and the like). The server receives the recording filtering request sent by the client, acquires and stores the filtering parameters carried in the driver and passenger identification request, and is used for subsequently carrying out voice recognition analysis on the recording, and returning a recording filtering starting response message to the client, marking the recording filtering request sent by the client and starting the recording filtering function, and the server sends the recording filtering request to the vehicle-mounted locator corresponding to the vehicle-mounted locator marking, is used for requesting to acquire the first recording collected by the vehicle-mounted locator and carries out subsequent recording filtering steps according to the acquired sound information. It can be understood that, the server to before the on-vehicle locator sends the recording filtering request, can judge earlier whether on-vehicle locator is online, if online, then directly to on-vehicle locator sends the recording filtering request, if not online, then wait after on-vehicle locator goes on the line the on-vehicle locator sends the recording filtering request. And after receiving the recording filtering request sent by the server, the vehicle-mounted locator stores the filtering parameters in the recording filtering request and returns a recording filtering response message to the server, and in addition, the vehicle-mounted locator reports the collected first recording to the server.
The method and process provided by the invention are described in detail below by taking the recording filter device as a vehicle-mounted locator and taking the first recording as the recording recorded by the vehicle-mounted locator as an example.
When the administrator needs to perform playback recognition on the recording on the vehicle, the administrator can start an application program on the client, and send a recording filtering request to the vehicle-mounted locator through the application program, wherein the recording filtering request carries filtering parameters, the filtering parameters at least comprise preset rules, and the filtering parameters also comprise other information, such as at least one of a user account number, a vehicle-mounted locator identifier, the number of passengers of the vehicle, preset voiceprint characteristic parameters and user information (such as name, gender, age, contact way and the like). The client side can directly establish communication connection with the vehicle-mounted locator and send the recording filtering request to the vehicle-mounted locator, and can also send the recording filtering request to the vehicle-mounted locator through a server. The method comprises the steps that after receiving a recording filtering request sent by a client, a vehicle-mounted locator acquires and stores filtering parameters carried in the recording filtering request, the vehicle-mounted locator is used for carrying out voice analysis on a first recording subsequently and returning a recording filtering response message to the client, the vehicle-mounted locator is marked to successfully receive the recording filtering request sent by the client and start a recording filtering function, and the vehicle-mounted locator acquires the acquired first recording and carries out subsequent recording filtering steps according to the acquired first recording.
Optionally, the performing voice recognition analysis on the first audio recording includes:
performing voice classification on the first sound recording to obtain a voice type, wherein the voice type comprises: human voice, noise, music;
if the voice type is voice, performing voiceprint recognition on the first recording to obtain voiceprint characteristic parameters of the speaker, and/or performing gender judgment on the first recording to obtain gender of the speaker, and/or performing age range judgment on the first recording to obtain age range of the speaker.
Optionally, the preset rule includes retaining or filtering a recording of a preset voice type, and the filtering the first recording according to the preset rule includes:
keeping the recording of the first preset voice type;
and/or filtering the recordings of the second predetermined voice font.
Optionally, the first preset voice type includes a human voice, and/or the second preset voice type includes music and/or noise.
Optionally, the preset condition comprises the preset age range;
the recording which meets the preset condition is reserved or filtered, and the recording comprises the following steps:
judging whether the age range of the speaker in the first recording falls into the preset age range included by the preset condition;
if the age range of the speaker in the first recording does not fall into the preset age range included in the preset conditions, retaining or filtering the recording of the speaker.
Optionally, the preset condition comprises the preset gender;
the voice which meets the preset condition is reserved or filtered, and the voice comprises the following steps:
judging whether the gender of the speaker in the first recording is the same as the preset gender included in the preset condition;
if the gender of the speaker in the first recording is the same as the preset gender included in the preset condition, retaining or filtering the recording of the speaker.
Optionally, the preset condition includes the preset voiceprint characteristic parameter;
the voice which meets the preset condition is reserved or filtered, and the voice comprises the following steps:
judging whether the voiceprint characteristic parameters of the speaker in the first recording are matched with the voiceprint characteristic parameters included in the preset conditions;
if the voiceprint characteristic parameters of the speaker in the first recording are matched with the preset voiceprint characteristic parameters included in the preset conditions, retaining or filtering the recording of the speaker.
For example, when the user only needs to play back the recording for identifying the designated speaker (e.g., the driver), the voiceprint feature parameter of the designated speaker may be preset as a preset voiceprint feature parameter, and if the voiceprint feature parameter of the speaker in the first recording matches with the preset voiceprint feature parameter included in the preset condition, the recording of the speaker is retained.
In some embodiments of the present invention, the method may further identify voices of different speakers in the first recording, and store the recording of each speaker in a centralized manner, that is, store the recording having the same voiceprint characteristic parameter in a centralized manner. For example, assuming that the first sound recording includes A, B, C voices of three persons, the method stores the contents of the speech a in the first sound recording separately, stores the contents of the speech B in the first sound recording separately, and stores the contents of the speech C in the first sound recording separately.
Furthermore, it is also possible to identify each centrally stored sound recording, for example, assign a passenger identification code to each same voiceprint feature parameter, identify the speech recordings of different passengers using different passenger identification codes, and for the case where there are multiple speakers speaking simultaneously, identify the section of sound recording including multiple speakers speaking simultaneously with multiple passenger identification codes, and identify that the section of sound recording includes the speech recordings of multiple speakers. Or judging the gender and/or age range of the voice of each speaker, determining the gender and/or age range of each speaker, and identifying the speech recording of the river and lake according to the gender and/or age range of the speaker.
In some embodiments of the present invention, the method stores the second sound recording obtained after filtering the first sound recording, identifies the second sound recording as a normal sound recording, and also stores the third sound recording filtered out, and identifies the third sound recording as a filtered sound recording. Therefore, when the user needs to perform playback recognition on the first recording, which recording file is the filtered normal recording can be determined according to the recording identifier, and the user can conveniently and accurately select the normal recording file to perform playback recognition. In some embodiments, the recording filtering device further performs speech-to-text processing on the second recording to obtain text content corresponding to the second recording.
Optionally, in the process of performing voice classification on the first recording to obtain the voice type, when noise or music contains voice, the voice type is voice.
Those skilled in the art will appreciate that all or part of the steps of the method of the above embodiments may be implemented by hardware associated with at least one program instruction, where the at least one program may be stored in the memory 102 of the recording filter apparatus 100 shown in fig. 1 and can be executed by the processor 101 of the recording filter apparatus 100, and when executed by the processor, the at least one program implements the following steps:
performing voice recognition analysis on the first sound recording;
filtering the first recording according to a preset rule to obtain a second recording;
wherein the preset rule comprises:
preserving or filtering recordings of a preset voice type, the preset voice type comprising: human voice, music, noise;
or, the recording meeting the preset condition is reserved or filtered, wherein the preset condition comprises at least one of a preset age range, a preset gender and a preset voiceprint characteristic parameter.
Optionally, the performing voice recognition analysis on the first audio recording includes:
performing voice classification on the first sound recording to obtain a voice type, wherein the voice type comprises: human voice, noise, music;
if the voice type is voice, performing voiceprint recognition on the first recording to obtain voiceprint characteristic parameters of the speaker, and/or performing gender judgment on the first recording to obtain gender of the speaker, and/or performing age range judgment on the first recording to obtain age range of the speaker.
Optionally, the preset rule includes retaining or filtering a recording of a preset voice type, and the filtering the first recording according to the preset rule includes:
keeping the recording of the first preset voice type;
and/or filtering the recordings of the second predetermined voice font.
Optionally, the first preset voice type includes a human voice, and/or the second preset voice type includes music and/or noise.
Optionally, the preset condition comprises the preset age range;
the recording which meets the preset condition is reserved or filtered, and the recording comprises the following steps:
judging whether the age range of the speaker in the first recording falls into the preset age range included by the preset condition;
if the age range of the speaker in the first recording does not fall into the preset age range included in the preset conditions, retaining or filtering the recording of the speaker.
Optionally, the preset condition comprises the preset gender;
the voice which meets the preset condition is reserved or filtered, and the voice comprises the following steps:
judging whether the gender of the speaker in the first recording is the same as the preset gender included in the preset condition;
if the gender of the speaker in the first recording is the same as the preset gender included in the preset condition, retaining or filtering the recording of the speaker.
Optionally, the preset condition includes the preset voiceprint characteristic parameter;
the voice which meets the preset condition is reserved or filtered, and the voice comprises the following steps:
judging whether the voiceprint characteristic parameters of the speaker in the first recording are matched with the voiceprint characteristic parameters included in the preset conditions;
if the voiceprint characteristic parameters of the speaker in the first recording are matched with the preset voiceprint characteristic parameters included in the preset conditions, retaining or filtering the recording of the speaker.
Optionally, in the process of performing voice classification on the first recording to obtain the voice type, when noise or music contains voice, the voice type is voice.
It will be understood by those skilled in the art that all or part of the steps of the method for implementing the above embodiments may be implemented by hardware associated with at least one program instruction, the at least one program may be stored in a computer readable storage medium, and when executed, the at least one program implements the steps of:
performing voice recognition analysis on the first sound recording;
filtering the first recording according to a preset rule to obtain a second recording;
wherein the preset rule comprises:
preserving or filtering recordings of a preset voice type, the preset voice type comprising: human voice, music, noise;
or, the recording meeting the preset condition is reserved or filtered, wherein the preset condition comprises at least one of a preset age range, a preset gender and a preset voiceprint characteristic parameter.
Optionally, the performing voice recognition analysis on the first audio recording includes:
performing voice classification on the first sound recording to obtain a voice type, wherein the voice type comprises: human voice, noise, music;
if the voice type is voice, performing voiceprint recognition on the first recording to obtain voiceprint characteristic parameters of the speaker, and/or performing gender judgment on the first recording to obtain gender of the speaker, and/or performing age range judgment on the first recording to obtain age range of the speaker.
Optionally, the preset rule includes retaining or filtering a recording of a preset voice type, and the filtering the first recording according to the preset rule includes:
keeping the recording of the first preset voice type;
and/or filtering the recordings of the second predetermined voice font.
Optionally, the first preset voice type includes a human voice, and/or the second preset voice type includes music and/or noise.
Optionally, the preset condition comprises the preset age range;
the recording which meets the preset condition is reserved or filtered, and the recording comprises the following steps:
judging whether the age range of the speaker in the first recording falls into the preset age range included by the preset condition;
if the age range of the speaker in the first recording does not fall into the preset age range included in the preset conditions, retaining or filtering the recording of the speaker.
Optionally, the preset condition comprises the preset gender;
the voice which meets the preset condition is reserved or filtered, and the voice comprises the following steps:
judging whether the gender of the speaker in the first recording is the same as the preset gender included in the preset condition;
if the gender of the speaker in the first recording is the same as the preset gender included in the preset condition, retaining or filtering the recording of the speaker.
Optionally, the preset condition includes the preset voiceprint characteristic parameter;
the voice which meets the preset condition is reserved or filtered, and the voice comprises the following steps:
judging whether the voiceprint characteristic parameters of the speaker in the first recording are matched with the voiceprint characteristic parameters included in the preset conditions;
if the voiceprint characteristic parameters of the speaker in the first recording are matched with the preset voiceprint characteristic parameters included in the preset conditions, retaining or filtering the recording of the speaker.
Optionally, in the process of performing voice classification on the first recording to obtain the voice type, when noise or music contains voice, the voice type is voice.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.
Claims (10)
1. A method of filtering audio recordings, the method comprising:
performing voice recognition analysis on the first sound recording;
filtering the first recording according to a preset rule to obtain a second recording;
wherein the preset rule comprises:
preserving or filtering recordings of a preset voice type, the preset voice type comprising: human voice, music, noise;
or, the recording meeting the preset condition is reserved or filtered, wherein the preset condition comprises at least one of a preset age range, a preset gender and a preset voiceprint characteristic parameter.
2. The method for filtering audio records according to claim 1, wherein the performing a speech recognition analysis on the first audio record comprises:
performing voice classification on the first sound recording to obtain a voice type, wherein the voice type comprises: human voice, noise, music;
if the voice type is voice, performing voiceprint recognition on the first recording to obtain voiceprint characteristic parameters of the speaker, and/or performing gender judgment on the first recording to obtain gender of the speaker, and/or performing age range judgment on the first recording to obtain age range of the speaker.
3. The audio record filtering method according to claim 1, wherein the preset rule comprises retaining or filtering audio records of a preset voice type, and the filtering the first audio record according to the preset rule comprises:
keeping the recording of the first preset voice type;
and/or filtering the recordings of the second predetermined voice font.
4. The recording filtering method according to claim 3, wherein the first predetermined voice type includes human voice, and/or the second predetermined voice type includes music and/or noise.
5. The audio record filtering method according to claim 2, characterized in that said preset condition comprises said preset age range;
the recording which meets the preset condition is reserved or filtered, and the recording comprises the following steps:
judging whether the age range of the speaker in the first recording falls into the preset age range included by the preset condition;
if the age range of the speaker in the first recording does not fall into the preset age range included in the preset conditions, retaining or filtering the recording of the speaker.
6. The method of claim 2, wherein the predetermined condition comprises the predetermined gender;
the voice which meets the preset condition is reserved or filtered, and the voice comprises the following steps:
judging whether the gender of the speaker in the first recording is the same as the preset gender included in the preset condition;
if the gender of the speaker in the first recording is the same as the preset gender included in the preset condition, retaining or filtering the recording of the speaker.
7. The audio record filtering method according to claim 2, wherein the preset condition comprises the preset voiceprint characteristic parameter;
the voice which meets the preset condition is reserved or filtered, and the voice comprises the following steps:
judging whether the voiceprint characteristic parameters of the speaker in the first recording are matched with the voiceprint characteristic parameters included in the preset conditions;
if the voiceprint characteristic parameters of the speaker in the first recording are matched with the preset voiceprint characteristic parameters included in the preset conditions, retaining or filtering the recording of the speaker.
8. The method of claim 2, wherein in the voice classifying the first recording to obtain the voice type, the voice type is voice when the noise or music contains voice.
9. A sound recording filtering apparatus, characterized in that the sound recording filtering apparatus comprises a memory, at least one processor and at least one program stored on the memory and executable on the at least one processor, the at least one program implementing the steps of the method of any one of the preceding claims 1 to 8 when executed by the at least one processor.
10. A computer-readable storage medium storing at least one program executable by a computer, the at least one program, when executed by the computer, causing the computer to perform the steps of the method of any one of claims 1 to 8.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010999917.2A CN112102854A (en) | 2020-09-22 | 2020-09-22 | Recording filtering method and device and computer readable storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202010999917.2A CN112102854A (en) | 2020-09-22 | 2020-09-22 | Recording filtering method and device and computer readable storage medium |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112102854A true CN112102854A (en) | 2020-12-18 |
Family
ID=73755742
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202010999917.2A Pending CN112102854A (en) | 2020-09-22 | 2020-09-22 | Recording filtering method and device and computer readable storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112102854A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113014844A (en) * | 2021-02-08 | 2021-06-22 | Oppo广东移动通信有限公司 | Audio processing method and device, storage medium and electronic equipment |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103714817A (en) * | 2013-12-31 | 2014-04-09 | 厦门天聪智能软件有限公司 | Satisfaction survey cheating screening method based on voiceprint recognition technology |
CN108694954A (en) * | 2018-06-13 | 2018-10-23 | 广州势必可赢网络科技有限公司 | A kind of Sex, Age recognition methods, device, equipment and readable storage medium storing program for executing |
CN108831440A (en) * | 2018-04-24 | 2018-11-16 | 中国地质大学(武汉) | A kind of vocal print noise-reduction method and system based on machine learning and deep learning |
CN109448756A (en) * | 2018-11-14 | 2019-03-08 | 北京大生在线科技有限公司 | A kind of voice age recognition methods and system |
CN110473566A (en) * | 2019-07-25 | 2019-11-19 | 深圳壹账通智能科技有限公司 | Audio separation method, device, electronic equipment and computer readable storage medium |
CN111246285A (en) * | 2020-03-24 | 2020-06-05 | 北京奇艺世纪科技有限公司 | Method for separating sound in comment video and method and device for adjusting volume |
CN111640422A (en) * | 2020-05-13 | 2020-09-08 | 广州国音智能科技有限公司 | Voice and human voice separation method and device, terminal and storage medium |
-
2020
- 2020-09-22 CN CN202010999917.2A patent/CN112102854A/en active Pending
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103714817A (en) * | 2013-12-31 | 2014-04-09 | 厦门天聪智能软件有限公司 | Satisfaction survey cheating screening method based on voiceprint recognition technology |
CN108831440A (en) * | 2018-04-24 | 2018-11-16 | 中国地质大学(武汉) | A kind of vocal print noise-reduction method and system based on machine learning and deep learning |
CN108694954A (en) * | 2018-06-13 | 2018-10-23 | 广州势必可赢网络科技有限公司 | A kind of Sex, Age recognition methods, device, equipment and readable storage medium storing program for executing |
CN109448756A (en) * | 2018-11-14 | 2019-03-08 | 北京大生在线科技有限公司 | A kind of voice age recognition methods and system |
CN110473566A (en) * | 2019-07-25 | 2019-11-19 | 深圳壹账通智能科技有限公司 | Audio separation method, device, electronic equipment and computer readable storage medium |
CN111246285A (en) * | 2020-03-24 | 2020-06-05 | 北京奇艺世纪科技有限公司 | Method for separating sound in comment video and method and device for adjusting volume |
CN111640422A (en) * | 2020-05-13 | 2020-09-08 | 广州国音智能科技有限公司 | Voice and human voice separation method and device, terminal and storage medium |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113014844A (en) * | 2021-02-08 | 2021-06-22 | Oppo广东移动通信有限公司 | Audio processing method and device, storage medium and electronic equipment |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106209138B (en) | Vehicle cautious emergency response system and method | |
US9646427B2 (en) | System for detecting the operational status of a vehicle using a handheld communication device | |
CN107613144B (en) | Automatic calling method, device, storage medium and mobile terminal | |
US9420431B2 (en) | Vehicle telematics communication for providing hands-free wireless communication | |
CN112086098B (en) | Driver and passenger analysis method and device and computer readable storage medium | |
CN106816149A (en) | The priorization content loading of vehicle automatic speech recognition system | |
WO2014137384A1 (en) | Emergency handling system using informative alarm sound | |
CN112785837A (en) | Method and device for recognizing emotion of user when driving vehicle, storage medium and terminal | |
CN105895132B (en) | vehicle-mounted voice recording method, device and system | |
CN108597524B (en) | Automobile voice recognition prompting device and method | |
CN111028834B (en) | Voice message reminding method and device, server and voice message reminding equipment | |
CN112071309A (en) | Network appointment car safety monitoring device and system | |
CN113094483B (en) | Method and device for processing vehicle feedback information, terminal equipment and storage medium | |
CN106156036B (en) | Vehicle-mounted audio processing method and vehicle-mounted equipment | |
CN112102854A (en) | Recording filtering method and device and computer readable storage medium | |
WO2016165403A1 (en) | Transportation assisting method and system | |
CN110826433B (en) | Emotion analysis data processing method, device and equipment for test driving user and storage medium | |
CN113596247A (en) | Alarm clock information processing method, device, vehicle, storage medium and program product | |
JP2006121270A (en) | Hands-free speech unit | |
CN106296867B (en) | Image recording apparatus and its image mark method | |
CN112116911B (en) | Sound control method and device and computer readable storage medium | |
CN112118536B (en) | Power saving method and device of device and computer readable storage medium | |
CN113306487A (en) | Vehicle prompting method, device, electronic equipment, storage medium and program product | |
CN112261586A (en) | Method for automatically identifying driver to limit driving range of driver by using vehicle-mounted robot | |
CN113392650A (en) | Intelligent memo reminding method and system based on vehicle position |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20201218 |