CN112102854A

CN112102854A - Recording filtering method and device and computer readable storage medium

Info

Publication number: CN112102854A
Application number: CN202010999917.2A
Authority: CN
Inventors: 严馨华
Original assignee: Fujian Hongxingfu Food Co ltd
Current assignee: Fujian Hongxingfu Food Co ltd
Priority date: 2020-09-22
Filing date: 2020-09-22
Publication date: 2020-12-18

Abstract

The sound recording filtering method disclosed by the invention is used for carrying out voice recognition analysis on the first sound recording; filtering the first recording according to a preset rule to obtain a second recording; wherein the preset rule comprises: preserving or filtering recordings of a preset voice type, the preset voice type comprising: human voice, music, noise; or, the recording meeting the preset condition is reserved or filtered, wherein the preset condition comprises at least one of a preset age range, a preset gender and a preset voiceprint characteristic parameter. Therefore, the recording filtering method provided by the invention can filter the recording according to the preset rule, filter the invalid recording, only reserve the valid recording, reduce the time for manually carrying out playback recognition on the recording, and improve the efficiency of the recording playback recognition.

Description

Recording filtering method and device and computer readable storage medium

Technical Field

The invention relates to the technical field of recording processing, in particular to a recording filtering method and device and a computer readable storage medium.

Background

With the continuous popularization of electronic products and the continuous development of electronic technologies, people usually choose to record in a recording mode in scenes (such as meeting scenes or monitoring scenes) needing real-time recording, and then manually play back a recording file, identify and screen effective recordings and manually convert the effective recordings into characters.

Because the duration of the recording file is usually longer, and more invalid recordings may exist in the middle, more time needs to be consumed for manually playing back and identifying the recording, and the efficiency is lower.

Disclosure of Invention

In view of the above, the present invention provides a recording filtering method, an apparatus and a computer-readable storage medium to solve the above technical problems.

Firstly, in order to achieve the above object, the present invention provides a recording filtering method, including:

performing voice recognition analysis on the first sound recording;

filtering the first recording according to a preset rule to obtain a second recording;

wherein the preset rule comprises:

preserving or filtering recordings of a preset voice type, the preset voice type comprising: human voice, music, noise;

or, the recording meeting the preset condition is reserved or filtered, wherein the preset condition comprises at least one of a preset age range, a preset gender and a preset voiceprint characteristic parameter.

Optionally, the performing voice recognition analysis on the first audio recording includes:

performing voice classification on the first sound recording to obtain a voice type, wherein the voice type comprises: human voice, noise, music;

if the voice type is voice, performing voiceprint recognition on the first recording to obtain voiceprint characteristic parameters of the speaker, and/or performing gender judgment on the first recording to obtain gender of the speaker, and/or performing age range judgment on the first recording to obtain age range of the speaker.

Optionally, the preset rule includes retaining or filtering a recording of a preset voice type, and the filtering the first recording according to the preset rule includes:

keeping the recording of the first preset voice type;

and/or filtering the recordings of the second predetermined voice font.

Optionally, the first preset voice type includes a human voice, and/or the second preset voice type includes music and/or noise.

Optionally, the preset condition comprises the preset age range;

the recording which meets the preset condition is reserved or filtered, and the recording comprises the following steps:

judging whether the age range of the speaker in the first recording falls into the preset age range included by the preset condition;

if the age range of the speaker in the first recording does not fall into the preset age range included in the preset conditions, retaining or filtering the recording of the speaker.

Optionally, the preset condition comprises the preset gender;

the voice which meets the preset condition is reserved or filtered, and the voice comprises the following steps:

judging whether the gender of the speaker in the first recording is the same as the preset gender included in the preset condition;

if the gender of the speaker in the first recording is the same as the preset gender included in the preset condition, retaining or filtering the recording of the speaker.

Optionally, the preset condition includes the preset voiceprint characteristic parameter;

judging whether the voiceprint characteristic parameters of the speaker in the first recording are matched with the voiceprint characteristic parameters included in the preset conditions;

if the voiceprint characteristic parameters of the speaker in the first recording are matched with the preset voiceprint characteristic parameters included in the preset conditions, retaining or filtering the recording of the speaker.

Optionally, in the process of performing voice classification on the first recording to obtain the voice type, when noise or music contains voice, the voice type is voice.

Further, to achieve the above object, the present invention also provides a recording filter device, which includes a memory, at least one processor, and at least one program stored on the memory and executable on the at least one processor, wherein the at least one program, when executed by the at least one processor, implements the steps of the method.

Further, to achieve the above object, the present invention provides a computer-readable storage medium storing at least one program executable by a computer, the at least one program causing the computer to perform the steps of the method of any one of the above when the at least one program is executed by the computer.

Compared with the prior art, the sound recording filtering method provided by the invention is used for carrying out voice recognition analysis on the first sound recording; filtering the first recording according to a preset rule to obtain a second recording; wherein the preset rule comprises: preserving or filtering recordings of a preset voice type, the preset voice type comprising: human voice, music, noise; or, the recording meeting the preset condition is reserved or filtered, wherein the preset condition comprises at least one of a preset age range, a preset gender and a preset voiceprint characteristic parameter. Therefore, the recording filtering method provided by the invention can filter the recording according to the preset rule, filter the invalid recording, only reserve the valid recording, reduce the time for manually carrying out playback recognition on the recording, and improve the efficiency of the recording playback recognition.

Drawings

Fig. 1 is a schematic structural diagram of a recording filter device according to an embodiment of the present invention;

FIG. 2 is a schematic structural diagram of a vehicle-mounted locator according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a server according to an embodiment of the present invention;

fig. 4 is a schematic flow chart of a recording filtering method according to an embodiment of the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

In the following description, suffixes such as "module", "component", or "unit" used to denote elements are used only for facilitating the explanation of the present invention, and have no specific meaning in itself. Thus, "module", "component" or "unit" may be used mixedly.

Referring to fig. 1, fig. 1 is a schematic structural diagram of a recording filter apparatus according to an embodiment of the present invention, as shown in fig. 1, the recording filter apparatus 100 includes a processor 101 and a memory 102, where the memory 102 is used to store related data, such as a program, of the recording filter apparatus 100, and the processor 101 is used to execute the program stored in the memory 102 and implement a corresponding function. In the embodiment of the present invention, the recording filter device 100 may be a vehicle-mounted locator or a server.

Referring to fig. 2, fig. 2 is a schematic structural diagram of a vehicle-mounted locator according to an embodiment of the present invention, as shown in fig. 2, a vehicle-mounted locator 200 includes a processor 201 and a memory 202, where the memory 202 is used to store relevant data of the vehicle-mounted locator 200, for example, data and programs collected by the vehicle-mounted locator 200, and the processor 201 is used to execute the programs stored in the processor 202 and implement corresponding functions.

The in-vehicle locator 200 further includes one or more of a location module 203, a recording module 204, a wireless communication module 205, a shock sensor 206, a low-power detection module 207, and a battery module 208. The positioning module 203 is configured to position the vehicle-mounted locator 200 to obtain position information of the vehicle-mounted locator 200, where the positioning module 203 may be a positioning chip such as a GPS or a beidou, and may also be a WIFI positioning module, a bluetooth positioning module, or a base station positioning module by obtaining longitude and latitude information of a vehicle, and by obtaining address information of peripheral WIFI devices, address information of bluetooth devices, or identification information of a base station.

The recording module 204 is configured to record sound around the vehicle-mounted locator 200, the wireless communication module 205 is configured to implement wireless communication connection between the vehicle-mounted locator 200 and an external device, and the wireless communication module 205 may include one or more of a bluetooth communication module, an infrared communication module, a WIFI communication module, and a mobile cellular network communication module (e.g., a 2G, a 3G, a 4G, or a 5G communication module). It is understood that in some embodiments, the vehicle-mounted locator 200 may include a wired communication module for implementing a wired communication connection between the vehicle-mounted locator 200 and a vehicle-mounted terminal, and further, a communication connection between external devices through the vehicle-mounted terminal. The vibration sensor 206 is configured to detect vibration data of the vehicle, and the processor 201 may determine a driving state (e.g., a moving state or a stationary state) of the vehicle according to the vibration data detected by the vibration sensor 206. The low power detection module 207 is configured to detect power information of the vehicle-mounted locator 200, and report the battery power information to the processor 201, and the battery module 208 is configured to supply power to the vehicle-mounted locator 200.

Referring to fig. 3, fig. 3 is a schematic structural diagram of a server according to an embodiment of the present invention, as shown in fig. 3, a server 300 includes a processor 301 and a memory 302, where the memory 302 is used for storing relevant data, such as a program, of the server 300, and the processor 301 is used for executing the program stored in the memory 302 and implementing a corresponding function.

It should be noted that, when the recording filter apparatus 100 is the vehicle-mounted locator 200 shown in fig. 2, the vehicle-mounted locator 200 may implement a communication connection with a client through the server 300, or may directly establish a communication connection with the client without the server 300. When the recording filter device 100 is the server 300 shown in fig. 3, the server 300 acquires data collected by the vehicle-mounted locator 200, such as position information and sound information, by establishing a communication connection with the vehicle-mounted locator 200.

Based on the schematic structural diagram of the recording filter device 100, various embodiments of the method of the present invention are provided.

Referring to fig. 4, fig. 4 is a flowchart illustrating steps of a recording filtering method according to an embodiment of the present invention, where the method is applied to the recording filtering apparatus 100, and as shown in fig. 4, the method includes:

step 401, performing voice recognition analysis on the first sound recording.

In this step, the method performs a speech recognition analysis of a first recording, which is a recording recorded by a recording device, for example a recording recorded by a recording pen at a meeting, or a sound recorded by an onboard locator provided in the vehicle. For the case of too long voice content, the voice content can be split into multiple pieces, and then the voice analysis is performed piece by piece.

Performing voice recognition analysis on the first recording, which may specifically include performing voice classification on the first recording to obtain a voice type, where the voice type includes voice, noise, and music; if the voice type is voice, performing voiceprint recognition on the first recording to obtain voiceprint characteristic parameters of the speaker, and/or performing gender judgment on the first recording to obtain gender of the speaker, and/or performing age range judgment on the first recording to obtain age range of the speaker.

It should be noted that a voice recognition device may be disposed inside the recording filter device, and the first recording is subjected to voice analysis through the voice recognition device, or the function of performing voice analysis on the first recording is realized by calling an external voice recognition server without disposing the voice recognition device.

Step 402, filtering the first sound recording according to a preset rule to obtain a second sound recording; wherein the preset rule comprises: preserving or filtering recordings of a preset voice type, the preset voice type comprising: human voice, music, noise; or, the recording meeting the preset condition is reserved or filtered, wherein the preset condition comprises at least one of a preset age range, a preset gender and a preset voiceprint characteristic parameter.

In the step, the method filters the first recording according to a preset rule to obtain a second recording. The preset rule may include filtering according to a voice type, for example, retaining or filtering a recording of a preset voice type, where the preset voice type includes voice, music, and noise; the preset rule may also include filtering according to the voiceprint characteristics of the speaker, for example, retaining or filtering the recording meeting a preset condition, where the preset condition includes at least one of a preset age range, a preset gender, and a preset voiceprint characteristic parameter.

For example, when the user only needs to recognize the voice, the preset rule may be set to keep the recording of the preset voice type, where the preset voice type is the voice. When the user only needs to recognize the voice of the female speaker, the preset rule may be set to retain the recording with gender of female, or filter the recording with gender of male. When the user only needs to recognize the voice of a specified speaker (e.g., a car owner, a driver, or a fixed passenger), the preset rule may be set to retain the recording of the preset voiceprint characteristic parameter, where the preset voiceprint characteristic parameter is the voiceprint characteristic parameter corresponding to the specified speaker. Conversely, when the user needs to recognize the voices of other speakers except the designated speaker, the preset rule may be set to filter the recording of the preset voiceprint characteristic parameter, where the preset voiceprint characteristic parameter is the voiceprint characteristic parameter corresponding to the designated speaker.

In some embodiments of the present invention, in the process of filtering the sound record and/or after the sound record is filtered, the method may further receive a modification operation for the preset rule, and update the preset rule according to the modification operation.

In this embodiment, the recording filtering method performs voice recognition analysis on the first recording; filtering the first recording according to a preset rule to obtain a second recording; wherein the preset rule comprises: preserving or filtering recordings of a preset voice type, the preset voice type comprising: human voice, music, noise; or, the recording meeting the preset condition is reserved or filtered, wherein the preset condition comprises at least one of a preset age range, a preset gender and a preset voiceprint characteristic parameter. Therefore, the recording filtering method provided by the invention can filter the recording according to the preset rule, filter the invalid recording, only reserve the valid recording, reduce the time for manually carrying out playback recognition on the recording, and improve the efficiency of the recording playback recognition.

The following describes the method and process of the present invention in detail by taking the recording filter device as a server and taking the first recording as a recording recorded by a vehicle-mounted locator as an example.

When the administrator needs to perform playback recognition on the recording on the vehicle, the administrator can start an application program on the client, and send a recording filtering request to the server through the application program, wherein the recording filtering request carries filtering parameters, and the filtering parameters at least comprise preset rules and can also comprise other information, such as at least one of a user account number, a vehicle-mounted locator identifier, a vehicle passenger limitation number and user information (such as name, gender, age, contact information and the like). The server receives the recording filtering request sent by the client, acquires and stores the filtering parameters carried in the driver and passenger identification request, and is used for subsequently carrying out voice recognition analysis on the recording, and returning a recording filtering starting response message to the client, marking the recording filtering request sent by the client and starting the recording filtering function, and the server sends the recording filtering request to the vehicle-mounted locator corresponding to the vehicle-mounted locator marking, is used for requesting to acquire the first recording collected by the vehicle-mounted locator and carries out subsequent recording filtering steps according to the acquired sound information. It can be understood that, the server to before the on-vehicle locator sends the recording filtering request, can judge earlier whether on-vehicle locator is online, if online, then directly to on-vehicle locator sends the recording filtering request, if not online, then wait after on-vehicle locator goes on the line the on-vehicle locator sends the recording filtering request. And after receiving the recording filtering request sent by the server, the vehicle-mounted locator stores the filtering parameters in the recording filtering request and returns a recording filtering response message to the server, and in addition, the vehicle-mounted locator reports the collected first recording to the server.

The method and process provided by the invention are described in detail below by taking the recording filter device as a vehicle-mounted locator and taking the first recording as the recording recorded by the vehicle-mounted locator as an example.

When the administrator needs to perform playback recognition on the recording on the vehicle, the administrator can start an application program on the client, and send a recording filtering request to the vehicle-mounted locator through the application program, wherein the recording filtering request carries filtering parameters, the filtering parameters at least comprise preset rules, and the filtering parameters also comprise other information, such as at least one of a user account number, a vehicle-mounted locator identifier, the number of passengers of the vehicle, preset voiceprint characteristic parameters and user information (such as name, gender, age, contact way and the like). The client side can directly establish communication connection with the vehicle-mounted locator and send the recording filtering request to the vehicle-mounted locator, and can also send the recording filtering request to the vehicle-mounted locator through a server. The method comprises the steps that after receiving a recording filtering request sent by a client, a vehicle-mounted locator acquires and stores filtering parameters carried in the recording filtering request, the vehicle-mounted locator is used for carrying out voice analysis on a first recording subsequently and returning a recording filtering response message to the client, the vehicle-mounted locator is marked to successfully receive the recording filtering request sent by the client and start a recording filtering function, and the vehicle-mounted locator acquires the acquired first recording and carries out subsequent recording filtering steps according to the acquired first recording.

keeping the recording of the first preset voice type;

and/or filtering the recordings of the second predetermined voice font.

Optionally, the preset condition comprises the preset age range;

Optionally, the preset condition comprises the preset gender;

For example, when the user only needs to play back the recording for identifying the designated speaker (e.g., the driver), the voiceprint feature parameter of the designated speaker may be preset as a preset voiceprint feature parameter, and if the voiceprint feature parameter of the speaker in the first recording matches with the preset voiceprint feature parameter included in the preset condition, the recording of the speaker is retained.

In some embodiments of the present invention, the method may further identify voices of different speakers in the first recording, and store the recording of each speaker in a centralized manner, that is, store the recording having the same voiceprint characteristic parameter in a centralized manner. For example, assuming that the first sound recording includes A, B, C voices of three persons, the method stores the contents of the speech a in the first sound recording separately, stores the contents of the speech B in the first sound recording separately, and stores the contents of the speech C in the first sound recording separately.

Furthermore, it is also possible to identify each centrally stored sound recording, for example, assign a passenger identification code to each same voiceprint feature parameter, identify the speech recordings of different passengers using different passenger identification codes, and for the case where there are multiple speakers speaking simultaneously, identify the section of sound recording including multiple speakers speaking simultaneously with multiple passenger identification codes, and identify that the section of sound recording includes the speech recordings of multiple speakers. Or judging the gender and/or age range of the voice of each speaker, determining the gender and/or age range of each speaker, and identifying the speech recording of the river and lake according to the gender and/or age range of the speaker.

In some embodiments of the present invention, the method stores the second sound recording obtained after filtering the first sound recording, identifies the second sound recording as a normal sound recording, and also stores the third sound recording filtered out, and identifies the third sound recording as a filtered sound recording. Therefore, when the user needs to perform playback recognition on the first recording, which recording file is the filtered normal recording can be determined according to the recording identifier, and the user can conveniently and accurately select the normal recording file to perform playback recognition. In some embodiments, the recording filtering device further performs speech-to-text processing on the second recording to obtain text content corresponding to the second recording.

Those skilled in the art will appreciate that all or part of the steps of the method of the above embodiments may be implemented by hardware associated with at least one program instruction, where the at least one program may be stored in the memory 102 of the recording filter apparatus 100 shown in fig. 1 and can be executed by the processor 101 of the recording filter apparatus 100, and when executed by the processor, the at least one program implements the following steps:

performing voice recognition analysis on the first sound recording;

wherein the preset rule comprises:

keeping the recording of the first preset voice type;

and/or filtering the recordings of the second predetermined voice font.

Optionally, the preset condition comprises the preset age range;

Optionally, the preset condition comprises the preset gender;

It will be understood by those skilled in the art that all or part of the steps of the method for implementing the above embodiments may be implemented by hardware associated with at least one program instruction, the at least one program may be stored in a computer readable storage medium, and when executed, the at least one program implements the steps of:

performing voice recognition analysis on the first sound recording;

wherein the preset rule comprises:

keeping the recording of the first preset voice type;

and/or filtering the recordings of the second predetermined voice font.

Optionally, the preset condition comprises the preset age range;

Optionally, the preset condition comprises the preset gender;

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present invention may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (such as a mobile phone, a computer, a server, an air conditioner, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A method of filtering audio recordings, the method comprising:

performing voice recognition analysis on the first sound recording;

wherein the preset rule comprises:

2. The method for filtering audio records according to claim 1, wherein the performing a speech recognition analysis on the first audio record comprises:

3. The audio record filtering method according to claim 1, wherein the preset rule comprises retaining or filtering audio records of a preset voice type, and the filtering the first audio record according to the preset rule comprises:

keeping the recording of the first preset voice type;

and/or filtering the recordings of the second predetermined voice font.

4. The recording filtering method according to claim 3, wherein the first predetermined voice type includes human voice, and/or the second predetermined voice type includes music and/or noise.

5. The audio record filtering method according to claim 2, characterized in that said preset condition comprises said preset age range;

6. The method of claim 2, wherein the predetermined condition comprises the predetermined gender;

7. The audio record filtering method according to claim 2, wherein the preset condition comprises the preset voiceprint characteristic parameter;

8. The method of claim 2, wherein in the voice classifying the first recording to obtain the voice type, the voice type is voice when the noise or music contains voice.

9. A sound recording filtering apparatus, characterized in that the sound recording filtering apparatus comprises a memory, at least one processor and at least one program stored on the memory and executable on the at least one processor, the at least one program implementing the steps of the method of any one of the preceding claims 1 to 8 when executed by the at least one processor.

10. A computer-readable storage medium storing at least one program executable by a computer, the at least one program, when executed by the computer, causing the computer to perform the steps of the method of any one of claims 1 to 8.