CN111341301B

CN111341301B - Recording processing method

Info

Publication number: CN111341301B
Application number: CN202010421975.7A
Authority: CN
Inventors: 马强
Original assignee: Beijing Xiaomi Mobile Software Co Ltd
Current assignee: Beijing Xiaomi Mobile Software Co Ltd
Priority date: 2020-05-19
Filing date: 2020-05-19
Publication date: 2020-09-04
Anticipated expiration: 2040-05-19
Also published as: CN111341301A

Abstract

The invention provides a recording processing method, which comprises the following steps: acquiring user requirements of a target user; performing voice processing on a pre-recorded local recording file based on a voice processing algorithm according to user requirements, and outputting required contents, wherein a CPU (central processing unit) of target equipment issues a voice processing command to an NPU (network provider unit) of the target equipment according to the user requirements; after the NPU receives the voice processing command, the NPU performs voice processing on the local recording file based on a built-in voice processing algorithm in the NPU; and outputting the required content after the voice processing. Through the processing of the local NPU to the recording, the user can obtain various desired functions very conveniently, and the efficiency is improved.

Description

Recording processing method

Technical Field

The invention relates to the technical field of sound recording, in particular to a sound recording processing method.

Background

In daily life, we often use recording functions, such as: recording conversations, recording meetings and the like, and generally, the recorded contents need to be arranged into notes or required information needs to be searched from the stored recorded contents, but the recording time is long or short, if the recording time is too long, it takes a long time to listen to the recorded contents again, or the recorded contents are searched according to time stamps, but in the searching process, the searching process becomes complicated and troublesome because the specific time stamp of the required information is uncertain, therefore, the invention provides a recording processing method.

Disclosure of Invention

The invention provides a recording processing method which is used for outputting effective required content according to user requirements and a voice processing algorithm and meeting the requirements of users on recording processing.

The invention provides a recording processing method, which comprises the following steps:

acquiring user requirements of a target user;

and performing voice processing on the pre-recorded local recording file based on a voice processing algorithm according to the user requirement, and outputting the required content.

Preferably, the step of performing voice processing on the pre-recorded recording file according to the user requirement and based on a voice processing algorithm and outputting the required content comprises:

the CPU of the target equipment issues a voice processing command to the NPU of the target equipment according to the user requirement;

after receiving the voice processing command, the NPU performs voice processing on the local recording file based on a voice processing algorithm built in the NPU;

and outputting the required content after the voice processing.

Preferably, the process of performing voice processing on the local sound recording file based on a built-in voice processing algorithm in the NPU includes:

the NPU extracts the voice characteristics of each user in the local recording file according to a voice processing algorithm, and extracts and outputs all the voices of each user independently according to the voice characteristics;

and according to the user requirements, the target user can listen to all the voices of the appointed person which are extracted and output independently.

receiving a keyword input by a target user;

the NPU determines the word characteristics of the keywords according to the voice processing algorithm, retrieves the local recording file according to the word characteristics and outputs all voices related to the keywords;

and simultaneously, storing all the output voices related to the keywords.

the NPU judges whether a voice-free interval exists in the local recording file or not according to a voice processing algorithm;

and if so, deleting the voice-free interval and outputting the local recording file without idle waiting time.

the NPU processes all voice recognition of each user corresponding to the local recording file into text information according to a voice processing algorithm and outputs the text information;

and simultaneously, correspondingly marking the user characteristics on the output text information.

Preferably, the voice processing function of the NPU based on the voice processing algorithm includes: any one or more of feature extraction of user voice based on the local sound recording file, intelligent noise reduction of voice based on the local sound recording file, feature matching of keywords based on the local sound recording file, and intelligent interception of voice-free intervals based on the local sound recording file.

Preferably, the output demand content includes: outputting all voices of appointed persons based on the local sound recording file, outputting all voices related to keywords input by a target user based on the local sound recording file, outputting character information marked according to individuals in the local sound recording file or outputting residual contents without conversation interval parts based on the local sound recording file.

The invention has the beneficial effects that:

1. the local NPU has high processing speed and is not limited by the network speed.

2. Through the processing of the local NPU to the recording, the user can obtain various desired functions very conveniently, and the efficiency is improved.

3. The voice recognition is processed through a local NPU operation algorithm, and is not required to be uploaded to a network server side, so that the privacy safety is guaranteed, and the safety of a local recording file is ensured.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:

FIG. 1 is a flowchart illustrating a recording processing method according to an embodiment of the present invention;

FIG. 2 is a block diagram of a recording method according to an embodiment of the present invention;

fig. 3 is a diagram of a sixth embodiment of the present invention.

Detailed Description

The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.

The first embodiment is as follows:

the invention provides a recording processing method, as shown in fig. 1-2, comprising:

step 1: acquiring user requirements of a target user;

step 2: and performing voice processing on the pre-recorded local recording file based on a voice processing algorithm according to the user requirement, and outputting the required content.

and outputting the required content after the voice processing.

In this embodiment, a CPU (central processing unit); NPU (neural-processing units).

In this embodiment, the target device may be a mobile phone, for example, a CPU in the mobile phone may issue a command to an NPU according to various requirements of a user, where various voice processing algorithms are built in the NPU, and the NPU processes a sound recording file and outputs various contents, where the various contents are required contents.

The user requirement in this embodiment is, for example, extracting the voice of the user a in the local recording file, and for example, acquiring the voice related to the keyword "mobile phone".

The beneficial effects of the embodiment are as follows:

Example two:

on the basis of the first embodiment, the method comprises the following steps of,

In this embodiment, because in the local recording file, everyone's pronunciation all is different, the NPU can extract the pronunciation characteristic of everyone in the recording file through the algorithm to extract output alone with all pronunciation of everyone, the user can listen to everyone's speech content that wants to listen according to the demand of oneself.

The beneficial effects of the embodiment are as follows: the requirement content can be extracted conveniently according to the feature extraction, and then the requirement of the user can be met.

Example three:

receiving a keyword input by a target user;

and simultaneously, storing all the output voices related to the keywords.

In this embodiment, the NPU may search the entire voice file according to the features of the keyword, and search all the features that meet the specified keyword, that is, all the voices related to the "mobile phone" may be realized.

The beneficial effects of the embodiment are as follows: and the related voice of the key words is obtained through the characteristic matching retrieval, so that the user can conveniently search in the future.

Example four:

In this embodiment, all the local audio files may contain a large number of intervals without speech, which results in huge redundancy of the audio files, and in addition, a large amount of time waiting for speech is wasted during playback.

At this time, the NPU may determine the interval portion without voice according to the voice processing algorithm, and delete the interval portion without voice, so that the whole course of the recording file is the voice content without idle waiting time.

The beneficial effects of the embodiment are as follows: through the no voice part of intelligence intercepting, improve pronunciation playback efficiency, improve user experience effect.

Example five:

Because the current speech recognition character output is realized by a network server, a recording file needs to be uploaded to the server, and the recording file is processed by the server and then returned to a user, but the scheme has network security risk, and the recording file of the user is possibly private under many conditions and does not want to be uploaded to the network server; the other is that the network is dependent on the influence of the network speed on the processing speed, and the network returns results for a long time if the network is slow. Therefore, the technical problem can be effectively solved by adopting the scheme provided by the embodiment, and the scheme of the embodiment is specifically as follows:

by utilizing the powerful AI (Artificial Intelligence) operational capability of the local NPU, the voice of each person in the recording file can be recognized and processed into characters to be output, and marks can be made, so that the user can see who says each sentence.

The beneficial effects of the embodiment are as follows: the voice recognition is processed through a local NPU operation algorithm, and is not required to be uploaded to a network server side, so that the privacy safety is guaranteed, and the safety of a local recording file is ensured.

Example six:

on the basis of the first embodiment, the method further comprises the following steps: recording the target scene to obtain recording information, and storing the recording information to form a local recording file, wherein the recording of the target scene comprises the following steps:

testing whether the recording equipment for recording the target scene can work normally, wherein the testing step comprises the following steps:

activating a recording program arranged in the recording equipment to enable the recording equipment to collect a target audio frequency and store the target audio frequency, and simultaneously activating a playing program arranged in the recording equipment to enable the recording equipment to outwards output an electric signal related to the target audio frequency for playing;

extracting first audio features (a first audio frequency and a first audio amplitude) of each first frequency node of the target audio, and simultaneously extracting second audio features (a second audio frequency and a second audio amplitude) of each second frequency node of the playing audio;

establishing an incidence relation between the first frequency node and the second frequency node;

comparing and analyzing the audio numerical values between the first audio frequency and the second audio frequency and between the first audio amplitude and the second audio amplitude one by one on the basis of an audio comparison algorithm through the incidence relation;

determining whether the recording equipment is normal or not according to the comparison and analysis result;

if the target scene is normal, controlling the recording equipment to record the target scene;

otherwise, determining an abnormal audio segment according to the comparison and analysis result, acquiring an abnormal log of the abnormal audio segment based on the log database, acquiring an abnormal solution according to the abnormal log, and outputting the abnormal solution to the target device for displaying.

The working principle of this embodiment is: the first frequency node of the target audio and the second frequency node playing the audio are subjected to frequency and amplitude one-to-one comparison analysis, when the comparison analysis result shows that the first audio frequency is consistent with the second audio frequency and the first audio amplitude is consistent with the second audio amplitude, the recording equipment is normal, otherwise, an abnormal audio segment in the second frequency node is obtained, a related abnormal solution is obtained, the abnormal solution is output to the target equipment to be displayed, and the abnormal problem is solved conveniently in time.

In this embodiment, as shown in fig. 3, for example, the first frequency nodes of the target audio are a1, a2, a3, a 4; the second audio nodes playing audio are b1, b2, b3, b 4;

at this time, a1 and b1, a2 and b2, a3 and b3, and a4 and b4 are in one-to-one correspondence, and the corresponding frequencies and amplitudes are also in one-to-one correspondence.

In this embodiment, the target scene may be a scene recorded in a meeting room.

The beneficial effects of the embodiment are as follows: the frequency and amplitude of the first frequency node of the target audio and the second frequency node of the played audio are compared and analyzed one by one, whether the recording equipment is normal or not is convenient to determine, and the bronze drum displays an abnormal solution, so that the abnormal problem is convenient to solve in time.

Example seven:

based on the sixth embodiment, the method further comprises the following steps: in a target scene, rotating m microphones in the recording equipment according to interval angles, and acquiring sound source directions of sound channels corresponding to the m microphones corresponding to each rotation angle

Wherein n1 represents the number of rotations of the rotation angle, and the interval angle

，

Wherein, in the step (A),

denotes that the ith microphone rotates the sound source direction of the corresponding sound channel at the jth time, and i =1,2,3, ·, m;

configuring a predetermined sound frequency range for each sound channel

And calculating the phase difference between adjacent channels

、

；

；

Wherein the content of the first and second substances,

representing a phase difference function of an ith sound channel and an (i-1) th sound channel;

representing a phase difference function of the (i + 1) th sound channel and the ith sound channel;

representing a phase difference of an ith sound channel and an ith-1 sound channel;

represents a phase difference between the i +1 th sound channel and the i-th sound channel;

the sound channels are arranged in one-to-one correspondence with the microphones;

estimating the actual distance between the current positions of m microphones corresponding to different rotation angles and the sound source position of the sound source point in the sound source direction

And determining a sound source estimation value of the sound source point

；

；

；

Wherein the content of the first and second substances,

representing the actual distance between the current position of the microphone and the sound source position of the sound source point of the sound source direction after the jth rotation of the ith microphone;

representing the current position coordinates of the ith microphone after the jth rotation;

sound source position coordinates of a sound source point of a sound source direction corresponding to the ith microphone after the jth rotation; v represents a sound transmission speed;

representing the sound frequency normalization result of the ith microphone based on the microphone after the jth rotation;

representing the inertia entropy value of the microphone in the j rotation process of the ith microphone;

according to the phase difference

、

And sound source estimation value

Detecting whether the corresponding microphone is qualified;

；

wherein D represents a sound source qualification value of the microphone;

when the D is within a preset qualified range, judging that the microphone is qualified;

when the qualified microphones need to record voice based on m microphones, establishing Bluetooth communication connection between the recording devices;

when the microphone is judged to be unqualified, alarming and warning are carried out;

after Bluetooth communication connection between the sound recording devices is established, monitoring all first users in the target scene, and determining the current position and the current angle of each first user;

determining a corresponding microphone to be started based on the current position and the current angle;

meanwhile, controlling the microphone to be started to start working based on the Bluetooth communication technology;

when the current position or the current angle of a second user in the first user changes, determining a first microphone corresponding to the second user, determining whether the first microphone is consistent with a microphone to be started, and if so, keeping the original microphone to be started to continue working;

otherwise, adjusting and controlling a second microphone in the original microphones to be started;

the method comprises the steps that during recording, audio information collected by a recording device is obtained, wherein the audio information comprises a first audio and a second audio, the first audio is sound information of a target scene, and the second audio is device information of the recording device;

after the audio information is collected, the audio signal is clipped, idle audio is clipped, and meanwhile, the sound information and the equipment information are separated based on an audio separation algorithm to obtain final audio.

The beneficial effects of the embodiment are as follows: and adjusting qualified microphones by inspecting the microphones, wherein in the process of inspecting the microphones, firstly, rotating the microphones and obtaining the sound source directions of the sound channels corresponding to the same microphone at different rotation angles, secondly, configuring frequency for each channel, calculating phase difference between adjacent channels, then calculating actual distance between current position of microphone and sound source position of sound source point in sound source direction to determine sound source estimation value of sound source point, finally determining whether microphone is qualified according to sound source estimation value and phase difference to effectively ensure validity of microphone in use, in the process of adjusting and controlling the microphones, the opening of the corresponding microphones is flexibly controlled according to the direction and angle changes of the user, the efficiency of collecting audio information is improved, and the recording reliability is ensured.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims

1. A method for processing audio recordings, comprising:

acquiring user requirements of a target user;

performing voice processing on the pre-recorded local recording file based on a voice processing algorithm according to the user requirement, and outputting required content;

the recording processing method further comprises the following steps: recording the target scene to obtain recording information, and storing the recording information to form a local recording file, wherein the recording of the target scene comprises the following steps:

extracting a first audio characteristic of each first frequency node of the target audio, and simultaneously extracting a second audio characteristic of each second frequency node of the playing audio;

2. The recording processing method of claim 1, wherein the step of performing voice processing on the pre-recorded recording file based on a voice processing algorithm according to the user requirement and outputting the required content comprises:

and outputting the required content after the voice processing.

3. The recording processing method of claim 2, wherein the voice processing of the local recording file based on a voice processing algorithm built in the NPU comprises:

4. The recording processing method of claim 2, wherein the voice processing of the local recording file based on a voice processing algorithm built in the NPU comprises:

receiving a keyword input by a target user;

and simultaneously, storing all the output voices related to the keywords.

5. The recording processing method of claim 2, wherein the voice processing of the local recording file based on a voice processing algorithm built in the NPU comprises:

6. The recording processing method of claim 2, wherein the voice processing of the local recording file based on a voice processing algorithm built in the NPU comprises:

7. The sound recording processing method of any one of claims 2-6 wherein the NPU speech processing function based on a speech processing algorithm comprises: any one or more of feature extraction of user voice based on the local sound recording file, intelligent noise reduction of voice based on the local sound recording file, feature matching of keywords based on the local sound recording file, and intelligent interception of voice-free intervals based on the local sound recording file.

8. The recording processing method of any of claims 1-2, wherein the output of the desired content comprises: outputting all voices of appointed persons based on the local sound recording file, outputting all voices related to keywords input by a target user based on the local sound recording file, outputting character information marked according to individuals in the local sound recording file or outputting residual contents without conversation interval parts based on the local sound recording file.