CN111341301B - Recording processing method - Google Patents

Recording processing method Download PDF

Info

Publication number
CN111341301B
CN111341301B CN202010421975.7A CN202010421975A CN111341301B CN 111341301 B CN111341301 B CN 111341301B CN 202010421975 A CN202010421975 A CN 202010421975A CN 111341301 B CN111341301 B CN 111341301B
Authority
CN
China
Prior art keywords
recording
voice processing
local
voice
npu
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202010421975.7A
Other languages
Chinese (zh)
Other versions
CN111341301A (en
Inventor
马强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Xiaomi Mobile Software Co Ltd
Original Assignee
Beijing Xiaomi Mobile Software Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Xiaomi Mobile Software Co Ltd filed Critical Beijing Xiaomi Mobile Software Co Ltd
Priority to CN202010421975.7A priority Critical patent/CN111341301B/en
Publication of CN111341301A publication Critical patent/CN111341301A/en
Application granted granted Critical
Publication of CN111341301B publication Critical patent/CN111341301B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S5/00Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations
    • G01S5/18Position-fixing by co-ordinating two or more direction or position line determinations; Position-fixing by co-ordinating two or more distance determinations using ultrasonic, sonic, or infrasonic waves
    • G01S5/22Position of source determined by co-ordinating a plurality of position lines defined by path-difference measurements
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering

Abstract

The invention provides a recording processing method, which comprises the following steps: acquiring user requirements of a target user; performing voice processing on a pre-recorded local recording file based on a voice processing algorithm according to user requirements, and outputting required contents, wherein a CPU (central processing unit) of target equipment issues a voice processing command to an NPU (network provider unit) of the target equipment according to the user requirements; after the NPU receives the voice processing command, the NPU performs voice processing on the local recording file based on a built-in voice processing algorithm in the NPU; and outputting the required content after the voice processing. Through the processing of the local NPU to the recording, the user can obtain various desired functions very conveniently, and the efficiency is improved.

Description

Recording processing method
Technical Field
The invention relates to the technical field of sound recording, in particular to a sound recording processing method.
Background
In daily life, we often use recording functions, such as: recording conversations, recording meetings and the like, and generally, the recorded contents need to be arranged into notes or required information needs to be searched from the stored recorded contents, but the recording time is long or short, if the recording time is too long, it takes a long time to listen to the recorded contents again, or the recorded contents are searched according to time stamps, but in the searching process, the searching process becomes complicated and troublesome because the specific time stamp of the required information is uncertain, therefore, the invention provides a recording processing method.
Disclosure of Invention
The invention provides a recording processing method which is used for outputting effective required content according to user requirements and a voice processing algorithm and meeting the requirements of users on recording processing.
The invention provides a recording processing method, which comprises the following steps:
acquiring user requirements of a target user;
and performing voice processing on the pre-recorded local recording file based on a voice processing algorithm according to the user requirement, and outputting the required content.
Preferably, the step of performing voice processing on the pre-recorded recording file according to the user requirement and based on a voice processing algorithm and outputting the required content comprises:
the CPU of the target equipment issues a voice processing command to the NPU of the target equipment according to the user requirement;
after receiving the voice processing command, the NPU performs voice processing on the local recording file based on a voice processing algorithm built in the NPU;
and outputting the required content after the voice processing.
Preferably, the process of performing voice processing on the local sound recording file based on a built-in voice processing algorithm in the NPU includes:
the NPU extracts the voice characteristics of each user in the local recording file according to a voice processing algorithm, and extracts and outputs all the voices of each user independently according to the voice characteristics;
and according to the user requirements, the target user can listen to all the voices of the appointed person which are extracted and output independently.
Preferably, the process of performing voice processing on the local sound recording file based on a built-in voice processing algorithm in the NPU includes:
receiving a keyword input by a target user;
the NPU determines the word characteristics of the keywords according to the voice processing algorithm, retrieves the local recording file according to the word characteristics and outputs all voices related to the keywords;
and simultaneously, storing all the output voices related to the keywords.
Preferably, the process of performing voice processing on the local sound recording file based on a built-in voice processing algorithm in the NPU includes:
the NPU judges whether a voice-free interval exists in the local recording file or not according to a voice processing algorithm;
and if so, deleting the voice-free interval and outputting the local recording file without idle waiting time.
Preferably, the process of performing voice processing on the local sound recording file based on a built-in voice processing algorithm in the NPU includes:
the NPU processes all voice recognition of each user corresponding to the local recording file into text information according to a voice processing algorithm and outputs the text information;
and simultaneously, correspondingly marking the user characteristics on the output text information.
Preferably, the voice processing function of the NPU based on the voice processing algorithm includes: any one or more of feature extraction of user voice based on the local sound recording file, intelligent noise reduction of voice based on the local sound recording file, feature matching of keywords based on the local sound recording file, and intelligent interception of voice-free intervals based on the local sound recording file.
Preferably, the output demand content includes: outputting all voices of appointed persons based on the local sound recording file, outputting all voices related to keywords input by a target user based on the local sound recording file, outputting character information marked according to individuals in the local sound recording file or outputting residual contents without conversation interval parts based on the local sound recording file.
The invention has the beneficial effects that:
1. the local NPU has high processing speed and is not limited by the network speed.
2. Through the processing of the local NPU to the recording, the user can obtain various desired functions very conveniently, and the efficiency is improved.
3. The voice recognition is processed through a local NPU operation algorithm, and is not required to be uploaded to a network server side, so that the privacy safety is guaranteed, and the safety of a local recording file is ensured.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.
The technical solution of the present invention is further described in detail by the accompanying drawings and embodiments.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
FIG. 1 is a flowchart illustrating a recording processing method according to an embodiment of the present invention;
FIG. 2 is a block diagram of a recording method according to an embodiment of the present invention;
fig. 3 is a diagram of a sixth embodiment of the present invention.
Detailed Description
The preferred embodiments of the present invention will be described in conjunction with the accompanying drawings, and it will be understood that they are described herein for the purpose of illustration and explanation and not limitation.
The first embodiment is as follows:
the invention provides a recording processing method, as shown in fig. 1-2, comprising:
step 1: acquiring user requirements of a target user;
step 2: and performing voice processing on the pre-recorded local recording file based on a voice processing algorithm according to the user requirement, and outputting the required content.
Preferably, the step of performing voice processing on the pre-recorded recording file according to the user requirement and based on a voice processing algorithm and outputting the required content comprises:
the CPU of the target equipment issues a voice processing command to the NPU of the target equipment according to the user requirement;
after receiving the voice processing command, the NPU performs voice processing on the local recording file based on a voice processing algorithm built in the NPU;
and outputting the required content after the voice processing.
Preferably, the voice processing function of the NPU based on the voice processing algorithm includes: any one or more of feature extraction of user voice based on the local sound recording file, intelligent noise reduction of voice based on the local sound recording file, feature matching of keywords based on the local sound recording file, and intelligent interception of voice-free intervals based on the local sound recording file.
Preferably, the output demand content includes: outputting all voices of appointed persons based on the local sound recording file, outputting all voices related to keywords input by a target user based on the local sound recording file, outputting character information marked according to individuals in the local sound recording file or outputting residual contents without conversation interval parts based on the local sound recording file.
In this embodiment, a CPU (central processing unit); NPU (neural-processing units).
In this embodiment, the target device may be a mobile phone, for example, a CPU in the mobile phone may issue a command to an NPU according to various requirements of a user, where various voice processing algorithms are built in the NPU, and the NPU processes a sound recording file and outputs various contents, where the various contents are required contents.
The user requirement in this embodiment is, for example, extracting the voice of the user a in the local recording file, and for example, acquiring the voice related to the keyword "mobile phone".
The beneficial effects of the embodiment are as follows:
1. the local NPU has high processing speed and is not limited by the network speed.
2. Through the processing of the local NPU to the recording, the user can obtain various desired functions very conveniently, and the efficiency is improved.
Example two:
on the basis of the first embodiment, the method comprises the following steps of,
preferably, the process of performing voice processing on the local sound recording file based on a built-in voice processing algorithm in the NPU includes:
the NPU extracts the voice characteristics of each user in the local recording file according to a voice processing algorithm, and extracts and outputs all the voices of each user independently according to the voice characteristics;
and according to the user requirements, the target user can listen to all the voices of the appointed person which are extracted and output independently.
In this embodiment, because in the local recording file, everyone's pronunciation all is different, the NPU can extract the pronunciation characteristic of everyone in the recording file through the algorithm to extract output alone with all pronunciation of everyone, the user can listen to everyone's speech content that wants to listen according to the demand of oneself.
The beneficial effects of the embodiment are as follows: the requirement content can be extracted conveniently according to the feature extraction, and then the requirement of the user can be met.
Example three:
on the basis of the first embodiment, the method comprises the following steps of,
preferably, the process of performing voice processing on the local sound recording file based on a built-in voice processing algorithm in the NPU includes:
receiving a keyword input by a target user;
the NPU determines the word characteristics of the keywords according to the voice processing algorithm, retrieves the local recording file according to the word characteristics and outputs all voices related to the keywords;
and simultaneously, storing all the output voices related to the keywords.
In this embodiment, the NPU may search the entire voice file according to the features of the keyword, and search all the features that meet the specified keyword, that is, all the voices related to the "mobile phone" may be realized.
The beneficial effects of the embodiment are as follows: and the related voice of the key words is obtained through the characteristic matching retrieval, so that the user can conveniently search in the future.
Example four:
on the basis of the first embodiment, the method comprises the following steps of,
preferably, the process of performing voice processing on the local sound recording file based on a built-in voice processing algorithm in the NPU includes:
the NPU judges whether a voice-free interval exists in the local recording file or not according to a voice processing algorithm;
and if so, deleting the voice-free interval and outputting the local recording file without idle waiting time.
In this embodiment, all the local audio files may contain a large number of intervals without speech, which results in huge redundancy of the audio files, and in addition, a large amount of time waiting for speech is wasted during playback.
At this time, the NPU may determine the interval portion without voice according to the voice processing algorithm, and delete the interval portion without voice, so that the whole course of the recording file is the voice content without idle waiting time.
The beneficial effects of the embodiment are as follows: through the no voice part of intelligence intercepting, improve pronunciation playback efficiency, improve user experience effect.
Example five:
on the basis of the first embodiment, the method comprises the following steps of,
preferably, the process of performing voice processing on the local sound recording file based on a built-in voice processing algorithm in the NPU includes:
the NPU processes all voice recognition of each user corresponding to the local recording file into text information according to a voice processing algorithm and outputs the text information;
and simultaneously, correspondingly marking the user characteristics on the output text information.
Because the current speech recognition character output is realized by a network server, a recording file needs to be uploaded to the server, and the recording file is processed by the server and then returned to a user, but the scheme has network security risk, and the recording file of the user is possibly private under many conditions and does not want to be uploaded to the network server; the other is that the network is dependent on the influence of the network speed on the processing speed, and the network returns results for a long time if the network is slow. Therefore, the technical problem can be effectively solved by adopting the scheme provided by the embodiment, and the scheme of the embodiment is specifically as follows:
by utilizing the powerful AI (Artificial Intelligence) operational capability of the local NPU, the voice of each person in the recording file can be recognized and processed into characters to be output, and marks can be made, so that the user can see who says each sentence.
The beneficial effects of the embodiment are as follows: the voice recognition is processed through a local NPU operation algorithm, and is not required to be uploaded to a network server side, so that the privacy safety is guaranteed, and the safety of a local recording file is ensured.
Example six:
on the basis of the first embodiment, the method further comprises the following steps: recording the target scene to obtain recording information, and storing the recording information to form a local recording file, wherein the recording of the target scene comprises the following steps:
testing whether the recording equipment for recording the target scene can work normally, wherein the testing step comprises the following steps:
activating a recording program arranged in the recording equipment to enable the recording equipment to collect a target audio frequency and store the target audio frequency, and simultaneously activating a playing program arranged in the recording equipment to enable the recording equipment to outwards output an electric signal related to the target audio frequency for playing;
extracting first audio features (a first audio frequency and a first audio amplitude) of each first frequency node of the target audio, and simultaneously extracting second audio features (a second audio frequency and a second audio amplitude) of each second frequency node of the playing audio;
establishing an incidence relation between the first frequency node and the second frequency node;
comparing and analyzing the audio numerical values between the first audio frequency and the second audio frequency and between the first audio amplitude and the second audio amplitude one by one on the basis of an audio comparison algorithm through the incidence relation;
determining whether the recording equipment is normal or not according to the comparison and analysis result;
if the target scene is normal, controlling the recording equipment to record the target scene;
otherwise, determining an abnormal audio segment according to the comparison and analysis result, acquiring an abnormal log of the abnormal audio segment based on the log database, acquiring an abnormal solution according to the abnormal log, and outputting the abnormal solution to the target device for displaying.
The working principle of this embodiment is: the first frequency node of the target audio and the second frequency node playing the audio are subjected to frequency and amplitude one-to-one comparison analysis, when the comparison analysis result shows that the first audio frequency is consistent with the second audio frequency and the first audio amplitude is consistent with the second audio amplitude, the recording equipment is normal, otherwise, an abnormal audio segment in the second frequency node is obtained, a related abnormal solution is obtained, the abnormal solution is output to the target equipment to be displayed, and the abnormal problem is solved conveniently in time.
In this embodiment, as shown in fig. 3, for example, the first frequency nodes of the target audio are a1, a2, a3, a 4; the second audio nodes playing audio are b1, b2, b3, b 4;
at this time, a1 and b1, a2 and b2, a3 and b3, and a4 and b4 are in one-to-one correspondence, and the corresponding frequencies and amplitudes are also in one-to-one correspondence.
In this embodiment, the target scene may be a scene recorded in a meeting room.
The beneficial effects of the embodiment are as follows: the frequency and amplitude of the first frequency node of the target audio and the second frequency node of the played audio are compared and analyzed one by one, whether the recording equipment is normal or not is convenient to determine, and the bronze drum displays an abnormal solution, so that the abnormal problem is convenient to solve in time.
Example seven:
based on the sixth embodiment, the method further comprises the following steps: in a target scene, rotating m microphones in the recording equipment according to interval angles, and acquiring sound source directions of sound channels corresponding to the m microphones corresponding to each rotation angle
Figure 639190DEST_PATH_IMAGE001
Wherein n1 represents the number of rotations of the rotation angle, and the interval angle
Figure 812682DEST_PATH_IMAGE002
Figure 387145DEST_PATH_IMAGE003
Wherein, in the step (A),
Figure 355101DEST_PATH_IMAGE004
denotes that the ith microphone rotates the sound source direction of the corresponding sound channel at the jth time, and i =1,2,3, ·, m;
configuring a predetermined sound frequency range for each sound channel
Figure 513550DEST_PATH_IMAGE005
And calculating the phase difference between adjacent channels
Figure 959575DEST_PATH_IMAGE006
Figure 152659DEST_PATH_IMAGE007
Figure 291516DEST_PATH_IMAGE008
Figure 874944DEST_PATH_IMAGE009
Wherein the content of the first and second substances,
Figure 186977DEST_PATH_IMAGE010
representing a phase difference function of an ith sound channel and an (i-1) th sound channel;
Figure 172250DEST_PATH_IMAGE011
representing a phase difference function of the (i + 1) th sound channel and the ith sound channel;
Figure 45791DEST_PATH_IMAGE012
representing a phase difference of an ith sound channel and an ith-1 sound channel;
Figure 116515DEST_PATH_IMAGE013
represents a phase difference between the i +1 th sound channel and the i-th sound channel;
the sound channels are arranged in one-to-one correspondence with the microphones;
estimating the actual distance between the current positions of m microphones corresponding to different rotation angles and the sound source position of the sound source point in the sound source direction
Figure 232239DEST_PATH_IMAGE014
And determining a sound source estimation value of the sound source point
Figure DEST_PATH_IMAGE015
Figure 134335DEST_PATH_IMAGE016
Figure 614995DEST_PATH_IMAGE017
Wherein the content of the first and second substances,
Figure 907436DEST_PATH_IMAGE018
representing the actual distance between the current position of the microphone and the sound source position of the sound source point of the sound source direction after the jth rotation of the ith microphone;
Figure 826851DEST_PATH_IMAGE019
representing the current position coordinates of the ith microphone after the jth rotation;
Figure 521137DEST_PATH_IMAGE020
sound source position coordinates of a sound source point of a sound source direction corresponding to the ith microphone after the jth rotation; v represents a sound transmission speed;
Figure 736480DEST_PATH_IMAGE021
representing the sound frequency normalization result of the ith microphone based on the microphone after the jth rotation;
Figure 844114DEST_PATH_IMAGE022
representing the inertia entropy value of the microphone in the j rotation process of the ith microphone;
according to the phase difference
Figure 239323DEST_PATH_IMAGE023
Figure 850433DEST_PATH_IMAGE024
And sound source estimation value
Figure 938475DEST_PATH_IMAGE025
Detecting whether the corresponding microphone is qualified;
Figure 267825DEST_PATH_IMAGE026
wherein D represents a sound source qualification value of the microphone;
when the D is within a preset qualified range, judging that the microphone is qualified;
when the qualified microphones need to record voice based on m microphones, establishing Bluetooth communication connection between the recording devices;
when the microphone is judged to be unqualified, alarming and warning are carried out;
after Bluetooth communication connection between the sound recording devices is established, monitoring all first users in the target scene, and determining the current position and the current angle of each first user;
determining a corresponding microphone to be started based on the current position and the current angle;
meanwhile, controlling the microphone to be started to start working based on the Bluetooth communication technology;
when the current position or the current angle of a second user in the first user changes, determining a first microphone corresponding to the second user, determining whether the first microphone is consistent with a microphone to be started, and if so, keeping the original microphone to be started to continue working;
otherwise, adjusting and controlling a second microphone in the original microphones to be started;
the method comprises the steps that during recording, audio information collected by a recording device is obtained, wherein the audio information comprises a first audio and a second audio, the first audio is sound information of a target scene, and the second audio is device information of the recording device;
after the audio information is collected, the audio signal is clipped, idle audio is clipped, and meanwhile, the sound information and the equipment information are separated based on an audio separation algorithm to obtain final audio.
The beneficial effects of the embodiment are as follows: and adjusting qualified microphones by inspecting the microphones, wherein in the process of inspecting the microphones, firstly, rotating the microphones and obtaining the sound source directions of the sound channels corresponding to the same microphone at different rotation angles, secondly, configuring frequency for each channel, calculating phase difference between adjacent channels, then calculating actual distance between current position of microphone and sound source position of sound source point in sound source direction to determine sound source estimation value of sound source point, finally determining whether microphone is qualified according to sound source estimation value and phase difference to effectively ensure validity of microphone in use, in the process of adjusting and controlling the microphones, the opening of the corresponding microphones is flexibly controlled according to the direction and angle changes of the user, the efficiency of collecting audio information is improved, and the recording reliability is ensured.
It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

Claims (8)

1. A method for processing audio recordings, comprising:
acquiring user requirements of a target user;
performing voice processing on the pre-recorded local recording file based on a voice processing algorithm according to the user requirement, and outputting required content;
the recording processing method further comprises the following steps: recording the target scene to obtain recording information, and storing the recording information to form a local recording file, wherein the recording of the target scene comprises the following steps:
testing whether the recording equipment for recording the target scene can work normally, wherein the testing step comprises the following steps:
activating a recording program arranged in the recording equipment to enable the recording equipment to collect a target audio frequency and store the target audio frequency, and simultaneously activating a playing program arranged in the recording equipment to enable the recording equipment to outwards output an electric signal related to the target audio frequency for playing;
extracting a first audio characteristic of each first frequency node of the target audio, and simultaneously extracting a second audio characteristic of each second frequency node of the playing audio;
establishing an incidence relation between the first frequency node and the second frequency node;
comparing and analyzing the audio numerical values between the first audio frequency and the second audio frequency and between the first audio amplitude and the second audio amplitude one by one on the basis of an audio comparison algorithm through the incidence relation;
determining whether the recording equipment is normal or not according to the comparison and analysis result;
if the target scene is normal, controlling the recording equipment to record the target scene;
otherwise, determining an abnormal audio segment according to the comparison and analysis result, acquiring an abnormal log of the abnormal audio segment based on the log database, acquiring an abnormal solution according to the abnormal log, and outputting the abnormal solution to the target device for displaying.
2. The recording processing method of claim 1, wherein the step of performing voice processing on the pre-recorded recording file based on a voice processing algorithm according to the user requirement and outputting the required content comprises:
the CPU of the target equipment issues a voice processing command to the NPU of the target equipment according to the user requirement;
after receiving the voice processing command, the NPU performs voice processing on the local recording file based on a voice processing algorithm built in the NPU;
and outputting the required content after the voice processing.
3. The recording processing method of claim 2, wherein the voice processing of the local recording file based on a voice processing algorithm built in the NPU comprises:
the NPU extracts the voice characteristics of each user in the local recording file according to a voice processing algorithm, and extracts and outputs all the voices of each user independently according to the voice characteristics;
and according to the user requirements, the target user can listen to all the voices of the appointed person which are extracted and output independently.
4. The recording processing method of claim 2, wherein the voice processing of the local recording file based on a voice processing algorithm built in the NPU comprises:
receiving a keyword input by a target user;
the NPU determines the word characteristics of the keywords according to the voice processing algorithm, retrieves the local recording file according to the word characteristics and outputs all voices related to the keywords;
and simultaneously, storing all the output voices related to the keywords.
5. The recording processing method of claim 2, wherein the voice processing of the local recording file based on a voice processing algorithm built in the NPU comprises:
the NPU judges whether a voice-free interval exists in the local recording file or not according to a voice processing algorithm;
and if so, deleting the voice-free interval and outputting the local recording file without idle waiting time.
6. The recording processing method of claim 2, wherein the voice processing of the local recording file based on a voice processing algorithm built in the NPU comprises:
the NPU processes all voice recognition of each user corresponding to the local recording file into text information according to a voice processing algorithm and outputs the text information;
and simultaneously, correspondingly marking the user characteristics on the output text information.
7. The sound recording processing method of any one of claims 2-6 wherein the NPU speech processing function based on a speech processing algorithm comprises: any one or more of feature extraction of user voice based on the local sound recording file, intelligent noise reduction of voice based on the local sound recording file, feature matching of keywords based on the local sound recording file, and intelligent interception of voice-free intervals based on the local sound recording file.
8. The recording processing method of any of claims 1-2, wherein the output of the desired content comprises: outputting all voices of appointed persons based on the local sound recording file, outputting all voices related to keywords input by a target user based on the local sound recording file, outputting character information marked according to individuals in the local sound recording file or outputting residual contents without conversation interval parts based on the local sound recording file.
CN202010421975.7A 2020-05-19 2020-05-19 Recording processing method Active CN111341301B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010421975.7A CN111341301B (en) 2020-05-19 2020-05-19 Recording processing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010421975.7A CN111341301B (en) 2020-05-19 2020-05-19 Recording processing method

Publications (2)

Publication Number Publication Date
CN111341301A CN111341301A (en) 2020-06-26
CN111341301B true CN111341301B (en) 2020-09-04

Family

ID=71186502

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010421975.7A Active CN111341301B (en) 2020-05-19 2020-05-19 Recording processing method

Country Status (1)

Country Link
CN (1) CN111341301B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113611296A (en) * 2021-08-20 2021-11-05 天津讯飞极智科技有限公司 Speech recognition apparatus and sound pickup device
CN113674744A (en) * 2021-08-20 2021-11-19 天津讯飞极智科技有限公司 Voice transcription method, device, pickup transcription equipment and storage medium

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104679729B (en) * 2015-02-13 2018-06-26 广州市讯飞樽鸿信息技术有限公司 Message just recorded validity processing method and system
CN208141826U (en) * 2018-05-22 2018-11-23 出门问问信息科技有限公司 A kind of voice identified off-line device
CN108986826A (en) * 2018-08-14 2018-12-11 中国平安人寿保险股份有限公司 Automatically generate method, electronic device and the readable storage medium storing program for executing of minutes
US20200075044A1 (en) * 2018-08-31 2020-03-05 CloudMinds Technology, Inc. System and method for performing multi-model automatic speech recognition in challenging acoustic environments
CN110675862A (en) * 2019-09-25 2020-01-10 招商局金融科技有限公司 Corpus acquisition method, electronic device and storage medium
CN111128199A (en) * 2019-12-27 2020-05-08 中国人民解放军陆军工程大学 Sensitive speaker monitoring and recording control method and system based on deep learning

Also Published As

Publication number Publication date
CN111341301A (en) 2020-06-26

Similar Documents

Publication Publication Date Title
CN110322869B (en) Conference character-division speech synthesis method, device, computer equipment and storage medium
CN108399923B (en) More human hairs call the turn spokesman's recognition methods and device
US9813551B2 (en) Multi-party conversation analyzer and logger
US8078463B2 (en) Method and apparatus for speaker spotting
CN107274916B (en) Method and device for operating audio/video file based on voiceprint information
WO2020211354A1 (en) Speaker identity recognition method and device based on speech content, and storage medium
US20110004473A1 (en) Apparatus and method for enhanced speech recognition
CN107492153B (en) Attendance system, method, attendance server and attendance terminal
CN111341301B (en) Recording processing method
US20220238118A1 (en) Apparatus for processing an audio signal for the generation of a multimedia file with speech transcription
US20120155663A1 (en) Fast speaker hunting in lawful interception systems
CN113327619B (en) Conference recording method and system based on cloud-edge collaborative architecture
Siantikos et al. Fusing multiple audio sensors for acoustic event detection
CN113744742A (en) Role identification method, device and system in conversation scene
Fabien et al. Open-Set Speaker Identification pipeline in live criminal investigations
Yakovlev et al. LRPD: Large Replay Parallel Dataset
WO2014085985A1 (en) Call transcription system and method
Cardaioli et al. For Your Voice Only: Exploiting Side Channels in Voice Messaging for Environment Detection
Chunyu et al. Identification and restoration of transformed voice by fusing CNN and Transformer encoder.
CN115312057A (en) Conference interaction method and device, computer equipment and storage medium
Khazaleh et al. An investigation into the reliability of speaker recognition schemes: analysing the impact of environmental factors utilising deep learning techniques
TR2023019166A2 (en) A SONG FINDING SYSTEM
EP1662483A1 (en) Method and apparatus for speaker spotting
CN116206611A (en) Method and device for automatically constructing case-related voiceprint library
Sze LYU 0202 Advanced Audio Information Retrieval System

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant