CN110010130A - A kind of intelligent method towards participant's simultaneous voice transcription text - Google Patents
A kind of intelligent method towards participant's simultaneous voice transcription text Download PDFInfo
- Publication number
- CN110010130A CN110010130A CN201910263845.2A CN201910263845A CN110010130A CN 110010130 A CN110010130 A CN 110010130A CN 201910263845 A CN201910263845 A CN 201910263845A CN 110010130 A CN110010130 A CN 110010130A
- Authority
- CN
- China
- Prior art keywords
- participant
- control centre
- spectrogram
- information
- voice
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/20—Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
Abstract
The invention discloses a kind of intelligent methods towards participant's simultaneous voice transcription text, the following steps are included: participant microphone of registering to control centre the position of typing oneself, name information in a manner of voice, control centre stores the voice messaging of input by the sequence of input, and control centre pre-processes the voice messaging of typing;Pretreated voice messaging is converted into spectrogram and text information, and spectrogram and text information are stored in the information bank of the participant;Control centre's spectrogram is divided into several groups framing frequency spectrum;Control centre carries out feature extraction to framing frequency spectrum;Store participant's spectral energy values difference DelN;Identify spokesman's identity;Spectral energy values difference DelNf is calculated, and is compared;Formation forms minutes document.The present invention can be realized meeting overall process record, not can recognize that the identity for the spokesman that attends a meeting.
Description
Technical field
The invention belongs to intelligent sound technical fields, and in particular to a kind of intelligence towards participant's simultaneous voice transcription text
It can method.
Background technique
For some important meetings, needs to record the full content of meeting, be consumed by the way of manual record
Take manpower, the existing technology recorded automatically to conference content, the voice signal for usually issuing participant are direct at present
Text character is converted into be saved.
It realizes in process of the present invention, at least there are the following problems in the related technology for inventor's discovery: by the voice of participant
It is lengthy that signal is directly changed into the minutes that text character is saved and formed, it is difficult to identify each spokesman
Any content said.
Summary of the invention
It is an object of the invention to overcome above-mentioned the deficiencies in the prior art, one kind is provided towards participant's simultaneous voice transcription
The intelligent method of text.
A kind of intelligent method towards participant's simultaneous voice transcription text, it is characterised in that: the following steps are included:
1) participant microphone of registering to control centre the position of typing oneself, name information in a manner of voice, in control
The heart stores the voice messaging of input by the sequence of input, and control centre pre-processes the voice messaging of typing;
2) pretreated voice messaging is converted into spectrogram and text information, and spectrogram and text information are stored in this and attended a meeting
The information bank of person;
3) spectrogram of each participant is carried out framing by control centre at a fixed time interval, spectrogram is divided into several
Component frame frequency spectrum;
4) control centre to framing frequency spectrum carry out feature extraction, the project of feature extraction include: framing frequency spectrum mass center Ci(i=1,
2 ... n), spectral energy values Ni(i=1,2 ... n), spectral energy values difference DelNd;
5) participant's spectral energy values difference DelN is stored in the information bank of the participant;
6) when participant makes a speech, by the microphone on attending a banquet to control centre by input speech voice, voice messaging of making a speech is through controlling
Center processed is pre-processed, and pretreated speech voice messaging is converted into speech spectrogram and text information of making a speech, speech text
Word information is stored into minutes document;
7) control centre presses step 3), 4) calculates spectral energy values difference DelNf;
8) DelNf is compared with the DelNd being stored in participant's information bank for control centre, DelNf and participant's information
When the threshold value between DelNd in library is less than setting value, spokesman's identity is determined, and increase in the front of minutes document
Spokesman's name;
9) after completing meeting, minutes document is formed, and print, sign.
Preferably, spectral energy values Ni in the step 4)=, DelNd=Ni-N(i-1).
Preferably, the pre-treatment step is to remove noise, signal amplification.
Preferably, the fixed time interval is 10-20ms.
Compared with prior art, beneficial effects of the present invention:
In the use of the present invention, the present invention knows otherwise to confirm the identity of each participant by tone color, so as to
Conference content is corresponded to each participant, minutes is avoided to be difficult to differentiate the defect of speaker;Pass through pretreatment
Technology, to be removed dryness and be amplified to signal, it is ensured that the accuracy of signal;The sound of participant is identified by intelligent automated manner
Color has the advantages that accuracy is good;Participant is when carrying out typing identity, it is only necessary to carry out once, in the later period in use, i.e.
It does not need to carry out typing.
Specific embodiment
A kind of intelligent method towards participant's simultaneous voice transcription text, it is characterised in that: the following steps are included:
1) participant microphone of registering to control centre the position of typing oneself, name information in a manner of voice, in control
The heart stores the voice messaging of input by the sequence of input, and control centre pre-processes the voice messaging of typing;
2) pretreated voice messaging is converted into spectrogram and text information, and spectrogram and text information are stored in this and attended a meeting
The information bank of person;
3) spectrogram of each participant is carried out framing by control centre at a fixed time interval, spectrogram is divided into several
Component frame frequency spectrum;
4) control centre to framing frequency spectrum carry out feature extraction, the project of feature extraction include: framing frequency spectrum mass center Ci(i=1,
2 ... n), spectral energy values Ni(i=1,2 ... n), spectral energy values difference DelNd;
5) participant's spectral energy values difference DelN is stored in the information bank of the participant;
6) when participant makes a speech, by the microphone on attending a banquet to control centre by input speech voice, voice messaging of making a speech is through controlling
Center processed is pre-processed, and pretreated speech voice messaging is converted into speech spectrogram and text information of making a speech, speech text
Word information is stored into minutes document;
7) control centre presses step 3), 4) calculates spectral energy values difference DelNf;
8) DelNf is compared with the DelNd being stored in participant's information bank for control centre, DelNf and participant's information
When the threshold value between DelNd in library is less than setting value, spokesman's identity is determined, and increase in the front of minutes document
Spokesman's name;
9) after completing meeting, minutes document is formed, and print, sign.
Preferably, spectral energy values Ni=* MERGEFORMAT, DelNd=Ni-N(i-1 in the step 4)).
Preferably, the pre-treatment step is to remove noise, signal amplification.
Preferably, the fixed time interval is 10-20ms.
The working principle of the invention is:
In the use of the present invention, by calculating spectral energy values difference, to identify the identity of spokesman.
It should be noted that present invention specific implementation is not subject to the restrictions described above, as long as using side of the invention
The various unsubstantialities that method conception and technical scheme carry out improve, or the not improved conception and technical scheme by invention are directly answered
It is within the scope of the present invention for other occasions.
Claims (4)
1. a kind of intelligent method towards participant's simultaneous voice transcription text, it is characterised in that: the following steps are included:
1) participant microphone of registering to control centre the position of typing oneself, name information in a manner of voice, in control
The heart stores the voice messaging of input by the sequence of input, and control centre pre-processes the voice messaging of typing;
2) pretreated voice messaging is converted into spectrogram and text information, and spectrogram and text information are stored in this and attended a meeting
The information bank of person;
3) spectrogram of each participant is carried out framing by control centre at a fixed time interval, spectrogram is divided into several
Component frame frequency spectrum;
4) control centre to framing frequency spectrum carry out feature extraction, the project of feature extraction include: framing frequency spectrum mass center Ci(i=1,
2 ... n), spectral energy values Ni(i=1,2 ... n), spectral energy values difference DelNd;
5) participant's spectral energy values difference DelN is stored in the information bank of the participant;
6) when participant makes a speech, by the microphone on attending a banquet to control centre by input speech voice, voice messaging of making a speech is through controlling
Center processed is pre-processed, and pretreated speech voice messaging is converted into speech spectrogram and text information of making a speech, speech text
Word information is stored into minutes document;
7) control centre presses step 3), 4) calculates spectral energy values difference DelNf;
8) DelNf is compared with the DelNd being stored in participant's information bank for control centre, DelNf and participant's information
When the threshold value between DelNd in library is less than setting value, spokesman's identity is determined, and increase in the front of minutes document
Spokesman's name;
9) after completing meeting, minutes document is formed, and print, sign.
2. a kind of intelligent method towards participant's simultaneous voice transcription text as described in claim 1, it is characterised in that: institute
State spectral energy values Ni in step 4)=, DelNd=Ni-N(i-1).
3. a kind of intelligent method towards participant's simultaneous voice transcription text as described in claim 1, it is characterised in that: institute
Stating pre-treatment step is to remove noise, signal amplification.
4. a kind of intelligent method towards participant's simultaneous voice transcription text as described in claim 1, it is characterised in that: institute
Stating fixed time interval is 10-20ms.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910263845.2A CN110010130A (en) | 2019-04-03 | 2019-04-03 | A kind of intelligent method towards participant's simultaneous voice transcription text |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910263845.2A CN110010130A (en) | 2019-04-03 | 2019-04-03 | A kind of intelligent method towards participant's simultaneous voice transcription text |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110010130A true CN110010130A (en) | 2019-07-12 |
Family
ID=67169523
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910263845.2A Pending CN110010130A (en) | 2019-04-03 | 2019-04-03 | A kind of intelligent method towards participant's simultaneous voice transcription text |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110010130A (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110600039A (en) * | 2019-09-27 | 2019-12-20 | 百度在线网络技术(北京)有限公司 | Speaker attribute determination method and device, electronic equipment and readable storage medium |
CN113660378A (en) * | 2020-05-12 | 2021-11-16 | 宁波维度数字科技有限公司 | Intelligent voice automatic conference record generation system |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130144603A1 (en) * | 2011-12-01 | 2013-06-06 | Richard T. Lord | Enhanced voice conferencing with history |
US20170277784A1 (en) * | 2016-03-22 | 2017-09-28 | International Business Machines Corporation | Audio summarization of meetings driven by user participation |
CN107452166A (en) * | 2017-06-27 | 2017-12-08 | 长江大学 | A kind of library book-borrowing method and device based on Application on Voiceprint Recognition |
CN108022583A (en) * | 2017-11-17 | 2018-05-11 | 平安科技(深圳)有限公司 | Meeting summary generation method, application server and computer-readable recording medium |
CN109388701A (en) * | 2018-08-17 | 2019-02-26 | 深圳壹账通智能科技有限公司 | Minutes generation method, device, equipment and computer storage medium |
-
2019
- 2019-04-03 CN CN201910263845.2A patent/CN110010130A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130144603A1 (en) * | 2011-12-01 | 2013-06-06 | Richard T. Lord | Enhanced voice conferencing with history |
US20170277784A1 (en) * | 2016-03-22 | 2017-09-28 | International Business Machines Corporation | Audio summarization of meetings driven by user participation |
CN107452166A (en) * | 2017-06-27 | 2017-12-08 | 长江大学 | A kind of library book-borrowing method and device based on Application on Voiceprint Recognition |
CN108022583A (en) * | 2017-11-17 | 2018-05-11 | 平安科技(深圳)有限公司 | Meeting summary generation method, application server and computer-readable recording medium |
CN109388701A (en) * | 2018-08-17 | 2019-02-26 | 深圳壹账通智能科技有限公司 | Minutes generation method, device, equipment and computer storage medium |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110600039A (en) * | 2019-09-27 | 2019-12-20 | 百度在线网络技术(北京)有限公司 | Speaker attribute determination method and device, electronic equipment and readable storage medium |
CN110600039B (en) * | 2019-09-27 | 2022-05-20 | 百度在线网络技术(北京)有限公司 | Method and device for determining speaker attribute, electronic equipment and readable storage medium |
CN113660378A (en) * | 2020-05-12 | 2021-11-16 | 宁波维度数字科技有限公司 | Intelligent voice automatic conference record generation system |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103236260B (en) | Speech recognition system | |
TW201824250A (en) | Method and apparatus for speaker diarization | |
CN108922518A (en) | voice data amplification method and system | |
CN108986824B (en) | Playback voice detection method | |
JP2017207770A (en) | System and method for fingerprinting datasets | |
CN104091603B (en) | Endpoint detection system and its computational methods based on fundamental frequency | |
CN105244023A (en) | System and method for reminding teacher emotion in classroom teaching | |
CN104883437B (en) | The method and system of speech analysis adjustment reminding sound volume based on environment | |
WO2021082572A1 (en) | Wake-up model generation method, smart terminal wake-up method, and devices | |
CN108766441A (en) | A kind of sound control method and device based on offline Application on Voiceprint Recognition and speech recognition | |
CN110010130A (en) | A kind of intelligent method towards participant's simultaneous voice transcription text | |
CN109256150A (en) | Speech emotion recognition system and method based on machine learning | |
CN106297776A (en) | A kind of voice keyword retrieval method based on audio template | |
Sun et al. | Speaker diarization system for RT07 and RT09 meeting room audio | |
CN104103272B (en) | Audio recognition method, device and bluetooth earphone | |
CN107705791A (en) | Caller identity confirmation method, device and Voiceprint Recognition System based on Application on Voiceprint Recognition | |
CN108806698A (en) | A kind of camouflage audio recognition method based on convolutional neural networks | |
CN110136709A (en) | Audio recognition method and video conferencing system based on speech recognition | |
CN101625858B (en) | Method for extracting short-time energy frequency value in voice endpoint detection | |
CN108735200A (en) | A kind of speaker's automatic marking method | |
CN110148419A (en) | Speech separating method based on deep learning | |
CN109377986B (en) | Non-parallel corpus voice personalized conversion method | |
CN109274819A (en) | User emotion method of adjustment, device, mobile terminal and storage medium when call | |
WO2023088083A1 (en) | Speech enhancement method and apparatus | |
CN106887231A (en) | A kind of identification model update method and system and intelligent terminal |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20190712 |