CN110335612A - Minutes generation method, device and storage medium based on speech recognition - Google Patents
Minutes generation method, device and storage medium based on speech recognition Download PDFInfo
- Publication number
- CN110335612A CN110335612A CN201910627403.1A CN201910627403A CN110335612A CN 110335612 A CN110335612 A CN 110335612A CN 201910627403 A CN201910627403 A CN 201910627403A CN 110335612 A CN110335612 A CN 110335612A
- Authority
- CN
- China
- Prior art keywords
- audio
- minutes
- sentence
- voice segments
- converted
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
- G06F16/63—Querying
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/04—Training, enrolment or model building
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
- G10L17/22—Interactive procedures; Man-machine interfaces
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
- G10L2015/088—Word spotting
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
- G10L2015/223—Execution procedure of a spoken command
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Multimedia (AREA)
- Human Computer Interaction (AREA)
- Health & Medical Sciences (AREA)
- Acoustics & Sound (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Document Processing Apparatus (AREA)
Abstract
The minutes generation method based on speech recognition that disclosed herein a kind of obtains audio to be converted this method comprises: receiving the minutes that user issues generates instruction;Sentence division is carried out to the audio to be converted, obtains the audio sentence of the audio to be converted;Vocal print feature is extracted from the audio sentence identified respectively, the corresponding vocal print feature of each audio sentence is compared with default vocal print feature library, determine the corresponding speaker's identity information of each audio sentence, and the audio sentence is divided by voice segments according to the speaker's identity information, determine the corresponding voice segments set of the audio to be converted;The corresponding target voice identification model of each voice segments is called, the corresponding text of each voice segments is successively obtained;And generate the corresponding minutes of the audio to be converted.The present invention is also disclosed that a kind of electronic device and computer storage medium.Using the present invention, the accuracy and efficiency of minutes generation can be improved.
Description
Technical field
The present invention relates to Internet technical field more particularly to a kind of minutes generation method based on speech recognition,
Electronic device and computer readable storage medium.
Background technique
Currently, the writing mode of minutes is main are as follows: firstly, meeting on-the-spot record keyword;Secondly, in meeting after meeting
Keyword is found in view recording and keyword hard of hearing is nearby recorded and expanded keyword is to form minutes.But due to key
There is no corresponding relationship between word and recording, record personnel need to find by artificial positioning repeatedly when ransacing keyword after the meeting,
The time is expended, operation is also more troublesome, further, only manually hard of hearing if the same keyword occurs repeatedly in meeting
The case where recording positioning is likely to occur location of mistake, causes minutes misregistration occur.
To solve the above-mentioned problems, occur relying on Voice Conversion Techniques on the market at present and automatically generate minutes text
This minutes product, however, this existing minutes product is usually simple speech-to-text product, voice turns
The accuracy rate changed cannot ensure that record personnel obtained after use is a long text, it and session recording have no hook,
In addition speech-to-text technology is not mature enough, often record personnel after taking text because turn text it is wrong it is more have no way of doing it, most
Afterwards or it can only go to complete minutes by the mode of artificial playback.
Therefore, how convenient, accurately generate minutes as a technical problem urgently to be resolved.
Summary of the invention
In view of the foregoing, the present invention provide a kind of minutes generation method based on speech recognition, electronic device and
Computer readable storage medium, main purpose are to improve the efficiency and accuracy that minutes generate.
To achieve the above object, the present invention provides a kind of minutes generation method based on speech recognition, this method packet
It includes:
Receiving step: the minutes that user issues are received and generate instruction, instruction is generated according to the minutes and obtains
Audio to be converted, alternatively, timing or obtaining audio to be converted from default store path in real time;
First partiting step: sentence division is carried out to the audio to be converted, obtains the audio sentence of the audio to be converted
Son;
Second partiting step: extracting vocal print feature from the audio sentence respectively, by the sound of each audio sentence
Line feature is compared and analyzed with default vocal print feature library, determines the corresponding speaker's identity information of each audio sentence,
And the audio sentence is divided by voice segments according to the speaker's identity information, determine the corresponding language of the audio to be converted
Segment set;
Speech recognition steps: each language is called according to the corresponding speaker's identity information of voice segments each in institute's speech segment set
Each voice segments are successively inputted corresponding target voice identification model, obtain each language by the corresponding target voice identification model of segment
The corresponding text fragments of segment, wherein the target voice identification model is carried out based on accent corpus and industry corpus
Update what training obtained;And
Generation step: merging the corresponding text fragments of each voice segments, generates the corresponding target text of the audio to be converted,
And corresponding voice segments and speaker's identity information are associated in each of the target text text fragments, described in generation
The corresponding minutes of audio to be converted.
In addition, to achieve the above object, the present invention also provides a kind of electronic device, which includes: memory, processing
Device, the minutes that be stored in the memory to run on the processor generate program, and the minutes generate
Program can realize any step in the minutes generation method based on speech recognition as described above when being executed by the processor
Suddenly.
In addition, to achieve the above object, it is described computer-readable the present invention also provides a kind of computer readable storage medium
It include that minutes generate program in storage medium, the minutes generate when program is executed by processor, it can be achieved that as above
Arbitrary steps in the minutes generation method based on speech recognition.
Minutes generation method, electronic device and computer-readable storage medium proposed by the present invention based on speech recognition
Matter, 1. carry out subordinate sentence, speech feature extraction and speaker's identity information matches by treating transducing audio, true according to matching result
Determine the corresponding voice segments set of audio to be converted, different target voice identification models is called to carry out voice to each voice segments respectively
Identification, improves the efficiency and accuracy rate of speech recognition, lays the foundation to be subsequently generated the minutes of complete and accurate;2. passing through
Training pattern is updated using speaker's accent corpus and industry corpus, improves the accuracy of speech recognition;3. passing through association
Speaker's identity information, voice segments, text fragments, keyword etc. generate minutes, improve minutes integrality and
Convenience.
Detailed description of the invention
Fig. 1 is that the present invention is based on the flow charts of the minutes generation method preferred embodiment of speech recognition;
Fig. 2 is the schematic diagram of electronic device preferred embodiment of the present invention;
Fig. 3 is the program module schematic diagram that minutes generate program preferred embodiment in Fig. 2.
The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.
Specific embodiment
It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not intended to limit the present invention.
The present invention provides a kind of minutes generation method based on speech recognition.This method can be held by a device
Row, which can be by software and or hardware realization.
Shown in referring to Fig.1, for the present invention is based on the flow charts of the minutes generation method preferred embodiment of speech recognition.
In one embodiment of minutes generation method the present invention is based on speech recognition, this method only includes: step S1-
Step S5.
Step S1, receive user issue minutes generate instruction, according to the minutes generate instruction obtain to
Transducing audio, alternatively, timing or obtaining audio to be converted from default store path in real time.
In the following description, based on electronic device, various embodiments of the present invention are illustrated.
In the present embodiment, user issues minutes to electronic device by terminal and generates instruction, wherein in described instruction
Including audio to be converted;Above-mentioned audio to be converted is the speech audio recorded in conference process, can be and passes through words by user
The speech ciphering equipments such as cylinder are inputted and are saved, alternatively, the voice information paper downloaded from the Internet by user or locally imported.It is above-mentioned default
Store path is not limited only to the database for storing minutes related audio.
Above-mentioned timing or the step of obtaining audio to be converted from default store path in real time include: timing (every morning
9:00, every afternoon 5:30) judge to whether there is unconverted minutes related audio in store path, if so,
Using unconverted minutes related audio as audio to be converted, if it is not, then audio to be converted is not present in judgement.Alternatively,
When one section of minutes related audio is written in default store path, then as audio to be converted and read out,
To execute subsequent step.
Step S2 carries out sentence division to the audio to be converted based on default sentence division rule, obtains described wait turn
Change the audio sentence of audio.
Treating the purpose that transducing audio carries out sentence cutting is the short sentence for being easier to carry out speech recognition in order to obtain, is improved
The subsequent accuracy converted the audio into as text.In the present embodiment, described to be based on default sentence division rule to described wait turn
It changes audio and carries out sentence division, obtain the audio sentence of the audio to be converted, comprising:
First in a1, the identification audio to be converted pauses, at the beginning of record first pauses and the end time;
A2, the first sentence in the audio to be converted is identified, and the end time that first is paused is as first
At the beginning of son;
A3, identification second pause, record second pause at the beginning of and the end time, and will second pause at the beginning of
Between end time as the first sentence, realize the division of the first sentence;
A4, above-mentioned steps are successively executed, until the audio to be converted terminates, obtains all sounds of the audio to be converted
Frequency sentence.
Wherein, the first pause, second pause including mute section, non-speech segment in audio to be converted;First sentence be to
The voice segments of transducing audio.It should be noted that first pauses and second pauses only for the corresponding pause of differentiation different time.
It is understood that the division result of audio sentence and the accuracy rate that follow audio is converted are closely bound up, audio sentence
Son division accuracy rate is higher, and the accuracy rate of audio conversion is higher.In the present embodiment, each, which pauses, has minimum length limitation,
For ignoring short sound information, such as the instantaneous ventilation of speaker etc., to protect the integrality of a word;Division each of obtains
Sentence is limited with minimum length, for filtering out the invalid information in short-term in audio, for example, the cough etc. of speaker;Meanwhile
Dividing obtained each sentence also has maximum length limitation, and for limiting the length of sentence, the conversion for improving follow audio is quasi-
True rate.
Step S3 extracts vocal print feature from the audio sentence respectively, by the vocal print feature of each audio sentence
It is compared and analyzed with default vocal print feature library, determines the corresponding speaker's identity information of each audio sentence, and according to
The audio sentence is divided into voice segments by the speaker's identity information, determines the corresponding voice segments collection of the audio to be converted
It closes.
It include: the vocal print feature of each employee of company P by taking company P as an example, in above-mentioned default vocal print feature library and corresponding
Worker's information.Above-mentioned speaker's identity information includes: speaker's name, native place, accent etc..
In the present embodiment, described " the audio sentence is divided by voice segments according to the speaker's identity information "
Step includes:
The identical audio sentence of temporally adjacent and corresponding speaker's identity information is merged and generates a voice segments, and root
The beginning and ending time at least one the audio sentence for including according to institute's speech segment determines the beginning and ending time of institute's speech segment.
Wherein, the beginning and ending time at least one audio sentence that the beginning and ending time of each voice segments is contained by it is determining, example
Such as, using the initial time of first audio sentence of a voice segments as the initial time of the voice segments, the last one audio
Termination time of the termination time of sentence as the voice segments.
It include: voice segments and the corresponding speaker's identity information of each voice segments in voice segments set.For example, audio sentence is drawn
The result got according to chronological order successively are as follows: sentence 1, sentence 2, sentence 3, sentence 4, sentence 5;Each audio sentence pair
The speaker answered is respectively as follows: first, second, second, third, second;So, final voice segments set include: voice segments 1 (sentence 1),
First }, { voice segments 2 (sentence 2, sentence 3), second }, { voice segments 3 (sentence 4), third }, { voice segments 4 (sentence 5), second }.
It is understood that needing periodically to be updated default vocal print feature library, to improve the efficiency of vocal print feature comparison.
In addition, extraction vocal print feature and vocal print are more mature than peer to peer technology from audio, do not repeat here.
In other embodiments, the audio to be converted can also be the voice by the real-time typing of microphone, by pre-
First microphone signal channel is numbered, and microphone signal channel number and speaker's identity need to be predefined before meeting
The corresponding relationship of information.It, can also be logical by microphone signal when audio to be converted is the voice by the real-time typing of microphone
Road number confirms corresponding speaker's identity information, does not repeat here.
Above-mentioned steps pass through the identity for determining speaker, on the one hand determine the corresponding spokesman of each audio sentence, facilitate
The integrality of minutes;On the other hand, optimal voice transformation model is called convenient for the subsequent identity information according to spokesman,
To improve the accuracy of voice conversion.
Step S4 calls each voice segments pair according to the corresponding speaker's identity information of voice segments each in institute's speech segment set
Each voice segments are successively inputted corresponding target voice identification model, obtain each voice segments pair by the target voice identification model answered
The text fragments answered, wherein the target voice identification model is to be updated instruction based on accent corpus and industry corpus
It gets;
In order to improve the accuracy of speech recognition, the target voice identification model is in general speech recognition modeling
On the basis of carried out twice update training:
1) training is updated to general speech recognition modeling according to speaker's accent (that is, language feature), obtains
One voice transformation model, first speech recognition modeling are determined by following steps:
Accent is divided into several major class, for example, without accent (that is, standard mandarin), Beijing accent, Shandong accent, Guangdong
The corresponding recorded audio of all kinds of accents is collected in accent, Hunan accent, Sichuan accent etc. respectively;
The corresponding recorded audio of all kinds of accents is pre-processed, leave out be not easy, the inconvenient segment understood, and by remaining piece
Section is converted to writing text, obtains the corpus of all kinds of accents;
Processed audio and writing text are sent into general speech recognition modeling, so that model, which obtains, is directed to specific mouth
The optimization of sound;
In meeting actual scene, it may be found that transcription error segment be re-fed into model and carry out re-optimization, respectively
To corresponding first speech recognition modeling of each accent classification.
2) training is updated according to company, industrial nature the first speech recognition modeling corresponding to all kinds of accents, obtained
Second voice transformation model, second speech recognition modeling are determined by following steps:
The company of compiling/industry special-purpose word list, saves in the form of text;
So that special messenger is read aloud above-mentioned special-purpose word with all kinds of accents, forms the corresponding audio file of all kinds of accents;
Form that text is matched with audio file is sent into corresponding first speech recognition modeling of all kinds of accents to instruct
Practice, so that each first speech recognition modeling, which obtains, is directed to the optimization of specific company/industry;
In meeting actual scene, more relevant to proprietary name word corpus it will be sent into model and carry out re-optimization, point
Corresponding second speech recognition modeling of each accent classification is not obtained.
For example, determining the corresponding artificial first of speaking of current speech segment 1 by Application on Voiceprint Recognition, determined according to the identity information of first
Its accent is Shandong, obtains the corresponding second voice transformation model of Shandong accent as target voice identification model.
Above-mentioned steps by training general voice transformation model, and according to speaker before carrying out audio conversion in advance
Accent feature training is updated to speech recognition modeling, to improve voice transformation model to the identification energy of the voice of speaker
Power, while training is updated to voice transformation model also according to company/industrial nature, voice transformation model is improved to company spy
Determine the recognition capability of business voice.
Step S5 merges the corresponding text fragments of each voice segments, generates the corresponding target text of the audio to be converted, and
Be associated with corresponding voice segments and speaker's identity information in each of the target text text fragments, generate it is described to
The corresponding minutes of transducing audio.
For example, successively obtaining the corresponding text of each voice segments in upper speech segment set is respectively as follows: text 1, text 2, text
This 3, text 4, text 5, merge splicing to the text of acquisition and obtain the corresponding target text of audio to be converted.Then, root
Beginning and ending time according to each voice segments is from intercepting corresponding voice segments and text piece corresponding in target text in voice segments to be converted
Section is associated, that is to say, that each text fragments marks corresponding speaker information and its correspondence in target text
Voice segments, to generate minutes, the minutes saved, which are simultaneously pushed to minutes and generate instruction, to be corresponded to
Terminal.
In the present embodiment, speaker information, voice segments are associated in the form of hyperlink with text fragments.
By being associated with information above in minutes, convenient for minutes manager correlative segment hard of hearing and adjustment meeting
View record.
The minutes generation method based on speech recognition that above-described embodiment proposes, is divided by treating transducing audio
Sentence, speech feature extraction and matching, determine the corresponding voice segments of audio to be converted according to matching result, call different mesh respectively
It marks speech recognition modeling and speech recognition is carried out to each voice segments, the efficiency and accuracy rate of speech recognition are improved, to improve
The accuracy rate that minutes generate;Meanwhile by being associated with speaker's identity information, voice segments, text fragments, meeting note is generated
Record, improves the integrality and convenience of minutes.
Further, in order to improve the conversion accuracy of audio to be converted, the present invention is based on the meeting of speech recognition notes
It records in another embodiment of generation method, before step S2, this method further include: the audio to be converted is pre-processed,
Obtain pretreated audio to be converted.
In general meeting, due to the influence of ambient enviroment, different noises can be generated, it is therefore desirable to minutes pair
The audio to be converted answered is pre-processed.The pretreatment includes but are not limited to:
B1, echo cancellation process is carried out;For example, echo canceling method can be used, it can also pass through estimation echo letter
Number size, then receive signal in subtract the estimated value to offset echo;
B2, beam forming processing is carried out;For example, the voice messaging of user is acquired in different direction by multiple microphones,
Determine the direction of sound source.According to the weighted of different direction, it is weighted summation.For example, the weight ratio of Sounnd source direction other
The sound weight in orientation is bigger, to guarantee the voice messaging of enhancing user's input, weakens the influence of other sound;
B3, noise reduction process is carried out;Such as: it can first pass through using identical as frequency noise, amplitude is identical, opposite in phase
Sound is cancelled out each other, and then eliminates reverberation using the audio plug of dereverberation or microphone array;
B4, enhancing enhanced processing is carried out;For example, amplifying processing to audio using AGC (automatic growth control) mode.
The minutes generation method based on speech recognition that above-described embodiment proposes is carried out in advance by treating transducing audio
Processing, reduces external interference, the accuracy of speech recognition can be improved, to lay good base to be subsequently generated minutes
Plinth.
Further, in order to keep minutes apparent, in the minutes generation side of the invention based on speech recognition
In another embodiment of method, this method further include:
The minutes are segmented, the list after being segmented, and identified from the list after the participle
Keyword;
The corresponding text fragments set of each keyword is determined respectively, is believed according to the corresponding speaker's identity of each text fragments
Breath classifies to the text fragments set, and according to chronological order to each keyword and the corresponding text of each keyword
Segment is ranked up, the text fragments set after obtaining the corresponding sequence of each keyword.
Wherein, the step of above-mentioned " segmenting to minutes " includes: a) based on default vocabulary to the minutes
It is matched, the first list after being segmented, wherein vocabulary is the cooperation peculiar special-purpose word of the prepared company of corporate business
With the proprietary word of company the industry;B) for remaining text, using the segmenting method based on understanding and based on statistics
Segmenting method segments the remaining text of step a), the second list after being segmented;C) remove asemantic stop words,
Third list such as ' ', ' ', after being segmented;D) merge above-mentioned first list, second list and third list, obtain
List after final participle.Text is finally switched to the list for including multiple words by participle.
The step of above-mentioned " identifying keyword from the list after the participle " includes: a) to calculate in the list each
The information value of word, for example, tf-idf value (term frequency-inverse document frequency, word frequency-
Reverse document-frequency);B) judge whether the information value of each word is greater than or equal to preset threshold respectively, information value is big
It is determined as keyword in or equal to the word of preset threshold, wherein preset threshold can be adjusted according to actual needs.
Assuming that keyword A, B, C are identified from minutes, and by taking keyword A as an example, keyword A in above-mentioned ranking results
It include: the corresponding text fragments 1 of speaker's first, the corresponding text fragments 2 of speaker's second, text in corresponding text fragments set
Segment 4, the corresponding text fragments 3 of speaker third.
It further, can also include each text fragments pair in the text fragments set after the corresponding sequence of each keyword
The voice segments answered are convenient for minutes manager and inquiry's correlative segment hard of hearing by associated text segment and voice segments.
The minutes generation method based on speech recognition that above-described embodiment proposes passes through association speaker's identity letter
Breath, voice segments, text fragments, keyword etc. generate minutes, improve the integrality and convenience of minutes.
In minutes generation method another embodiment of the invention based on speech recognition, this method further include:
The minutes that response user issues check instruction, show the minutes to user;And/or
The text fragments clicking operation that user issues is responded, shows the corresponding related information of the text fragments to user;
For example, related information includes: keyword, speaker's identity information, the link of corresponding voice segments, voice segments are linked when the user clicks
When, play current speech segment;And/or
It responds the minutes that user issues and modifies instruction, instructed based on the modification and update and save the meeting note
Record;And/or
Respond the inquiry instruction for the carrying inquiry field that user issues, inquiry and the inquiry word from the minutes
The text fragments inquired and corresponding related information are fed back to the user in a preset form by the matched text fragments of section.Its
In, inquiry field can be keyword and may not be keyword, inquire field match query can be fuzzy search can also
To be semantic searching, do not repeat here.After being matched to corresponding text fragments, in a preset form (for example, dendrogram, or
Person is according to chronological order etc.) all text fragments corresponding with inquiry field and related information are shown to user, for example, closing
Keyword, speaker's identity information, the link of corresponding voice segments etc..
The present invention also proposes a kind of electronic device.It is the signal of electronic device preferred embodiment of the present invention referring to shown in Fig. 2
Figure.
In the present embodiment, electronic device 1 can be server, smart phone, tablet computer, portable computer, on table
The terminal device having data processing function such as type computer, the server can be rack-mount server, blade type service
Device, tower server or Cabinet-type server.
The electronic device 1 includes memory 11, processor 12 and network interface 13.
Wherein, memory 11 include at least a type of readable storage medium storing program for executing, the readable storage medium storing program for executing include flash memory,
Hard disk, multimedia card, card-type memory (for example, SD or DX memory etc.), magnetic storage, disk, CD etc..Memory 11
It can be the internal storage unit of the electronic device 1, such as the hard disk of the electronic device 1 in some embodiments.Memory
11 are also possible to be equipped on the External memory equipment of the electronic device 1, such as the electronic device 1 in further embodiments
Plug-in type hard disk, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card dodge
Deposit card (Flash Card) etc..Further, memory 11 can also both include the internal storage unit of the electronic device 1 or wrap
Include External memory equipment.
Memory 11 can be not only used for the application software and Various types of data that storage is installed on the electronic device 1, for example, meeting
Record generator 10 etc. is discussed, can be also used for temporarily storing the data that has exported or will export.
Processor 12 can be in some embodiments a central processing unit (Central Processing Unit,
CPU), controller, microcontroller, microprocessor or other data processing chips, the program for being stored in run memory 11
Code or processing data, for example, minutes generate program 10 etc..
Network interface 13 optionally may include standard wireline interface and wireless interface (such as WI-FI interface), be commonly used in
Communication connection is established between the electronic device 1 and other electronic equipments, for example, minutes manager and minutes inquiry
The terminal that person uses.The component 11-13 of electronic device 1 is in communication with each other by communication bus.
Fig. 2 illustrates only the electronic device 1 with component 11-13, it will be appreciated by persons skilled in the art that Fig. 2 shows
Structure out does not constitute the restriction to electronic device 1, may include than illustrating less perhaps more components or combining certain
A little components or different component layouts.
Optionally, the electronic device 1 can also include user interface, user interface may include display (Display),
Input unit such as keyboard (Keyboard), optional user interface can also include standard wireline interface and wireless interface.
Optionally, in some embodiments, display can be light-emitting diode display, liquid crystal display, touch control type LCD and show
Device and Organic Light Emitting Diode (Organic Light-Emitting Diode, OLED) touch device etc..Wherein, display
It is properly termed as display screen or display unit, for showing the information handled in the electronic apparatus 1 and for showing visually
User interface.
In 1 embodiment of electronic device shown in Fig. 2, as storage meeting in a kind of memory 11 of computer storage medium
The program code of record generator 10 is discussed to realize such as when processor 12 executes the program code of minutes generation program 10
Lower step:
Receiving step: the minutes that user issues are received and generate instruction, instruction is generated according to the minutes and obtains
Audio to be converted, alternatively, timing or obtaining audio to be converted from default store path in real time.
In the present embodiment, user issues minutes to electronic device 1 by terminal and generates instruction, wherein described instruction
In include audio to be converted;Above-mentioned audio to be converted is the speech audio recorded in conference process, can be and is passed through by user
The speech ciphering equipments such as microphone are inputted and are saved, alternatively, the voice information paper downloaded from the Internet by user or locally imported.It is above-mentioned pre-
If store path is not limited only to the database for storing minutes related audio.
Above-mentioned timing or the step of obtaining audio to be converted from default store path in real time include: timing (every morning
9:00, every afternoon 5:30) judge to whether there is unconverted minutes related audio in store path, if so,
Using unconverted minutes related audio as audio to be converted, if it is not, then audio to be converted is not present in judgement.Alternatively,
When one section of minutes related audio is written in default store path, then as audio to be converted and read out,
To execute subsequent step.
First partiting step: sentence division is carried out to the audio to be converted based on default sentence division rule, obtains institute
State the audio sentence of audio to be converted.
Treating the purpose that transducing audio carries out sentence cutting is the short sentence for being easier to carry out speech recognition in order to obtain, is improved
The subsequent accuracy converted the audio into as text.In the present embodiment, described to be based on default sentence division rule to described wait turn
It changes audio and carries out sentence division, obtain the audio sentence of the audio to be converted, comprising:
First in a1, the identification audio to be converted pauses, at the beginning of record first pauses and the end time;
A2, the first sentence in the audio to be converted is identified, and the end time that first is paused is as first
At the beginning of son;
A3, identification second pause, record second pause at the beginning of and the end time, and will second pause at the beginning of
Between end time as the first sentence, realize the division of the first sentence;
A4, above-mentioned steps are successively executed, until the audio to be converted terminates, obtains all sounds of the audio to be converted
Frequency sentence.
Wherein, the first pause, second pause including mute section, non-speech segment in audio to be converted;First sentence be to
The voice segments of transducing audio.It should be noted that first pauses and second pauses only for the corresponding pause of differentiation different time.
It is understood that the division result of audio sentence and the accuracy rate that follow audio is converted are closely bound up, audio sentence
Son division accuracy rate is higher, and the accuracy rate of audio conversion is higher.In the present embodiment, each, which pauses, has minimum length limitation,
For ignoring short sound information, such as the instantaneous ventilation of speaker etc., to protect the integrality of a word;Division each of obtains
Sentence is limited with minimum length, for filtering out the invalid information in short-term in audio, for example, the cough etc. of speaker;Meanwhile
Dividing obtained each sentence also has maximum length limitation, and for limiting the length of sentence, the conversion for improving follow audio is quasi-
True rate.
Second partiting step: extracting vocal print feature from the audio sentence respectively, by the sound of each audio sentence
Line feature is compared and analyzed with default vocal print feature library, determines the corresponding speaker's identity information of each audio sentence,
And the audio sentence is divided by voice segments according to the speaker's identity information, determine the corresponding language of the audio to be converted
Segment set.
It include: the vocal print feature of each employee of company P by taking company P as an example, in above-mentioned default vocal print feature library and corresponding
Worker's information.Above-mentioned speaker's identity information includes: speaker's name, native place, accent etc..
In the present embodiment, described " the audio sentence is divided by voice segments according to the speaker's identity information "
Step includes:
The identical audio sentence of temporally adjacent and corresponding speaker's identity information is merged and generates a voice segments, and root
The beginning and ending time at least one the audio sentence for including according to institute's speech segment determines the beginning and ending time of institute's speech segment.
Wherein, the beginning and ending time at least one audio sentence that the beginning and ending time of each voice segments is contained by it is determining, example
Such as, using the initial time of first audio sentence of a voice segments as the initial time of the voice segments, the last one audio
Termination time of the termination time of sentence as the voice segments.
It include: voice segments and the corresponding speaker's identity information of each voice segments in voice segments set.For example, audio sentence is drawn
The result got according to chronological order successively are as follows: sentence 1, sentence 2, sentence 3, sentence 4, sentence 5;Each audio sentence pair
The speaker answered is respectively as follows: first, second, second, third, second;So, final voice segments set include: voice segments 1 (sentence 1),
First }, { voice segments 2 (sentence 2, sentence 3), second }, { voice segments 3 (sentence 4), third }, { voice segments 4 (sentence 5), second }.
It is understood that needing periodically to be updated default vocal print feature library, to improve the efficiency of vocal print feature comparison.
In addition, extraction vocal print feature and vocal print are more mature than peer to peer technology from audio, do not repeat here.
In other embodiments, the audio to be converted can also be the voice by the real-time typing of microphone, by pre-
First microphone signal channel is numbered, and microphone signal channel number and speaker's identity need to be predefined before meeting
The corresponding relationship of information.It, can also be logical by microphone signal when audio to be converted is the voice by the real-time typing of microphone
Road number confirms corresponding speaker's identity information, does not repeat here.
Above-mentioned steps pass through the identity for determining speaker, on the one hand determine the corresponding spokesman of each audio sentence, facilitate
The integrality of minutes;On the other hand, optimal voice transformation model is called convenient for the subsequent identity information according to spokesman,
To improve the accuracy of voice conversion.
Speech recognition steps: each language is called according to the corresponding speaker's identity information of voice segments each in institute's speech segment set
Each voice segments are successively inputted corresponding target voice identification model, obtain each language by the corresponding target voice identification model of segment
The corresponding text fragments of segment, wherein the target voice identification model is carried out based on accent corpus and industry corpus
Update what training obtained;
In order to improve the accuracy of speech recognition, the target voice identification model is in general speech recognition modeling
On the basis of carried out twice update training:
1) training is updated to general speech recognition modeling according to speaker's accent (that is, language feature), obtains
One voice transformation model, first speech recognition modeling are determined by following steps:
Accent is divided into several major class, for example, without accent (that is, standard mandarin), Beijing accent, Shandong accent, Guangdong
The corresponding recorded audio of all kinds of accents is collected in accent, Hunan accent, Sichuan accent etc. respectively;
The corresponding recorded audio of all kinds of accents is pre-processed, leave out be not easy, the inconvenient segment understood, and by remaining piece
Section is converted to writing text, obtains the corpus of all kinds of accents;
Processed audio and writing text are sent into general speech recognition modeling, so that model, which obtains, is directed to specific mouth
The optimization of sound;
In meeting actual scene, it may be found that transcription error segment be re-fed into model and carry out re-optimization, respectively
To corresponding first speech recognition modeling of each accent classification.
2) training is updated according to company, industrial nature the first speech recognition modeling corresponding to all kinds of accents, obtained
Second voice transformation model, second speech recognition modeling are determined by following steps:
The company of compiling/industry special-purpose word list, saves in the form of text;
So that special messenger is read aloud above-mentioned special-purpose word with all kinds of accents, forms the corresponding audio file of all kinds of accents;
Form that text is matched with audio file is sent into corresponding first speech recognition modeling of all kinds of accents to instruct
Practice, so that each first speech recognition modeling, which obtains, is directed to the optimization of specific company/industry;
In meeting actual scene, more relevant to proprietary name word corpus it will be sent into model and carry out re-optimization, point
Corresponding second speech recognition modeling of each accent classification is not obtained.
For example, determining the corresponding artificial first of speaking of current speech segment 1 by Application on Voiceprint Recognition, determined according to the identity information of first
Its accent is Shandong, obtains the corresponding second voice transformation model of Shandong accent as target voice identification model.
Above-mentioned steps by training general voice transformation model, and according to speaker before carrying out audio conversion in advance
Accent feature training is updated to speech recognition modeling, to improve voice transformation model to the identification energy of the voice of speaker
Power, while training is updated to voice transformation model also according to company/industrial nature, voice transformation model is improved to company spy
Determine the recognition capability of business voice.
Generation step: merging the corresponding text fragments of each voice segments, generates the corresponding target text of the audio to be converted,
And corresponding voice segments and speaker's identity information are associated in each of the target text text fragments, described in generation
The corresponding minutes of audio to be converted.
For example, successively obtaining the corresponding text of each voice segments in upper speech segment set is respectively as follows: text 1, text 2, text
This 3, text 4, text 5, merge splicing to the text of acquisition and obtain the corresponding target text of audio to be converted.Then, root
Beginning and ending time according to each voice segments is from intercepting corresponding voice segments and text piece corresponding in target text in voice segments to be converted
Section is associated, that is to say, that each text fragments marks corresponding speaker information and its correspondence in target text
Voice segments, to generate minutes, the minutes saved, which are simultaneously pushed to minutes and generate instruction, to be corresponded to
Terminal.
In the present embodiment, speaker information, voice segments are associated in the form of hyperlink with text fragments.
By being associated with information above in minutes, convenient for minutes manager correlative segment hard of hearing and adjustment meeting
View record.The electronic device 1 that above-described embodiment proposes carries out subordinate sentence, speech feature extraction and matching by treating transducing audio,
The corresponding voice segments of audio to be converted are determined according to matching result, call different target voice identification models to each voice respectively
Duan Jinhang speech recognition, improves the efficiency and accuracy rate of speech recognition, to improve the accuracy rate of minutes generation;Together
When, by being associated with speaker's identity information, voice segments, text fragments, minutes are generated, the integrality of minutes is improved
And convenience.
Further, in order to improve the conversion accuracy of audio to be converted, in another embodiment of electronic device 1 of the present invention
In, before the first partiting step, the program code that processor 12 executes minutes generation program 10 is also realized: pretreatment step
Suddenly.
Pre-treatment step: pre-processing the audio to be converted, obtains pretreated audio to be converted.
In general meeting, due to the influence of ambient enviroment, different noises can be generated, it is therefore desirable to minutes pair
The audio to be converted answered is pre-processed.The pretreatment includes but are not limited to:
B1, echo cancellation process is carried out;For example, echo canceling method can be used, it can also pass through estimation echo letter
Number size, then receive signal in subtract the estimated value to offset echo;
B2, beam forming processing is carried out;For example, the voice messaging of user is acquired in different direction by multiple microphones,
Determine the direction of sound source.According to the weighted of different direction, it is weighted summation.For example, the weight ratio of Sounnd source direction other
The sound weight in orientation is bigger, to guarantee the voice messaging of enhancing user's input, weakens the influence of other sound;
B3, noise reduction process is carried out;Such as: it can first pass through using identical as frequency noise, amplitude is identical, opposite in phase
Sound is cancelled out each other, and then eliminates reverberation using the audio plug of dereverberation or microphone array;
B4, enhancing enhanced processing is carried out;For example, amplifying processing to audio using AGC (automatic growth control) mode.
The electronic device 1 that above-described embodiment proposes, is pre-processed by treating transducing audio, reduces external interference,
The accuracy of speech recognition can be improved, to lay a good foundation to be subsequently generated minutes.
Further, in order to keep minutes apparent, in another embodiment of electronic device 1 of the present invention, processor 12
When executing the program code of minutes generation program 10, following steps are also realized:
The minutes are segmented, the list after being segmented, and identified from the list after the participle
Keyword;And
The corresponding text fragments set of each keyword is determined respectively, is believed according to the corresponding speaker's identity of each text fragments
Breath classifies to the text fragments set, and according to chronological order to each keyword and the corresponding text of each keyword
Segment is ranked up, the text fragments set after obtaining the corresponding sequence of each keyword.
Wherein, the step of above-mentioned " segmenting to minutes " includes: a) based on default vocabulary to the minutes
It is matched, the first list after being segmented, wherein vocabulary is the cooperation peculiar special-purpose word of the prepared company of corporate business
With the proprietary word of company the industry;B) for remaining text, using the segmenting method based on understanding and based on statistics
Segmenting method segments the remaining text of step a), the second list after being segmented;C) remove asemantic stop words,
Third list such as ' ', ' ', after being segmented;D) merge above-mentioned first list, second list and third list, obtain
List after final participle.Text is finally switched to the list for including multiple words by participle.
The step of above-mentioned " identifying keyword from the list after the participle " includes: a) to calculate in the list each
The information value of word, for example, tf-idf value;B) judge whether the information value of each word is greater than or equal to default threshold respectively
The word that information value is greater than or equal to preset threshold is determined as keyword, wherein preset threshold can be according to actual needs by value
It is adjusted.
Assuming that keyword A, B, C are identified from minutes, and by taking keyword A as an example, keyword A in above-mentioned ranking results
It include: the corresponding text fragments 1 of speaker's first, the corresponding text fragments 2 of speaker's second, text in corresponding text fragments set
Segment 4, the corresponding text fragments 3 of speaker third.
It further, can also include each text fragments pair in the text fragments set after the corresponding sequence of each keyword
The voice segments answered are convenient for minutes manager and inquiry's correlative segment hard of hearing by associated text segment and voice segments.
The electronic device 1 that above-described embodiment proposes, by being associated with speaker's identity information, voice segments, text fragments, key
Word etc. generates minutes, improves the integrality and convenience of minutes.
In another embodiment of electronic device 1 of the present invention, processor 12 executes the program generation that minutes generate program 10
When code, following steps are also realized:
The minutes that response user issues check instruction, show the minutes to user;And/or
The text fragments clicking operation that user issues is responded, shows the corresponding related information of the text fragments to user;
For example, related information includes: keyword, speaker's identity information, the link of corresponding voice segments, voice segments are linked when the user clicks
When, play current speech segment;And/or
It responds the minutes that user issues and modifies instruction, instructed based on the modification and update and save the meeting note
Record;And/or
Respond the inquiry instruction for the carrying inquiry field that user issues, inquiry and the inquiry word from the minutes
The text fragments inquired and corresponding related information are fed back to the user in a preset form by the matched text fragments of section.Its
In, inquiry field can be keyword and may not be keyword, inquire field match query can be fuzzy search can also
To be semantic searching, do not repeat here.After being matched to corresponding text fragments, in a preset form (for example, dendrogram, or
Person is according to chronological order etc.) all text fragments corresponding with inquiry field and related information are shown to user, for example, closing
Keyword, speaker's identity information, the link of corresponding voice segments etc..
Optionally, in other examples, minutes, which generate program 10, can also be divided into one or more
Module, one or more module are stored in memory 11, and as performed by one or more processors 12, to complete this
Invention, the so-called module of the present invention are the series of computation machine program instruction sections for referring to complete specific function.
For example, referring to the program module schematic diagram for shown in Fig. 3, being minutes generation program 10 in Fig. 2.
It is generated in 10 1 embodiment of program in the minutes, it includes: module 110- that minutes, which generate program 10,
150, in which:
Receiving module 110, the minutes for receiving user's sending generate instruction, are referred to according to minutes generation
It enables and obtains audio to be converted, alternatively, timing or obtaining audio to be converted from default store path in real time;
First division module 120, for carrying out sentence division to the audio to be converted based on default sentence division rule,
Obtain the audio sentence of the audio to be converted;
Second division module 130, for extracting vocal print feature from the audio sentence respectively, by each audio sentence
The vocal print feature of son is compared and analyzed with default vocal print feature library, determines the corresponding speaker's identity of each audio sentence
Information, and the audio sentence is divided by voice segments according to the speaker's identity information, determine the audio pair to be converted
The voice segments set answered;
Speech recognition module 140, for according to the corresponding speaker's identity information of voice segments each in institute's speech segment set
The corresponding target voice identification model of each voice segments is called, each voice segments are successively inputted into corresponding target voice identification model,
Obtain the corresponding text fragments of each voice segments, wherein the target voice identification model is based on accent corpus and jargon
Material library is updated what training obtained;And
Generation module 150 generates the corresponding mesh of the audio to be converted for merging the corresponding text fragments of each voice segments
Text is marked, and is associated with corresponding voice segments and speaker's identity information in each of the target text text fragments,
Generate the corresponding minutes of the audio to be converted.
The functions or operations step that the module 110-150 is realized is similar as above, and and will not be described here in detail.
In addition, the embodiment of the present invention also proposes a kind of computer readable storage medium, the computer readable storage medium
In include that minutes generate program 10, the minutes, which generate, realizes following operation when program 10 is executed by processor:
Receiving step: the minutes that user issues are received and generate instruction, instruction is generated according to the minutes and obtains
Audio to be converted, alternatively, timing or obtaining audio to be converted from default store path in real time;
First partiting step: sentence division is carried out to the audio to be converted based on default sentence division rule, obtains institute
State the audio sentence of audio to be converted;
Second partiting step: extracting vocal print feature from the audio sentence respectively, by the sound of each audio sentence
Line feature is compared and analyzed with default vocal print feature library, determines the corresponding speaker's identity information of each audio sentence,
And the audio sentence is divided by voice segments according to the speaker's identity information, determine the corresponding language of the audio to be converted
Segment set;
Speech recognition steps: each language is called according to the corresponding speaker's identity information of voice segments each in institute's speech segment set
Each voice segments are successively inputted corresponding target voice identification model, obtain each language by the corresponding target voice identification model of segment
The corresponding text fragments of segment, wherein the target voice identification model is carried out based on accent corpus and industry corpus
Update what training obtained;And
Generation step: merging the corresponding text fragments of each voice segments, generates the corresponding target text of the audio to be converted,
And corresponding voice segments and speaker's identity information are associated in each of the target text text fragments, described in generation
The corresponding minutes of audio to be converted.
The specific embodiment of the computer readable storage medium of the present invention and the above-mentioned minutes based on speech recognition
The specific embodiment of generation method is roughly the same, and details are not described herein.
The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.
It should be noted that, in this document, the terms "include", "comprise" or its any other variant are intended to non-row
His property includes, so that the process, device, article or the method that include a series of elements not only include those elements, and
And further include other elements that are not explicitly listed, or further include for this process, device, article or method institute it is intrinsic
Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including being somebody's turn to do
There is also other identical elements in the process, device of element, article or method.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side
Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but in many cases
The former is more preferably embodiment.Based on this understanding, technical solution of the present invention substantially in other words does the prior art
The part contributed out can be embodied in the form of software products, which is stored in one as described above
In storage medium (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that terminal device (it can be mobile phone,
Computer, server or network equipment etc.) execute method described in each embodiment of the present invention.
The above is only a preferred embodiment of the present invention, is not intended to limit the scope of the invention, all to utilize this hair
Equivalent structure or equivalent flow shift made by bright specification and accompanying drawing content is applied directly or indirectly in other relevant skills
Art field, is included within the scope of the present invention.
Claims (10)
1. a kind of minutes generation method based on speech recognition is suitable for electronic device, which is characterized in that this method packet
It includes:
Receiving step: the minutes that user issues are received and generate instruction, instruction is generated according to the minutes and obtains wait turn
Audio is changed, alternatively, timing or obtaining audio to be converted from default store path in real time;
First partiting step: carrying out sentence division to the audio to be converted based on default sentence division rule, obtain it is described to
The audio sentence of transducing audio;
Second partiting step: extracting vocal print feature from the audio sentence respectively, and the vocal print of each audio sentence is special
Sign is compared and analyzed with default vocal print feature library, determines the corresponding speaker's identity information of each audio sentence, and root
The audio sentence is divided into voice segments according to the speaker's identity information, determines the corresponding voice segments of the audio to be converted
Set;
Speech recognition steps: each voice segments are called according to the corresponding speaker's identity information of voice segments each in institute's speech segment set
Each voice segments are successively inputted corresponding target voice identification model, obtain each voice segments by corresponding target voice identification model
Corresponding text fragments, wherein the target voice identification model is updated based on accent corpus and industry corpus
What training obtained;And
Generation step: merging the corresponding text fragments of each voice segments, generates the corresponding target text of the audio to be converted, and
It is associated with corresponding voice segments and speaker's identity information in each of the target text text fragments, generates described wait turn
Change the corresponding minutes of audio.
2. the minutes generation method according to claim 1 based on speech recognition, which is characterized in that described first stroke
Include: step by step
Identify in the audio to be converted first pause, record first pause at the beginning of and the end time;
Identify the first sentence in the audio to be converted, and the end time that first is paused is as the beginning of the first sentence
Time;
Identification second pause, record second pause at the beginning of and the end time, and by second pause at the beginning of conduct
The end time of first sentence realizes the division of the first sentence;And
Above-mentioned steps are successively executed, until the audio to be converted terminates, obtain all audio sentences of the audio to be converted.
3. the minutes generation method according to claim 1 based on speech recognition, which is characterized in that described according to institute
It states speaker's identity information and the audio sentence is divided into voice segments, comprising:
The identical audio sentence of temporally adjacent and corresponding speaker's identity information is merged and generates a voice segments, and according to institute
The beginning and ending time at least one audio sentence that speech segment includes determines the beginning and ending time of institute's speech segment.
4. the minutes generation method as claimed in any of claims 1 to 3 based on speech recognition, feature exist
In, before the first partiting step, this method further include:
Pre-treatment step: pre-processing the audio to be converted, obtains pretreated audio to be converted, the pretreatment
It include: echo cancellation process, beam forming processing, noise reduction process and enhancing enhanced processing.
5. the minutes generation method according to claim 4 based on speech recognition, which is characterized in that this method is also wrapped
It includes:
The minutes are segmented, the list after being segmented, and identify key from the list after the participle
Word;And
The corresponding text fragments set of each keyword is determined respectively, according to the corresponding speaker's identity information pair of each text fragments
The text fragments set is classified, and according to chronological order to each keyword and the corresponding text fragments of each keyword
It is ranked up, the text fragments set after obtaining the corresponding sequence of each keyword.
6. the minutes generation method according to claim 5 based on speech recognition, which is characterized in that this method is also wrapped
It includes:
The minutes that response user issues check instruction, show the minutes to user;And/or
The text fragments clicking operation that user issues is responded, shows the corresponding related information of the text fragments to user;Association
Information includes: keyword, speaker's identity information, the link of corresponding voice segments, and when voice segments link when the user clicks, broadcasting is worked as
Preceding voice segments;And/or
It responds the minutes that user issues and modifies instruction, instructed based on the modification and update and save the minutes;And/
Or
Respond the inquiry instruction for the carrying inquiry field that user issues, inquiry and the inquiry field from the minutes
The text fragments inquired and corresponding related information are fed back to the user in a preset form by the text fragments matched.
7. a kind of electronic device, which is characterized in that the device includes memory and processor, and being stored in the memory can be
The minutes run on the processor generate program, and the minutes generate program and execute Shi Keshi by the processor
Existing following steps:
Receiving step: the minutes that user issues are received and generate instruction, instruction is generated according to the minutes and obtains wait turn
Audio is changed, alternatively, timing or obtaining audio to be converted from default store path in real time;
First partiting step: sentence cutting is carried out to the audio to be converted based on preset audio sentence segmentation rules, is obtained
The audio sentence of the audio to be converted;
Second partiting step: extracting vocal print feature from the audio sentence respectively, and the vocal print of each audio sentence is special
Sign is compared and analyzed with default vocal print feature library, determines the corresponding speaker's identity information of each audio sentence, and root
The audio sentence is divided into voice segments according to the speaker's identity information, determines the corresponding voice segments of the audio to be converted
Set;
Speech recognition steps: each voice segments are called according to the corresponding speaker's identity information of voice segments each in institute's speech segment set
Each voice segments are successively inputted corresponding target voice identification model, obtain each voice segments by corresponding target voice identification model
Corresponding text fragments, wherein the target voice identification model is updated based on accent corpus and industry corpus
What training obtained;And
Generation step: merging the corresponding text fragments of each voice segments, generates the corresponding target text of the audio to be converted, and
It is associated with corresponding voice segments and speaker's identity information in each of the target text text fragments, generates described wait turn
Change the corresponding minutes of audio.
8. electronic device according to claim 7, which is characterized in that the minutes generate program by the processor
It can also be achieved following steps when execution:
The minutes are segmented, the list after being segmented, and identify key from the list after the participle
Word;And
The corresponding text fragments set of each keyword is determined respectively, according to the corresponding speaker's identity information pair of each text fragments
The text fragments set is classified, and according to chronological order to each keyword and the corresponding text fragments of each keyword
It is ranked up, the text fragments set after obtaining the corresponding sequence of each keyword.
9. electronic device according to claim 7, which is characterized in that the minutes generate program by the processor
It can also be achieved following steps when execution:
The minutes that response user issues check instruction, show the minutes to user;And/or
The text fragments clicking operation that user issues is responded, shows the corresponding related information of the text fragments to user;Association
Information includes: keyword, speaker's identity information, the link of corresponding voice segments, and when voice segments link when the user clicks, broadcasting is worked as
Preceding voice segments;And/or
It responds the minutes that user issues and modifies instruction, instructed based on the modification and update and save the minutes;And/
Or
Respond the inquiry instruction for the carrying inquiry field that user issues, inquiry and the inquiry field from the minutes
The text fragments inquired and corresponding related information are fed back to the user in a preset form by the text fragments matched.
10. a kind of computer readable storage medium, which is characterized in that include minutes in the computer readable storage medium
Program is generated, the minutes generate when program is executed by processor, it can be achieved that such as any one of claim 1 to 6 institute
The step of minutes generation method based on speech recognition stated.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910627403.1A CN110335612A (en) | 2019-07-11 | 2019-07-11 | Minutes generation method, device and storage medium based on speech recognition |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910627403.1A CN110335612A (en) | 2019-07-11 | 2019-07-11 | Minutes generation method, device and storage medium based on speech recognition |
Publications (1)
Publication Number | Publication Date |
---|---|
CN110335612A true CN110335612A (en) | 2019-10-15 |
Family
ID=68146486
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910627403.1A Pending CN110335612A (en) | 2019-07-11 | 2019-07-11 | Minutes generation method, device and storage medium based on speech recognition |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110335612A (en) |
Cited By (54)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110691258A (en) * | 2019-10-30 | 2020-01-14 | 中央电视台 | Program material manufacturing method and device, computer storage medium and electronic equipment |
CN110767235A (en) * | 2019-11-14 | 2020-02-07 | 北京中电慧声科技有限公司 | Voice transcription processing device with role separation function and control method |
CN110837557A (en) * | 2019-11-05 | 2020-02-25 | 北京声智科技有限公司 | Abstract generation method, device, equipment and medium |
CN110875036A (en) * | 2019-11-11 | 2020-03-10 | 广州国音智能科技有限公司 | Voice classification method, device, equipment and computer readable storage medium |
CN110930984A (en) * | 2019-12-04 | 2020-03-27 | 北京搜狗科技发展有限公司 | Voice processing method and device and electronic equipment |
CN110992958A (en) * | 2019-11-19 | 2020-04-10 | 深圳追一科技有限公司 | Content recording method, content recording apparatus, electronic device, and storage medium |
CN110995943A (en) * | 2019-12-25 | 2020-04-10 | 携程计算机技术(上海)有限公司 | Multi-user streaming voice recognition method, system, device and medium |
CN111177353A (en) * | 2019-12-27 | 2020-05-19 | 拉克诺德(深圳)科技有限公司 | Text record generation method and device, computer equipment and storage medium |
CN111192587A (en) * | 2019-12-27 | 2020-05-22 | 拉克诺德(深圳)科技有限公司 | Voice data matching method and device, computer equipment and storage medium |
CN111312216A (en) * | 2020-02-21 | 2020-06-19 | 厦门快商通科技股份有限公司 | Voice marking method containing multiple speakers and computer readable storage medium |
CN111353038A (en) * | 2020-05-25 | 2020-06-30 | 深圳市友杰智新科技有限公司 | Data display method and device, computer equipment and storage medium |
CN111405235A (en) * | 2020-04-20 | 2020-07-10 | 杭州大轶科技有限公司 | Video conference method and system based on artificial intelligence recognition and extraction |
CN111629267A (en) * | 2020-04-30 | 2020-09-04 | 腾讯科技(深圳)有限公司 | Audio labeling method, device, equipment and computer readable storage medium |
CN111625614A (en) * | 2020-01-20 | 2020-09-04 | 全息空间(深圳)智能科技有限公司 | Live broadcast platform voice collection method, system and storage medium |
CN111739536A (en) * | 2020-05-09 | 2020-10-02 | 北京捷通华声科技股份有限公司 | Audio processing method and device |
CN111739553A (en) * | 2020-06-02 | 2020-10-02 | 深圳市未艾智能有限公司 | Conference sound acquisition method, conference recording method, conference record presentation method and device |
CN111785275A (en) * | 2020-06-30 | 2020-10-16 | 北京捷通华声科技股份有限公司 | Voice recognition method and device |
CN111785260A (en) * | 2020-07-08 | 2020-10-16 | 泰康保险集团股份有限公司 | Sentence dividing method and device, storage medium and electronic equipment |
CN111968657A (en) * | 2020-08-17 | 2020-11-20 | 北京字节跳动网络技术有限公司 | Voice processing method and device, electronic equipment and computer readable medium |
CN112017632A (en) * | 2020-09-02 | 2020-12-01 | 浪潮云信息技术股份公司 | Automatic conference record generation method |
CN112165599A (en) * | 2020-10-10 | 2021-01-01 | 广州科天视畅信息科技有限公司 | Automatic conference summary generation method for video conference |
CN112270918A (en) * | 2020-10-22 | 2021-01-26 | 北京百度网讯科技有限公司 | Information processing method, device, system, electronic equipment and storage medium |
CN112395420A (en) * | 2021-01-19 | 2021-02-23 | 平安科技(深圳)有限公司 | Video content retrieval method and device, computer equipment and storage medium |
CN112562682A (en) * | 2020-12-02 | 2021-03-26 | 携程计算机技术(上海)有限公司 | Identity recognition method, system, equipment and storage medium based on multi-person call |
WO2021073116A1 (en) * | 2019-10-18 | 2021-04-22 | 平安科技(深圳)有限公司 | Method and apparatus for generating legal document, device and storage medium |
CN112800269A (en) * | 2021-01-20 | 2021-05-14 | 上海明略人工智能(集团)有限公司 | Conference record generation method and device |
CN112820297A (en) * | 2020-12-30 | 2021-05-18 | 平安普惠企业管理有限公司 | Voiceprint recognition method and device, computer equipment and storage medium |
CN112839195A (en) * | 2020-12-30 | 2021-05-25 | 深圳市皓丽智能科技有限公司 | Method and device for consulting meeting record, computer equipment and storage medium |
CN112837690A (en) * | 2020-12-30 | 2021-05-25 | 科大讯飞股份有限公司 | Audio data generation method, audio data transcription method and device |
CN112887659A (en) * | 2021-01-29 | 2021-06-01 | 深圳前海微众银行股份有限公司 | Conference recording method, device, equipment and storage medium |
CN112995572A (en) * | 2021-04-23 | 2021-06-18 | 深圳市黑金工业制造有限公司 | Remote conference system and physical display method in remote conference |
CN113010704A (en) * | 2020-11-18 | 2021-06-22 | 北京字跳网络技术有限公司 | Interaction method, device, equipment and medium for conference summary |
CN113055529A (en) * | 2021-03-29 | 2021-06-29 | 深圳市艾酷通信软件有限公司 | Recording control method and recording control device |
CN113051426A (en) * | 2021-03-18 | 2021-06-29 | 深圳市声扬科技有限公司 | Audio information classification method and device, electronic equipment and storage medium |
CN113113018A (en) * | 2021-04-16 | 2021-07-13 | 钦州云之汇大数据科技有限公司 | Enterprise intelligent management system and method based on big data |
CN113207032A (en) * | 2021-04-29 | 2021-08-03 | 读书郎教育科技有限公司 | System and method for increasing subtitles by recording videos in intelligent classroom |
CN113299279A (en) * | 2021-05-18 | 2021-08-24 | 上海明略人工智能(集团)有限公司 | Method, apparatus, electronic device and readable storage medium for associating voice data and retrieving voice data |
CN113327619A (en) * | 2021-02-26 | 2021-08-31 | 山东大学 | Conference recording method and system based on cloud-edge collaborative architecture |
CN113409774A (en) * | 2021-07-20 | 2021-09-17 | 北京声智科技有限公司 | Voice recognition method and device and electronic equipment |
CN113408996A (en) * | 2020-03-16 | 2021-09-17 | 上海博泰悦臻网络技术服务有限公司 | Schedule management method, schedule management device and computer readable storage medium |
CN113488025A (en) * | 2021-07-14 | 2021-10-08 | 维沃移动通信(杭州)有限公司 | Text generation method and device, electronic equipment and readable storage medium |
CN113539269A (en) * | 2021-07-20 | 2021-10-22 | 上海明略人工智能(集团)有限公司 | Audio information processing method, system and computer readable storage medium |
CN113595868A (en) * | 2021-06-28 | 2021-11-02 | 深圳云之家网络有限公司 | Voice message processing method and device based on instant messaging and computer equipment |
CN113658599A (en) * | 2021-08-18 | 2021-11-16 | 平安普惠企业管理有限公司 | Conference record generation method, device, equipment and medium based on voice recognition |
CN114079695A (en) * | 2020-08-18 | 2022-02-22 | 北京有限元科技有限公司 | Method, device and storage medium for recording voice call content |
WO2022037388A1 (en) * | 2020-08-17 | 2022-02-24 | 北京字节跳动网络技术有限公司 | Voice generation method and apparatus, device, and computer readable medium |
CN114125368A (en) * | 2021-11-30 | 2022-03-01 | 北京字跳网络技术有限公司 | Conference audio participant association method and device and electronic equipment |
CN114330369A (en) * | 2022-03-15 | 2022-04-12 | 深圳文达智通技术有限公司 | Local production marketing management method, device and equipment based on intelligent voice analysis |
WO2022105861A1 (en) * | 2020-11-20 | 2022-05-27 | 北京有竹居网络技术有限公司 | Method and apparatus for recognizing voice, electronic device and medium |
CN115174285A (en) * | 2022-07-26 | 2022-10-11 | 中国工商银行股份有限公司 | Conference record generation method and device and electronic equipment |
CN115828907A (en) * | 2023-02-16 | 2023-03-21 | 南昌航天广信科技有限责任公司 | Intelligent conference management method, system, readable storage medium and computer equipment |
CN115906781A (en) * | 2022-12-15 | 2023-04-04 | 广州文石信息科技有限公司 | Method, device and equipment for audio identification and anchor point addition and readable storage medium |
CN117456984A (en) * | 2023-10-26 | 2024-01-26 | 杭州捷途慧声科技有限公司 | Voice interaction method and system based on voiceprint recognition |
CN113488025B (en) * | 2021-07-14 | 2024-05-14 | 维沃移动通信(杭州)有限公司 | Text generation method, device, electronic equipment and readable storage medium |
Citations (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120072211A1 (en) * | 2010-09-16 | 2012-03-22 | Nuance Communications, Inc. | Using codec parameters for endpoint detection in speech recognition |
CN104427292A (en) * | 2013-08-22 | 2015-03-18 | 中兴通讯股份有限公司 | Method and device for extracting a conference summary |
CN105632484A (en) * | 2016-02-19 | 2016-06-01 | 上海语知义信息技术有限公司 | Voice synthesis database pause information automatic marking method and system |
CN105632498A (en) * | 2014-10-31 | 2016-06-01 | 株式会社东芝 | Method, device and system for generating conference record |
CN105719642A (en) * | 2016-02-29 | 2016-06-29 | 黄博 | Continuous and long voice recognition method and system and hardware equipment |
CN105845129A (en) * | 2016-03-25 | 2016-08-10 | 乐视控股(北京)有限公司 | Method and system for dividing sentences in audio and automatic caption generation method and system for video files |
CN106683662A (en) * | 2015-11-10 | 2017-05-17 | 中国电信股份有限公司 | Speech recognition method and device |
CN107039035A (en) * | 2017-01-10 | 2017-08-11 | 上海优同科技有限公司 | A kind of detection method of voice starting point and ending point |
CN107689225A (en) * | 2017-09-29 | 2018-02-13 | 福建实达电脑设备有限公司 | A kind of method for automatically generating minutes |
CN108335697A (en) * | 2018-01-29 | 2018-07-27 | 北京百度网讯科技有限公司 | Minutes method, apparatus, equipment and computer-readable medium |
CN108363765A (en) * | 2018-02-06 | 2018-08-03 | 深圳市鹰硕技术有限公司 | The recognition methods of audio paragraph and device |
CN108447471A (en) * | 2017-02-15 | 2018-08-24 | 腾讯科技(深圳)有限公司 | Audio recognition method and speech recognition equipment |
CN108763338A (en) * | 2018-05-14 | 2018-11-06 | 山东亿云信息技术有限公司 | A kind of News Collection&Edit System based on power industry |
CN108986826A (en) * | 2018-08-14 | 2018-12-11 | 中国平安人寿保险股份有限公司 | Automatically generate method, electronic device and the readable storage medium storing program for executing of minutes |
CN109325737A (en) * | 2018-09-17 | 2019-02-12 | 态度国际咨询管理(深圳)有限公司 | A kind of enterprise intelligent virtual assistant system and its method |
CN109388701A (en) * | 2018-08-17 | 2019-02-26 | 深圳壹账通智能科技有限公司 | Minutes generation method, device, equipment and computer storage medium |
CN109754808A (en) * | 2018-12-13 | 2019-05-14 | 平安科技(深圳)有限公司 | Method, apparatus, computer equipment and the storage medium of voice conversion text |
CN109767757A (en) * | 2019-01-16 | 2019-05-17 | 平安科技(深圳)有限公司 | A kind of minutes generation method and device |
-
2019
- 2019-07-11 CN CN201910627403.1A patent/CN110335612A/en active Pending
Patent Citations (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20120072211A1 (en) * | 2010-09-16 | 2012-03-22 | Nuance Communications, Inc. | Using codec parameters for endpoint detection in speech recognition |
CN104427292A (en) * | 2013-08-22 | 2015-03-18 | 中兴通讯股份有限公司 | Method and device for extracting a conference summary |
CN105632498A (en) * | 2014-10-31 | 2016-06-01 | 株式会社东芝 | Method, device and system for generating conference record |
CN106683662A (en) * | 2015-11-10 | 2017-05-17 | 中国电信股份有限公司 | Speech recognition method and device |
CN105632484A (en) * | 2016-02-19 | 2016-06-01 | 上海语知义信息技术有限公司 | Voice synthesis database pause information automatic marking method and system |
CN105719642A (en) * | 2016-02-29 | 2016-06-29 | 黄博 | Continuous and long voice recognition method and system and hardware equipment |
CN105845129A (en) * | 2016-03-25 | 2016-08-10 | 乐视控股(北京)有限公司 | Method and system for dividing sentences in audio and automatic caption generation method and system for video files |
CN107039035A (en) * | 2017-01-10 | 2017-08-11 | 上海优同科技有限公司 | A kind of detection method of voice starting point and ending point |
CN108447471A (en) * | 2017-02-15 | 2018-08-24 | 腾讯科技(深圳)有限公司 | Audio recognition method and speech recognition equipment |
CN107689225A (en) * | 2017-09-29 | 2018-02-13 | 福建实达电脑设备有限公司 | A kind of method for automatically generating minutes |
CN108335697A (en) * | 2018-01-29 | 2018-07-27 | 北京百度网讯科技有限公司 | Minutes method, apparatus, equipment and computer-readable medium |
CN108363765A (en) * | 2018-02-06 | 2018-08-03 | 深圳市鹰硕技术有限公司 | The recognition methods of audio paragraph and device |
CN108763338A (en) * | 2018-05-14 | 2018-11-06 | 山东亿云信息技术有限公司 | A kind of News Collection&Edit System based on power industry |
CN108986826A (en) * | 2018-08-14 | 2018-12-11 | 中国平安人寿保险股份有限公司 | Automatically generate method, electronic device and the readable storage medium storing program for executing of minutes |
CN109388701A (en) * | 2018-08-17 | 2019-02-26 | 深圳壹账通智能科技有限公司 | Minutes generation method, device, equipment and computer storage medium |
CN109325737A (en) * | 2018-09-17 | 2019-02-12 | 态度国际咨询管理(深圳)有限公司 | A kind of enterprise intelligent virtual assistant system and its method |
CN109754808A (en) * | 2018-12-13 | 2019-05-14 | 平安科技(深圳)有限公司 | Method, apparatus, computer equipment and the storage medium of voice conversion text |
CN109767757A (en) * | 2019-01-16 | 2019-05-17 | 平安科技(深圳)有限公司 | A kind of minutes generation method and device |
Non-Patent Citations (1)
Title |
---|
曾超 等: ""有限命令集特定人汉语语音实时识别系统"", 《第三届全国人机语音通讯学术会议(NCMMSC1994)论文集》 * |
Cited By (67)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2021073116A1 (en) * | 2019-10-18 | 2021-04-22 | 平安科技(深圳)有限公司 | Method and apparatus for generating legal document, device and storage medium |
CN110691258A (en) * | 2019-10-30 | 2020-01-14 | 中央电视台 | Program material manufacturing method and device, computer storage medium and electronic equipment |
CN110837557A (en) * | 2019-11-05 | 2020-02-25 | 北京声智科技有限公司 | Abstract generation method, device, equipment and medium |
CN110837557B (en) * | 2019-11-05 | 2023-02-17 | 北京声智科技有限公司 | Abstract generation method, device, equipment and medium |
CN110875036A (en) * | 2019-11-11 | 2020-03-10 | 广州国音智能科技有限公司 | Voice classification method, device, equipment and computer readable storage medium |
CN110767235A (en) * | 2019-11-14 | 2020-02-07 | 北京中电慧声科技有限公司 | Voice transcription processing device with role separation function and control method |
CN110992958B (en) * | 2019-11-19 | 2021-06-22 | 深圳追一科技有限公司 | Content recording method, content recording apparatus, electronic device, and storage medium |
CN110992958A (en) * | 2019-11-19 | 2020-04-10 | 深圳追一科技有限公司 | Content recording method, content recording apparatus, electronic device, and storage medium |
CN110930984A (en) * | 2019-12-04 | 2020-03-27 | 北京搜狗科技发展有限公司 | Voice processing method and device and electronic equipment |
CN110995943A (en) * | 2019-12-25 | 2020-04-10 | 携程计算机技术(上海)有限公司 | Multi-user streaming voice recognition method, system, device and medium |
CN110995943B (en) * | 2019-12-25 | 2021-05-07 | 携程计算机技术(上海)有限公司 | Multi-user streaming voice recognition method, system, device and medium |
CN111192587A (en) * | 2019-12-27 | 2020-05-22 | 拉克诺德(深圳)科技有限公司 | Voice data matching method and device, computer equipment and storage medium |
CN111177353A (en) * | 2019-12-27 | 2020-05-19 | 拉克诺德(深圳)科技有限公司 | Text record generation method and device, computer equipment and storage medium |
CN111625614A (en) * | 2020-01-20 | 2020-09-04 | 全息空间(深圳)智能科技有限公司 | Live broadcast platform voice collection method, system and storage medium |
CN111312216A (en) * | 2020-02-21 | 2020-06-19 | 厦门快商通科技股份有限公司 | Voice marking method containing multiple speakers and computer readable storage medium |
CN113408996A (en) * | 2020-03-16 | 2021-09-17 | 上海博泰悦臻网络技术服务有限公司 | Schedule management method, schedule management device and computer readable storage medium |
CN111405235A (en) * | 2020-04-20 | 2020-07-10 | 杭州大轶科技有限公司 | Video conference method and system based on artificial intelligence recognition and extraction |
CN111629267A (en) * | 2020-04-30 | 2020-09-04 | 腾讯科技(深圳)有限公司 | Audio labeling method, device, equipment and computer readable storage medium |
CN111739536A (en) * | 2020-05-09 | 2020-10-02 | 北京捷通华声科技股份有限公司 | Audio processing method and device |
CN111353038A (en) * | 2020-05-25 | 2020-06-30 | 深圳市友杰智新科技有限公司 | Data display method and device, computer equipment and storage medium |
CN111739553B (en) * | 2020-06-02 | 2024-04-05 | 深圳市未艾智能有限公司 | Conference sound collection, conference record and conference record presentation method and device |
CN111739553A (en) * | 2020-06-02 | 2020-10-02 | 深圳市未艾智能有限公司 | Conference sound acquisition method, conference recording method, conference record presentation method and device |
CN111785275A (en) * | 2020-06-30 | 2020-10-16 | 北京捷通华声科技股份有限公司 | Voice recognition method and device |
CN111785260B (en) * | 2020-07-08 | 2023-10-27 | 泰康保险集团股份有限公司 | Clause method and device, storage medium and electronic equipment |
CN111785260A (en) * | 2020-07-08 | 2020-10-16 | 泰康保险集团股份有限公司 | Sentence dividing method and device, storage medium and electronic equipment |
CN111968657A (en) * | 2020-08-17 | 2020-11-20 | 北京字节跳动网络技术有限公司 | Voice processing method and device, electronic equipment and computer readable medium |
WO2022037388A1 (en) * | 2020-08-17 | 2022-02-24 | 北京字节跳动网络技术有限公司 | Voice generation method and apparatus, device, and computer readable medium |
CN114079695A (en) * | 2020-08-18 | 2022-02-22 | 北京有限元科技有限公司 | Method, device and storage medium for recording voice call content |
CN112017632A (en) * | 2020-09-02 | 2020-12-01 | 浪潮云信息技术股份公司 | Automatic conference record generation method |
CN112165599A (en) * | 2020-10-10 | 2021-01-01 | 广州科天视畅信息科技有限公司 | Automatic conference summary generation method for video conference |
CN112270918A (en) * | 2020-10-22 | 2021-01-26 | 北京百度网讯科技有限公司 | Information processing method, device, system, electronic equipment and storage medium |
CN113010704A (en) * | 2020-11-18 | 2021-06-22 | 北京字跳网络技术有限公司 | Interaction method, device, equipment and medium for conference summary |
WO2022105861A1 (en) * | 2020-11-20 | 2022-05-27 | 北京有竹居网络技术有限公司 | Method and apparatus for recognizing voice, electronic device and medium |
CN112562682A (en) * | 2020-12-02 | 2021-03-26 | 携程计算机技术(上海)有限公司 | Identity recognition method, system, equipment and storage medium based on multi-person call |
CN112837690A (en) * | 2020-12-30 | 2021-05-25 | 科大讯飞股份有限公司 | Audio data generation method, audio data transcription method and device |
CN112820297A (en) * | 2020-12-30 | 2021-05-18 | 平安普惠企业管理有限公司 | Voiceprint recognition method and device, computer equipment and storage medium |
CN112839195A (en) * | 2020-12-30 | 2021-05-25 | 深圳市皓丽智能科技有限公司 | Method and device for consulting meeting record, computer equipment and storage medium |
CN112839195B (en) * | 2020-12-30 | 2023-10-10 | 深圳市皓丽智能科技有限公司 | Conference record consulting method and device, computer equipment and storage medium |
CN112837690B (en) * | 2020-12-30 | 2024-04-16 | 科大讯飞股份有限公司 | Audio data generation method, audio data transfer method and device |
CN112395420A (en) * | 2021-01-19 | 2021-02-23 | 平安科技(深圳)有限公司 | Video content retrieval method and device, computer equipment and storage medium |
CN112800269A (en) * | 2021-01-20 | 2021-05-14 | 上海明略人工智能(集团)有限公司 | Conference record generation method and device |
CN112887659B (en) * | 2021-01-29 | 2023-06-23 | 深圳前海微众银行股份有限公司 | Conference recording method, device, equipment and storage medium |
CN112887659A (en) * | 2021-01-29 | 2021-06-01 | 深圳前海微众银行股份有限公司 | Conference recording method, device, equipment and storage medium |
CN113327619A (en) * | 2021-02-26 | 2021-08-31 | 山东大学 | Conference recording method and system based on cloud-edge collaborative architecture |
CN113327619B (en) * | 2021-02-26 | 2022-11-04 | 山东大学 | Conference recording method and system based on cloud-edge collaborative architecture |
CN113051426A (en) * | 2021-03-18 | 2021-06-29 | 深圳市声扬科技有限公司 | Audio information classification method and device, electronic equipment and storage medium |
CN113055529B (en) * | 2021-03-29 | 2022-12-13 | 深圳市艾酷通信软件有限公司 | Recording control method and recording control device |
CN113055529A (en) * | 2021-03-29 | 2021-06-29 | 深圳市艾酷通信软件有限公司 | Recording control method and recording control device |
CN113113018A (en) * | 2021-04-16 | 2021-07-13 | 钦州云之汇大数据科技有限公司 | Enterprise intelligent management system and method based on big data |
CN112995572A (en) * | 2021-04-23 | 2021-06-18 | 深圳市黑金工业制造有限公司 | Remote conference system and physical display method in remote conference |
CN113207032A (en) * | 2021-04-29 | 2021-08-03 | 读书郎教育科技有限公司 | System and method for increasing subtitles by recording videos in intelligent classroom |
CN113299279A (en) * | 2021-05-18 | 2021-08-24 | 上海明略人工智能(集团)有限公司 | Method, apparatus, electronic device and readable storage medium for associating voice data and retrieving voice data |
CN113595868A (en) * | 2021-06-28 | 2021-11-02 | 深圳云之家网络有限公司 | Voice message processing method and device based on instant messaging and computer equipment |
CN113488025B (en) * | 2021-07-14 | 2024-05-14 | 维沃移动通信(杭州)有限公司 | Text generation method, device, electronic equipment and readable storage medium |
CN113488025A (en) * | 2021-07-14 | 2021-10-08 | 维沃移动通信(杭州)有限公司 | Text generation method and device, electronic equipment and readable storage medium |
CN113539269A (en) * | 2021-07-20 | 2021-10-22 | 上海明略人工智能(集团)有限公司 | Audio information processing method, system and computer readable storage medium |
CN113409774A (en) * | 2021-07-20 | 2021-09-17 | 北京声智科技有限公司 | Voice recognition method and device and electronic equipment |
CN113658599A (en) * | 2021-08-18 | 2021-11-16 | 平安普惠企业管理有限公司 | Conference record generation method, device, equipment and medium based on voice recognition |
CN114125368A (en) * | 2021-11-30 | 2022-03-01 | 北京字跳网络技术有限公司 | Conference audio participant association method and device and electronic equipment |
CN114125368B (en) * | 2021-11-30 | 2024-01-30 | 北京字跳网络技术有限公司 | Conference audio participant association method and device and electronic equipment |
CN114330369A (en) * | 2022-03-15 | 2022-04-12 | 深圳文达智通技术有限公司 | Local production marketing management method, device and equipment based on intelligent voice analysis |
CN115174285B (en) * | 2022-07-26 | 2024-02-27 | 中国工商银行股份有限公司 | Conference record generation method and device and electronic equipment |
CN115174285A (en) * | 2022-07-26 | 2022-10-11 | 中国工商银行股份有限公司 | Conference record generation method and device and electronic equipment |
CN115906781A (en) * | 2022-12-15 | 2023-04-04 | 广州文石信息科技有限公司 | Method, device and equipment for audio identification and anchor point addition and readable storage medium |
CN115906781B (en) * | 2022-12-15 | 2023-11-24 | 广州文石信息科技有限公司 | Audio identification anchor adding method, device, equipment and readable storage medium |
CN115828907A (en) * | 2023-02-16 | 2023-03-21 | 南昌航天广信科技有限责任公司 | Intelligent conference management method, system, readable storage medium and computer equipment |
CN117456984A (en) * | 2023-10-26 | 2024-01-26 | 杭州捷途慧声科技有限公司 | Voice interaction method and system based on voiceprint recognition |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110335612A (en) | Minutes generation method, device and storage medium based on speech recognition | |
CN112804400B (en) | Customer service call voice quality inspection method and device, electronic equipment and storage medium | |
CN107038220B (en) | Method, intelligent robot and system for generating memorandum | |
US20170300487A1 (en) | System And Method For Enhancing Voice-Enabled Search Based On Automated Demographic Identification | |
CN104969288B (en) | The method and system of voice recognition system is provided based on voice recording daily record | |
US7983910B2 (en) | Communicating across voice and text channels with emotion preservation | |
US11189277B2 (en) | Dynamic gazetteers for personalized entity recognition | |
CN108305626A (en) | The sound control method and device of application program | |
CN108829765A (en) | A kind of information query method, device, computer equipment and storage medium | |
CN110349564A (en) | Across the language voice recognition methods of one kind and device | |
CN109256150A (en) | Speech emotion recognition system and method based on machine learning | |
CN109256136A (en) | A kind of audio recognition method and device | |
CN110134756A (en) | Minutes generation method, electronic device and storage medium | |
CN109801638B (en) | Voice verification method, device, computer equipment and storage medium | |
CN110047481A (en) | Method for voice recognition and device | |
CN110933225B (en) | Call information acquisition method and device, storage medium and electronic equipment | |
CN104252464A (en) | Information processing method and information processing device | |
CN109190124A (en) | Method and apparatus for participle | |
CN112925945A (en) | Conference summary generation method, device, equipment and storage medium | |
CN109754808B (en) | Method, device, computer equipment and storage medium for converting voice into text | |
US20210118464A1 (en) | Method and apparatus for emotion recognition from speech | |
KR102312993B1 (en) | Method and apparatus for implementing interactive message using artificial neural network | |
CN113920986A (en) | Conference record generation method, device, equipment and storage medium | |
KR20150041592A (en) | Method for updating contact information in callee electronic device, and the electronic device | |
CN112468665A (en) | Method, device, equipment and storage medium for generating conference summary |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20191015 |
|
RJ01 | Rejection of invention patent application after publication |