CN110335612A - Minutes generation method, device and storage medium based on speech recognition - Google Patents

Minutes generation method, device and storage medium based on speech recognition Download PDF

Info

Publication number
CN110335612A
CN110335612A CN201910627403.1A CN201910627403A CN110335612A CN 110335612 A CN110335612 A CN 110335612A CN 201910627403 A CN201910627403 A CN 201910627403A CN 110335612 A CN110335612 A CN 110335612A
Authority
CN
China
Prior art keywords
audio
minutes
sentence
voice segments
converted
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910627403.1A
Other languages
Chinese (zh)
Inventor
林子童
邵嘉琦
刘屹
肖金平
郭翼斌
万正勇
沈志勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Merchants Finance Technology Co Ltd
Original Assignee
China Merchants Finance Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Merchants Finance Technology Co Ltd filed Critical China Merchants Finance Technology Co Ltd
Priority to CN201910627403.1A priority Critical patent/CN110335612A/en
Publication of CN110335612A publication Critical patent/CN110335612A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/63Querying
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/04Training, enrolment or model building
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification
    • G10L17/22Interactive procedures; Man-machine interfaces
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/08Speech classification or search
    • G10L2015/088Word spotting
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Acoustics & Sound (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Document Processing Apparatus (AREA)

Abstract

The minutes generation method based on speech recognition that disclosed herein a kind of obtains audio to be converted this method comprises: receiving the minutes that user issues generates instruction;Sentence division is carried out to the audio to be converted, obtains the audio sentence of the audio to be converted;Vocal print feature is extracted from the audio sentence identified respectively, the corresponding vocal print feature of each audio sentence is compared with default vocal print feature library, determine the corresponding speaker's identity information of each audio sentence, and the audio sentence is divided by voice segments according to the speaker's identity information, determine the corresponding voice segments set of the audio to be converted;The corresponding target voice identification model of each voice segments is called, the corresponding text of each voice segments is successively obtained;And generate the corresponding minutes of the audio to be converted.The present invention is also disclosed that a kind of electronic device and computer storage medium.Using the present invention, the accuracy and efficiency of minutes generation can be improved.

Description

Minutes generation method, device and storage medium based on speech recognition
Technical field
The present invention relates to Internet technical field more particularly to a kind of minutes generation method based on speech recognition, Electronic device and computer readable storage medium.
Background technique
Currently, the writing mode of minutes is main are as follows: firstly, meeting on-the-spot record keyword;Secondly, in meeting after meeting Keyword is found in view recording and keyword hard of hearing is nearby recorded and expanded keyword is to form minutes.But due to key There is no corresponding relationship between word and recording, record personnel need to find by artificial positioning repeatedly when ransacing keyword after the meeting, The time is expended, operation is also more troublesome, further, only manually hard of hearing if the same keyword occurs repeatedly in meeting The case where recording positioning is likely to occur location of mistake, causes minutes misregistration occur.
To solve the above-mentioned problems, occur relying on Voice Conversion Techniques on the market at present and automatically generate minutes text This minutes product, however, this existing minutes product is usually simple speech-to-text product, voice turns The accuracy rate changed cannot ensure that record personnel obtained after use is a long text, it and session recording have no hook, In addition speech-to-text technology is not mature enough, often record personnel after taking text because turn text it is wrong it is more have no way of doing it, most Afterwards or it can only go to complete minutes by the mode of artificial playback.
Therefore, how convenient, accurately generate minutes as a technical problem urgently to be resolved.
Summary of the invention
In view of the foregoing, the present invention provide a kind of minutes generation method based on speech recognition, electronic device and Computer readable storage medium, main purpose are to improve the efficiency and accuracy that minutes generate.
To achieve the above object, the present invention provides a kind of minutes generation method based on speech recognition, this method packet It includes:
Receiving step: the minutes that user issues are received and generate instruction, instruction is generated according to the minutes and obtains Audio to be converted, alternatively, timing or obtaining audio to be converted from default store path in real time;
First partiting step: sentence division is carried out to the audio to be converted, obtains the audio sentence of the audio to be converted Son;
Second partiting step: extracting vocal print feature from the audio sentence respectively, by the sound of each audio sentence Line feature is compared and analyzed with default vocal print feature library, determines the corresponding speaker's identity information of each audio sentence, And the audio sentence is divided by voice segments according to the speaker's identity information, determine the corresponding language of the audio to be converted Segment set;
Speech recognition steps: each language is called according to the corresponding speaker's identity information of voice segments each in institute's speech segment set Each voice segments are successively inputted corresponding target voice identification model, obtain each language by the corresponding target voice identification model of segment The corresponding text fragments of segment, wherein the target voice identification model is carried out based on accent corpus and industry corpus Update what training obtained;And
Generation step: merging the corresponding text fragments of each voice segments, generates the corresponding target text of the audio to be converted, And corresponding voice segments and speaker's identity information are associated in each of the target text text fragments, described in generation The corresponding minutes of audio to be converted.
In addition, to achieve the above object, the present invention also provides a kind of electronic device, which includes: memory, processing Device, the minutes that be stored in the memory to run on the processor generate program, and the minutes generate Program can realize any step in the minutes generation method based on speech recognition as described above when being executed by the processor Suddenly.
In addition, to achieve the above object, it is described computer-readable the present invention also provides a kind of computer readable storage medium It include that minutes generate program in storage medium, the minutes generate when program is executed by processor, it can be achieved that as above Arbitrary steps in the minutes generation method based on speech recognition.
Minutes generation method, electronic device and computer-readable storage medium proposed by the present invention based on speech recognition Matter, 1. carry out subordinate sentence, speech feature extraction and speaker's identity information matches by treating transducing audio, true according to matching result Determine the corresponding voice segments set of audio to be converted, different target voice identification models is called to carry out voice to each voice segments respectively Identification, improves the efficiency and accuracy rate of speech recognition, lays the foundation to be subsequently generated the minutes of complete and accurate;2. passing through Training pattern is updated using speaker's accent corpus and industry corpus, improves the accuracy of speech recognition;3. passing through association Speaker's identity information, voice segments, text fragments, keyword etc. generate minutes, improve minutes integrality and Convenience.
Detailed description of the invention
Fig. 1 is that the present invention is based on the flow charts of the minutes generation method preferred embodiment of speech recognition;
Fig. 2 is the schematic diagram of electronic device preferred embodiment of the present invention;
Fig. 3 is the program module schematic diagram that minutes generate program preferred embodiment in Fig. 2.
The embodiments will be further described with reference to the accompanying drawings for the realization, the function and the advantages of the object of the present invention.
Specific embodiment
It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, it is not intended to limit the present invention.
The present invention provides a kind of minutes generation method based on speech recognition.This method can be held by a device Row, which can be by software and or hardware realization.
Shown in referring to Fig.1, for the present invention is based on the flow charts of the minutes generation method preferred embodiment of speech recognition.
In one embodiment of minutes generation method the present invention is based on speech recognition, this method only includes: step S1- Step S5.
Step S1, receive user issue minutes generate instruction, according to the minutes generate instruction obtain to Transducing audio, alternatively, timing or obtaining audio to be converted from default store path in real time.
In the following description, based on electronic device, various embodiments of the present invention are illustrated.
In the present embodiment, user issues minutes to electronic device by terminal and generates instruction, wherein in described instruction Including audio to be converted;Above-mentioned audio to be converted is the speech audio recorded in conference process, can be and passes through words by user The speech ciphering equipments such as cylinder are inputted and are saved, alternatively, the voice information paper downloaded from the Internet by user or locally imported.It is above-mentioned default Store path is not limited only to the database for storing minutes related audio.
Above-mentioned timing or the step of obtaining audio to be converted from default store path in real time include: timing (every morning 9:00, every afternoon 5:30) judge to whether there is unconverted minutes related audio in store path, if so, Using unconverted minutes related audio as audio to be converted, if it is not, then audio to be converted is not present in judgement.Alternatively, When one section of minutes related audio is written in default store path, then as audio to be converted and read out, To execute subsequent step.
Step S2 carries out sentence division to the audio to be converted based on default sentence division rule, obtains described wait turn Change the audio sentence of audio.
Treating the purpose that transducing audio carries out sentence cutting is the short sentence for being easier to carry out speech recognition in order to obtain, is improved The subsequent accuracy converted the audio into as text.In the present embodiment, described to be based on default sentence division rule to described wait turn It changes audio and carries out sentence division, obtain the audio sentence of the audio to be converted, comprising:
First in a1, the identification audio to be converted pauses, at the beginning of record first pauses and the end time;
A2, the first sentence in the audio to be converted is identified, and the end time that first is paused is as first At the beginning of son;
A3, identification second pause, record second pause at the beginning of and the end time, and will second pause at the beginning of Between end time as the first sentence, realize the division of the first sentence;
A4, above-mentioned steps are successively executed, until the audio to be converted terminates, obtains all sounds of the audio to be converted Frequency sentence.
Wherein, the first pause, second pause including mute section, non-speech segment in audio to be converted;First sentence be to The voice segments of transducing audio.It should be noted that first pauses and second pauses only for the corresponding pause of differentiation different time.
It is understood that the division result of audio sentence and the accuracy rate that follow audio is converted are closely bound up, audio sentence Son division accuracy rate is higher, and the accuracy rate of audio conversion is higher.In the present embodiment, each, which pauses, has minimum length limitation, For ignoring short sound information, such as the instantaneous ventilation of speaker etc., to protect the integrality of a word;Division each of obtains Sentence is limited with minimum length, for filtering out the invalid information in short-term in audio, for example, the cough etc. of speaker;Meanwhile Dividing obtained each sentence also has maximum length limitation, and for limiting the length of sentence, the conversion for improving follow audio is quasi- True rate.
Step S3 extracts vocal print feature from the audio sentence respectively, by the vocal print feature of each audio sentence It is compared and analyzed with default vocal print feature library, determines the corresponding speaker's identity information of each audio sentence, and according to The audio sentence is divided into voice segments by the speaker's identity information, determines the corresponding voice segments collection of the audio to be converted It closes.
It include: the vocal print feature of each employee of company P by taking company P as an example, in above-mentioned default vocal print feature library and corresponding Worker's information.Above-mentioned speaker's identity information includes: speaker's name, native place, accent etc..
In the present embodiment, described " the audio sentence is divided by voice segments according to the speaker's identity information " Step includes:
The identical audio sentence of temporally adjacent and corresponding speaker's identity information is merged and generates a voice segments, and root The beginning and ending time at least one the audio sentence for including according to institute's speech segment determines the beginning and ending time of institute's speech segment.
Wherein, the beginning and ending time at least one audio sentence that the beginning and ending time of each voice segments is contained by it is determining, example Such as, using the initial time of first audio sentence of a voice segments as the initial time of the voice segments, the last one audio Termination time of the termination time of sentence as the voice segments.
It include: voice segments and the corresponding speaker's identity information of each voice segments in voice segments set.For example, audio sentence is drawn The result got according to chronological order successively are as follows: sentence 1, sentence 2, sentence 3, sentence 4, sentence 5;Each audio sentence pair The speaker answered is respectively as follows: first, second, second, third, second;So, final voice segments set include: voice segments 1 (sentence 1), First }, { voice segments 2 (sentence 2, sentence 3), second }, { voice segments 3 (sentence 4), third }, { voice segments 4 (sentence 5), second }.
It is understood that needing periodically to be updated default vocal print feature library, to improve the efficiency of vocal print feature comparison. In addition, extraction vocal print feature and vocal print are more mature than peer to peer technology from audio, do not repeat here.
In other embodiments, the audio to be converted can also be the voice by the real-time typing of microphone, by pre- First microphone signal channel is numbered, and microphone signal channel number and speaker's identity need to be predefined before meeting The corresponding relationship of information.It, can also be logical by microphone signal when audio to be converted is the voice by the real-time typing of microphone Road number confirms corresponding speaker's identity information, does not repeat here.
Above-mentioned steps pass through the identity for determining speaker, on the one hand determine the corresponding spokesman of each audio sentence, facilitate The integrality of minutes;On the other hand, optimal voice transformation model is called convenient for the subsequent identity information according to spokesman, To improve the accuracy of voice conversion.
Step S4 calls each voice segments pair according to the corresponding speaker's identity information of voice segments each in institute's speech segment set Each voice segments are successively inputted corresponding target voice identification model, obtain each voice segments pair by the target voice identification model answered The text fragments answered, wherein the target voice identification model is to be updated instruction based on accent corpus and industry corpus It gets;
In order to improve the accuracy of speech recognition, the target voice identification model is in general speech recognition modeling On the basis of carried out twice update training:
1) training is updated to general speech recognition modeling according to speaker's accent (that is, language feature), obtains One voice transformation model, first speech recognition modeling are determined by following steps:
Accent is divided into several major class, for example, without accent (that is, standard mandarin), Beijing accent, Shandong accent, Guangdong The corresponding recorded audio of all kinds of accents is collected in accent, Hunan accent, Sichuan accent etc. respectively;
The corresponding recorded audio of all kinds of accents is pre-processed, leave out be not easy, the inconvenient segment understood, and by remaining piece Section is converted to writing text, obtains the corpus of all kinds of accents;
Processed audio and writing text are sent into general speech recognition modeling, so that model, which obtains, is directed to specific mouth The optimization of sound;
In meeting actual scene, it may be found that transcription error segment be re-fed into model and carry out re-optimization, respectively To corresponding first speech recognition modeling of each accent classification.
2) training is updated according to company, industrial nature the first speech recognition modeling corresponding to all kinds of accents, obtained Second voice transformation model, second speech recognition modeling are determined by following steps:
The company of compiling/industry special-purpose word list, saves in the form of text;
So that special messenger is read aloud above-mentioned special-purpose word with all kinds of accents, forms the corresponding audio file of all kinds of accents;
Form that text is matched with audio file is sent into corresponding first speech recognition modeling of all kinds of accents to instruct Practice, so that each first speech recognition modeling, which obtains, is directed to the optimization of specific company/industry;
In meeting actual scene, more relevant to proprietary name word corpus it will be sent into model and carry out re-optimization, point Corresponding second speech recognition modeling of each accent classification is not obtained.
For example, determining the corresponding artificial first of speaking of current speech segment 1 by Application on Voiceprint Recognition, determined according to the identity information of first Its accent is Shandong, obtains the corresponding second voice transformation model of Shandong accent as target voice identification model.
Above-mentioned steps by training general voice transformation model, and according to speaker before carrying out audio conversion in advance Accent feature training is updated to speech recognition modeling, to improve voice transformation model to the identification energy of the voice of speaker Power, while training is updated to voice transformation model also according to company/industrial nature, voice transformation model is improved to company spy Determine the recognition capability of business voice.
Step S5 merges the corresponding text fragments of each voice segments, generates the corresponding target text of the audio to be converted, and Be associated with corresponding voice segments and speaker's identity information in each of the target text text fragments, generate it is described to The corresponding minutes of transducing audio.
For example, successively obtaining the corresponding text of each voice segments in upper speech segment set is respectively as follows: text 1, text 2, text This 3, text 4, text 5, merge splicing to the text of acquisition and obtain the corresponding target text of audio to be converted.Then, root Beginning and ending time according to each voice segments is from intercepting corresponding voice segments and text piece corresponding in target text in voice segments to be converted Section is associated, that is to say, that each text fragments marks corresponding speaker information and its correspondence in target text Voice segments, to generate minutes, the minutes saved, which are simultaneously pushed to minutes and generate instruction, to be corresponded to Terminal.
In the present embodiment, speaker information, voice segments are associated in the form of hyperlink with text fragments.
By being associated with information above in minutes, convenient for minutes manager correlative segment hard of hearing and adjustment meeting View record.
The minutes generation method based on speech recognition that above-described embodiment proposes, is divided by treating transducing audio Sentence, speech feature extraction and matching, determine the corresponding voice segments of audio to be converted according to matching result, call different mesh respectively It marks speech recognition modeling and speech recognition is carried out to each voice segments, the efficiency and accuracy rate of speech recognition are improved, to improve The accuracy rate that minutes generate;Meanwhile by being associated with speaker's identity information, voice segments, text fragments, meeting note is generated Record, improves the integrality and convenience of minutes.
Further, in order to improve the conversion accuracy of audio to be converted, the present invention is based on the meeting of speech recognition notes It records in another embodiment of generation method, before step S2, this method further include: the audio to be converted is pre-processed, Obtain pretreated audio to be converted.
In general meeting, due to the influence of ambient enviroment, different noises can be generated, it is therefore desirable to minutes pair The audio to be converted answered is pre-processed.The pretreatment includes but are not limited to:
B1, echo cancellation process is carried out;For example, echo canceling method can be used, it can also pass through estimation echo letter Number size, then receive signal in subtract the estimated value to offset echo;
B2, beam forming processing is carried out;For example, the voice messaging of user is acquired in different direction by multiple microphones, Determine the direction of sound source.According to the weighted of different direction, it is weighted summation.For example, the weight ratio of Sounnd source direction other The sound weight in orientation is bigger, to guarantee the voice messaging of enhancing user's input, weakens the influence of other sound;
B3, noise reduction process is carried out;Such as: it can first pass through using identical as frequency noise, amplitude is identical, opposite in phase Sound is cancelled out each other, and then eliminates reverberation using the audio plug of dereverberation or microphone array;
B4, enhancing enhanced processing is carried out;For example, amplifying processing to audio using AGC (automatic growth control) mode.
The minutes generation method based on speech recognition that above-described embodiment proposes is carried out in advance by treating transducing audio Processing, reduces external interference, the accuracy of speech recognition can be improved, to lay good base to be subsequently generated minutes Plinth.
Further, in order to keep minutes apparent, in the minutes generation side of the invention based on speech recognition In another embodiment of method, this method further include:
The minutes are segmented, the list after being segmented, and identified from the list after the participle Keyword;
The corresponding text fragments set of each keyword is determined respectively, is believed according to the corresponding speaker's identity of each text fragments Breath classifies to the text fragments set, and according to chronological order to each keyword and the corresponding text of each keyword Segment is ranked up, the text fragments set after obtaining the corresponding sequence of each keyword.
Wherein, the step of above-mentioned " segmenting to minutes " includes: a) based on default vocabulary to the minutes It is matched, the first list after being segmented, wherein vocabulary is the cooperation peculiar special-purpose word of the prepared company of corporate business With the proprietary word of company the industry;B) for remaining text, using the segmenting method based on understanding and based on statistics Segmenting method segments the remaining text of step a), the second list after being segmented;C) remove asemantic stop words, Third list such as ' ', ' ', after being segmented;D) merge above-mentioned first list, second list and third list, obtain List after final participle.Text is finally switched to the list for including multiple words by participle.
The step of above-mentioned " identifying keyword from the list after the participle " includes: a) to calculate in the list each The information value of word, for example, tf-idf value (term frequency-inverse document frequency, word frequency- Reverse document-frequency);B) judge whether the information value of each word is greater than or equal to preset threshold respectively, information value is big It is determined as keyword in or equal to the word of preset threshold, wherein preset threshold can be adjusted according to actual needs.
Assuming that keyword A, B, C are identified from minutes, and by taking keyword A as an example, keyword A in above-mentioned ranking results It include: the corresponding text fragments 1 of speaker's first, the corresponding text fragments 2 of speaker's second, text in corresponding text fragments set Segment 4, the corresponding text fragments 3 of speaker third.
It further, can also include each text fragments pair in the text fragments set after the corresponding sequence of each keyword The voice segments answered are convenient for minutes manager and inquiry's correlative segment hard of hearing by associated text segment and voice segments.
The minutes generation method based on speech recognition that above-described embodiment proposes passes through association speaker's identity letter Breath, voice segments, text fragments, keyword etc. generate minutes, improve the integrality and convenience of minutes.
In minutes generation method another embodiment of the invention based on speech recognition, this method further include:
The minutes that response user issues check instruction, show the minutes to user;And/or
The text fragments clicking operation that user issues is responded, shows the corresponding related information of the text fragments to user; For example, related information includes: keyword, speaker's identity information, the link of corresponding voice segments, voice segments are linked when the user clicks When, play current speech segment;And/or
It responds the minutes that user issues and modifies instruction, instructed based on the modification and update and save the meeting note Record;And/or
Respond the inquiry instruction for the carrying inquiry field that user issues, inquiry and the inquiry word from the minutes The text fragments inquired and corresponding related information are fed back to the user in a preset form by the matched text fragments of section.Its In, inquiry field can be keyword and may not be keyword, inquire field match query can be fuzzy search can also To be semantic searching, do not repeat here.After being matched to corresponding text fragments, in a preset form (for example, dendrogram, or Person is according to chronological order etc.) all text fragments corresponding with inquiry field and related information are shown to user, for example, closing Keyword, speaker's identity information, the link of corresponding voice segments etc..
The present invention also proposes a kind of electronic device.It is the signal of electronic device preferred embodiment of the present invention referring to shown in Fig. 2 Figure.
In the present embodiment, electronic device 1 can be server, smart phone, tablet computer, portable computer, on table The terminal device having data processing function such as type computer, the server can be rack-mount server, blade type service Device, tower server or Cabinet-type server.
The electronic device 1 includes memory 11, processor 12 and network interface 13.
Wherein, memory 11 include at least a type of readable storage medium storing program for executing, the readable storage medium storing program for executing include flash memory, Hard disk, multimedia card, card-type memory (for example, SD or DX memory etc.), magnetic storage, disk, CD etc..Memory 11 It can be the internal storage unit of the electronic device 1, such as the hard disk of the electronic device 1 in some embodiments.Memory 11 are also possible to be equipped on the External memory equipment of the electronic device 1, such as the electronic device 1 in further embodiments Plug-in type hard disk, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card dodge Deposit card (Flash Card) etc..Further, memory 11 can also both include the internal storage unit of the electronic device 1 or wrap Include External memory equipment.
Memory 11 can be not only used for the application software and Various types of data that storage is installed on the electronic device 1, for example, meeting Record generator 10 etc. is discussed, can be also used for temporarily storing the data that has exported or will export.
Processor 12 can be in some embodiments a central processing unit (Central Processing Unit, CPU), controller, microcontroller, microprocessor or other data processing chips, the program for being stored in run memory 11 Code or processing data, for example, minutes generate program 10 etc..
Network interface 13 optionally may include standard wireline interface and wireless interface (such as WI-FI interface), be commonly used in Communication connection is established between the electronic device 1 and other electronic equipments, for example, minutes manager and minutes inquiry The terminal that person uses.The component 11-13 of electronic device 1 is in communication with each other by communication bus.
Fig. 2 illustrates only the electronic device 1 with component 11-13, it will be appreciated by persons skilled in the art that Fig. 2 shows Structure out does not constitute the restriction to electronic device 1, may include than illustrating less perhaps more components or combining certain A little components or different component layouts.
Optionally, the electronic device 1 can also include user interface, user interface may include display (Display), Input unit such as keyboard (Keyboard), optional user interface can also include standard wireline interface and wireless interface.
Optionally, in some embodiments, display can be light-emitting diode display, liquid crystal display, touch control type LCD and show Device and Organic Light Emitting Diode (Organic Light-Emitting Diode, OLED) touch device etc..Wherein, display It is properly termed as display screen or display unit, for showing the information handled in the electronic apparatus 1 and for showing visually User interface.
In 1 embodiment of electronic device shown in Fig. 2, as storage meeting in a kind of memory 11 of computer storage medium The program code of record generator 10 is discussed to realize such as when processor 12 executes the program code of minutes generation program 10 Lower step:
Receiving step: the minutes that user issues are received and generate instruction, instruction is generated according to the minutes and obtains Audio to be converted, alternatively, timing or obtaining audio to be converted from default store path in real time.
In the present embodiment, user issues minutes to electronic device 1 by terminal and generates instruction, wherein described instruction In include audio to be converted;Above-mentioned audio to be converted is the speech audio recorded in conference process, can be and is passed through by user The speech ciphering equipments such as microphone are inputted and are saved, alternatively, the voice information paper downloaded from the Internet by user or locally imported.It is above-mentioned pre- If store path is not limited only to the database for storing minutes related audio.
Above-mentioned timing or the step of obtaining audio to be converted from default store path in real time include: timing (every morning 9:00, every afternoon 5:30) judge to whether there is unconverted minutes related audio in store path, if so, Using unconverted minutes related audio as audio to be converted, if it is not, then audio to be converted is not present in judgement.Alternatively, When one section of minutes related audio is written in default store path, then as audio to be converted and read out, To execute subsequent step.
First partiting step: sentence division is carried out to the audio to be converted based on default sentence division rule, obtains institute State the audio sentence of audio to be converted.
Treating the purpose that transducing audio carries out sentence cutting is the short sentence for being easier to carry out speech recognition in order to obtain, is improved The subsequent accuracy converted the audio into as text.In the present embodiment, described to be based on default sentence division rule to described wait turn It changes audio and carries out sentence division, obtain the audio sentence of the audio to be converted, comprising:
First in a1, the identification audio to be converted pauses, at the beginning of record first pauses and the end time;
A2, the first sentence in the audio to be converted is identified, and the end time that first is paused is as first At the beginning of son;
A3, identification second pause, record second pause at the beginning of and the end time, and will second pause at the beginning of Between end time as the first sentence, realize the division of the first sentence;
A4, above-mentioned steps are successively executed, until the audio to be converted terminates, obtains all sounds of the audio to be converted Frequency sentence.
Wherein, the first pause, second pause including mute section, non-speech segment in audio to be converted;First sentence be to The voice segments of transducing audio.It should be noted that first pauses and second pauses only for the corresponding pause of differentiation different time.
It is understood that the division result of audio sentence and the accuracy rate that follow audio is converted are closely bound up, audio sentence Son division accuracy rate is higher, and the accuracy rate of audio conversion is higher.In the present embodiment, each, which pauses, has minimum length limitation, For ignoring short sound information, such as the instantaneous ventilation of speaker etc., to protect the integrality of a word;Division each of obtains Sentence is limited with minimum length, for filtering out the invalid information in short-term in audio, for example, the cough etc. of speaker;Meanwhile Dividing obtained each sentence also has maximum length limitation, and for limiting the length of sentence, the conversion for improving follow audio is quasi- True rate.
Second partiting step: extracting vocal print feature from the audio sentence respectively, by the sound of each audio sentence Line feature is compared and analyzed with default vocal print feature library, determines the corresponding speaker's identity information of each audio sentence, And the audio sentence is divided by voice segments according to the speaker's identity information, determine the corresponding language of the audio to be converted Segment set.
It include: the vocal print feature of each employee of company P by taking company P as an example, in above-mentioned default vocal print feature library and corresponding Worker's information.Above-mentioned speaker's identity information includes: speaker's name, native place, accent etc..
In the present embodiment, described " the audio sentence is divided by voice segments according to the speaker's identity information " Step includes:
The identical audio sentence of temporally adjacent and corresponding speaker's identity information is merged and generates a voice segments, and root The beginning and ending time at least one the audio sentence for including according to institute's speech segment determines the beginning and ending time of institute's speech segment.
Wherein, the beginning and ending time at least one audio sentence that the beginning and ending time of each voice segments is contained by it is determining, example Such as, using the initial time of first audio sentence of a voice segments as the initial time of the voice segments, the last one audio Termination time of the termination time of sentence as the voice segments.
It include: voice segments and the corresponding speaker's identity information of each voice segments in voice segments set.For example, audio sentence is drawn The result got according to chronological order successively are as follows: sentence 1, sentence 2, sentence 3, sentence 4, sentence 5;Each audio sentence pair The speaker answered is respectively as follows: first, second, second, third, second;So, final voice segments set include: voice segments 1 (sentence 1), First }, { voice segments 2 (sentence 2, sentence 3), second }, { voice segments 3 (sentence 4), third }, { voice segments 4 (sentence 5), second }.
It is understood that needing periodically to be updated default vocal print feature library, to improve the efficiency of vocal print feature comparison. In addition, extraction vocal print feature and vocal print are more mature than peer to peer technology from audio, do not repeat here.
In other embodiments, the audio to be converted can also be the voice by the real-time typing of microphone, by pre- First microphone signal channel is numbered, and microphone signal channel number and speaker's identity need to be predefined before meeting The corresponding relationship of information.It, can also be logical by microphone signal when audio to be converted is the voice by the real-time typing of microphone Road number confirms corresponding speaker's identity information, does not repeat here.
Above-mentioned steps pass through the identity for determining speaker, on the one hand determine the corresponding spokesman of each audio sentence, facilitate The integrality of minutes;On the other hand, optimal voice transformation model is called convenient for the subsequent identity information according to spokesman, To improve the accuracy of voice conversion.
Speech recognition steps: each language is called according to the corresponding speaker's identity information of voice segments each in institute's speech segment set Each voice segments are successively inputted corresponding target voice identification model, obtain each language by the corresponding target voice identification model of segment The corresponding text fragments of segment, wherein the target voice identification model is carried out based on accent corpus and industry corpus Update what training obtained;
In order to improve the accuracy of speech recognition, the target voice identification model is in general speech recognition modeling On the basis of carried out twice update training:
1) training is updated to general speech recognition modeling according to speaker's accent (that is, language feature), obtains One voice transformation model, first speech recognition modeling are determined by following steps:
Accent is divided into several major class, for example, without accent (that is, standard mandarin), Beijing accent, Shandong accent, Guangdong The corresponding recorded audio of all kinds of accents is collected in accent, Hunan accent, Sichuan accent etc. respectively;
The corresponding recorded audio of all kinds of accents is pre-processed, leave out be not easy, the inconvenient segment understood, and by remaining piece Section is converted to writing text, obtains the corpus of all kinds of accents;
Processed audio and writing text are sent into general speech recognition modeling, so that model, which obtains, is directed to specific mouth The optimization of sound;
In meeting actual scene, it may be found that transcription error segment be re-fed into model and carry out re-optimization, respectively To corresponding first speech recognition modeling of each accent classification.
2) training is updated according to company, industrial nature the first speech recognition modeling corresponding to all kinds of accents, obtained Second voice transformation model, second speech recognition modeling are determined by following steps:
The company of compiling/industry special-purpose word list, saves in the form of text;
So that special messenger is read aloud above-mentioned special-purpose word with all kinds of accents, forms the corresponding audio file of all kinds of accents;
Form that text is matched with audio file is sent into corresponding first speech recognition modeling of all kinds of accents to instruct Practice, so that each first speech recognition modeling, which obtains, is directed to the optimization of specific company/industry;
In meeting actual scene, more relevant to proprietary name word corpus it will be sent into model and carry out re-optimization, point Corresponding second speech recognition modeling of each accent classification is not obtained.
For example, determining the corresponding artificial first of speaking of current speech segment 1 by Application on Voiceprint Recognition, determined according to the identity information of first Its accent is Shandong, obtains the corresponding second voice transformation model of Shandong accent as target voice identification model.
Above-mentioned steps by training general voice transformation model, and according to speaker before carrying out audio conversion in advance Accent feature training is updated to speech recognition modeling, to improve voice transformation model to the identification energy of the voice of speaker Power, while training is updated to voice transformation model also according to company/industrial nature, voice transformation model is improved to company spy Determine the recognition capability of business voice.
Generation step: merging the corresponding text fragments of each voice segments, generates the corresponding target text of the audio to be converted, And corresponding voice segments and speaker's identity information are associated in each of the target text text fragments, described in generation The corresponding minutes of audio to be converted.
For example, successively obtaining the corresponding text of each voice segments in upper speech segment set is respectively as follows: text 1, text 2, text This 3, text 4, text 5, merge splicing to the text of acquisition and obtain the corresponding target text of audio to be converted.Then, root Beginning and ending time according to each voice segments is from intercepting corresponding voice segments and text piece corresponding in target text in voice segments to be converted Section is associated, that is to say, that each text fragments marks corresponding speaker information and its correspondence in target text Voice segments, to generate minutes, the minutes saved, which are simultaneously pushed to minutes and generate instruction, to be corresponded to Terminal.
In the present embodiment, speaker information, voice segments are associated in the form of hyperlink with text fragments.
By being associated with information above in minutes, convenient for minutes manager correlative segment hard of hearing and adjustment meeting View record.The electronic device 1 that above-described embodiment proposes carries out subordinate sentence, speech feature extraction and matching by treating transducing audio, The corresponding voice segments of audio to be converted are determined according to matching result, call different target voice identification models to each voice respectively Duan Jinhang speech recognition, improves the efficiency and accuracy rate of speech recognition, to improve the accuracy rate of minutes generation;Together When, by being associated with speaker's identity information, voice segments, text fragments, minutes are generated, the integrality of minutes is improved And convenience.
Further, in order to improve the conversion accuracy of audio to be converted, in another embodiment of electronic device 1 of the present invention In, before the first partiting step, the program code that processor 12 executes minutes generation program 10 is also realized: pretreatment step Suddenly.
Pre-treatment step: pre-processing the audio to be converted, obtains pretreated audio to be converted.
In general meeting, due to the influence of ambient enviroment, different noises can be generated, it is therefore desirable to minutes pair The audio to be converted answered is pre-processed.The pretreatment includes but are not limited to:
B1, echo cancellation process is carried out;For example, echo canceling method can be used, it can also pass through estimation echo letter Number size, then receive signal in subtract the estimated value to offset echo;
B2, beam forming processing is carried out;For example, the voice messaging of user is acquired in different direction by multiple microphones, Determine the direction of sound source.According to the weighted of different direction, it is weighted summation.For example, the weight ratio of Sounnd source direction other The sound weight in orientation is bigger, to guarantee the voice messaging of enhancing user's input, weakens the influence of other sound;
B3, noise reduction process is carried out;Such as: it can first pass through using identical as frequency noise, amplitude is identical, opposite in phase Sound is cancelled out each other, and then eliminates reverberation using the audio plug of dereverberation or microphone array;
B4, enhancing enhanced processing is carried out;For example, amplifying processing to audio using AGC (automatic growth control) mode.
The electronic device 1 that above-described embodiment proposes, is pre-processed by treating transducing audio, reduces external interference, The accuracy of speech recognition can be improved, to lay a good foundation to be subsequently generated minutes.
Further, in order to keep minutes apparent, in another embodiment of electronic device 1 of the present invention, processor 12 When executing the program code of minutes generation program 10, following steps are also realized:
The minutes are segmented, the list after being segmented, and identified from the list after the participle Keyword;And
The corresponding text fragments set of each keyword is determined respectively, is believed according to the corresponding speaker's identity of each text fragments Breath classifies to the text fragments set, and according to chronological order to each keyword and the corresponding text of each keyword Segment is ranked up, the text fragments set after obtaining the corresponding sequence of each keyword.
Wherein, the step of above-mentioned " segmenting to minutes " includes: a) based on default vocabulary to the minutes It is matched, the first list after being segmented, wherein vocabulary is the cooperation peculiar special-purpose word of the prepared company of corporate business With the proprietary word of company the industry;B) for remaining text, using the segmenting method based on understanding and based on statistics Segmenting method segments the remaining text of step a), the second list after being segmented;C) remove asemantic stop words, Third list such as ' ', ' ', after being segmented;D) merge above-mentioned first list, second list and third list, obtain List after final participle.Text is finally switched to the list for including multiple words by participle.
The step of above-mentioned " identifying keyword from the list after the participle " includes: a) to calculate in the list each The information value of word, for example, tf-idf value;B) judge whether the information value of each word is greater than or equal to default threshold respectively The word that information value is greater than or equal to preset threshold is determined as keyword, wherein preset threshold can be according to actual needs by value It is adjusted.
Assuming that keyword A, B, C are identified from minutes, and by taking keyword A as an example, keyword A in above-mentioned ranking results It include: the corresponding text fragments 1 of speaker's first, the corresponding text fragments 2 of speaker's second, text in corresponding text fragments set Segment 4, the corresponding text fragments 3 of speaker third.
It further, can also include each text fragments pair in the text fragments set after the corresponding sequence of each keyword The voice segments answered are convenient for minutes manager and inquiry's correlative segment hard of hearing by associated text segment and voice segments.
The electronic device 1 that above-described embodiment proposes, by being associated with speaker's identity information, voice segments, text fragments, key Word etc. generates minutes, improves the integrality and convenience of minutes.
In another embodiment of electronic device 1 of the present invention, processor 12 executes the program generation that minutes generate program 10 When code, following steps are also realized:
The minutes that response user issues check instruction, show the minutes to user;And/or
The text fragments clicking operation that user issues is responded, shows the corresponding related information of the text fragments to user; For example, related information includes: keyword, speaker's identity information, the link of corresponding voice segments, voice segments are linked when the user clicks When, play current speech segment;And/or
It responds the minutes that user issues and modifies instruction, instructed based on the modification and update and save the meeting note Record;And/or
Respond the inquiry instruction for the carrying inquiry field that user issues, inquiry and the inquiry word from the minutes The text fragments inquired and corresponding related information are fed back to the user in a preset form by the matched text fragments of section.Its In, inquiry field can be keyword and may not be keyword, inquire field match query can be fuzzy search can also To be semantic searching, do not repeat here.After being matched to corresponding text fragments, in a preset form (for example, dendrogram, or Person is according to chronological order etc.) all text fragments corresponding with inquiry field and related information are shown to user, for example, closing Keyword, speaker's identity information, the link of corresponding voice segments etc..
Optionally, in other examples, minutes, which generate program 10, can also be divided into one or more Module, one or more module are stored in memory 11, and as performed by one or more processors 12, to complete this Invention, the so-called module of the present invention are the series of computation machine program instruction sections for referring to complete specific function.
For example, referring to the program module schematic diagram for shown in Fig. 3, being minutes generation program 10 in Fig. 2.
It is generated in 10 1 embodiment of program in the minutes, it includes: module 110- that minutes, which generate program 10, 150, in which:
Receiving module 110, the minutes for receiving user's sending generate instruction, are referred to according to minutes generation It enables and obtains audio to be converted, alternatively, timing or obtaining audio to be converted from default store path in real time;
First division module 120, for carrying out sentence division to the audio to be converted based on default sentence division rule, Obtain the audio sentence of the audio to be converted;
Second division module 130, for extracting vocal print feature from the audio sentence respectively, by each audio sentence The vocal print feature of son is compared and analyzed with default vocal print feature library, determines the corresponding speaker's identity of each audio sentence Information, and the audio sentence is divided by voice segments according to the speaker's identity information, determine the audio pair to be converted The voice segments set answered;
Speech recognition module 140, for according to the corresponding speaker's identity information of voice segments each in institute's speech segment set The corresponding target voice identification model of each voice segments is called, each voice segments are successively inputted into corresponding target voice identification model, Obtain the corresponding text fragments of each voice segments, wherein the target voice identification model is based on accent corpus and jargon Material library is updated what training obtained;And
Generation module 150 generates the corresponding mesh of the audio to be converted for merging the corresponding text fragments of each voice segments Text is marked, and is associated with corresponding voice segments and speaker's identity information in each of the target text text fragments, Generate the corresponding minutes of the audio to be converted.
The functions or operations step that the module 110-150 is realized is similar as above, and and will not be described here in detail.
In addition, the embodiment of the present invention also proposes a kind of computer readable storage medium, the computer readable storage medium In include that minutes generate program 10, the minutes, which generate, realizes following operation when program 10 is executed by processor:
Receiving step: the minutes that user issues are received and generate instruction, instruction is generated according to the minutes and obtains Audio to be converted, alternatively, timing or obtaining audio to be converted from default store path in real time;
First partiting step: sentence division is carried out to the audio to be converted based on default sentence division rule, obtains institute State the audio sentence of audio to be converted;
Second partiting step: extracting vocal print feature from the audio sentence respectively, by the sound of each audio sentence Line feature is compared and analyzed with default vocal print feature library, determines the corresponding speaker's identity information of each audio sentence, And the audio sentence is divided by voice segments according to the speaker's identity information, determine the corresponding language of the audio to be converted Segment set;
Speech recognition steps: each language is called according to the corresponding speaker's identity information of voice segments each in institute's speech segment set Each voice segments are successively inputted corresponding target voice identification model, obtain each language by the corresponding target voice identification model of segment The corresponding text fragments of segment, wherein the target voice identification model is carried out based on accent corpus and industry corpus Update what training obtained;And
Generation step: merging the corresponding text fragments of each voice segments, generates the corresponding target text of the audio to be converted, And corresponding voice segments and speaker's identity information are associated in each of the target text text fragments, described in generation The corresponding minutes of audio to be converted.
The specific embodiment of the computer readable storage medium of the present invention and the above-mentioned minutes based on speech recognition The specific embodiment of generation method is roughly the same, and details are not described herein.
The serial number of the above embodiments of the invention is only for description, does not represent the advantages or disadvantages of the embodiments.
It should be noted that, in this document, the terms "include", "comprise" or its any other variant are intended to non-row His property includes, so that the process, device, article or the method that include a series of elements not only include those elements, and And further include other elements that are not explicitly listed, or further include for this process, device, article or method institute it is intrinsic Element.In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including being somebody's turn to do There is also other identical elements in the process, device of element, article or method.
Through the above description of the embodiments, those skilled in the art can be understood that above-described embodiment side Method can be realized by means of software and necessary general hardware platform, naturally it is also possible to by hardware, but in many cases The former is more preferably embodiment.Based on this understanding, technical solution of the present invention substantially in other words does the prior art The part contributed out can be embodied in the form of software products, which is stored in one as described above In storage medium (such as ROM/RAM, magnetic disk, CD), including some instructions are used so that terminal device (it can be mobile phone, Computer, server or network equipment etc.) execute method described in each embodiment of the present invention.
The above is only a preferred embodiment of the present invention, is not intended to limit the scope of the invention, all to utilize this hair Equivalent structure or equivalent flow shift made by bright specification and accompanying drawing content is applied directly or indirectly in other relevant skills Art field, is included within the scope of the present invention.

Claims (10)

1. a kind of minutes generation method based on speech recognition is suitable for electronic device, which is characterized in that this method packet It includes:
Receiving step: the minutes that user issues are received and generate instruction, instruction is generated according to the minutes and obtains wait turn Audio is changed, alternatively, timing or obtaining audio to be converted from default store path in real time;
First partiting step: carrying out sentence division to the audio to be converted based on default sentence division rule, obtain it is described to The audio sentence of transducing audio;
Second partiting step: extracting vocal print feature from the audio sentence respectively, and the vocal print of each audio sentence is special Sign is compared and analyzed with default vocal print feature library, determines the corresponding speaker's identity information of each audio sentence, and root The audio sentence is divided into voice segments according to the speaker's identity information, determines the corresponding voice segments of the audio to be converted Set;
Speech recognition steps: each voice segments are called according to the corresponding speaker's identity information of voice segments each in institute's speech segment set Each voice segments are successively inputted corresponding target voice identification model, obtain each voice segments by corresponding target voice identification model Corresponding text fragments, wherein the target voice identification model is updated based on accent corpus and industry corpus What training obtained;And
Generation step: merging the corresponding text fragments of each voice segments, generates the corresponding target text of the audio to be converted, and It is associated with corresponding voice segments and speaker's identity information in each of the target text text fragments, generates described wait turn Change the corresponding minutes of audio.
2. the minutes generation method according to claim 1 based on speech recognition, which is characterized in that described first stroke Include: step by step
Identify in the audio to be converted first pause, record first pause at the beginning of and the end time;
Identify the first sentence in the audio to be converted, and the end time that first is paused is as the beginning of the first sentence Time;
Identification second pause, record second pause at the beginning of and the end time, and by second pause at the beginning of conduct The end time of first sentence realizes the division of the first sentence;And
Above-mentioned steps are successively executed, until the audio to be converted terminates, obtain all audio sentences of the audio to be converted.
3. the minutes generation method according to claim 1 based on speech recognition, which is characterized in that described according to institute It states speaker's identity information and the audio sentence is divided into voice segments, comprising:
The identical audio sentence of temporally adjacent and corresponding speaker's identity information is merged and generates a voice segments, and according to institute The beginning and ending time at least one audio sentence that speech segment includes determines the beginning and ending time of institute's speech segment.
4. the minutes generation method as claimed in any of claims 1 to 3 based on speech recognition, feature exist In, before the first partiting step, this method further include:
Pre-treatment step: pre-processing the audio to be converted, obtains pretreated audio to be converted, the pretreatment It include: echo cancellation process, beam forming processing, noise reduction process and enhancing enhanced processing.
5. the minutes generation method according to claim 4 based on speech recognition, which is characterized in that this method is also wrapped It includes:
The minutes are segmented, the list after being segmented, and identify key from the list after the participle Word;And
The corresponding text fragments set of each keyword is determined respectively, according to the corresponding speaker's identity information pair of each text fragments The text fragments set is classified, and according to chronological order to each keyword and the corresponding text fragments of each keyword It is ranked up, the text fragments set after obtaining the corresponding sequence of each keyword.
6. the minutes generation method according to claim 5 based on speech recognition, which is characterized in that this method is also wrapped It includes:
The minutes that response user issues check instruction, show the minutes to user;And/or
The text fragments clicking operation that user issues is responded, shows the corresponding related information of the text fragments to user;Association Information includes: keyword, speaker's identity information, the link of corresponding voice segments, and when voice segments link when the user clicks, broadcasting is worked as Preceding voice segments;And/or
It responds the minutes that user issues and modifies instruction, instructed based on the modification and update and save the minutes;And/ Or
Respond the inquiry instruction for the carrying inquiry field that user issues, inquiry and the inquiry field from the minutes The text fragments inquired and corresponding related information are fed back to the user in a preset form by the text fragments matched.
7. a kind of electronic device, which is characterized in that the device includes memory and processor, and being stored in the memory can be The minutes run on the processor generate program, and the minutes generate program and execute Shi Keshi by the processor Existing following steps:
Receiving step: the minutes that user issues are received and generate instruction, instruction is generated according to the minutes and obtains wait turn Audio is changed, alternatively, timing or obtaining audio to be converted from default store path in real time;
First partiting step: sentence cutting is carried out to the audio to be converted based on preset audio sentence segmentation rules, is obtained The audio sentence of the audio to be converted;
Second partiting step: extracting vocal print feature from the audio sentence respectively, and the vocal print of each audio sentence is special Sign is compared and analyzed with default vocal print feature library, determines the corresponding speaker's identity information of each audio sentence, and root The audio sentence is divided into voice segments according to the speaker's identity information, determines the corresponding voice segments of the audio to be converted Set;
Speech recognition steps: each voice segments are called according to the corresponding speaker's identity information of voice segments each in institute's speech segment set Each voice segments are successively inputted corresponding target voice identification model, obtain each voice segments by corresponding target voice identification model Corresponding text fragments, wherein the target voice identification model is updated based on accent corpus and industry corpus What training obtained;And
Generation step: merging the corresponding text fragments of each voice segments, generates the corresponding target text of the audio to be converted, and It is associated with corresponding voice segments and speaker's identity information in each of the target text text fragments, generates described wait turn Change the corresponding minutes of audio.
8. electronic device according to claim 7, which is characterized in that the minutes generate program by the processor It can also be achieved following steps when execution:
The minutes are segmented, the list after being segmented, and identify key from the list after the participle Word;And
The corresponding text fragments set of each keyword is determined respectively, according to the corresponding speaker's identity information pair of each text fragments The text fragments set is classified, and according to chronological order to each keyword and the corresponding text fragments of each keyword It is ranked up, the text fragments set after obtaining the corresponding sequence of each keyword.
9. electronic device according to claim 7, which is characterized in that the minutes generate program by the processor It can also be achieved following steps when execution:
The minutes that response user issues check instruction, show the minutes to user;And/or
The text fragments clicking operation that user issues is responded, shows the corresponding related information of the text fragments to user;Association Information includes: keyword, speaker's identity information, the link of corresponding voice segments, and when voice segments link when the user clicks, broadcasting is worked as Preceding voice segments;And/or
It responds the minutes that user issues and modifies instruction, instructed based on the modification and update and save the minutes;And/ Or
Respond the inquiry instruction for the carrying inquiry field that user issues, inquiry and the inquiry field from the minutes The text fragments inquired and corresponding related information are fed back to the user in a preset form by the text fragments matched.
10. a kind of computer readable storage medium, which is characterized in that include minutes in the computer readable storage medium Program is generated, the minutes generate when program is executed by processor, it can be achieved that such as any one of claim 1 to 6 institute The step of minutes generation method based on speech recognition stated.
CN201910627403.1A 2019-07-11 2019-07-11 Minutes generation method, device and storage medium based on speech recognition Pending CN110335612A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910627403.1A CN110335612A (en) 2019-07-11 2019-07-11 Minutes generation method, device and storage medium based on speech recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910627403.1A CN110335612A (en) 2019-07-11 2019-07-11 Minutes generation method, device and storage medium based on speech recognition

Publications (1)

Publication Number Publication Date
CN110335612A true CN110335612A (en) 2019-10-15

Family

ID=68146486

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910627403.1A Pending CN110335612A (en) 2019-07-11 2019-07-11 Minutes generation method, device and storage medium based on speech recognition

Country Status (1)

Country Link
CN (1) CN110335612A (en)

Cited By (54)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110691258A (en) * 2019-10-30 2020-01-14 中央电视台 Program material manufacturing method and device, computer storage medium and electronic equipment
CN110767235A (en) * 2019-11-14 2020-02-07 北京中电慧声科技有限公司 Voice transcription processing device with role separation function and control method
CN110837557A (en) * 2019-11-05 2020-02-25 北京声智科技有限公司 Abstract generation method, device, equipment and medium
CN110875036A (en) * 2019-11-11 2020-03-10 广州国音智能科技有限公司 Voice classification method, device, equipment and computer readable storage medium
CN110930984A (en) * 2019-12-04 2020-03-27 北京搜狗科技发展有限公司 Voice processing method and device and electronic equipment
CN110992958A (en) * 2019-11-19 2020-04-10 深圳追一科技有限公司 Content recording method, content recording apparatus, electronic device, and storage medium
CN110995943A (en) * 2019-12-25 2020-04-10 携程计算机技术(上海)有限公司 Multi-user streaming voice recognition method, system, device and medium
CN111177353A (en) * 2019-12-27 2020-05-19 拉克诺德(深圳)科技有限公司 Text record generation method and device, computer equipment and storage medium
CN111192587A (en) * 2019-12-27 2020-05-22 拉克诺德(深圳)科技有限公司 Voice data matching method and device, computer equipment and storage medium
CN111312216A (en) * 2020-02-21 2020-06-19 厦门快商通科技股份有限公司 Voice marking method containing multiple speakers and computer readable storage medium
CN111353038A (en) * 2020-05-25 2020-06-30 深圳市友杰智新科技有限公司 Data display method and device, computer equipment and storage medium
CN111405235A (en) * 2020-04-20 2020-07-10 杭州大轶科技有限公司 Video conference method and system based on artificial intelligence recognition and extraction
CN111629267A (en) * 2020-04-30 2020-09-04 腾讯科技(深圳)有限公司 Audio labeling method, device, equipment and computer readable storage medium
CN111625614A (en) * 2020-01-20 2020-09-04 全息空间(深圳)智能科技有限公司 Live broadcast platform voice collection method, system and storage medium
CN111739536A (en) * 2020-05-09 2020-10-02 北京捷通华声科技股份有限公司 Audio processing method and device
CN111739553A (en) * 2020-06-02 2020-10-02 深圳市未艾智能有限公司 Conference sound acquisition method, conference recording method, conference record presentation method and device
CN111785275A (en) * 2020-06-30 2020-10-16 北京捷通华声科技股份有限公司 Voice recognition method and device
CN111785260A (en) * 2020-07-08 2020-10-16 泰康保险集团股份有限公司 Sentence dividing method and device, storage medium and electronic equipment
CN111968657A (en) * 2020-08-17 2020-11-20 北京字节跳动网络技术有限公司 Voice processing method and device, electronic equipment and computer readable medium
CN112017632A (en) * 2020-09-02 2020-12-01 浪潮云信息技术股份公司 Automatic conference record generation method
CN112165599A (en) * 2020-10-10 2021-01-01 广州科天视畅信息科技有限公司 Automatic conference summary generation method for video conference
CN112270918A (en) * 2020-10-22 2021-01-26 北京百度网讯科技有限公司 Information processing method, device, system, electronic equipment and storage medium
CN112395420A (en) * 2021-01-19 2021-02-23 平安科技(深圳)有限公司 Video content retrieval method and device, computer equipment and storage medium
CN112562682A (en) * 2020-12-02 2021-03-26 携程计算机技术(上海)有限公司 Identity recognition method, system, equipment and storage medium based on multi-person call
WO2021073116A1 (en) * 2019-10-18 2021-04-22 平安科技(深圳)有限公司 Method and apparatus for generating legal document, device and storage medium
CN112800269A (en) * 2021-01-20 2021-05-14 上海明略人工智能(集团)有限公司 Conference record generation method and device
CN112820297A (en) * 2020-12-30 2021-05-18 平安普惠企业管理有限公司 Voiceprint recognition method and device, computer equipment and storage medium
CN112839195A (en) * 2020-12-30 2021-05-25 深圳市皓丽智能科技有限公司 Method and device for consulting meeting record, computer equipment and storage medium
CN112837690A (en) * 2020-12-30 2021-05-25 科大讯飞股份有限公司 Audio data generation method, audio data transcription method and device
CN112887659A (en) * 2021-01-29 2021-06-01 深圳前海微众银行股份有限公司 Conference recording method, device, equipment and storage medium
CN112995572A (en) * 2021-04-23 2021-06-18 深圳市黑金工业制造有限公司 Remote conference system and physical display method in remote conference
CN113010704A (en) * 2020-11-18 2021-06-22 北京字跳网络技术有限公司 Interaction method, device, equipment and medium for conference summary
CN113055529A (en) * 2021-03-29 2021-06-29 深圳市艾酷通信软件有限公司 Recording control method and recording control device
CN113051426A (en) * 2021-03-18 2021-06-29 深圳市声扬科技有限公司 Audio information classification method and device, electronic equipment and storage medium
CN113113018A (en) * 2021-04-16 2021-07-13 钦州云之汇大数据科技有限公司 Enterprise intelligent management system and method based on big data
CN113207032A (en) * 2021-04-29 2021-08-03 读书郎教育科技有限公司 System and method for increasing subtitles by recording videos in intelligent classroom
CN113299279A (en) * 2021-05-18 2021-08-24 上海明略人工智能(集团)有限公司 Method, apparatus, electronic device and readable storage medium for associating voice data and retrieving voice data
CN113327619A (en) * 2021-02-26 2021-08-31 山东大学 Conference recording method and system based on cloud-edge collaborative architecture
CN113409774A (en) * 2021-07-20 2021-09-17 北京声智科技有限公司 Voice recognition method and device and electronic equipment
CN113408996A (en) * 2020-03-16 2021-09-17 上海博泰悦臻网络技术服务有限公司 Schedule management method, schedule management device and computer readable storage medium
CN113488025A (en) * 2021-07-14 2021-10-08 维沃移动通信(杭州)有限公司 Text generation method and device, electronic equipment and readable storage medium
CN113539269A (en) * 2021-07-20 2021-10-22 上海明略人工智能(集团)有限公司 Audio information processing method, system and computer readable storage medium
CN113595868A (en) * 2021-06-28 2021-11-02 深圳云之家网络有限公司 Voice message processing method and device based on instant messaging and computer equipment
CN113658599A (en) * 2021-08-18 2021-11-16 平安普惠企业管理有限公司 Conference record generation method, device, equipment and medium based on voice recognition
CN114079695A (en) * 2020-08-18 2022-02-22 北京有限元科技有限公司 Method, device and storage medium for recording voice call content
WO2022037388A1 (en) * 2020-08-17 2022-02-24 北京字节跳动网络技术有限公司 Voice generation method and apparatus, device, and computer readable medium
CN114125368A (en) * 2021-11-30 2022-03-01 北京字跳网络技术有限公司 Conference audio participant association method and device and electronic equipment
CN114330369A (en) * 2022-03-15 2022-04-12 深圳文达智通技术有限公司 Local production marketing management method, device and equipment based on intelligent voice analysis
WO2022105861A1 (en) * 2020-11-20 2022-05-27 北京有竹居网络技术有限公司 Method and apparatus for recognizing voice, electronic device and medium
CN115174285A (en) * 2022-07-26 2022-10-11 中国工商银行股份有限公司 Conference record generation method and device and electronic equipment
CN115828907A (en) * 2023-02-16 2023-03-21 南昌航天广信科技有限责任公司 Intelligent conference management method, system, readable storage medium and computer equipment
CN115906781A (en) * 2022-12-15 2023-04-04 广州文石信息科技有限公司 Method, device and equipment for audio identification and anchor point addition and readable storage medium
CN117456984A (en) * 2023-10-26 2024-01-26 杭州捷途慧声科技有限公司 Voice interaction method and system based on voiceprint recognition
CN113488025B (en) * 2021-07-14 2024-05-14 维沃移动通信(杭州)有限公司 Text generation method, device, electronic equipment and readable storage medium

Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120072211A1 (en) * 2010-09-16 2012-03-22 Nuance Communications, Inc. Using codec parameters for endpoint detection in speech recognition
CN104427292A (en) * 2013-08-22 2015-03-18 中兴通讯股份有限公司 Method and device for extracting a conference summary
CN105632484A (en) * 2016-02-19 2016-06-01 上海语知义信息技术有限公司 Voice synthesis database pause information automatic marking method and system
CN105632498A (en) * 2014-10-31 2016-06-01 株式会社东芝 Method, device and system for generating conference record
CN105719642A (en) * 2016-02-29 2016-06-29 黄博 Continuous and long voice recognition method and system and hardware equipment
CN105845129A (en) * 2016-03-25 2016-08-10 乐视控股(北京)有限公司 Method and system for dividing sentences in audio and automatic caption generation method and system for video files
CN106683662A (en) * 2015-11-10 2017-05-17 中国电信股份有限公司 Speech recognition method and device
CN107039035A (en) * 2017-01-10 2017-08-11 上海优同科技有限公司 A kind of detection method of voice starting point and ending point
CN107689225A (en) * 2017-09-29 2018-02-13 福建实达电脑设备有限公司 A kind of method for automatically generating minutes
CN108335697A (en) * 2018-01-29 2018-07-27 北京百度网讯科技有限公司 Minutes method, apparatus, equipment and computer-readable medium
CN108363765A (en) * 2018-02-06 2018-08-03 深圳市鹰硕技术有限公司 The recognition methods of audio paragraph and device
CN108447471A (en) * 2017-02-15 2018-08-24 腾讯科技(深圳)有限公司 Audio recognition method and speech recognition equipment
CN108763338A (en) * 2018-05-14 2018-11-06 山东亿云信息技术有限公司 A kind of News Collection&Edit System based on power industry
CN108986826A (en) * 2018-08-14 2018-12-11 中国平安人寿保险股份有限公司 Automatically generate method, electronic device and the readable storage medium storing program for executing of minutes
CN109325737A (en) * 2018-09-17 2019-02-12 态度国际咨询管理(深圳)有限公司 A kind of enterprise intelligent virtual assistant system and its method
CN109388701A (en) * 2018-08-17 2019-02-26 深圳壹账通智能科技有限公司 Minutes generation method, device, equipment and computer storage medium
CN109754808A (en) * 2018-12-13 2019-05-14 平安科技(深圳)有限公司 Method, apparatus, computer equipment and the storage medium of voice conversion text
CN109767757A (en) * 2019-01-16 2019-05-17 平安科技(深圳)有限公司 A kind of minutes generation method and device

Patent Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20120072211A1 (en) * 2010-09-16 2012-03-22 Nuance Communications, Inc. Using codec parameters for endpoint detection in speech recognition
CN104427292A (en) * 2013-08-22 2015-03-18 中兴通讯股份有限公司 Method and device for extracting a conference summary
CN105632498A (en) * 2014-10-31 2016-06-01 株式会社东芝 Method, device and system for generating conference record
CN106683662A (en) * 2015-11-10 2017-05-17 中国电信股份有限公司 Speech recognition method and device
CN105632484A (en) * 2016-02-19 2016-06-01 上海语知义信息技术有限公司 Voice synthesis database pause information automatic marking method and system
CN105719642A (en) * 2016-02-29 2016-06-29 黄博 Continuous and long voice recognition method and system and hardware equipment
CN105845129A (en) * 2016-03-25 2016-08-10 乐视控股(北京)有限公司 Method and system for dividing sentences in audio and automatic caption generation method and system for video files
CN107039035A (en) * 2017-01-10 2017-08-11 上海优同科技有限公司 A kind of detection method of voice starting point and ending point
CN108447471A (en) * 2017-02-15 2018-08-24 腾讯科技(深圳)有限公司 Audio recognition method and speech recognition equipment
CN107689225A (en) * 2017-09-29 2018-02-13 福建实达电脑设备有限公司 A kind of method for automatically generating minutes
CN108335697A (en) * 2018-01-29 2018-07-27 北京百度网讯科技有限公司 Minutes method, apparatus, equipment and computer-readable medium
CN108363765A (en) * 2018-02-06 2018-08-03 深圳市鹰硕技术有限公司 The recognition methods of audio paragraph and device
CN108763338A (en) * 2018-05-14 2018-11-06 山东亿云信息技术有限公司 A kind of News Collection&Edit System based on power industry
CN108986826A (en) * 2018-08-14 2018-12-11 中国平安人寿保险股份有限公司 Automatically generate method, electronic device and the readable storage medium storing program for executing of minutes
CN109388701A (en) * 2018-08-17 2019-02-26 深圳壹账通智能科技有限公司 Minutes generation method, device, equipment and computer storage medium
CN109325737A (en) * 2018-09-17 2019-02-12 态度国际咨询管理(深圳)有限公司 A kind of enterprise intelligent virtual assistant system and its method
CN109754808A (en) * 2018-12-13 2019-05-14 平安科技(深圳)有限公司 Method, apparatus, computer equipment and the storage medium of voice conversion text
CN109767757A (en) * 2019-01-16 2019-05-17 平安科技(深圳)有限公司 A kind of minutes generation method and device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
曾超 等: ""有限命令集特定人汉语语音实时识别系统"", 《第三届全国人机语音通讯学术会议(NCMMSC1994)论文集》 *

Cited By (67)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021073116A1 (en) * 2019-10-18 2021-04-22 平安科技(深圳)有限公司 Method and apparatus for generating legal document, device and storage medium
CN110691258A (en) * 2019-10-30 2020-01-14 中央电视台 Program material manufacturing method and device, computer storage medium and electronic equipment
CN110837557A (en) * 2019-11-05 2020-02-25 北京声智科技有限公司 Abstract generation method, device, equipment and medium
CN110837557B (en) * 2019-11-05 2023-02-17 北京声智科技有限公司 Abstract generation method, device, equipment and medium
CN110875036A (en) * 2019-11-11 2020-03-10 广州国音智能科技有限公司 Voice classification method, device, equipment and computer readable storage medium
CN110767235A (en) * 2019-11-14 2020-02-07 北京中电慧声科技有限公司 Voice transcription processing device with role separation function and control method
CN110992958B (en) * 2019-11-19 2021-06-22 深圳追一科技有限公司 Content recording method, content recording apparatus, electronic device, and storage medium
CN110992958A (en) * 2019-11-19 2020-04-10 深圳追一科技有限公司 Content recording method, content recording apparatus, electronic device, and storage medium
CN110930984A (en) * 2019-12-04 2020-03-27 北京搜狗科技发展有限公司 Voice processing method and device and electronic equipment
CN110995943A (en) * 2019-12-25 2020-04-10 携程计算机技术(上海)有限公司 Multi-user streaming voice recognition method, system, device and medium
CN110995943B (en) * 2019-12-25 2021-05-07 携程计算机技术(上海)有限公司 Multi-user streaming voice recognition method, system, device and medium
CN111192587A (en) * 2019-12-27 2020-05-22 拉克诺德(深圳)科技有限公司 Voice data matching method and device, computer equipment and storage medium
CN111177353A (en) * 2019-12-27 2020-05-19 拉克诺德(深圳)科技有限公司 Text record generation method and device, computer equipment and storage medium
CN111625614A (en) * 2020-01-20 2020-09-04 全息空间(深圳)智能科技有限公司 Live broadcast platform voice collection method, system and storage medium
CN111312216A (en) * 2020-02-21 2020-06-19 厦门快商通科技股份有限公司 Voice marking method containing multiple speakers and computer readable storage medium
CN113408996A (en) * 2020-03-16 2021-09-17 上海博泰悦臻网络技术服务有限公司 Schedule management method, schedule management device and computer readable storage medium
CN111405235A (en) * 2020-04-20 2020-07-10 杭州大轶科技有限公司 Video conference method and system based on artificial intelligence recognition and extraction
CN111629267A (en) * 2020-04-30 2020-09-04 腾讯科技(深圳)有限公司 Audio labeling method, device, equipment and computer readable storage medium
CN111739536A (en) * 2020-05-09 2020-10-02 北京捷通华声科技股份有限公司 Audio processing method and device
CN111353038A (en) * 2020-05-25 2020-06-30 深圳市友杰智新科技有限公司 Data display method and device, computer equipment and storage medium
CN111739553B (en) * 2020-06-02 2024-04-05 深圳市未艾智能有限公司 Conference sound collection, conference record and conference record presentation method and device
CN111739553A (en) * 2020-06-02 2020-10-02 深圳市未艾智能有限公司 Conference sound acquisition method, conference recording method, conference record presentation method and device
CN111785275A (en) * 2020-06-30 2020-10-16 北京捷通华声科技股份有限公司 Voice recognition method and device
CN111785260B (en) * 2020-07-08 2023-10-27 泰康保险集团股份有限公司 Clause method and device, storage medium and electronic equipment
CN111785260A (en) * 2020-07-08 2020-10-16 泰康保险集团股份有限公司 Sentence dividing method and device, storage medium and electronic equipment
CN111968657A (en) * 2020-08-17 2020-11-20 北京字节跳动网络技术有限公司 Voice processing method and device, electronic equipment and computer readable medium
WO2022037388A1 (en) * 2020-08-17 2022-02-24 北京字节跳动网络技术有限公司 Voice generation method and apparatus, device, and computer readable medium
CN114079695A (en) * 2020-08-18 2022-02-22 北京有限元科技有限公司 Method, device and storage medium for recording voice call content
CN112017632A (en) * 2020-09-02 2020-12-01 浪潮云信息技术股份公司 Automatic conference record generation method
CN112165599A (en) * 2020-10-10 2021-01-01 广州科天视畅信息科技有限公司 Automatic conference summary generation method for video conference
CN112270918A (en) * 2020-10-22 2021-01-26 北京百度网讯科技有限公司 Information processing method, device, system, electronic equipment and storage medium
CN113010704A (en) * 2020-11-18 2021-06-22 北京字跳网络技术有限公司 Interaction method, device, equipment and medium for conference summary
WO2022105861A1 (en) * 2020-11-20 2022-05-27 北京有竹居网络技术有限公司 Method and apparatus for recognizing voice, electronic device and medium
CN112562682A (en) * 2020-12-02 2021-03-26 携程计算机技术(上海)有限公司 Identity recognition method, system, equipment and storage medium based on multi-person call
CN112837690A (en) * 2020-12-30 2021-05-25 科大讯飞股份有限公司 Audio data generation method, audio data transcription method and device
CN112820297A (en) * 2020-12-30 2021-05-18 平安普惠企业管理有限公司 Voiceprint recognition method and device, computer equipment and storage medium
CN112839195A (en) * 2020-12-30 2021-05-25 深圳市皓丽智能科技有限公司 Method and device for consulting meeting record, computer equipment and storage medium
CN112839195B (en) * 2020-12-30 2023-10-10 深圳市皓丽智能科技有限公司 Conference record consulting method and device, computer equipment and storage medium
CN112837690B (en) * 2020-12-30 2024-04-16 科大讯飞股份有限公司 Audio data generation method, audio data transfer method and device
CN112395420A (en) * 2021-01-19 2021-02-23 平安科技(深圳)有限公司 Video content retrieval method and device, computer equipment and storage medium
CN112800269A (en) * 2021-01-20 2021-05-14 上海明略人工智能(集团)有限公司 Conference record generation method and device
CN112887659B (en) * 2021-01-29 2023-06-23 深圳前海微众银行股份有限公司 Conference recording method, device, equipment and storage medium
CN112887659A (en) * 2021-01-29 2021-06-01 深圳前海微众银行股份有限公司 Conference recording method, device, equipment and storage medium
CN113327619A (en) * 2021-02-26 2021-08-31 山东大学 Conference recording method and system based on cloud-edge collaborative architecture
CN113327619B (en) * 2021-02-26 2022-11-04 山东大学 Conference recording method and system based on cloud-edge collaborative architecture
CN113051426A (en) * 2021-03-18 2021-06-29 深圳市声扬科技有限公司 Audio information classification method and device, electronic equipment and storage medium
CN113055529B (en) * 2021-03-29 2022-12-13 深圳市艾酷通信软件有限公司 Recording control method and recording control device
CN113055529A (en) * 2021-03-29 2021-06-29 深圳市艾酷通信软件有限公司 Recording control method and recording control device
CN113113018A (en) * 2021-04-16 2021-07-13 钦州云之汇大数据科技有限公司 Enterprise intelligent management system and method based on big data
CN112995572A (en) * 2021-04-23 2021-06-18 深圳市黑金工业制造有限公司 Remote conference system and physical display method in remote conference
CN113207032A (en) * 2021-04-29 2021-08-03 读书郎教育科技有限公司 System and method for increasing subtitles by recording videos in intelligent classroom
CN113299279A (en) * 2021-05-18 2021-08-24 上海明略人工智能(集团)有限公司 Method, apparatus, electronic device and readable storage medium for associating voice data and retrieving voice data
CN113595868A (en) * 2021-06-28 2021-11-02 深圳云之家网络有限公司 Voice message processing method and device based on instant messaging and computer equipment
CN113488025B (en) * 2021-07-14 2024-05-14 维沃移动通信(杭州)有限公司 Text generation method, device, electronic equipment and readable storage medium
CN113488025A (en) * 2021-07-14 2021-10-08 维沃移动通信(杭州)有限公司 Text generation method and device, electronic equipment and readable storage medium
CN113539269A (en) * 2021-07-20 2021-10-22 上海明略人工智能(集团)有限公司 Audio information processing method, system and computer readable storage medium
CN113409774A (en) * 2021-07-20 2021-09-17 北京声智科技有限公司 Voice recognition method and device and electronic equipment
CN113658599A (en) * 2021-08-18 2021-11-16 平安普惠企业管理有限公司 Conference record generation method, device, equipment and medium based on voice recognition
CN114125368A (en) * 2021-11-30 2022-03-01 北京字跳网络技术有限公司 Conference audio participant association method and device and electronic equipment
CN114125368B (en) * 2021-11-30 2024-01-30 北京字跳网络技术有限公司 Conference audio participant association method and device and electronic equipment
CN114330369A (en) * 2022-03-15 2022-04-12 深圳文达智通技术有限公司 Local production marketing management method, device and equipment based on intelligent voice analysis
CN115174285B (en) * 2022-07-26 2024-02-27 中国工商银行股份有限公司 Conference record generation method and device and electronic equipment
CN115174285A (en) * 2022-07-26 2022-10-11 中国工商银行股份有限公司 Conference record generation method and device and electronic equipment
CN115906781A (en) * 2022-12-15 2023-04-04 广州文石信息科技有限公司 Method, device and equipment for audio identification and anchor point addition and readable storage medium
CN115906781B (en) * 2022-12-15 2023-11-24 广州文石信息科技有限公司 Audio identification anchor adding method, device, equipment and readable storage medium
CN115828907A (en) * 2023-02-16 2023-03-21 南昌航天广信科技有限责任公司 Intelligent conference management method, system, readable storage medium and computer equipment
CN117456984A (en) * 2023-10-26 2024-01-26 杭州捷途慧声科技有限公司 Voice interaction method and system based on voiceprint recognition

Similar Documents

Publication Publication Date Title
CN110335612A (en) Minutes generation method, device and storage medium based on speech recognition
CN112804400B (en) Customer service call voice quality inspection method and device, electronic equipment and storage medium
CN107038220B (en) Method, intelligent robot and system for generating memorandum
US20170300487A1 (en) System And Method For Enhancing Voice-Enabled Search Based On Automated Demographic Identification
CN104969288B (en) The method and system of voice recognition system is provided based on voice recording daily record
US7983910B2 (en) Communicating across voice and text channels with emotion preservation
US11189277B2 (en) Dynamic gazetteers for personalized entity recognition
CN108305626A (en) The sound control method and device of application program
CN108829765A (en) A kind of information query method, device, computer equipment and storage medium
CN110349564A (en) Across the language voice recognition methods of one kind and device
CN109256150A (en) Speech emotion recognition system and method based on machine learning
CN109256136A (en) A kind of audio recognition method and device
CN110134756A (en) Minutes generation method, electronic device and storage medium
CN109801638B (en) Voice verification method, device, computer equipment and storage medium
CN110047481A (en) Method for voice recognition and device
CN110933225B (en) Call information acquisition method and device, storage medium and electronic equipment
CN104252464A (en) Information processing method and information processing device
CN109190124A (en) Method and apparatus for participle
CN112925945A (en) Conference summary generation method, device, equipment and storage medium
CN109754808B (en) Method, device, computer equipment and storage medium for converting voice into text
US20210118464A1 (en) Method and apparatus for emotion recognition from speech
KR102312993B1 (en) Method and apparatus for implementing interactive message using artificial neural network
CN113920986A (en) Conference record generation method, device, equipment and storage medium
KR20150041592A (en) Method for updating contact information in callee electronic device, and the electronic device
CN112468665A (en) Method, device, equipment and storage medium for generating conference summary

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20191015

RJ01 Rejection of invention patent application after publication