CN109887508A

CN109887508A - A kind of meeting automatic record method, electronic equipment and storage medium based on vocal print

Info

Publication number: CN109887508A
Application number: CN201910072434.5A
Authority: CN
Inventors: 傅峰峰
Original assignee: Guangzhou Fugang Wanjia Intelligent Technology Co Ltd
Current assignee: Guangzhou Fugang Wanjia Intelligent Technology Co Ltd
Priority date: 2019-01-25
Filing date: 2019-01-25
Publication date: 2019-06-14

Abstract

The invention discloses a kind of meeting automatic record method based on vocal print, comprising the following steps: obtaining step: pass through the voice messaging that sound collection equipment obtains active user；Extraction step: the acoustic information and vocal print characteristic information in voice messaging are extracted；Judgment step: judging whether the vocal print feature information is stored in the Application on Voiceprint Recognition model library of server, if it is, converting text information for acoustic information and recording.The present invention also provides a kind of electronic equipment and computer readable storage medium.Of the invention being passed through based on the meeting automatic record method of vocal print distinguishes vocal print feature information to further determine whether to record its speech content, it highly efficient can record meeting, so that convenience when later period progress conference content access more.

Description

A kind of meeting automatic record method, electronic equipment and storage medium based on vocal print

Technical field

The present invention relates to a kind of minutes technical field more particularly to a kind of meeting sides of automatically recording based on vocal print Method, storage medium and storage medium.

Background technique

Currently, conventional meeting is to carry out minutes using special record personnel；Relatively advanced is existing meeting The method reported and recorded in view, usually using equipment such as video camera, microphone, recording pens to everyone in conference process Speech is recorded and is recorded a video.The personnel to take minutes after the meeting can check, play back recording and record a video to arrange minutes. However, by being manually labeled and extracting to voice data, it is time-consuming and extremely inconvenient for user.

So those skilled in the art are directed to the proposition that minutes have continuous new idea, such as application No. is The application for a patent for invention of CN201810328377.8 belongs to conference audio processing technology field, discloses a kind of automatic conference Recording method, comprising: data acquisition, noise reduction process, tone identification, speech recognition, key content mark and Automatic Typesetting.This hair The bright emphasis for capableing of automatic marking minutes, prominent session topic, automation typesetting form the minutes after arranging, save The time of secondary arrangement.It improves the efficiency of minutes to a certain extent.

Summary of the invention

For overcome the deficiencies in the prior art, it is an object of the present invention to the location-based meeting sides of automatically recording Method can solve the technical issues of carrying out minutes based on participant's identity.

The second object of the present invention is to provide a kind of electronic equipment, can solve and carry out meeting note based on participant's identity The technical issues of record.

The third object of the present invention is to provide a kind of computer readable storage medium, can solve based on participant's identity The technical issues of carrying out minutes.

An object of the present invention adopts the following technical scheme that realization:

A kind of meeting automatic record method based on vocal print, comprising the following steps:

Obtaining step: the voice messaging of active user is obtained by sound collection equipment；

Extraction step: the acoustic information and vocal print characteristic information in voice messaging are extracted；

Judgment step: judging whether the vocal print feature information is stored in the Application on Voiceprint Recognition model library of server, if It is, then text information is converted for acoustic information by speech recognition module and records.

Further, before the step of acquiring further include wake-up step: when receiving default wake-up word, starting sound is adopted Collect equipment.

Further, the wake-up step specifically: when the default wake-up word received, judgement wakes up word pair with default Whether the voiceprint answered stores in the server, if it is, starting sound collection equipment.

Further, in the obtaining step specifically: obtain the sound of active user by annular microphone array Information.

Further, the Application on Voiceprint Recognition model library in the judgment step constructs as follows:

Obtain the voice messaging of all users to be registered；

Extract the vocal print feature information in the voice messaging of all users to be registered；

All voiceprints are stored to complete the building of Application on Voiceprint Recognition model library.

Further, the vocal print feature information is using classical mel cepstrum coefficients MFCC or perception linear prediction system Number PLP or depth characteristic Deep Feature or the regular spectral coefficient PNCC of energy are indicated.

Further, further include terminating judgment step after judgment step: when sound collection equipment within a preset time When being not received by acoustic information, then prompt information is sent to judge whether meeting terminates, if it is, exporting complete meeting text Shelves.

Further, further include document sending step after terminating judgment step: complete meeting document being sent to and is beaten Print machine print or complete meeting document is sent in the user information prestored.

The second object of the present invention adopts the following technical scheme that realization:

A kind of electronic equipment can be run on a memory and on a processor including memory, processor and storage Computer program, the processor are realized as described in any one of one of the object of the invention when executing the computer program A kind of meeting automatic record method based on vocal print.

The third object of the present invention adopts the following technical scheme that realization:

A kind of computer readable storage medium, is stored thereon with computer program, and the computer program is held by processor A kind of meeting automatic record method based on vocal print as described in any one of one of the object of the invention is realized when row.

Compared with prior art, the beneficial effects of the present invention are:

Meeting automatic record method based on vocal print of the invention, which passes through, distinguishes vocal print feature information further to judge to be It is no that its speech content is recorded, highly efficient meeting can be recorded, be looked into so that the later period carries out conference content Convenience when readding more.

Detailed description of the invention

Fig. 1 is the flow chart of the meeting automatic record method based on vocal print of embodiment one.

Specific embodiment

In the following, being described further in conjunction with attached drawing and specific embodiment to the present invention, it should be noted that not Under the premise of conflicting, new implementation can be formed between various embodiments described below or between each technical characteristic in any combination Example.

Embodiment one

As shown in Figure 1, present embodiments providing a kind of meeting automatic record method based on vocal print, comprising the following steps:

S0: when receiving default wake-up word, start sound collection equipment；Before carrying out sound acquisition, it must sentence first Disconnected whether to start sound collection equipment, there are many kinds of the modes of unlatching, for example directly can realize voice by power button The switch of identifying system, it is this be it is the most original, it is also not smart enough；Its side that can be substituted as one kind in the present embodiment Formula can be selected, and speech recognition system can also be arranged and be in normally opened state, can only compare consuming electricity in this way, and And many unnecessary " meeting documents " are readily formed, but it can also be used as a kind of mode to be implemented, only not Belong in the present invention to be focused on the highly preferred mode of description.

In the present embodiment most preferably, being that can carry out speech recognition system by the way of keyword wake-up It wakes up, for example be set as " meeting starts " for keyword is waken up, when sound collection equipment collects such information, then The speech recognition system being in the standby state is waken up to work, to realize real automatic processing.So that meeting Discuss more smooth that can be carried out.Owner can control the progress of meeting due to being not, so need to be arranged one A or multiple users plan as a whole, the voiceprint of these users are stored in advance in the server, only they say The default wake-up word come just has the function of starting.So-called vocal print (Voiceprint), is the carrying that electricity consumption acoustic instrument is shown The sound wave spectrum of verbal information.Modern scientific research shows that vocal print not only has specificity, but also has the spy of relative stability Point.After adult, the sound of people can keep stablizing relatively for a long time constant.It is demonstrated experimentally that no matter talker is deliberately to imitate other people Sound and the tone, or whisper in sb.'s ear are softly talked, even if imitating remarkably true to life, vocal print is not but identical always.So using sound This identification method of line identifies more efficient.

When the default wake-up word received, whether judgement voiceprint corresponding with default wake-up word is stored in server In, if it is, starting sound collection equipment.It here, can be according to the habit of user when carrying out the default setting for waking up word Be used to be configured, for example, it is common can be set " meeting starts ", such routinely words and phrases, corresponding enterprise can also be according to According to the corporate culture of oneself, different wake-up words is set.For example the wake-up word of Alibaba Co can be set that " 12 robbers are Gather, meeting starts " such more individual character more for the wake-up mode of characteristic allows the system to have higher user Stickiness.

When carrying out waking up word setting, following manner can be set to, " ask everybody peace and quiet, meeting preparation starts." When such setting, has higher anti-interference.Because when user carries out a meeting room, at this time respectively It can be talked between conference member, sometimes can accidentally trigger speech recognition system, will result in certain starting in this way and lose Accidentally, user can be reduced to a certain extent to the trust of system stability.Therefore relatively long language is set, it can be made Difficulty is waken up to increase.More importantly when finishing " it is quiet to ask everybody ", it can be by as detection ambient noise Information confirms, if it is the clearly instruction that conference moderator issues, if after finishing, ambient noise is substantially reduced, then Illustrate to be strictly formally to enter conference model, at this point, starting speech recognition system.After start-up, the speech recognition system, It can also be further sent out inquiry, to confirm whether meeting starts, when the information is affirmative acknowledgement (ACK), then starting completely.

S1: the acoustic information that sound collection equipment obtains active user is crossed；The sound collection equipment is highly preferred, adopts The acoustic information of active user is acquired with annular microphone array；This step is primarily to get the sound of corresponding user Information, this is also the basis of following all steps.It can be with highly efficient accurate acquisition round table surrounding by annular microphone Acoustic information, the sound source information got is more clear, also it will be made more accurate then the later period carries out voiced translation.

S2: the acoustic information and vocal print characteristic information in voice messaging are extracted；The vocal print feature information is using classics Mel cepstrum coefficients MFCC or perception linear predictor coefficient PLP or depth characteristic DeepFeature or the regular spectrum of energy FACTOR P NCC is indicated.

For Voiceprint Recognition System, if the angle of the voice content described in the user, can be divided into content Related two major classes technology unrelated with content.As its name suggests, " content is related " just refers to that system assumes that user only says in system prompt The content that appearance or a small range allow, and " content is unrelated " does not limit user's content then.The former only needs to identify system System can in lesser range handle different user between sound property difference, since content is substantially similar, Only need to consider that the difference of sound itself, difficulty are relatively small；And the latter not only needs to examine due to not limiting content, identifying system Consider the particular differences between user voice, it is also necessary to which phonetic difference caused by process content difference, difficulty are larger.

There is a kind of technology to fall between at present, " limited content is related " can be referred to as, system can arrange in pairs or groups at random Some numbers or symbol, user, which need to correctly read out corresponding content, just can recognize vocal print, and the introducing of this randomness is so that text Collected vocal print has the difference on content timeline each time in Classical correlation, and this characteristic is just deposited on internet extensively Short random number word string (such as Digital verification code) mutually agree with, can be used to verify identity, or special with the biology such as other faces Sign combines composition multiple-factor authentication means.In the present embodiment using the technology of the unrelated aspect of content, because at this In only to need to recognize corresponding user be whom, without further being verified.Because of this sound recognition system To build be in closed environment, rather than in a kind of open environment.But it when designing can be with It is related to be set as the related either content of limited content, in this way by the way that specific word is added in every words, so that record is more It is accurate.It is above-mentioned to be only simply illustrated from substantially technique direction, next to the voiceprint recognition algorithm specifically designed Technical detail is described.

Specific to the technical detail of voiceprint recognition algorithm, in feature level, classical mel cepstrum coefficients MFCC perceives line Property predictive coefficient PLP, depth characteristic Deep Feature and the regular spectral coefficient PNCC of energy etc., all can serve as outstanding Acoustic feature is used for the input of model learning, but most commonly used or MFCC feature, can also be by various features in characteristic layer Face or model level are combined use.In machine learning model level, there are also a kind of mode be using iVector frame come Learnt.Since deep learning is currently in the hot topic just studied, also inevitably influenced by it in vocal print field, therefore in tradition UBM-iVector frame under derivation gone out DNN-iVector, also be only using DNN (or BN) extract feature replace MFCC or supplement as MFCC, rear end learning framework are still iVector.These are all specific progress vocal print feature information The mode of extraction only enumerates corresponding mode, this field skill since the present invention is not directed to the improvement of concrete mode herein Art personnel can build proper identification module according to corresponding mode and actual demand.

S3: judging whether the vocal print feature information is stored in the Application on Voiceprint Recognition model library of server, if it is, will Acoustic information is converted into text information and records.Application on Voiceprint Recognition model library in judgment step structure as follows It builds:

Obtain the voice messaging of all users to be registered；Extract the vocal print feature in the voice messaging of all users to be registered Information；All voiceprints are stored to complete the building of Application on Voiceprint Recognition model library.Most important mode of the invention It is to be directed to specific personnel to carry out minutes, rather than be directed to owner.When need to these meeting participants carry out When minutes, most start to need to get its corresponding voiceprint, for example in holding meeting, necessarily need to obtain Take the key message of department head and project leader spoken；These users need in advance by its sound typing into the system In, corresponding judgement is completed as a criterion.When carrying out information registering, speaking, which can be, appoints What language, such as language as " I will be registered as member ", then by extract the vocal print feature information of the user come It is stored as one identity information.Be provided in this way a benefit be will not limit the position of user, but according to Family exclusive sound characteristic is identified.And the acoustic information of user is picked up due to the annular microphone of setting, that No matter user be moved to that place, can by annular microphone array complete to its localization of sound.

Behind auditory localization technological orientation sound position, close in addition to the wheat nearest with acoustic information position Remaining microphone outside gram wind.When navigating to specific position, the microphone in face of it is most preferably only opened, and is incited somebody to action Remaining microphone is closed, and is capable of the acoustic information of more efficiently acquisition current speaker in this way, and masks a part of theory Words person's speaks in a low voice, and sound will not be caused to obtain in a kind of state that comparison is chaotic because many places sound source is generated.Because if When talker is not one, and open simultaneously has multiple microphones at this time, then can not just judge their come Source strength is weak, and can only open the wheat in face of speaker by all typings of all acoustic informations, so that certain confusion can be generated If gram wind, whether can need to record corresponding information by sound intensity, audio direction etc. to position, and judge be It is no to carry out recording replacement etc..The auditory localization technology is the algorithm estimated based on time delay or is estimated based on High-Resolution Spectral The algorithm of meter or algorithm based on rarefaction representation.The auditory localization technology its be that the algorithm core based on TDE is to biography The accurate estimation that sowing time prolongs, is generally obtained by doing cross correlation process to signal microphone.Further obtain sound source position letter Breath can be calculated by the summation that is simply delayed, geometry or directly carry out controlled power response search using cross correlation results The methods of.Such algorithm realization is relatively easy, and operand is small, convenient for processing in real time, therefore in practice with most extensively.

When carrying out the building of Application on Voiceprint Recognition model library, two parts can be set, a part is permanent effective note Volume user, one is interim effective registration users.The permanent effective registration used time is its important and usual meeting Leader, his sound must be recorded, interim effective registration user, be because can there is a situation where it is such be, In a meeting, although its rank is lower, since it proposes an effective idea, so needing to be directed to it, this is thought Method is further discussed, and at this time, which will become a part important in minutes, faces so giving it When effective identity, make it possible to and effectively recorded if being said in entire meeting.

S4: when sound collection equipment is not received by acoustic information within a preset time, then prompt information is sent to sentence Whether disconnected meeting terminates, if it is, exporting complete meeting document.The beginning with of meeting wakes up word and confirms, the end of meeting Judgement it is also possible that setting.For example when the speech recognition system, collect " meeting of today is arrived here " either " meeting View terminates " etc. similar sentence when, so that it may judge that active conference finishes.Speech recognition is closed by closing System is a kind of mode, there are also a kind of mode be can be when being not received by voice recognition information in 5 minutes, can be with Determine that meeting is over.Because being not generally possible to so long thinking waiting time, spirit in meeting in a meeting Relative altitude is concentrated, and personnel concentrate, so being not in the stagnation up to five minutes；Therefore, when having more than five points It is then controlled when the blank of clock and closes the system.Some systems when designing, can be from more safe angle Consider to design, voice reminder can also be set further to judge whether meeting terminates, if the information received is affirmative , or be not received by any information whithin a period of time, then judge that nobody is in meeting, to close voice knowledge Other system.

S5: complete meeting document is sent to printer and carries out printing or complete meeting document is sent to the use prestored In the information of family.After the conference is over, the recording documents of meeting will be exported.The letter for the people that typing seat has been sat if before Breath, so that it may according to the information recorded before directly output.If coming it in document export without the information at typing seat in advance Before, the information for the people for also wanting each seat of first typing to be sat finally exports minutes document again.Minutes document be according to What the meeting template of standard was inputted, at the beginning of most, the meeting template of oneself is can be set in each company, is being carried out When template completion, the content information of all corresponding texts directly can be directed by corresponding template mould by the system In block, so that the standardization of minutes content more.It is finally to input all contents, electricity can be carried out Son achieves, and can also be directly connected to print system and be stored to export paper document.More preferably mode is to carry out It all contents can be formed to complete meeting document before output is sent to corresponding personnel and audit.Because in language In sound identification process, it is possible to which having some places, there are careless omission or the places of inaccuracy, so needing by further Audit come so that less mistake occurs in final meeting document.

Embodiment two

Embodiment two discloses a kind of electronic equipment, which includes processor, memory and program, wherein locating One or more can be used in reason device and memory, and program is stored in memory, and is configured to be executed by processor, When processor executes the program, a kind of meeting automatic record method based on vocal print of embodiment one is realized.The electronic equipment can To be a series of electronic equipment of mobile phone, computer, tablet computer etc..

Embodiment three

Embodiment three discloses a kind of computer readable storage medium, and the storage medium is for storing program, and the journey When sequence is executed by processor, a kind of meeting automatic record method based on vocal print of embodiment one is realized.

Certainly, a kind of storage medium comprising computer executable instructions, computer provided by the embodiment of the present invention The method operation that executable instruction is not limited to the described above, can also be performed in method provided by any embodiment of the invention Relevant operation.

By the description above with respect to embodiment, it is apparent to those skilled in the art that, the present invention It can be realized by software and required common hardware, naturally it is also possible to which by hardware realization, but in many cases, the former is more Good embodiment.Based on this understanding, technical solution of the present invention substantially in other words contributes to the prior art Part can be embodied in the form of software products, which can store in computer readable storage medium In, floppy disk, read-only memory (Read-Only Memory, ROM), random access memory (Random such as computer Access Memory, RAM), flash memory (FLASH), hard disk or CD etc., including some instructions use so that an electronic equipment (can be personal computer, server or the network equipment etc.) executes method described in each embodiment of the present invention.

It is worth noting that, in the above-mentioned embodiment based on content update notice device, included each unit and mould Block is only divided according to the functional logic, but is not limited to the above division, and is as long as corresponding functions can be realized It can；In addition, the specific name of each functional unit is also only for convenience of distinguishing each other, the protection model being not intended to restrict the invention It encloses.

The above embodiment is only the preferred embodiment of the present invention, and the scope of protection of the present invention is not limited thereto, The variation and replacement for any unsubstantiality that those skilled in the art is done on the basis of the present invention belong to institute of the present invention Claimed range.

Claims

1. a kind of meeting automatic record method based on vocal print, which comprises the following steps:

Judgment step: judging whether the vocal print feature information is stored in the Application on Voiceprint Recognition model library of server, if it is, Text information is converted by acoustic information and is recorded.

2. a kind of meeting automatic record method based on vocal print as described in claim 1, which is characterized in that obtaining step it Before further include wake-up step: when receive it is default wake up word when, start sound collection equipment.

3. a kind of meeting automatic record method based on vocal print as claimed in claim 2, which is characterized in that the wake-up step Specifically: when the default wake-up word received, whether judgement voiceprint corresponding with default wake-up word is stored in server In, if it is, starting sound collection equipment.

4. a kind of meeting automatic record method based on vocal print as described in claim 1, which is characterized in that walked in the acquisition In rapid specifically: obtain the acoustic information of active user by annular microphone array.

5. a kind of meeting automatic record method based on vocal print as described in claim 1, which is characterized in that the judgment step In Application on Voiceprint Recognition model library construct as follows:

Obtain the voice messaging of all users to be registered；

6. a kind of meeting automatic record method based on vocal print as claimed in claim 5, which is characterized in that the vocal print feature Information is using classical mel cepstrum coefficients MFCC or perception linear predictor coefficient PLP or depth characteristic Deep Feature Or the regular spectral coefficient PNCC of energy is indicated.

7. a kind of meeting automatic record method based on vocal print as described in any one of claim 1-6, which is characterized in that It further include terminating judgment step after judgment step: when sound collection equipment is not received by acoustic information within a preset time When, then prompt information is sent to judge whether meeting terminates, if it is, exporting complete meeting document.

8. a kind of meeting automatic record method based on vocal print as claimed in claim 7, which is characterized in that terminating judgement step Further include document sending step after rapid: complete meeting document being sent to printer and carries out printing or by complete meeting document It is sent in the user information prestored.

9. a kind of electronic equipment including memory, processor and stores the meter that can be run on a memory and on a processor Calculation machine program, which is characterized in that the processor realizes any one of claim 1-8 institute when executing the computer program A kind of meeting automatic record method based on vocal print stated.

10. a kind of computer readable storage medium, is stored thereon with computer program, it is characterised in that: the computer program A kind of meeting side of automatically recording based on vocal print as described in any one of claim 1-8 is realized when being executed by processor Method.