CN109767757A - A kind of minutes generation method and device - Google Patents
A kind of minutes generation method and device Download PDFInfo
- Publication number
- CN109767757A CN109767757A CN201910038460.6A CN201910038460A CN109767757A CN 109767757 A CN109767757 A CN 109767757A CN 201910038460 A CN201910038460 A CN 201910038460A CN 109767757 A CN109767757 A CN 109767757A
- Authority
- CN
- China
- Prior art keywords
- sound bite
- classification
- spokesman
- sound
- bite
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 40
- 239000012634 fragment Substances 0.000 claims description 68
- 238000004458 analytical method Methods 0.000 claims description 22
- 230000015654 memory Effects 0.000 claims description 17
- 239000000284 extract Substances 0.000 claims description 8
- 238000013473 artificial intelligence Methods 0.000 abstract description 2
- 230000001755 vocal effect Effects 0.000 description 12
- 230000008569 process Effects 0.000 description 9
- 238000004422 calculation algorithm Methods 0.000 description 8
- 210000004027 cell Anatomy 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 238000004590 computer program Methods 0.000 description 3
- 238000010168 coupling process Methods 0.000 description 3
- 238000005859 coupling reaction Methods 0.000 description 3
- 230000000694 effects Effects 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 238000004891 communication Methods 0.000 description 2
- 230000008878 coupling Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000011218 segmentation Effects 0.000 description 2
- 210000003484 anatomy Anatomy 0.000 description 1
- 239000012141 concentrate Substances 0.000 description 1
- 210000000805 cytoplasm Anatomy 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 238000009432 framing Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 238000005192 partition Methods 0.000 description 1
- 230000033764 rhythmic process Effects 0.000 description 1
- 238000000926 separation method Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/60—Information retrieval; Database structures therefor; File system structures therefor of audio data
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/04—Segmentation; Word boundary detection
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/08—Speech classification or search
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification
Abstract
The embodiment of the invention provides a kind of minutes generation method and devices.The present invention relates to field of artificial intelligence, this method comprises: obtaining conference voice;Conference voice is split, N number of sound bite is obtained, N is the natural number more than or equal to 2;N number of sound bite is clustered, the sound bite of M classification is obtained, M is the natural number more than or equal to 2, and M≤N, the sound bite of M classification is respectively with M spokesman with one-to-one relationship;Determine the corresponding spokesman of the sound bite of each classification in the sound bite of M classification;The speech content of each spokesman in M spokesman is determined according to the sound bite of M classification;Minutes are generated according to the speech content of spokesman each in M spokesman.Therefore, technical solution provided in an embodiment of the present invention is able to solve the problem of time-consuming and laborious manual sorting minutes in the prior art, low efficiency.
Description
[technical field]
The present invention relates to field of artificial intelligence more particularly to a kind of minutes generation method and devices.
[background technique]
It in conference process, the speech content record of each spokesman of meeting and is arranged by record personnel, forms meeting
View record.It is long when the time of meeting, when the content for needing to record is more, manual sorting minutes are time-consuming and laborious,
Low efficiency.
[summary of the invention]
In view of this, the embodiment of the invention provides a kind of minutes generation method and devices, to solve existing skill
The problem of manual sorting minutes are time-consuming and laborious in art, low efficiency.
On the one hand, the embodiment of the invention provides a kind of minutes generation methods, which comprises obtains meeting language
Sound;The conference voice is split, N number of sound bite is obtained, N is the natural number more than or equal to 2;By N number of voice
Segment is clustered, and the sound bite of M classification is obtained, and M is the natural number more than or equal to 2, M≤N, the language of the M classification
Tablet section has one-to-one relationship with M spokesman respectively;Determine the language of each classification in the sound bite of the M classification
The corresponding spokesman of tablet section;The hair of each spokesman in the M spokesman is determined according to the sound bite of the M classification
Say content;Minutes are generated according to the speech content of each spokesman in the M spokesman.
Further, in the sound bite of the determination M classification each classification the corresponding speech of sound bite
People, comprising: from respectively selected in the sound bite of each classification in the sound bite of the M classification at least one sound bite turn
It changes text fragments into, obtains L text fragments, L is natural number, L >=M;The L text fragments and spokesman are shown to user
List, the speech list include the information of each spokesman in the M spokesman;Receive matching instruction, the matching
Instruction is that being used to indicate for user sending is matched by each text fragments in the L text fragments and spokesman's progress
Instruction;The corresponding speech of sound bite of each classification in the sound bite of the M classification is determined according to the matching instruction
People.
Further, in the sound bite of the determination M classification each classification the corresponding speech of sound bite
People, comprising: from respectively selecting at least one sound bite in the sound bite of each classification in the sound bite of the M classification,
Z sound bite is obtained, Z is natural number, Z >=M;The Z sound bite selected is played to user and shows spokesman
List, the speech list include the information of each spokesman in the M spokesman;Receive matching instruction, the matching
Instruction is that being used to indicate for user sending is matched by each sound bite in the Z sound bite and spokesman's progress
Instruction;The corresponding speech of sound bite of each classification in the sound bite of the M classification is determined according to the matching instruction
People.
Further, described to cluster N number of sound bite, comprising: S1: from N number of sound bite with
Machine selects M sound bite, using the M sound bite selected as the cluster centre of M classification;S2: for remaining N-M
I-th of sound bite in sound bite, calculate i-th of sound bite and each cluster centre in M cluster centre it
Between distance, and i-th of sound bite be referred to corresponding apart from nearest cluster centre with i-th of sound bite
Classification in, i successively takes 1 to the natural number between N-M;S3: after the M sound bite, which is sorted out, to be completed, according to the M
The sound bite that each classification includes in a classification recalculates the cluster centre of the M classification, and updates the M classification
Cluster centre, circulation execute S2 and S3, until the distance of the adjacent cluster centre twice of each classification in the M classification exists
Within pre-determined distance.
Further, described to be split the conference voice, obtain N number of sound bite, comprising: determine the meeting
Silence clip in voice;Remove the silence clip in the conference voice;It is described mute to removing according to the silence clip
Conference voice after segment is split, and obtains W long sound bites, and W is the natural number more than or equal to 2, W < N;Described in extraction
The acoustic feature of each long sound bite in W long sound bites;To each long voice sheet in the W long sound bites
The acoustic feature of section carries out Relative Entropy Analysis;Cutting is carried out to the W long sound bites according to the result of Relative Entropy Analysis, is obtained
To N number of sound bite.
On the one hand, the embodiment of the invention provides a kind of minutes generating means, described device includes: acquiring unit,
For obtaining conference voice;Cutting unit obtains N number of sound bite for the conference voice to be split, N be greater than
Natural number equal to 2;Cluster cell obtains the sound bite of M classification, M for clustering N number of sound bite
For the natural number more than or equal to 2, M≤N, the sound bite of the M classification has to correspond with M spokesman respectively to be closed
System;First determination unit, the corresponding speech of sound bite of each classification in the sound bite for determining the M classification
People;Second determination unit, for determining each spokesman in the M spokesman according to the sound bite of the M classification
Speech content;Generation unit, for generating minutes according to the speech content of each spokesman in the M spokesman.
Further, first determination unit includes: first choice subelement, for the voice from the M classification
It respectively selects at least one sound bite to be converted into text fragments in the sound bite of each classification in segment, obtains L text piece
Section, L is natural number, L >=M;First shows subelement, for showing the L text fragments and speech list, institute to user
State the information that speech list includes each spokesman in the M spokesman;First receiving subelement refers to for receiving matching
It enables, the matching instruction is being used to indicate each text fragments in the L text fragments and making a speech for user sending
People carries out matched instruction;First determines subelement, for determining the sound bite of the M classification according to the matching instruction
In each classification the corresponding spokesman of sound bite.
Further, first determination unit includes: the second selection subelement, for the voice from the M classification
At least one sound bite is respectively selected in segment in the sound bite of each classification, obtains Z sound bite, Z is natural number, Z
≥M;Second shows subelement, described for playing the Z sound bite selected to user and showing speech list
Speech list includes the information of each spokesman in the M spokesman;Second receiving subelement refers to for receiving matching
It enables, the matching instruction is being used to indicate each sound bite in the Z sound bite and making a speech for user sending
People carries out matched instruction;Second determines subelement, for determining the sound bite of the M classification according to the matching instruction
In each classification the corresponding spokesman of sound bite.
Further, the cluster cell is for executing following steps: S1: randomly choosing M from N number of sound bite
A sound bite, using the M sound bite selected as the cluster centre of M classification;S2: for remaining N-M voice sheet
I-th of sound bite in section, calculate in i-th of sound bite and M cluster centre between each cluster centre away from
From, and i-th of sound bite is referred to i-th of sound bite apart from the corresponding classification of nearest cluster centre
In, i successively takes 1 to the natural number between N-M;S3: after the M sound bite, which is sorted out, to be completed, according to the M classification
In each classification sound bite for including recalculate the cluster centre of the M classification, and update the cluster of the M classification
Center, circulation execute S2 and S3, until the adjacent cluster centre twice of each classification in the M classification distance preset away from
From within.
Further, the cutting unit includes: that third determines subelement, mute in the conference voice for determining
Segment;Subelement is removed, for removing the silence clip in the conference voice;Divide subelement, for according to described mute
Segment is split the conference voice after removing the silence clip, obtains W long sound bites, and W is oneself more than or equal to 2
So number, W < N;Subelement is extracted, for extracting the acoustic feature of each long sound bite in the W long sound bites;Phase
Subelement is analyzed to entropy, carries out relative entropy for the acoustic feature to each long sound bite in the W long sound bites
Analysis;Cutting subelement obtains the N for carrying out cutting to the W long sound bites according to the result of Relative Entropy Analysis
A sound bite.
On the one hand, the embodiment of the invention provides a kind of storage medium, the storage medium includes the program of storage,
In, equipment where controlling the storage medium in described program operation executes above-mentioned minutes generation method.
On the one hand, the embodiment of the invention provides a kind of computer equipment, including memory and processor, the memories
For storing the information including program instruction, the processor is used to control the execution of program instruction, and described program instruction is located
The step of reason device loads and realizes above-mentioned minutes generation method when executing.
In embodiments of the present invention, it will view voice is split, and obtains N number of sound bite, N number of sound bite is carried out
Cluster, obtains the sound bite of M classification, determines the corresponding spokesman of the sound bite of each classification;According to the language of M classification
Tablet section determines the speech content of M spokesman;According to the speech content of each spokesman, minutes are generated, are solved existing
There is in technology the problem of time-consuming and laborious manual sorting minutes, low efficiency, reached the speech content in intellectual analysis meeting,
Efficiently sort out the effect of minutes.
[Detailed description of the invention]
In order to illustrate the technical solution of the embodiments of the present invention more clearly, below will be to needed in the embodiment attached
Figure is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for this field
For those of ordinary skill, without any creative labor, it can also be obtained according to these attached drawings other attached
Figure.
Fig. 1 is a kind of flow chart of optional minutes generation method according to embodiments of the present invention;
Fig. 2 is a kind of schematic diagram of optional minutes generating means according to embodiments of the present invention;
Fig. 3 is a kind of schematic diagram of optional computer equipment provided in an embodiment of the present invention.
[specific embodiment]
For a better understanding of the technical solution of the present invention, being retouched in detail to the embodiment of the present invention with reference to the accompanying drawing
It states.
It will be appreciated that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.Base
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts it is all its
Its embodiment, shall fall within the protection scope of the present invention.
The term used in embodiments of the present invention is only to be not intended to be limiting merely for for the purpose of describing particular embodiments
The present invention.In the embodiment of the present invention and the "an" of singular used in the attached claims, " described " and "the"
It is also intended to including most forms, unless the context clearly indicates other meaning.
It should be appreciated that term "and/or" used herein is only a kind of incidence relation for describing affiliated partner, indicate
There may be three kinds of relationships, for example, A and/or B, can indicate: individualism A, exist simultaneously A and B, individualism B these three
Situation.In addition, character "/" herein, typicallys represent the relationship that forward-backward correlation object is a kind of "or".
Fig. 1 is a kind of flow chart of optional minutes generation method according to embodiments of the present invention, as shown in Figure 1, should
Method includes:
Step S102 obtains conference voice.
Step S104, it will view voice is split, and obtains N number of sound bite, and N is the natural number more than or equal to 2.
Step S106 clusters N number of sound bite, obtains the sound bite of M classification, and M is oneself more than or equal to 2
So number, M≤N, the sound bite of M classification have one-to-one relationship with M spokesman respectively.
Step S108 determines the corresponding spokesman of the sound bite of each classification in the sound bite of M classification.
Step S110 determines the speech content of each spokesman in M spokesman according to the sound bite of M classification.
Step S112 generates minutes according to the speech content of spokesman each in M spokesman.
In embodiments of the present invention, meeting may include two kinds of situations:
The first situation: the case where all personnels participating in the meeting show up.For example, certain department has held a meeting, owner
Gather and has a meeting in the same meeting room.
Second situation: the case where being had a meeting by means of certain application software.For example, certain department has held primary meeting
View, some gather in the same meeting room, other people go on business in other places, participate in meeting by wechat, QQ or other application software
View.For another example certain company has held a meeting, personnel participating in the meeting has 3, respectively Pekinese line manager, Shanghai department
It handles, the line manager in Shenzhen, the geographical location of these people is located at Beijing, Shanghai, Shenzhen, these three people are in different cities
City passes through wechat, QQ or other application software meeting.
In embodiments of the present invention, the language that conference voice generates during can having a meeting for a kind of mode of any of the above
Sound.Scene is recorded when conference voice can be in session, the meeting for example, several people get together, one of people mobile phone,
The voice generated during recorder, recording pen or other sound pick-up outfit recorded meetings obtains conference voice;Conference voice
Can be by record had a meeting by instant message applications during the voice that generates obtain, for example, several people pass through it is micro-
Letter/QQ meeting, what one of people mobile phone, recorder, recording pen or other sound pick-up outfit recorded meetings generated in the process
Wechat voice/QQ voice obtains conference voice.
In embodiments of the present invention, spokesman refers to the people to make a speech in conference process, the quantity of spokesman be less than or
Equal to the quantity of personnel participating in the meeting, if all personnels participating in the meeting make a speech, the quantity of spokesman is equal to the number of personnel participating in the meeting
Amount;If only part personnel participating in the meeting makes a speech, the quantity of spokesman is less than the quantity of personnel participating in the meeting.
Minutes generation method provided in an embodiment of the present invention is illustrated for a specific example below.
For example, several personal meetings, the voice that recorded meeting generates in the process obtain conference voice, it is assumed that conference voice
It is 20 minutes, it will view voice is split, such as obtains 6000 (N=6000) a sound bites, by this 6000 voice sheets
Duan Jinhang cluster, obtains the sound bite of 3 (M=3) a classifications, wherein classification 1 includes 3000 sound bites, respectively language
Tablet section P (1,1), sound bite P (1,2) ..., sound bite P (1,3000), this 3000 sound bites correspond to same
Spokesman;Classification 2 include 1000 sound bites, respectively sound bite P (2,1), sound bite P (2,2) ..., voice
Segment P (2,1000), this 1000 sound bites correspond to the same spokesman;Classification 3 includes 2000 sound bites, respectively
Sound bite P (3,1), sound bite P (3,2) ..., sound bite P (3,2000), this 2000 sound bites correspond to same
A spokesman.Then, the corresponding spokesman of the sound bite of each classification is determined respectively, for example, it is assumed that determining that classification 1 includes
3000 sound bites correspond to spokesman's first, 1000 sound bites that classification 2 includes correspond to spokesman's second, and classification 3 includes
2000 sound bites correspond to spokesman third, as shown in table 1.According to sound bite P (1,1), sound bite P (1,
2) ..., sound bite P (1,3000) determines the speech content of spokesman's first;According to sound bite P (2,1), sound bite P
(2,2) ..., sound bite P (2,1000) determine the speech content of spokesman's second;According to sound bite P (3,1), voice sheet
Section P (3,2) ..., sound bite P (3,2000) determine the speech content of spokesman third.According to spokesman first, spokesman second and
The speech content of spokesman third generates minutes.
Table 1
In embodiments of the present invention, it will view voice is split, and obtains N number of sound bite, N number of sound bite is carried out
Cluster, obtains the sound bite of M classification, determines the corresponding spokesman of the sound bite of each classification;According to the language of M classification
Tablet section determines the speech content of M spokesman;According to the speech content of each spokesman, minutes are generated, are solved existing
There is in technology the problem of time-consuming and laborious manual sorting minutes, low efficiency, reached the speech content in intellectual analysis meeting,
Efficiently sort out the effect of minutes.
Determine the corresponding spokesman of the sound bite of each classification, specific method can there are many, enumerate below several.
Method one:
At least one sound bite is respectively selected to be converted into from the sound bite of classification each in the sound bite of M classification
Text fragments, obtain L text fragments, and L is natural number, L >=M;L text fragments and speech list, hair are shown to user
Speech list includes the information of each spokesman in M spokesman;Matching instruction is received, matching instruction is being used for for user's sending
Text fragments each in L text fragments and spokesman are carried out matched instruction by instruction;M classification is determined according to matching instruction
Sound bite in each classification the corresponding spokesman of sound bite.
At least one sound bite is respectively selected to be converted into from the sound bite of classification each in the sound bite of M classification
Text fragments obtain L text fragments, specifically, can be from the sound bite of M classification in the sound bite of each classification
It respectively randomly chooses at least one sound bite and is converted into text fragments.
For example, respectively selecting a voice from the sound bite of classification each in the sound bite of 3 classifications shown in table 1
Segment, the sound bite selected from classification 1 are sound bite P (1,1);The sound bite selected from classification 2 is voice sheet
Section P (2,1);The sound bite selected from classification 3 is sound bite P (3,1).This 3 sound bites are converted respectively,
Obtain text fragments F (1,1), text fragments F (2,1), text fragments F (3,1), these three text fragments and above three voice
Corresponding relationship between segment is as shown in table 2.
Table 2
Sound bite | The text fragments that sound bite converts |
Sound bite P (1,1) | Text fragments F (1,1) |
Sound bite P (2,1) | Text fragments F (2,1) |
Sound bite P (3,1) | Text fragments F (3,1) |
Show that this 3 (L=3) a text fragments and speech list, speech list include every in 3 spokesman to user
The information of a spokesman.The information of spokesman may include the name of spokesman, position etc..
User can be the host of meeting, can be other personnels participating in the meeting.
After user sees this 3 text fragments, that is, may know that corresponding with text fragments is the hair of who personnel participating in the meeting
Speech.For example, it is assumed that having the content of a text fragments is: " hello, I is the host of meeting today." see this as user
After a text fragments, that is, may know that this text fragments it is corresponding be meeting presider speech.The capable of emitting matching instruction of user,
Matching instruction is to be used to indicate each text fragments in 3 text fragments and spokesman carrying out matched instruction, for example, matching
Instruction instruction matches text fragments with spokesman according to table 3.
Table 3
Text fragments | The corresponding spokesman of text fragments |
Text fragments F (1,1) | First |
Text fragments F (2,1) | Second |
Text fragments F (3,1) | Third |
Since text fragments F (1,1) is that sound bite in classification 1 is converted to, and all voice sheets in classification 1
Corresponding section is the same spokesman, and therefore, the corresponding spokesman's first of text fragments F (1,1) is all voices in classification 1
The corresponding spokesman of segment, that is, all sound bites in classification 1 are all that spokesman's first issues;Similarly, due to text fragments
F (2,1) is that the sound bite in classification 2 is converted to, and all sound bites in classification 2 are all that spokesman's second issues;
Similarly, since text fragments F (3,1) is that sound bite in classification 3 is converted to, all sound bites in classification 3 are all
It is that spokesman third issues, the corresponding relationship between sound bite and spokesman is as shown in table 4.
Table 4
Method two:
At least one sound bite is respectively selected from the sound bite of classification each in the sound bite of M classification, obtains Z
A sound bite, Z are natural number, Z >=M;The Z sound bite selected is played to user and shows speech list, is made a speech
List includes the information of each spokesman in M spokesman;Matching instruction is received, matching instruction is user's sending for referring to
Show and sound bite each in Z sound bite and spokesman are subjected to matched instruction;M classification is determined according to matching instruction
The corresponding spokesman of the sound bite of each classification in sound bite.
At least one sound bite is respectively selected from the sound bite of classification each in the sound bite of M classification, specifically
Ground can respectively randomly choose at least one sound bite from the sound bite of M classification in the sound bite of each classification.
Assuming that randomly choosed 2 sound bites from classification 1, respectively sound bite F (1,32), sound bite F (1,
450);2 sound bites, respectively sound bite F (2,100), sound bite F (2,400) have been randomly choosed from classification 2;
2 sound bites, respectively sound bite F (3,900), sound bite F (3,600) have been randomly choosed from classification 3.
This 6 (Z=6) a sound bite is played to user, after user hears this 6 sound bites, according to the tone color of sound
It can easily identify that each sound bite is the speech of who personnel participating in the meeting.The capable of emitting matching instruction of user, matching instruction are to use
Each sound bite in 6 sound bites and spokesman are subjected to matched instruction in instruction, matching way is as shown in table 5.
Table 5
Sound bite | The corresponding spokesman of sound bite |
Sound bite F (1,32), sound bite F (Isosorbide-5-Nitrae 50) | First |
Sound bite F (2,100), sound bite F (2,400) | Second |
Sound bite F (3,900), sound bite F (3,600) | Third |
Since sound bite F (1,32), sound bite F (Isosorbide-5-Nitrae 50) they are the sound bites in classification 1, and the institute in classification 1
Corresponding sound bite is the same spokesman, therefore, sound bite F (1,32), the corresponding hair of sound bite F (Isosorbide-5-Nitrae 50)
Yan Renjia is the corresponding spokesman of all sound bites in classification 1, that is, all sound bites in classification 1 are all speeches
What people's first issued;Similarly, due to the sound bite that sound bite F (2,100), sound bite F (2,400) are in classification 2, classification
All sound bites in 2 are all that spokesman's second issues;Similarly, due to sound bite F (3,900), sound bite F (3,
It 600) is sound bite in classification 3, all sound bites in classification 3 are all that spokesman third issues, sound bite and hair
Say that the corresponding relationship between people is as shown in table 4.
In embodiments of the present invention, by being clustered the corresponding sound bite of the same spokesman to one according to clustering algorithm
It rises, one or more sound bites is then randomly choosed from each classification, the sound bite of selection is played to user, asks user
Sound bite is carried out with spokesman corresponding;Or the sound bite selected is converted into text fragments, text is shown to user
This segment asks user to carry out text fragments with spokesman corresponding, very simple and convenient, does not need the sound for knowing spokesman in advance
Line feature or the relevant feature of other sound.
N number of sound bite clusters to detailed process is as follows:
S1: M sound bite is randomly choosed from N number of sound bite, using the M sound bite selected as M classification
Cluster centre;S2: for i-th of sound bite in remaining N-M sound bite, i-th of sound bite and M are calculated
The distance between each cluster centre in cluster centre, and i-th of sound bite is referred to i-th of sound bite distance most
In the corresponding classification of close cluster centre, i is successively taken 1 to the natural number between N-M;S3: when M sound bite sorts out completion
Afterwards, the cluster centre of M classification is recalculated according to the sound bite that classification each in M classification includes, and updates M classification
Cluster centre, circulation execute S2 and S3, until classification each in M classification adjacent cluster centre twice distance presetting
Within distance.
In embodiments of the present invention, K-means algorithm can be used to cluster sound bite.M is the number of spokesman
Amount, the quantity can be provided by meeting presider or other personnels participating in the meeting.
K-means algorithm is the evaluation index very typically based on the clustering algorithm of distance, using distance as similitude,
Think that the distance of two objects is closer, similarity is bigger.The algorithm think cluster by forming apart from close object,
Therefore handle obtains compact and independent cluster as final goal.The selection of initial classes cluster centre point has cluster result larger
Influence because in the algorithm first step being a object conduct of any k of random selection (in embodiments of the present invention, k=M)
The center of initial clustering initially represents a cluster.The algorithm concentrates remaining each object, root to data in each iteration
Each object is assigned to nearest cluster again at a distance from each cluster center according to it.After having investigated all data objects, once
Interative computation is completed, and new cluster centre is computed.If before and after an iteration, new mass center it is equal with the protoplasm heart or
Less than specified threshold, algorithm terminates.In embodiments of the present invention, when the adjacent cluster centre twice of classification each in M classification
Distance within pre-determined distance when, circulation terminates, and obtains cluster result.
Above-mentioned steps S2: for i-th of sound bite in remaining N-M sound bite, i-th of sound bite is calculated
The distance between cluster centre each in M cluster centre, can be calculated by vocal print feature, and detailed process can be with
It is: extracts the vocal print feature of i-th of sound bite (sound bite to be clustered);Extract each cluster in M cluster centre
The vocal print feature at center;The vocal print feature of i-th of sound bite and the vocal print of each cluster centre in M cluster centre is special
Sign carries out similarity calculation, using calculated similarity as the distance between i-th of sound bite and cluster centre.
Since everyone is different from the related anatomical structure that pronounces, and by socioeconomic status, level of education, birth
Ground etc. influences, and the vocal print feature of different people is not exactly the same.The vocal print feature extracted in the embodiment of the present invention can be special for the rhythm
Sign.Tone color, loudness of a sound, pitch etc., the collectively referred to as prosodic features of voice, also known as super-segmental feature.Loudness of a sound shows the stress, light of voice
The strong and weak variation such as sound, pitch show the word tune and intonation of voice.
In embodiments of the present invention, the vocal print feature for extracting sound bite gathers sound bite by vocal print feature
Class gets together the high sound bite of vocal print feature similarity, as the sound bite that the same spokesman issues, at this
In the process, it and needs to be known in advance the vocal print feature of spokesman, does not need the vocal print feature that spokesman is stored in advance more, protect
The privacy of spokesman, highly-safe, user experience is good.
Optionally, it will view voice is split, and obtains N number of sound bite, comprising: determine the mute plate in conference voice
Section;Remove the silence clip in conference voice;The conference voice after removal silence clip is split according to silence clip, is obtained
To W long sound bites, W is the natural number more than or equal to 2, W < N;Extract each long sound bite in W long sound bites
Acoustic feature;Relative Entropy Analysis is carried out to the acoustic feature of each long sound bite in W long sound bites;According to opposite
The result of entropy analysis carries out cutting to W long sound bites, obtains N number of sound bite.
Optionally, Relative Entropy Analysis is carried out to the acoustic feature of each long voice segments;According to the result of Relative Entropy Analysis
Cutting is carried out to long voice segments, comprising: framing is carried out to long voice segments, the speech frame of long voice segments is obtained, extracts speech frame
Acoustic feature, to acoustic feature carry out Relative Entropy Analysis, determine at the maximum value of relative entropy, judge long voice segments duration whether
Greater than preset duration;If the duration of long voice segments is greater than preset duration, long voice segments are carried out at the maximum value of relative entropy
Cutting.
In probability theory or information theory, relative entropy (relative entropy), also known as KL divergence (Kullback-
Leibler divergence), it is a kind of method for describing two probability distribution P and Q difference.It is asymmetrical, this meaning
D (P | | Q) ≠ D (Q | | P).Particularly, in information theory, and D (P | | Q) it indicates that true distribution P ought be fitted with probability distribution Q
When, the information loss of generation, wherein P indicates true distribution, and Q indicates the fitting distribution of P.
For two probability distribution P and Q of a discrete random variable, their KL divergence is defined as: D (P | | Q)=
∑ P (i) lnP (i)/Q (i) defines similar continuous random variable.
Fig. 2 is a kind of schematic diagram of optional minutes generating means according to embodiments of the present invention, and the device is for holding
The above-mentioned minutes generation method of row, as shown in Fig. 2, the device include: acquiring unit 10, cutting unit 20, cluster cell 30,
First determination unit 40, the second determination unit 50, generation unit 60.
Acquiring unit 10, for obtaining conference voice.
Cutting unit 20 obtains N number of sound bite, N is the nature more than or equal to 2 for conference voice to be split
Number.
Cluster cell 30 obtains the sound bite of M classification for clustering N number of sound bite, M be greater than etc.
In 2 natural number, M≤N, the sound bite of M classification has one-to-one relationship with M spokesman respectively.
First determination unit 40, the corresponding hair of sound bite of each classification in the sound bite for determining M classification
Say people.
Second determination unit 50, for determining the hair of each spokesman in M spokesman according to the sound bite of M classification
Say content.
Generation unit 60, for generating minutes according to the speech content of spokesman each in M spokesman.
In embodiments of the present invention, it will view voice is split, and obtains N number of sound bite, N number of sound bite is carried out
Cluster, obtains the sound bite of M classification, determines the corresponding spokesman of the sound bite of each classification;According to the language of M classification
Tablet section determines the speech content of M spokesman;According to the speech content of each spokesman, minutes are generated, are solved existing
There is in technology the problem of time-consuming and laborious manual sorting minutes, low efficiency, reached the speech content in intellectual analysis meeting,
Efficiently sort out the effect of minutes.
Optionally, the first determination unit 40 includes: first choice subelement, the first displaying subelement, the first reception son list
Member, first determine subelement.First choice subelement, the sound bite for classification each from the sound bite of M classification
In respectively select at least one sound bite to be converted into text fragments, obtain L text fragments, L is natural number, L >=M.First exhibition
Show subelement, for showing that L text fragments and speech list, speech list include each in M spokesman to user
The information of spokesman.First receiving subelement, for receiving matching instruction, matching instruction is used to indicate for what user issued by L
Each text fragments and spokesman carry out matched instruction in a text fragments.First determines subelement, for being referred to according to matching
Enable the corresponding spokesman of sound bite of each classification in the sound bite for determining M classification.
Optionally, the first determination unit 40 includes: the second selection subelement, the second displaying subelement, the second reception son list
Member, second determine subelement.Second selection subelement, the sound bite for classification each from the sound bite of M classification
In respectively select at least one sound bite, obtain Z sound bite, Z is natural number, Z >=M.Second show subelement, for
User plays the Z sound bite selected and shows speech list, and speech list includes each speech in M spokesman
The information of people.Second receiving subelement, for receiving matching instruction, matching instruction is used to indicate for what user issued by Z language
Each sound bite and spokesman carry out matched instruction in tablet section.Second determines subelement, for true according to matching instruction
Determine the corresponding spokesman of sound bite of each classification in the sound bite of M classification.
Optionally, cluster cell is for executing following steps: S1: M voice sheet is randomly choosed from N number of sound bite
Section, using the M sound bite selected as the cluster centre of M classification.S2: for i-th in remaining N-M sound bite
A sound bite, calculates the distance between i-th sound bite and each cluster centre in M cluster centre, and by i-th of language
Tablet section is referred to i-th of sound bite in the corresponding classification of nearest cluster centre, and i successively takes 1 between N-M
Natural number.S3: it after M sound bite, which is sorted out, to be completed, is counted again according to the sound bite that classification each in M classification includes
The cluster centre of M classification is calculated, and updates the cluster centre of M classification.Circulation executes S2 and S3, until each in M classification
The distance of the adjacent cluster centre twice of classification is within pre-determined distance.
Optionally, cutting unit 20 includes: that third determines subelement, removal subelement, segmentation subelement, extracts son list
Member, Relative Entropy Analysis subelement, cutting subelement.Third determines subelement, for determining the silence clip in conference voice.It goes
Except subelement, for removing the silence clip in conference voice.Divide subelement, is used for according to silence clip to removal mute plate
Conference voice after section is split, and obtains W long sound bites, and W is the natural number more than or equal to 2, W < N.It is single to extract son
Member, for extracting the acoustic feature of each long sound bite in W long sound bites.Relative Entropy Analysis subelement, for W
The acoustic feature of each long sound bite carries out Relative Entropy Analysis in a long sound bite.Cutting subelement, for according to phase
Cutting is carried out to W long sound bites to the result of entropy analysis, obtains N number of sound bite.
On the one hand, the embodiment of the invention provides a kind of storage medium, storage medium includes the program of storage, wherein
Equipment where control storage medium executes following steps when program is run: obtaining conference voice;Conference voice is split, is obtained
To N number of sound bite, N is the natural number more than or equal to 2;N number of sound bite is clustered, the voice sheet of M classification is obtained
Section, M are the natural number more than or equal to 2, and M≤N, the sound bite of M classification has to correspond with M spokesman respectively to close
System;Determine the corresponding spokesman of the sound bite of each classification in the sound bite of M classification;According to the voice sheet of M classification
Section determines the speech content of each spokesman in M spokesman;It is generated according to the speech content of spokesman each in M spokesman
Minutes.
Optionally, when program is run, equipment where control storage medium also executes following steps: from the voice of M classification
It respectively selects at least one sound bite to be converted into text fragments in the sound bite of each classification in segment, obtains L text piece
Section, L is natural number, L >=M;L text fragments and speech list are shown to user, and speech list includes in M spokesman
The information of each spokesman;Matching instruction is received, matching instruction is that being used to indicate for user's sending will be each in L text fragments
Text fragments and spokesman carry out matched instruction;Each classification in the sound bite of M classification is determined according to matching instruction
The corresponding spokesman of sound bite.
Optionally, when program is run, equipment where control storage medium also executes following steps: from the voice of M classification
At least one sound bite is respectively selected in segment in the sound bite of each classification, obtains Z sound bite, Z is natural number, Z
≥M;The Z sound bite selected is played to user and shows speech list, and speech list includes every in M spokesman
The information of a spokesman;Matching instruction is received, matching instruction is used to indicate for what user issued by language each in Z sound bite
Tablet section and spokesman carry out matched instruction;The language of each classification in the sound bite of M classification is determined according to matching instruction
The corresponding spokesman of tablet section.
Optionally, when program is run, equipment where control storage medium also executes following steps: S1: from N number of voice sheet
M sound bite is randomly choosed in section, using the M sound bite selected as the cluster centre of M classification;S2: for residue
N-M sound bite in i-th of sound bite, calculate i-th sound bite and each cluster centre in M cluster centre
The distance between, and i-th of sound bite is referred to i-th of sound bite apart from the corresponding classification of nearest cluster centre
In, i successively takes 1 to the natural number between N-M;S3: after M sound bite, which is sorted out, to be completed, according to class each in M classification
The sound bite for not including recalculates the cluster centre of M classification, and updates the cluster centre of M classification, and circulation executes S2
And S3, until classification each in M classification adjacent cluster centre twice distance within pre-determined distance.
Optionally, when program is run, equipment where control storage medium also executes following steps: determining in conference voice
Silence clip;Remove the silence clip in conference voice;According to silence clip to removal silence clip after conference voice into
Row segmentation, obtains W long sound bites, and W is the natural number more than or equal to 2, W < N;Extract each in W long sound bites
The acoustic feature of long sound bite;Relative entropy point is carried out to the acoustic feature of each long sound bite in W long sound bites
Analysis;Cutting is carried out to W long sound bites according to the result of Relative Entropy Analysis, obtains N number of sound bite.
On the one hand, the embodiment of the invention provides a kind of computer equipments, including memory and processor, memory to be used for
Storage includes the information of program instruction, and processor is used to control the execution of program instruction, and program instruction is loaded and held by processor
Acquisition conference voice is performed the steps of when row;Conference voice is split, N number of sound bite is obtained, N is more than or equal to 2
Natural number;N number of sound bite is clustered, obtains the sound bite of M classification, M is the natural number more than or equal to 2, M≤
The sound bite of N, M classifications has one-to-one relationship with M spokesman respectively;It determines every in the sound bite of M classification
The corresponding spokesman of the sound bite of a classification;Each spokesman in M spokesman is determined according to the sound bite of M classification
Speech content;Minutes are generated according to the speech content of spokesman each in M spokesman.
Optionally, the sound bite from M classification is also performed the steps of when program instruction is loaded and executed by processor
In each classification sound bite in respectively select at least one sound bite to be converted into text fragments, obtain L text fragments, L
For natural number, L >=M;Show that L text fragments and speech list, speech list include each in M spokesman to user
The information of spokesman;Matching instruction is received, matching instruction is used to indicate for what user issued by text each in L text fragments
Segment and spokesman carry out matched instruction;The voice of each classification in the sound bite of M classification is determined according to matching instruction
The corresponding spokesman of segment.
Optionally, the sound bite from M classification is also performed the steps of when program instruction is loaded and executed by processor
In each classification sound bite in respectively select at least one sound bite, obtain Z sound bite, Z is natural number, Z >=M;
The Z sound bite selected is played to user and shows speech list, and speech list includes each hair in M spokesman
Say the information of people;Matching instruction is received, matching instruction is used to indicate for what user issued by voice sheet each in Z sound bite
Section carries out matched instruction with spokesman;The voice sheet of each classification in the sound bite of M classification is determined according to matching instruction
The corresponding spokesman of section.
Optionally, S1 is also performed the steps of when program instruction is loaded and executed by processor: from N number of sound bite
M sound bite is randomly choosed, using the M sound bite selected as the cluster centre of M classification;S2: for remaining N-M
I-th of sound bite in a sound bite calculates in i-th of sound bite and M cluster centre between each cluster centre
Distance, and i-th of sound bite is referred to i-th of sound bite in the corresponding classification of nearest cluster centre, i
It successively takes 1 to the natural number between N-M;S3: after M sound bite, which is sorted out, to be completed, according to classification packet each in M classification
The sound bite included recalculates the cluster centre of M classification, and updates the cluster centre of M classification, and circulation executes S2 and S3,
Until classification each in M classification adjacent cluster centre twice distance within pre-determined distance.
Optionally, it is also performed the steps of when program instruction is loaded and executed by processor quiet in determining conference voice
Tablet section;Remove the silence clip in conference voice;The conference voice after removal silence clip is divided according to silence clip
It cuts, obtains W long sound bites, W is the natural number more than or equal to 2, W < N;Extract each long language in W long sound bites
The acoustic feature of tablet section;Relative Entropy Analysis is carried out to the acoustic feature of each long sound bite in W long sound bites;Root
Cutting is carried out to W long sound bites according to the result of Relative Entropy Analysis, obtains N number of sound bite.
Fig. 3 is a kind of schematic diagram of computer equipment provided in an embodiment of the present invention.As shown in figure 3, the meter of the embodiment
Machine equipment 50 is calculated to include: processor 51, memory 52 and be stored in the meter that can be run in memory 52 and on processor 51
Calculation machine program 53 realizes the minutes generation method in embodiment, to keep away when the computer program 53 is executed by processor 51
Exempt to repeat, not repeat one by one herein.Alternatively, realizing that minutes are raw in embodiment when the computer program is executed by processor 51
It is not repeated one by one herein at the function of model/unit each in device to avoid repeating.
Computer equipment 50 can be desktop PC, notebook, palm PC and cloud server etc. and calculate equipment.
Computer equipment may include, but be not limited only to, processor 51, memory 52.It will be understood by those skilled in the art that Fig. 3 is only
It is the example of computer equipment 50, does not constitute the restriction to computer equipment 50, may include more more or fewer than illustrating
Component perhaps combines certain components or different components, such as computer equipment can also include input-output equipment, net
Network access device, bus etc..
Alleged processor 51 can be central processing unit (Central Processing Unit, CPU), can also be
Other general processors, digital signal processor (Digital Signal Processor, DSP), specific integrated circuit
(Application Specific Integrated Circuit, ASIC), field programmable gate array (Field-
Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic,
Discrete hardware components etc..General processor can be microprocessor or the processor is also possible to any conventional processor
Deng.
Memory 52 can be the internal storage unit of computer equipment 50, such as the hard disk or interior of computer equipment 50
It deposits.Memory 52 is also possible to the plug-in type being equipped on the External memory equipment of computer equipment 50, such as computer equipment 50
Hard disk, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card, flash card
(Flash Card) etc..Further, memory 52 can also both including computer equipment 50 internal storage unit and also including
External memory equipment.Memory 52 is for storing other programs and data needed for computer program and computer equipment.It deposits
Reservoir 52 can be also used for temporarily storing the data that has exported or will export.
It is apparent to those skilled in the art that for convenience and simplicity of description, the system of foregoing description,
The specific work process of device and unit, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.
In several embodiments provided by the present invention, it should be understood that disclosed system, device and method can be with
It realizes by another way.For example, the apparatus embodiments described above are merely exemplary, for example, the unit
It divides, only a kind of logical function partition, there may be another division manner in actual implementation, for example, multiple units or group
Part can be combined or can be integrated into another system, or some features can be ignored or not executed.Another point, it is shown
Or the mutual coupling, direct-coupling or communication connection discussed can be through some interfaces, device or unit it is indirect
Coupling or communication connection can be electrical property, mechanical or other forms.
The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit
The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple
In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme
's.
It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit
It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list
Member both can take the form of hardware realization, can also realize in the form of hardware adds SFU software functional unit.
The above-mentioned integrated unit being realized in the form of SFU software functional unit can store and computer-readable deposit at one
In storage media.Above-mentioned SFU software functional unit is stored in a storage medium, including some instructions are used so that a computer
It is each that device (can be personal computer, server or network equipment etc.) or processor (Processor) execute the present invention
The part steps of embodiment the method.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (Read-
Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic or disk etc. it is various
It can store the medium of program code.
The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all in essence of the invention
Within mind and principle, any modification, equivalent substitution, improvement and etc. done be should be included within the scope of the present invention.
Claims (10)
1. a kind of minutes generation method, which is characterized in that the described method includes:
Obtain conference voice;
The conference voice is split, N number of sound bite is obtained, N is the natural number more than or equal to 2;
N number of sound bite is clustered, obtains the sound bite of M classification, M is the natural number more than or equal to 2, M≤
N, the sound bite of the M classification have one-to-one relationship with M spokesman respectively;
Determine the corresponding spokesman of sound bite of each classification in the sound bite of the M classification;
The speech content of each spokesman in the M spokesman is determined according to the sound bite of the M classification;
Minutes are generated according to the speech content of each spokesman in the M spokesman.
2. the method according to claim 1, wherein each in the sound bite of the determination M classification
The corresponding spokesman of the sound bite of classification, comprising:
From respectively selecting at least one sound bite to be converted into the sound bite of each classification in the sound bite of the M classification
Text fragments, obtain L text fragments, and L is natural number, L >=M;
Show that the L text fragments and speech list, the speech list include every in the M spokesman to user
The information of a spokesman;
Matching instruction is received, the matching instruction is that being used to indicate for user sending will be each in the L text fragments
Text fragments and spokesman carry out matched instruction;
The corresponding speech of sound bite of each classification in the sound bite of the M classification is determined according to the matching instruction
People.
3. the method according to claim 1, wherein each in the sound bite of the determination M classification
The corresponding spokesman of the sound bite of classification, comprising:
From at least one sound bite is respectively selected in the sound bite of each classification in the sound bite of the M classification, Z is obtained
A sound bite, Z are natural number, Z >=M;
The Z sound bite selected is played to user and shows speech list, and the speech list includes the M
The information of each spokesman in a spokesman;
Matching instruction is received, the matching instruction is that being used to indicate for user sending will be each in the Z sound bite
Sound bite and spokesman carry out matched instruction;
The corresponding speech of sound bite of each classification in the sound bite of the M classification is determined according to the matching instruction
People.
4. the method according to claim 1, wherein described cluster N number of sound bite, comprising:
S1: M sound bite is randomly choosed from N number of sound bite, using the M sound bite selected as M classification
Cluster centre;
S2: it for i-th of sound bite in remaining N-M sound bite, calculates i-th of sound bite and M poly-
The distance between each cluster centre in class center, and i-th of sound bite is referred to and i-th of sound bite
In the nearest corresponding classification of cluster centre, i is successively taken 1 to the natural number between N-M;
S3: after the M sound bite, which is sorted out, to be completed, according to the sound bite that each classification includes in the M classification
The cluster centre of the M classification is recalculated, and updates the cluster centre of the M classification,
Circulation execute S2 and S3, until the adjacent cluster centre twice of each classification in the M classification distance preset away from
From within.
5. method according to any one of claims 1 to 4, which is characterized in that it is described to be split the conference voice,
Obtain N number of sound bite, comprising:
Determine the silence clip in the conference voice;
Remove the silence clip in the conference voice;
The conference voice after removing the silence clip is split according to the silence clip, obtains W long sound bites,
W is the natural number more than or equal to 2, W < N;
Extract the acoustic feature of each long sound bite in the W long sound bites;
Relative Entropy Analysis is carried out to the acoustic feature of each long sound bite in the W long sound bites;
Cutting is carried out to the W long sound bites according to the result of Relative Entropy Analysis, obtains N number of sound bite.
6. a kind of minutes generating means, which is characterized in that described device includes:
Acquiring unit, for obtaining conference voice;
Cutting unit obtains N number of sound bite, N is the natural number more than or equal to 2 for the conference voice to be split;
Cluster cell obtains the sound bite of M classification, M is more than or equal to 2 for clustering N number of sound bite
Natural number, M≤N, the sound bite of the M classification respectively with M spokesman have one-to-one relationship;
First determination unit, the corresponding speech of sound bite of each classification in the sound bite for determining the M classification
People;
Second determination unit, for determining each spokesman in the M spokesman according to the sound bite of the M classification
Speech content;
Generation unit, for generating minutes according to the speech content of each spokesman in the M spokesman.
7. device according to claim 6, which is characterized in that first determination unit includes:
First choice subelement, respectively select in the sound bite for each classification in the sound bite from the M classification to
A few sound bite is converted into text fragments, obtains L text fragments, L is natural number, L >=M;
First shows subelement, for showing the L text fragments and speech list, the speech list packet to user
Include the information of each spokesman in the M spokesman;
First receiving subelement, for receiving matching instruction, the matching instruction is used to indicate for what the user issued by institute
It states each text fragments and spokesman in L text fragments and carries out matched instruction;
First determines subelement, each classification in the sound bite for determining the M classification according to the matching instruction
The corresponding spokesman of sound bite.
8. device according to claim 6, which is characterized in that first determination unit includes:
Second selection subelement, respectively select in the sound bite for each classification in the sound bite from the M classification to
A few sound bite, obtains Z sound bite, Z is natural number, Z >=M;
Second shows subelement, described for playing the Z sound bite selected to user and showing speech list
Speech list includes the information of each spokesman in the M spokesman;
Second receiving subelement, for receiving matching instruction, the matching instruction is used to indicate for what the user issued by institute
It states each sound bite and spokesman in Z sound bite and carries out matched instruction;
Second determines subelement, each classification in the sound bite for determining the M classification according to the matching instruction
The corresponding spokesman of sound bite.
9. a kind of storage medium, which is characterized in that the storage medium includes the program of storage, wherein run in described program
When control the storage medium where equipment perform claim require any one of 1 to 5 described in minutes generation method.
10. a kind of computer equipment, including memory and processor, the memory is for storing the letter including program instruction
Breath, the processor are used to control the execution of program instruction, it is characterised in that: described program instruction is loaded and executed by processor
The step of minutes generation method described in Shi Shixian claim 1 to 5 any one.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910038460.6A CN109767757A (en) | 2019-01-16 | 2019-01-16 | A kind of minutes generation method and device |
PCT/CN2019/118256 WO2020147407A1 (en) | 2019-01-16 | 2019-11-14 | Conference record generation method and apparatus, storage medium and computer device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910038460.6A CN109767757A (en) | 2019-01-16 | 2019-01-16 | A kind of minutes generation method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109767757A true CN109767757A (en) | 2019-05-17 |
Family
ID=66452786
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910038460.6A Pending CN109767757A (en) | 2019-01-16 | 2019-01-16 | A kind of minutes generation method and device |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN109767757A (en) |
WO (1) | WO2020147407A1 (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110265032A (en) * | 2019-06-05 | 2019-09-20 | 平安科技(深圳)有限公司 | Conferencing data analysis and processing method, device, computer equipment and storage medium |
CN110335612A (en) * | 2019-07-11 | 2019-10-15 | 招商局金融科技有限公司 | Minutes generation method, device and storage medium based on speech recognition |
CN110543559A (en) * | 2019-06-28 | 2019-12-06 | 谭浩 | Method for generating interview report, computer-readable storage medium and terminal device |
CN110930984A (en) * | 2019-12-04 | 2020-03-27 | 北京搜狗科技发展有限公司 | Voice processing method and device and electronic equipment |
WO2020147407A1 (en) * | 2019-01-16 | 2020-07-23 | 平安科技(深圳)有限公司 | Conference record generation method and apparatus, storage medium and computer device |
CN111933144A (en) * | 2020-10-09 | 2020-11-13 | 融智通科技(北京)股份有限公司 | Conference voice transcription method and device for post-creation of voiceprint and storage medium |
CN112562682A (en) * | 2020-12-02 | 2021-03-26 | 携程计算机技术(上海)有限公司 | Identity recognition method, system, equipment and storage medium based on multi-person call |
CN113674755A (en) * | 2021-08-19 | 2021-11-19 | 北京百度网讯科技有限公司 | Voice processing method, device, electronic equipment and medium |
DE202022101429U1 (en) | 2022-03-17 | 2022-04-06 | Waseem Ahmad | Intelligent system for creating meeting minutes using artificial intelligence and machine learning |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102185702A (en) * | 2011-04-27 | 2011-09-14 | 华东师范大学 | Intelligent conference system terminal controller, and operating method and application thereof |
CN103530432A (en) * | 2013-09-24 | 2014-01-22 | 华南理工大学 | Conference recorder with speech extracting function and speech extracting method |
CN106487757A (en) * | 2015-08-28 | 2017-03-08 | 华为技术有限公司 | Carry out method, conference client and the system of voice conferencing |
US20170270930A1 (en) * | 2014-08-04 | 2017-09-21 | Flagler Llc | Voice tallying system |
CN107689225A (en) * | 2017-09-29 | 2018-02-13 | 福建实达电脑设备有限公司 | A kind of method for automatically generating minutes |
CN108986826A (en) * | 2018-08-14 | 2018-12-11 | 中国平安人寿保险股份有限公司 | Automatically generate method, electronic device and the readable storage medium storing program for executing of minutes |
Family Cites Families (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103559882B (en) * | 2013-10-14 | 2016-08-10 | 华南理工大学 | A kind of meeting presider's voice extraction method based on speaker's segmentation |
CN105810207A (en) * | 2014-12-30 | 2016-07-27 | 富泰华工业(深圳)有限公司 | Meeting recording device and method thereof for automatically generating meeting record |
US10424317B2 (en) * | 2016-09-14 | 2019-09-24 | Nuance Communications, Inc. | Method for microphone selection and multi-talker segmentation with ambient automated speech recognition (ASR) |
CN109767757A (en) * | 2019-01-16 | 2019-05-17 | 平安科技(深圳)有限公司 | A kind of minutes generation method and device |
-
2019
- 2019-01-16 CN CN201910038460.6A patent/CN109767757A/en active Pending
- 2019-11-14 WO PCT/CN2019/118256 patent/WO2020147407A1/en active Application Filing
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102185702A (en) * | 2011-04-27 | 2011-09-14 | 华东师范大学 | Intelligent conference system terminal controller, and operating method and application thereof |
CN103530432A (en) * | 2013-09-24 | 2014-01-22 | 华南理工大学 | Conference recorder with speech extracting function and speech extracting method |
US20170270930A1 (en) * | 2014-08-04 | 2017-09-21 | Flagler Llc | Voice tallying system |
CN106487757A (en) * | 2015-08-28 | 2017-03-08 | 华为技术有限公司 | Carry out method, conference client and the system of voice conferencing |
CN107689225A (en) * | 2017-09-29 | 2018-02-13 | 福建实达电脑设备有限公司 | A kind of method for automatically generating minutes |
CN108986826A (en) * | 2018-08-14 | 2018-12-11 | 中国平安人寿保险股份有限公司 | Automatically generate method, electronic device and the readable storage medium storing program for executing of minutes |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020147407A1 (en) * | 2019-01-16 | 2020-07-23 | 平安科技(深圳)有限公司 | Conference record generation method and apparatus, storage medium and computer device |
CN110265032A (en) * | 2019-06-05 | 2019-09-20 | 平安科技(深圳)有限公司 | Conferencing data analysis and processing method, device, computer equipment and storage medium |
CN110543559A (en) * | 2019-06-28 | 2019-12-06 | 谭浩 | Method for generating interview report, computer-readable storage medium and terminal device |
CN110335612A (en) * | 2019-07-11 | 2019-10-15 | 招商局金融科技有限公司 | Minutes generation method, device and storage medium based on speech recognition |
CN110930984A (en) * | 2019-12-04 | 2020-03-27 | 北京搜狗科技发展有限公司 | Voice processing method and device and electronic equipment |
CN111933144A (en) * | 2020-10-09 | 2020-11-13 | 融智通科技(北京)股份有限公司 | Conference voice transcription method and device for post-creation of voiceprint and storage medium |
CN112562682A (en) * | 2020-12-02 | 2021-03-26 | 携程计算机技术(上海)有限公司 | Identity recognition method, system, equipment and storage medium based on multi-person call |
CN113674755A (en) * | 2021-08-19 | 2021-11-19 | 北京百度网讯科技有限公司 | Voice processing method, device, electronic equipment and medium |
CN113674755B (en) * | 2021-08-19 | 2024-04-02 | 北京百度网讯科技有限公司 | Voice processing method, device, electronic equipment and medium |
DE202022101429U1 (en) | 2022-03-17 | 2022-04-06 | Waseem Ahmad | Intelligent system for creating meeting minutes using artificial intelligence and machine learning |
Also Published As
Publication number | Publication date |
---|---|
WO2020147407A1 (en) | 2020-07-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109767757A (en) | A kind of minutes generation method and device | |
CN107680600B (en) | Sound-groove model training method, audio recognition method, device, equipment and medium | |
CN109741754A (en) | A kind of conference voice recognition methods and system, storage medium and terminal | |
CN107545897A (en) | Conversation activity presumption method, conversation activity estimating device and program | |
CN106407178A (en) | Session abstract generation method and device | |
CN110119673A (en) | Noninductive face Work attendance method, device, equipment and storage medium | |
CN102486922B (en) | Speaker recognition method, device and system | |
CN108536595B (en) | Intelligent matching method and device for test cases, computer equipment and storage medium | |
CN108257594A (en) | A kind of conference system and its information processing method | |
CN108597525A (en) | Voice vocal print modeling method and device | |
CN109871762B (en) | Face recognition model evaluation method and device | |
CN109214446A (en) | Potentiality good performance personnel kind identification method, system, terminal and computer readable storage medium | |
CN108269122A (en) | The similarity treating method and apparatus of advertisement | |
CN108764114B (en) | Signal identification method and device, storage medium and terminal thereof | |
CN109800309A (en) | Classroom Discourse genre classification methods and device | |
CN112287082A (en) | Data processing method, device, equipment and storage medium combining RPA and AI | |
CN107195312B (en) | Method and device for determining emotion releasing mode, terminal equipment and storage medium | |
CN109242106A (en) | sample processing method, device, equipment and storage medium | |
CN111816170A (en) | Training of audio classification model and junk audio recognition method and device | |
CN106844743B (en) | Emotion classification method and device for Uygur language text | |
JP5083951B2 (en) | Voice processing apparatus and program | |
CN109948718B (en) | System and method based on multi-algorithm fusion | |
CN107506407A (en) | A kind of document classification, the method and device called | |
CN104978395B (en) | Visual dictionary building and application method and device | |
CN108228950A (en) | A kind of information processing method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |