CN116911817B - Paperless conference record archiving method and paperless conference record archiving system - Google Patents

Paperless conference record archiving method and paperless conference record archiving system Download PDF

Info

Publication number
CN116911817B
CN116911817B CN202311158138.XA CN202311158138A CN116911817B CN 116911817 B CN116911817 B CN 116911817B CN 202311158138 A CN202311158138 A CN 202311158138A CN 116911817 B CN116911817 B CN 116911817B
Authority
CN
China
Prior art keywords
conference
spectrum
feature extraction
node
recording
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202311158138.XA
Other languages
Chinese (zh)
Other versions
CN116911817A (en
Inventor
施寅杰
刘立恺
徐义
李渝
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang Zhijia Information Technology Co ltd
Original Assignee
Zhejiang Zhijia Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang Zhijia Information Technology Co ltd filed Critical Zhejiang Zhijia Information Technology Co ltd
Priority to CN202311158138.XA priority Critical patent/CN116911817B/en
Publication of CN116911817A publication Critical patent/CN116911817A/en
Application granted granted Critical
Publication of CN116911817B publication Critical patent/CN116911817B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/109Time management, e.g. calendars, reminders, meetings or time accounting
    • G06Q10/1093Calendar-based scheduling for persons or groups
    • G06Q10/1095Meeting or appointment
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/18Artificial neural networks; Connectionist approaches
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Physics & Mathematics (AREA)
  • Multimedia (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Strategic Management (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Economics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Tourism & Hospitality (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Marketing (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention provides a paperless conference recording and archiving method and system, which relate to the technical field of data processing and are used for collecting recording information of a speaker, activating a first spectrum feature extraction base channel and a second spectrum feature extraction base channel, extracting a spectrum feature matrix, checking and generating spectrum feature similarity, determining the identity of the speaker and filling conference contents. The method comprises the steps of carrying out semantic association and content adjustment on a participant theme and input text information, activating an archiving server to conduct classified archiving, solving the technical problem that in the prior art, because a conference server needs to serve a dynamically-changing customer group, voice recognition records of users cannot be achieved, paperless record archiving has a great challenge, constructing a voice recognition channel, carrying out spectral feature recognition on real-time collection of speaker recording information and participant samples, analyzing similarity to obtain speaker identity, determining and conducting conference records, and carrying out semantic association and archiving storage on conference contents in combination with the participant theme, so that intelligent paperless record archiving is achieved.

Description

Paperless conference record archiving method and paperless conference record archiving system
Technical Field
The invention relates to the technical field of data processing, in particular to a paperless conference record archiving method and system.
Background
Meeting record is the basis in the daily work of the organization, the content discussed in the meeting has certain effectiveness, the speaking content of different speakers needs to be recorded and archived, and the recording difference of the content exists due to the lack of a standard meeting recording mode in the technical limitation at present. In the prior art, as a conference server is required to serve a dynamically-changed customer group, voice and color identification recording of a user cannot be realized, and paperless recording and archiving have a great challenge.
Disclosure of Invention
The application provides a paperless conference record archiving method and system, which are used for solving the technical problem that the paperless record archiving has a great challenge because the conference server needs to serve a dynamically-changed client group and cannot realize the voice recognition record of a user in the prior art.
In view of the above problems, the present application provides a paperless conference recording and archiving method and system.
In a first aspect, the present application provides a paperless conference recording archiving method, the method comprising:
when a conference starts, recording information of a speaker is collected through recording software;
activating a first spectral feature extraction base channel of a voice recognition channel embedded in a recording server to process the record information of the speaker to generate a first spectral feature matrix;
Activating a second spectral feature extraction base channel embedded in the voice recognition channel of the recording server to process the reference voice sample to generate a second spectral feature matrix;
performing similarity analysis on the first spectrum feature matrix and the second spectrum feature matrix to generate spectrum feature similarity;
determining the participant identity information of the speaker according to the spectrum feature similarity, and filling the voice recognition content of the speaker recording information into a conference recording table;
when the conference is finished, carrying out semantic association on the input text information of the conference recording table according to the conference subject, generating first important conference content, sending the first important conference content to a preset user terminal for adjustment, and generating second important conference content;
and activating an archiving server to conduct classified archiving on the conference record table, the conference theme, the conference time and the second heavy conference content, and erasing the conference record table from the first temporary memory after archiving is completed.
In a second aspect, the present application provides a paperless meeting record archiving system, the system comprising:
the information acquisition module is used for acquiring the recording information of the speaker through recording software when the conference starts;
The first spectral feature matrix generation module is used for activating a first spectral feature extraction base channel embedded in a voice recognition channel of the recording server to process the speaker recording information to generate a first spectral feature matrix;
the second spectral feature matrix generation module is used for activating a second spectral feature extraction base channel embedded in the voice recognition channel of the recording server to process a reference voice sample so as to generate a second spectral feature matrix;
the similarity analysis module is used for carrying out similarity analysis on the first spectrum feature matrix and the second spectrum feature matrix to generate spectrum feature similarity;
the conference recording module is used for determining the identity information of the talker according to the spectrum characteristic similarity and filling the voice recognition content of the talker recording information into a conference recording table;
the conference content generation module is used for carrying out semantic association on the input text information of the conference record table according to the conference theme when the conference is finished, generating first key conference content, sending the first key conference content to a preset user side for adjustment, and generating second key conference content;
And the content classification filing module is used for activating a filing server to carry out classification filing on the conference record table, the conference theme, the conference time and the second heavy conference content, and erasing the conference record table from the first temporary memory after filing.
One or more technical schemes provided by the application have at least the following technical effects or advantages:
according to the paperless conference recording and archiving method provided by the embodiment of the application, when a conference starts, recording information of a speaker is collected through recording software, and a first spectral feature extraction base channel embedded in a voice recognition channel of a recording server is activated to process the recording information of the speaker to generate a first spectral feature matrix; activating a second spectral feature extraction base channel embedded in a voice recognition channel of the recording server to process the participant voice sample to generate a second spectral feature matrix, and performing feature similarity analysis to generate spectral feature similarity so as to determine participant identity information of the speaker and fill conference content. When the conference is finished, semantic association is carried out according to the conference theme and the input text information, the second heavy point conference content is generated by adjusting based on the preset user terminal, the conference record list, the conference theme and the conference time are combined, the filing server is activated to carry out classified filing, the technical problems that the voice recognition record of the user cannot be realized due to the fact that the conference server needs to serve a client group with dynamic change in the prior art, the paperless record filing has a great challenge are solved, the voice recognition channel is built, spectral feature recognition is carried out on the speaker record information and the participant sample acquired in real time, similarity analysis is carried out to determine the speaker identity and carry out conference record, and semantic association and filing storage are carried out on the conference content by combining the conference theme, so that the intelligent paperless record filing is realized.
Drawings
FIG. 1 is a schematic flow chart of a paperless conference record archiving method;
fig. 2 is a schematic diagram of a recording and archiving flow of conference contents in a paperless conference recording and archiving method;
fig. 3 is a schematic structural diagram of a paperless conference recording and archiving system.
Reference numerals illustrate: the system comprises an information acquisition module 11, a first spectrum feature matrix generation module 12, a second spectrum feature matrix generation module 13, a similarity analysis module 14, a conference recording module 15, a conference content generation module 16 and a content classification filing module 17.
Detailed Description
The application provides a paperless conference recording and archiving method and system, which are used for collecting record information of a speaker, activating a first spectrum feature extraction base channel and a second spectrum feature extraction base channel, generating a first spectrum feature matrix and a second spectrum feature matrix, checking the generated spectrum feature similarity, determining the identity of the speaker and filling conference contents. When the conference is finished, semantic association and content adjustment of the conference theme and the input text information are carried out, and an archiving server is activated to conduct classified archiving, so that the technical problem that in the prior art, because a conference server needs to serve a dynamically-changed client group, voice recognition records of users cannot be achieved, and paperless record archiving has a great challenge is solved.
Examples
As shown in fig. 1 and 2, the present application provides a paperless conference recording and archiving method, which is applied to a paperless conference recording and archiving system, wherein the paperless conference recording and archiving system includes a user side, a recording server and an archiving server, and includes:
s1: when a conference starts, recording information of a speaker is collected through recording software;
when a conference starts, recording information of a speaker is collected through recording software, and before the conference starts, the application S1 further comprises:
s11: when receiving a meeting request signal, acquiring meeting basic information, wherein the meeting basic information comprises a meeting theme, meeting time and meeting participants;
s12: activating the first temporary memory of the recording server, and generating the conference recording table according to the conference theme, the conference time and the participants, wherein the conference recording table comprises preset users, and the preset users are conference decision users;
s13: and the interactive participant user side sends a voice test sample and collects the participant voice sample, wherein the voice test sample is greater than or equal to 15 Chinese characters.
Meeting record is the basis in the daily work of the organization, the content discussed in the meeting has certain effectiveness, the speaking content of different speakers needs to be recorded and archived, and the recording difference of the content exists due to the lack of a standard meeting recording mode in the technical limitation at present. The paperless conference record archiving method is applied to a paperless conference record archiving system, wherein the system is a master control system for managing and controlling the whole period of conference record archiving, the system comprises a user side, a record server and an archiving server, the user side is a mobile port of a participant, and the record server and the archiving server are used for executing conference temporary record and conference content archiving storage. By setting up a voice recognition channel, spectral feature recognition is carried out on the speaker recording information and the participant samples collected in real time, similarity analysis is carried out, speaker identity determination is carried out, conference recording is carried out, semantic association and archiving storage are carried out on conference contents in combination with participant subjects, and intelligent paperless recording archiving is realized.
Specifically, before the meeting formally starts, when the meeting request signal is received, the meeting theme, the meeting time and the meeting participants are determined to be used as the meeting basic information, wherein the meeting request signal is a signal for identifying the meeting to be started, and the meeting basic information is predetermined meeting information and can be directly determined through extraction. The first temporary storage of the recording server is synchronously activated along with the acquisition of the basic conference information, the recording server is an execution module for performing conference recording, the first temporary storage in the recording server is used for generating a conference recording table based on the conference subject, the conference time and the participants, the conference recording table is in a conference recording form, and the conference recording table is visualized as follows:
the conference record table comprises the preset user, namely a conference decision user, and overall decision of conference speaking contents is carried out. And the user ends of the participants are interacted and voice test samples are sent, wherein the voice test samples are larger than or equal to 15 Chinese characters so as to ensure that voice characteristics of the participants can be accurately covered, the participant voice samples are collected, and the participant voice samples are the basis for identifying and determining the speaking staff. Further, when the conference starts, recording information of the speaker is collected through recording software. A determination of a speaker for the speaker recording information may be made based on the reference voice sample.
S2: activating a first spectral feature extraction base channel of a voice recognition channel embedded in a recording server to process the record information of the speaker to generate a first spectral feature matrix;
the method comprises the steps of activating a first spectral feature extraction base channel embedded in a voice recognition channel of a recording server to process the record information of a speaker to generate a first spectral feature matrix, and the S2 further comprises the following steps:
s21: the first spectral feature extraction base channel comprises a Fourier transform feature extraction node, a sine harmonic feature extraction node and a time domain feature extraction node;
s22: activating the Fourier transform feature extraction node to carry out Fourier transform on the frequency spectrum sound signal of the speaker record information to generate a first voiceprint description factor set;
s23: activating the sinusoidal harmonic feature extraction node to extract harmonic features of the frequency spectrum sound signal of the speaker recording information, and generating a second harmonic description factor set;
s24: activating the time domain feature extraction node to extract time domain features of the frequency spectrum sound signals of the speaker recording information, and generating a third voiceprint description factor set;
wherein the first voiceprint description factor set and the second voiceprint description factor set comprise: spectrum centroid, spectrum peak, spectrum energy, spectrum decay rate, spectrum slope, and spectrum flux; the third set of voiceprint descriptive factors includes: start sounding time, sound decay time, RMS energy envelope, zero crossing rate, autocorrelation;
S25: constructing the first spectral feature matrix from the first set of voiceprint description factors, the second set of voiceprint description factors, and the third set of voiceprint description factors:
wherein,for the first spectral feature matrix,/a>Characterizing a first set of voiceprint descriptive factors, +.>Characterizing a second voiceprint descriptive factor set,>a third set of voiceprint descriptive factors is characterized.
Wherein, the application S21 further comprises:
s211: according to a neural network structure, the first spectral feature extraction base channel is built, the first spectral feature extraction base channel comprises an input layer, the Fourier transform feature extraction nodes, the sine harmonic feature extraction nodes and the time domain feature extraction nodes are connected in parallel to the input layer, the Fourier transform feature extraction nodes comprise 6 output nodes, the sine harmonic feature extraction nodes comprise 6 output nodes, the time domain feature extraction nodes comprise 5 output nodes, and different output nodes output different voiceprint features;
s212: invoking a voice frequency spectrum record data set and a first voiceprint description factor identification set to train the Fourier transform feature extraction node;
s213: training the sinusoidal harmonic feature extraction node according to a sound spectrum record data set and a second voice print description factor identification set;
S214: training the time domain feature extraction node according to the voice spectrum record data set and the third voiceprint description factor identification set;
wherein, collecting a sound spectrum record data set;
performing Fourier transform identification on the voice spectrum record data set according to the spectrum centroid, the spectrum peak value, the spectrum energy, the spectrum attenuation speed, the spectrum slope and the spectrum flux by an expert group to generate a first voiceprint description factor identification set;
performing sinusoidal harmonic identification on the sound spectrum record data set according to the spectrum centroid, the spectrum peak value, the spectrum energy, the spectrum attenuation speed, the spectrum slope and the spectrum flux by an expert group to generate a second voiceprint description factor identification set;
and carrying out time domain feature identification on the voice spectrum record data set according to the starting sounding time, the voice attenuation time, the RMS energy envelope, the zero-crossing rate and the autocorrelation by an expert group to generate a third voiceprint description factor identification set.
Wherein the invoking the sound spectrum record data set and the first voiceprint descriptive factor identification set trains the fourier transform feature extraction node, the application S212 further comprises:
S2121: constructing a single-node loss function:
wherein,characterizing a single node loss function, ">To evaluate the number of training lost by a single node, +.>Characterization of the first embodimentEvery +.>Training->Secondary output value->Characterization of->Dimension voiceprint description factor (SIF)The second output value corresponds to +.>Outputting the identification value for the second time;
s2122: constructing a first loss function of the joint node:
wherein,characterizing a joint node first loss function, +.>Characterization of->Single node loss function values for individual nodes;
s2123: constructing a joint node second loss function:
wherein,characterizing a joint node second loss function;
s2124: and when the single-node loss functions are smaller than or equal to a single-node loss function threshold, the first loss function of the joint node is smaller than or equal to a first loss threshold of the joint node, the second loss function of the joint node is smaller than or equal to a second loss threshold of the joint node, generating the Fourier transform feature extraction node, otherwise, performing cyclic training.
The recording server is embedded with a voice recognition channel for carrying out voice feature recognition on the collected recording information of the speaker so as to determine the speaking user. The voice recognition channel comprises a first spectrum feature extraction base channel and a second spectrum feature extraction base channel which are configured in parallel, and the first spectrum feature extraction base channel and the second spectrum feature extraction base channel are respectively used for extracting voice features of the recording information of the speaker and the speech samples of the participant, and feature correction is carried out to determine the participant identity information of the speaker.
The first spectral feature extraction base channel is generated based on neural network supervision training. And acquiring a basic framework of the first spectral feature extraction base channel, namely an input layer, the Fourier transform feature extraction node connected in parallel with the output layer, the sine harmonic feature extraction node and the time domain feature extraction node. The Fourier transform feature extraction node is used for converting the acquired voice signal from a time domain to a frequency domain, so that the extraction of signal features of the spectrum centroid, the spectrum peak value, the spectrum energy, the spectrum attenuation speed, the spectrum slope and the spectrum flux of the acquired voice signal of the speaker recording information is convenient to directly perform, and comprises 6 output nodes, wherein different output nodes output different voice print features; based on the sine harmonic feature extraction nodes, the quality of the sound signal is influenced by harmonic distortion, so that the sound signal is required to be extracted to ensure the recognition accuracy of the sound signal, and the sine harmonic feature extraction nodes comprise 6 output nodes and have the same signal features as the output nodes corresponding to the Fourier transform feature extraction nodes; the time domain feature extraction node is used for extracting signal features of the sound signal on a time axis, and comprises 5 output nodes corresponding to the starting sounding time, the sound attenuation time, the RMS energy envelope, the zero-crossing rate and the autocorrelation signal feature extraction.
And further calling the sound spectrum record data set, namely the collected sound spectrum record in the historical time interval. And further, performing Fourier transform identification on the voice spectrum record data set according to the spectrum centroid, the spectrum peak value, the spectrum energy, the spectrum attenuation speed, the spectrum slope and the spectrum flux by an expert group to generate a first voiceprint description factor identification set. Mapping and correlating the voice spectrum record data set and the first voiceprint description factor identification set, taking the voice spectrum record data set and the first voiceprint description factor identification set as training samples, monitoring and training the Fourier transform feature extraction nodes based on the training samples, performing mapping deviation measurement on node output and the first voiceprint description factor identification set, judging whether a deviation range allowed by node training precision is met or not, performing training loss measurement, and performing node retraining by combining training loss so as to ensure analysis precision of the Fourier transform feature extraction nodes.
And an insertion loss function for measuring the training loss of the Fourier transform feature extraction node. Specifically, a single node loss function is constructed:the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>Characterizing a single node loss function, " >To evaluate the number of training lost by a single node, +.>Characterization of->Every +.>Training->Secondary output value->Characterization of->Dimension voiceprint description factor +.>The second output value corresponds to +.>And outputting the identification value again. Constructing a first loss function of the joint node:
the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>The first loss function of the joint node is characterized,characterization of->Single node loss function values for individual nodes;the method comprises the steps of carrying out a first treatment on the surface of the Wherein (1)>And characterizing a second loss function of the joint node, wherein the parameters can be determined based on the actual training data of the Fourier transform characteristic extraction node and are known parameters.
Further, the single-node loss function threshold, the joint node first loss threshold and the joint node second loss threshold are set, namely, critical loss values which are set by a person skilled in the art in a self-defined manner and used for measuring the training precision requirements of the Fourier transform feature extraction nodes. Based on training verification data of the Fourier transform feature extraction nodes, training loss measurement is carried out by combining the constructed loss function, whether the corresponding loss threshold is met or not is judged, if not, the training samples which do not meet the loss threshold are screened, calculation force distribution is carried out again, wherein the distribution calculation force and the deviation degree are positively correlated, and further, the calculation force is input into the Fourier transform feature extraction nodes again for retraining. Repeating the training steps until the node output deviations meet the corresponding loss thresholds, namely when the single-node loss functions are smaller than or equal to the single-node loss function thresholds, the first loss functions of the joint nodes are smaller than or equal to the first loss thresholds of the joint nodes, the second loss functions of the joint nodes are smaller than or equal to the second loss thresholds of the joint nodes, indicating that the loss degree of the current training is within the precision allowable limit, and acquiring the Fourier transform feature extraction nodes after the training is completed.
Similarly, based on the sound spectrum record data set, sinusoidal harmonic identification is performed on the sound spectrum record data set according to the spectrum centroid, the spectrum peak value, the spectrum energy, the spectrum attenuation speed, the spectrum slope and the spectrum flux by an expert group, so that a second sound track description factor identification set is generated. And mapping the corresponding sound spectrum record data set and the second voice pattern description factor identification set to serve as training samples, performing neural network supervision training and node loss metering analysis, and performing iterative training to generate the sinusoidal harmonic feature extraction nodes meeting a loss threshold. Similarly, based on the voice spectrum record data set, performing time domain feature identification on the voice spectrum record data set according to the starting sounding time, the voice attenuation time, the RMS energy envelope, the zero crossing rate and the autocorrelation by an expert group to generate a third voiceprint description factor identification set, mapping and associating the voice spectrum record data set and the third voiceprint description factor identification set, performing two measurement analyses of neural network supervision training and node loss as training data, and performing iterative training to generate the time domain feature extraction node meeting a loss threshold.
The training modes and steps of the Fourier transform feature extraction node, the sine harmonic feature extraction node and the time domain feature extraction node are the same, and specific training data are different. The single-node loss function, the joint node first loss function and the joint node second loss function are universal loss analysis functions, are suitable for training analysis of each feature extraction node, and can efficiently and accurately extract features of diversified sound signals based on the constructed first spectrum feature extraction base channel.
And along with the collection of the speaker recording information, synchronously activating the Fourier transform feature extraction node, the sine harmonic feature extraction node and the time domain feature extraction node, executing feature extraction of the speaker recording information collected in a synchronous sequence, and outputting the first voiceprint description factor set, the second voiceprint description factor set and the third voiceprint description factor set.
Further performing matrix conversion on the first voiceprint description factor set, the second voiceprint description factor set and the third voiceprint description factor set to obtain the first spectral feature matrix, wherein the first spectral feature matrix is characterized in thatThe method comprises the steps of carrying out a first treatment on the surface of the Wherein (1) >For the first spectral feature matrix,/a>Characterizing a first set of voiceprint descriptive factors, +.>The second voiceprint description factor set is characterized,and characterizing the third voiceprint description factor set to ensure the characterization regularity and the ordering of the voiceprint expression factors, facilitating the subsequent mapping and checking analysis of the voiceprint description factors, and taking the first spectral feature matrix as the basis for carrying out the identity authentication of the reference users.
S3: activating a second spectral feature extraction base channel embedded in the voice recognition channel of the recording server to process the reference voice sample to generate a second spectral feature matrix;
similarly, along with the activation of the first spectral feature extraction base channel, the second spectral feature extraction base channel of the voice recognition channel embedded in the recording server is synchronously activated, wherein the construction mode of the second spectral feature extraction base channel and the first spectral feature extraction base channel is the same, and the channel architecture is the same as the execution underlying logic, and the specific construction data is different. And based on the second spectral feature extraction base channel, carrying out Fourier transform feature, sinusoidal harmonic feature and time domain feature extraction on the reference voice sample, obtaining a first voiceprint description factor set, a second voiceprint description factor set and a third voiceprint description factor set which are mapped on the reference voice sample, and carrying out matrix conversion to generate a second spectral feature matrix, wherein the first spectral feature matrix and the second spectral feature matrix have the same characterization format, and the distribution positions of matrix items have mapping corresponding relations.
S4: performing similarity analysis on the first spectrum feature matrix and the second spectrum feature matrix to generate spectrum feature similarity;
the method comprises the steps of (1) carrying out similarity analysis on the first spectrum feature matrix and the second spectrum feature matrix to generate spectrum feature similarity, and the step (S4) further comprises the following steps:
s41: performing single-element comparison on the first spectrum feature matrix and the second spectrum feature matrix, and taking absolute values to generate a plurality of single-element comparison deviations;
s42: acquiring a plurality of single element comparison preset deviations, wherein the plurality of single element comparison preset deviations correspond to the plurality of single element comparison deviations one by one;
s43: comparing the single element comparison deviation with the single element comparison preset deviation to obtain a dimension proportionality coefficient of which the single element comparison deviation is smaller than or equal to the single element comparison preset deviation;
s44: when the dimension proportionality coefficient is larger than or equal to a dimension proportionality coefficient threshold value, setting the dimension proportionality coefficient as the spectrum feature similarity;
s45: and when the dimension proportion coefficient is smaller than the dimension proportion coefficient threshold, counting the unit element comparison deviation which is larger than the unit element comparison preset deviation, carrying out normalization processing and calculating an average value, and then taking the opposite number to set the spectrum characteristic similarity.
And mapping and extracting the associated matrix item by taking the mapping and extracting relation as a reference, calculating the difference value of single elements for the mapping matrix item, solving the absolute value, and integrating to generate the plurality of single element comparison deviations. And further, aiming at each element in the Prime matrix, acquiring single element comparison preset deviation, wherein the single element comparison preset deviation is used as a critical deviation value which is custom-set by a person skilled in the art and used for measuring the similarity of the elements.
And mapping the corresponding single element comparison deviation and the preset deviation of the single element comparison, performing deviation correction, screening the number of unit prime numbers with the single element comparison deviation smaller than or equal to the preset deviation, and performing ratio calculation with the total number of unit prime numbers to obtain the dimension proportionality coefficient, wherein the dimension proportionality coefficient is the duty ratio of the single element with high similarity.
And further setting the dimension proportionality coefficient threshold, namely measuring the critical proportion of the overall similarity. Checking the dimension proportionality coefficient and the dimension proportionality coefficient threshold, and if the dimension proportionality coefficient is larger than or equal to the dimension proportionality coefficient threshold, setting the dimension proportionality coefficient as the spectrum feature similarity; if the dimension proportion coefficient is smaller than the dimension proportion coefficient threshold value, screening the unit element comparison deviation larger than the preset deviation of unit element comparison, and carrying out normalization processing on deviation data of unit elements with different dimensions according to the difference of the dimension differences in order to unify the data formats, so as to carry out mean value calculation, and measure the whole deviation degree. And taking the opposite number of the calculated mean value, namely, the greater the deviation degree is, the smaller the corresponding opposite number is the negative number, the smaller the corresponding similarity is, and setting the similarity as the third voiceprint sign similarity of the spectrum. And aiming at different deviation live conditions, a targeted processing mode is configured to measure the similarity of the spectrum characteristics, so that the determination accuracy of the similarity can be effectively improved, and the actual fit degree of the similarity of the spectrum characteristics is ensured.
S5: determining the participant identity information of the speaker according to the spectrum feature similarity, and filling the voice recognition content of the speaker recording information into a conference recording table;
and determining the participant identity information of the speaker based on the spectrum feature similarity, taking the sample attribution user with the highest spectrum feature similarity in the speaker recording information and the participant voice sample, and taking the sample attribution user as the participant identity information of the speaker. Further, traversing the conference record table to determine the filling position of the speaker, and filling the voice recognition content of the recording information of the speaker. And along with the conference process, collecting the recording information of the speaker in real time, analyzing the similarity of the features and the identity of the speaker, and recording conference contents in the conference recording table until the conference is finished.
S6: when the conference is finished, carrying out semantic association on the input text information of the conference recording table according to the conference subject, generating first important conference content, sending the first important conference content to a preset user terminal for adjustment, and generating second important conference content;
s7: and activating an archiving server to conduct classified archiving on the conference record table, the conference theme, the conference time and the second heavy conference content, and erasing the conference record table from the first temporary memory after archiving is completed.
When the conference is finished, carrying out semantic association on the input text information of the conference recording table according to the conference subject, generating first important conference content, sending the first important conference content to a preset user terminal for adjustment, and generating second important conference content, wherein the S6 further comprises:
s61: carrying out semantic recognition on the participant theme to generate participant destination information;
s62: according to the participant information, traversing the first user text and the second user text of the input text information until the Q user text is subjected to semantic association, and generating a plurality of key conference contents, wherein the method comprises the following steps:
s621: performing historical semantic retrieval on the first user text, generating a plurality of text execution destination information, and adding the first user text with the same text execution destination information and the same participant destination information into the plurality of key conference contents;
s622: and performing repeated semantic cleaning on the plurality of key conference contents to generate the first key conference content.
Specifically, semantic recognition is performed on the participant subjects, for example, recognition is performed in combination with natural language processing technology, and the participant destination information is obtained. And traversing the conference record table, extracting the first user text and the second user text corresponding to each participating user until the Q user text, and respectively carrying out semantic association with the participating destination information, wherein Q is the total number of participating users.
And carrying out historical semantic retrieval on the first user text to generate a plurality of text execution destination information, wherein the destination information is used for drainage, marketing and the like if the text is advertisement. And executing the checking of the destination information and the meeting destination information on the texts, and taking the first user text of the same destination information, namely, regarding as the key text content and adding the key text content into the key meeting contents.
And similarly, respectively determining and checking analysis of destination information of texts from the second user text to the Q user text, screening user texts which are the same as the participant destination information, and adding the user texts into the plurality of key conference contents, wherein each user text has a home subscriber identifier. Further, repeated semantic recognition is performed on the plurality of important meeting contents, for example, contents with different expressions but similar semantics are cleaned, so that the information simplicity of the important meeting contents is ensured, the existence of information redundancy is avoided, and the plurality of important meeting contents after repeated semantic cleaning are performed are used as the first important meeting contents.
And then the first important conference content is sent to the preset user end, namely the mobile terminal of the conference decision user, and the first important conference content is adjusted to generate the second important conference content, so that conference recording is completed. And further activating the archiving server to archive the conference record. And aiming at the conference record table, the conference subjects, the conference time and the second heavy conference content, classifying and archiving in the archiving server, namely corresponding archiving and storing aiming at different categories, and erasing the conference record table in the first temporary storage after archiving is completed.
The paperless conference recording and archiving method provided by the application has the following technical effects:
1. and aiming at real-time speaking information, synchronously carrying out feature processing and similarity judgment analysis by combining a voice recognition channel, and accurately carrying out speaker identity positioning and meeting record so as to adapt to dynamically-changed client groups.
2. The voice recognition channel is built and embedded into the recording server, the first spectrum feature extraction base channel and the second spectrum feature extraction base channel based on the built-in structure respectively comprise feature extraction nodes with different processing dimensions, the intervention loss function respectively performs multidimensional feature extraction on the speaker recording information and the participant voice sample, so that the extraction efficiency is ensured, and the completeness and the accuracy of the extracted features are improved. And accurately positioning the identity of the speaker through feature similarity analysis, and positioning and filling in a conference content table.
3. And carrying out semantic association on conference contents by combining the conference subjects, screening and cleaning key contents of conference records, finishing classified archiving storage, and realizing intelligent paperless record archiving.
Example two
Based on the same inventive concept as the paperless conference recording archiving method in the foregoing embodiment, as shown in fig. 3, the present application provides a paperless conference recording archiving system, which includes:
The information acquisition module 11 is used for acquiring the recording information of the speaker through recording software when the conference starts;
the first spectral feature matrix generating module 12, where the first spectral feature matrix generating module 12 is configured to activate a first spectral feature extraction base channel embedded in a voice recognition channel of a recording server to process the speaker recording information to generate a first spectral feature matrix;
the second spectral feature matrix generating module 13 is configured to activate a second spectral feature extraction base channel embedded in the voice recognition channel of the recording server, and process a reference voice sample to generate a second spectral feature matrix;
the similarity analysis module 14 is configured to perform similarity analysis on the first spectral feature matrix and the second spectral feature matrix, and generate a spectral feature similarity;
the conference recording module 15 is used for determining the identity information of the speaker according to the similarity of the spectrum characteristics, and filling the voice recognition content of the recording information of the speaker into a conference recording table;
the conference content generation module 16 is configured to, when the conference is ended, semantically correlate the input text information of the conference recording table according to the conference subject, generate a first key conference content, send the first key conference content to a preset user terminal for adjustment, and generate a second key conference content;
The content classification filing module 17 is configured to activate a filing server to classify and file the conference recording table, the participant theme, the participant time and the second heavy conference content, and after the filing is completed, the conference recording table is erased from the first temporary memory.
Further, the information collecting module 11 further includes:
the conference basic information acquisition module is used for acquiring conference basic information when receiving a conference request signal, wherein the conference basic information comprises a conference theme, a conference time and participants;
the conference record table generation module is used for activating the first temporary memory of the record server and generating the conference record table according to the conference theme, the conference time and the participants, wherein the conference record table comprises preset users, and the preset users are conference decision users;
the voice sample collection module is used for sending a voice test sample to the user side of the interactive participants and collecting the participant voice sample, wherein the voice test sample is greater than or equal to 15 Chinese characters.
Further, the first spectral feature matrix generating module 12 further includes:
the first spectral feature extraction base channel analysis module is used for the first spectral feature extraction base channel to comprise a Fourier transform feature extraction node, a sine harmonic feature extraction node and a time domain feature extraction node;
the first voiceprint description factor set generation module is used for activating the Fourier transform feature extraction node to carry out Fourier transform on the frequency spectrum acoustic signals of the speaker recording information so as to generate a first voiceprint description factor set;
the second voice print description factor set generation module is used for activating the sinusoidal harmonic feature extraction node to extract harmonic features of the frequency spectrum acoustic signals of the speaker recording information and generate a second voice print description factor set;
the third voiceprint description factor set generation module is used for activating the time domain feature extraction node to extract time domain features of the frequency spectrum acoustic signals of the speaker recording information and generate a third voiceprint description factor set;
Wherein the first voiceprint description factor set and the second voiceprint description factor set comprise: spectrum centroid, spectrum peak, spectrum energy, spectrum decay rate, spectrum slope, and spectrum flux; the third set of voiceprint descriptive factors includes: start sounding time, sound decay time, RMS energy envelope, zero crossing rate, autocorrelation;
a matrix construction module for constructing the first spectral feature matrix from the first set of voiceprint description factors, the second set of voiceprint description factors, and the third set of voiceprint description factors:
wherein,for the first spectral feature matrix,/a>Characterizing a first set of voiceprint descriptive factors, +.>Second voice print description factor set of sign +.>A third set of voiceprint descriptive factors is characterized.
Further, the first spectral feature extraction base channel analysis module further includes:
the first spectral feature extraction base channel construction module is used for constructing the first spectral feature extraction base channel according to a neural network structure, the first spectral feature extraction base channel comprises an input layer, the Fourier transform feature extraction nodes, the sine harmonic feature extraction nodes and the time domain feature extraction nodes are connected in parallel with the input layer, the Fourier transform feature extraction nodes comprise 6 output nodes, the sine harmonic feature extraction nodes comprise 6 output nodes, the time domain feature extraction nodes comprise 5 output nodes, and different output nodes output different voiceprint features;
The Fourier transform feature extraction node training module is used for calling a sound spectrum record data set and a first voiceprint description factor identification set to train the Fourier transform feature extraction node;
the sinusoidal harmonic feature extraction node training module is used for training the sinusoidal harmonic feature extraction node according to the voice frequency spectrum record data set and the second voice description factor identification set;
the time domain feature extraction node training module is used for training the time domain feature extraction node according to the voice frequency spectrum record data set and the third voiceprint description factor identification set;
wherein, collecting a sound spectrum record data set;
the first voiceprint description factor identification set generation module is used for carrying out Fourier change identification on the voice spectrum record data set according to the spectrum centroid, the spectrum peak value, the spectrum energy, the spectrum attenuation speed, the spectrum slope and the spectrum flux through an expert group to generate a first voiceprint description factor identification set;
A second voiceprint description factor identification set generation module, configured to generate a second voiceprint description factor identification set by performing sinusoidal harmonic identification on the sound spectrum record data set by an expert group according to the spectrum centroid, the spectrum peak, the spectrum energy, the spectrum decay speed, the spectrum slope, and the spectrum flux;
and the third voiceprint description factor identification set generation module is used for carrying out time domain feature identification on the voice spectrum record data set according to the starting sounding time, the voice attenuation time, the RMS energy envelope, the zero crossing rate and the autocorrelation by an expert group to generate a third voiceprint description factor identification set.
Further, the fourier transform feature extraction node training module further includes:
the single-node loss function construction module is used for constructing a single-node loss function:
wherein,characterizing a single node loss function, ">To evaluate the number of training lost by a single node, +.>Characterization of the first embodimentEvery +. >Training->Secondary output value->Characterization of->Dimension voiceprint description factor (SIF)The second output value corresponds to +.>Outputting the identification value for the second time;
the first loss function construction module is used for constructing a first loss function of the joint node:
wherein,characterizing a joint node first loss function, +.>Characterization of->Single node loss function values for individual nodes;
second loss function construction moduleThe second loss function construction module is configured to construct a joint node second loss function:
wherein,characterizing a joint node second loss function;
the node generation module is used for generating the Fourier transform feature extraction node when the single node loss functions are smaller than or equal to a single node loss function threshold value, the joint node first loss function is smaller than or equal to a joint node first loss threshold value, and the joint node second loss function is smaller than or equal to a joint node second loss threshold value, otherwise, the Fourier transform feature extraction node is trained in a circulating mode.
Further, the similarity analysis module 14 further includes:
the single element comparison deviation generation module is used for carrying out single element comparison on the first spectrum characteristic matrix and the second spectrum characteristic matrix and taking an absolute value to generate a plurality of single element comparison deviations;
The single element comparison preset deviation acquisition module is used for acquiring a plurality of single element comparison preset deviations, and the plurality of single element comparison preset deviations are in one-to-one correspondence with the plurality of single element comparison deviations;
the dimension ratio coefficient acquisition module is used for comparing the single element comparison deviation with the single element comparison preset deviation to acquire a dimension ratio coefficient of which the single element comparison deviation is smaller than or equal to the single element comparison preset deviation;
the spectrum feature similarity setting module is used for setting the dimension proportionality coefficient as the spectrum feature similarity when the dimension proportionality coefficient is larger than or equal to a dimension proportionality coefficient threshold value;
and the second-class spectrum feature similarity setting module is used for counting the unit element comparison deviation which is larger than the unit element comparison preset deviation when the dimension proportionality coefficient is smaller than the dimension proportionality coefficient threshold value, carrying out normalization processing and calculating an average value, and taking the opposite number to set the spectrum feature similarity.
Further, the conference content generating module 16 further includes:
The participant destination information generation module is used for carrying out semantic recognition on the participant theme to generate participant destination information;
the key conference content generation module is used for traversing the first user text and the second user text of the input text information to the Q user text for semantic association according to the meeting destination information to generate a plurality of key conference contents, and comprises the following steps:
the user text adding module is used for carrying out historical semantic retrieval on the first user text, generating a plurality of text execution destination information, and adding the first user text with the same text execution destination information and the same participant destination information into the plurality of key conference contents;
and the repeated semantic cleaning module is used for cleaning the repeated semantics of the plurality of important conference contents and generating the first important conference content.
In the foregoing detailed description of a paperless conference recording and archiving method, those skilled in the art will clearly know that a paperless conference recording and archiving method and system in this embodiment, and for the apparatus disclosed in the embodiments, the description is relatively simple, and relevant places refer to the description of the method section.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (3)

1. The paperless conference record archiving method is characterized by being applied to a paperless conference record archiving system, wherein the paperless conference record archiving system comprises a user side, a record server and an archiving server, and comprises the following steps:
when a conference starts, recording information of a speaker is collected through recording software;
activating a first spectral feature extraction base channel of a voice recognition channel embedded in a recording server to process the record information of the speaker to generate a first spectral feature matrix, wherein the method comprises the following steps of:
the first spectral feature extraction base channel comprises a Fourier transform feature extraction node, a sine harmonic feature extraction node and a time domain feature extraction node;
Activating the Fourier transform feature extraction node to carry out Fourier transform on the frequency spectrum sound signal of the speaker record information to generate a first voiceprint description factor set;
activating the sinusoidal harmonic feature extraction node to extract harmonic features of the frequency spectrum sound signal of the speaker recording information, and generating a second harmonic description factor set;
activating the time domain feature extraction node to extract time domain features of the frequency spectrum sound signals of the speaker recording information, and generating a third voiceprint description factor set;
wherein the first voiceprint description factor set and the second voiceprint description factor set comprise: spectrum centroid, spectrum peak, spectrum energy, spectrum decay rate, spectrum slope, and spectrum flux; the third set of voiceprint descriptive factors includes: start sounding time, sound decay time, RMS energy envelope, zero crossing rate, autocorrelation;
constructing the first spectral feature matrix from the first set of voiceprint description factors, the second set of voiceprint description factors, and the third set of voiceprint description factors:
;
wherein,for the first spectral feature matrix,/a>Characterizing a first set of voiceprint descriptive factors, +.>Characterizing a second voiceprint descriptive factor set, >Characterizing a third set of voiceprint descriptive factors;
according to a neural network structure, the first spectral feature extraction base channel is built, the first spectral feature extraction base channel comprises an input layer, the Fourier transform feature extraction nodes, the sine harmonic feature extraction nodes and the time domain feature extraction nodes are connected in parallel to the input layer, the Fourier transform feature extraction nodes comprise 6 output nodes, the sine harmonic feature extraction nodes comprise 6 output nodes, the time domain feature extraction nodes comprise 5 output nodes, and different output nodes output different voiceprint features;
invoking the voice spectrum record dataset and the first voiceprint descriptive factor identification set to train the fourier transform feature extraction node, comprising:
constructing a single-node loss function:
wherein,characterizing a single node loss function, ">To evaluate the number of training lost by a single node, +.>The jth output value of every n-th training of the ith dimension voiceprint descriptive factor is characterized,/>Representing a jth output identification value corresponding to a jth output value of the ith dimension voiceprint description factor;
constructing a first loss function of the joint node:
wherein,characterizing a joint node first loss function, +. >Characterizing a single node loss function value of a kth node;
constructing a joint node second loss function:
wherein,characterizing a joint node second loss function;
when the single-node loss functions are smaller than or equal to a single-node loss function threshold, the first loss function of the joint node is smaller than or equal to a first loss threshold of the joint node, the second loss function of the joint node is smaller than or equal to a second loss threshold of the joint node, the Fourier transform feature extraction node is generated, and otherwise, the cycle training is performed;
training the sinusoidal harmonic feature extraction node according to a sound spectrum record data set and a second voice print description factor identification set;
training the time domain feature extraction node according to the voice spectrum record data set and the third voiceprint description factor identification set;
wherein, collecting a sound spectrum record data set;
performing Fourier transform identification on the voice spectrum record data set according to the spectrum centroid, the spectrum peak value, the spectrum energy, the spectrum attenuation speed, the spectrum slope and the spectrum flux by an expert group to generate a first voiceprint description factor identification set;
performing sinusoidal harmonic identification on the sound spectrum record data set according to the spectrum centroid, the spectrum peak value, the spectrum energy, the spectrum attenuation speed, the spectrum slope and the spectrum flux by an expert group to generate a second voiceprint description factor identification set;
Performing time domain feature identification on the voice spectrum record data set according to the starting sounding time, the voice attenuation time, the RMS energy envelope, the zero-crossing rate and the autocorrelation by an expert group to generate a third voiceprint description factor identification set;
activating a second spectrum feature extraction base channel embedded in the voice recognition channel of the recording server to process a reference voice sample to generate a second spectrum feature matrix, wherein the construction mode of the second spectrum feature extraction base channel and the first spectrum feature extraction base channel is the same, the channel architecture is the same as that of the execution bottom logic, and the specific construction data is different;
performing similarity analysis on the first spectrum feature matrix and the second spectrum feature matrix to generate spectrum feature similarity, including:
performing single-element comparison on the first spectrum feature matrix and the second spectrum feature matrix, and taking absolute values to generate a plurality of single-element comparison deviations;
acquiring a plurality of single element comparison preset deviations, wherein the plurality of single element comparison preset deviations correspond to the plurality of single element comparison deviations one by one;
comparing the single element comparison deviation with the single element comparison preset deviation to obtain a dimension proportionality coefficient of which the single element comparison deviation is smaller than or equal to the single element comparison preset deviation;
When the dimension proportionality coefficient is larger than or equal to a dimension proportionality coefficient threshold value, setting the dimension proportionality coefficient as the spectrum feature similarity;
when the dimension proportion coefficient is smaller than the dimension proportion coefficient threshold, counting the unit element comparison deviation which is larger than the unit element comparison preset deviation, carrying out normalization processing and calculating an average value, and then taking the opposite number to set the spectrum characteristic similarity; screening the unit prime numbers with single element comparison deviation smaller than or equal to the single element comparison preset deviation, and carrying out ratio calculation on the unit prime numbers and the total unit prime numbers to obtain the dimension proportionality coefficient;
determining the participant identity information of the speaker according to the spectrum feature similarity, and filling the voice recognition content of the speaker recording information into a conference recording table;
when the conference is finished, carrying out semantic association on the input text information of the conference recording table according to the conference theme, generating first important conference content, sending the first important conference content to a preset user terminal for adjustment, and generating second important conference content, wherein the method comprises the following steps:
carrying out semantic recognition on the participant theme to generate participant destination information;
traversing the first user text and the second user text of the input text information until the Q user text carries out semantic association according to the meeting destination information, wherein Q is the total number of meeting users, and generating a plurality of key meeting contents comprises the following steps:
Performing historical semantic retrieval on the first user text, generating a plurality of text execution destination information, and adding the first user text with the same text execution destination information and the same participant destination information into the plurality of key conference contents;
repeated semantic cleaning is carried out on the plurality of key conference contents, and the first key conference contents are generated;
and activating an archiving server to conduct classified archiving on the conference record table, the conference theme, the conference time and the second heavy conference content, and erasing the conference record table from the first temporary memory after archiving is completed.
2. The method of claim 1, wherein collecting the speaker recording information by the recording software when the conference begins, previously comprises:
when receiving a meeting request signal, acquiring meeting basic information, wherein the meeting basic information comprises a meeting theme, meeting time and meeting participants;
activating the first temporary memory of the recording server, and generating the conference recording table according to the conference theme, the conference time and the participants, wherein the conference recording table comprises preset users, and the preset users are conference decision users;
And the interactive participant user side sends a voice test sample and collects the participant voice sample, wherein the voice test sample is greater than or equal to 15 Chinese characters.
3. A paperless conference recording archiving system for performing the method of claim 1, the system comprising a client, a recording server, and an archiving server, comprising:
the information acquisition module is used for acquiring the recording information of the speaker through recording software when the conference starts;
the first spectral feature matrix generation module is used for activating a first spectral feature extraction base channel embedded in a voice recognition channel of the recording server to process the speaker recording information to generate a first spectral feature matrix;
the second spectral feature matrix generation module is used for activating a second spectral feature extraction base channel embedded in the voice recognition channel of the recording server to process a reference voice sample so as to generate a second spectral feature matrix;
the similarity analysis module is used for carrying out similarity analysis on the first spectrum feature matrix and the second spectrum feature matrix to generate spectrum feature similarity;
The conference recording module is used for determining the identity information of the talker according to the spectrum characteristic similarity and filling the voice recognition content of the talker recording information into a conference recording table;
the conference content generation module is used for carrying out semantic association on the input text information of the conference record table according to the conference theme when the conference is finished, generating first key conference content, sending the first key conference content to a preset user side for adjustment, and generating second key conference content;
the content classification filing module is used for activating a filing server to classify and file the conference record table, the conference theme, the conference time and the second heavy conference content, and erasing the conference record table from the first temporary memory after filing;
the conference content generation module further includes:
the participant destination information generation module is used for carrying out semantic recognition on the participant theme to generate participant destination information;
the key conference content generation module is used for traversing the first user text and the second user text of the input text information to the Q user text for semantic association according to the meeting destination information to generate a plurality of key conference contents, and comprises the following steps:
The user text adding module is used for carrying out historical semantic retrieval on the first user text, generating a plurality of text execution destination information, and adding the first user text with the same text execution destination information and the same participant destination information into the plurality of key conference contents;
and the repeated semantic cleaning module is used for cleaning the repeated semantics of the plurality of important conference contents and generating the first important conference content.
CN202311158138.XA 2023-09-08 2023-09-08 Paperless conference record archiving method and paperless conference record archiving system Active CN116911817B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311158138.XA CN116911817B (en) 2023-09-08 2023-09-08 Paperless conference record archiving method and paperless conference record archiving system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311158138.XA CN116911817B (en) 2023-09-08 2023-09-08 Paperless conference record archiving method and paperless conference record archiving system

Publications (2)

Publication Number Publication Date
CN116911817A CN116911817A (en) 2023-10-20
CN116911817B true CN116911817B (en) 2023-12-01

Family

ID=88351376

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311158138.XA Active CN116911817B (en) 2023-09-08 2023-09-08 Paperless conference record archiving method and paperless conference record archiving system

Country Status (1)

Country Link
CN (1) CN116911817B (en)

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5970453A (en) * 1995-01-07 1999-10-19 International Business Machines Corporation Method and system for synthesizing speech
CN109388701A (en) * 2018-08-17 2019-02-26 深圳壹账通智能科技有限公司 Minutes generation method, device, equipment and computer storage medium
CN110265032A (en) * 2019-06-05 2019-09-20 平安科技(深圳)有限公司 Conferencing data analysis and processing method, device, computer equipment and storage medium
CN113139668A (en) * 2021-05-11 2021-07-20 中国工商银行股份有限公司 Intelligent conference management method, device, computer system and readable storage medium
CN113189571A (en) * 2020-01-14 2021-07-30 中国科学院声学研究所 Sound source passive ranging method based on tone feature extraction and deep learning
CN113763986A (en) * 2021-09-07 2021-12-07 山东大学 Air conditioner indoor unit abnormal sound detection method based on sound classification model
WO2022016994A1 (en) * 2020-07-23 2022-01-27 平安科技(深圳)有限公司 Ai recognition-based meeting minutes generation method and apparatus, device and medium
WO2022142610A1 (en) * 2020-12-28 2022-07-07 深圳壹账通智能科技有限公司 Speech recording method and apparatus, computer device, and readable storage medium
WO2023087287A1 (en) * 2021-11-19 2023-05-25 京东方科技集团股份有限公司 Conference content display method, conference system and conference device

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5970453A (en) * 1995-01-07 1999-10-19 International Business Machines Corporation Method and system for synthesizing speech
CN109388701A (en) * 2018-08-17 2019-02-26 深圳壹账通智能科技有限公司 Minutes generation method, device, equipment and computer storage medium
CN110265032A (en) * 2019-06-05 2019-09-20 平安科技(深圳)有限公司 Conferencing data analysis and processing method, device, computer equipment and storage medium
CN113189571A (en) * 2020-01-14 2021-07-30 中国科学院声学研究所 Sound source passive ranging method based on tone feature extraction and deep learning
WO2022016994A1 (en) * 2020-07-23 2022-01-27 平安科技(深圳)有限公司 Ai recognition-based meeting minutes generation method and apparatus, device and medium
WO2022142610A1 (en) * 2020-12-28 2022-07-07 深圳壹账通智能科技有限公司 Speech recording method and apparatus, computer device, and readable storage medium
CN113139668A (en) * 2021-05-11 2021-07-20 中国工商银行股份有限公司 Intelligent conference management method, device, computer system and readable storage medium
CN113763986A (en) * 2021-09-07 2021-12-07 山东大学 Air conditioner indoor unit abnormal sound detection method based on sound classification model
WO2023087287A1 (en) * 2021-11-19 2023-05-25 京东方科技集团股份有限公司 Conference content display method, conference system and conference device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于OA协同办公平台的创新成果提炼信息化管理;段华荣;吴蓓蕾;张雷雷;;拖拉机与农用运输车(第02期);全文 *
柳毅.《机器学习与Python实践》.西安电子科技大学出版社,2022,第201-204页. *

Also Published As

Publication number Publication date
CN116911817A (en) 2023-10-20

Similar Documents

Publication Publication Date Title
CN109901031A (en) Signal De-noising Method, information data processing terminal for local discharge signal
WO2020181824A1 (en) Voiceprint recognition method, apparatus and device, and computer-readable storage medium
CN101923855A (en) Test-irrelevant voice print identifying system
WO2021114841A1 (en) User report generating method and terminal device
US20040093354A1 (en) Method and system of representing musical information in a digital representation for use in content-based multimedia information retrieval
CN109256136A (en) A kind of audio recognition method and device
CN110457432A (en) Interview methods of marking, device, equipment and storage medium
CN106057206A (en) Voiceprint model training method, voiceprint recognition method and device
CN106991163A (en) A kind of song recommendations method based on singer's sound speciality
CN113221673B (en) Speaker authentication method and system based on multi-scale feature aggregation
CN1936967A (en) Vocal-print attendance machine
CN102496366B (en) Speaker identification method irrelevant with text
CN109670073B (en) Information conversion method and device and interactive auxiliary system
CN116911817B (en) Paperless conference record archiving method and paperless conference record archiving system
Nirjon et al. sMFCC: exploiting sparseness in speech for fast acoustic feature extraction on mobile devices--a feasibility study
Kumalasari et al. Speech classification using combination virtual center of gravity and k-means clustering based on audio feature extraction
CN111583963A (en) Method, device and equipment for detecting repeated audio and storage medium
Chen et al. A robust feature extraction algorithm for audio fingerprinting
CN111312258A (en) User identity authentication method, device, server and storage medium
Rohden et al. Low complexity musical meter estimation from polyphonic music
CN111951809A (en) Multi-person voiceprint identification method and system
CN108933846A (en) A kind of recognition methods, device and the electronic equipment of general parsing domain name
CN111489736B (en) Automatic scoring device and method for seat speaking operation
CN113239927A (en) Handwriting data acquisition method and system based on big data
Tingli et al. Birdcall Identification and Prediction Based on ResNeSt Model

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
PE01 Entry into force of the registration of the contract for pledge of patent right

Denomination of invention: A paperless meeting record archiving method and system

Granted publication date: 20231201

Pledgee: Zhejiang Tailong Commercial Bank Co.,Ltd. Ningbo Branch

Pledgor: Zhejiang Zhijia Information Technology Co.,Ltd.

Registration number: Y2024980010710

PE01 Entry into force of the registration of the contract for pledge of patent right