CN113691382A - Conference recording method, device, computer equipment and medium - Google Patents

Conference recording method, device, computer equipment and medium Download PDF

Info

Publication number
CN113691382A
CN113691382A CN202110978838.8A CN202110978838A CN113691382A CN 113691382 A CN113691382 A CN 113691382A CN 202110978838 A CN202110978838 A CN 202110978838A CN 113691382 A CN113691382 A CN 113691382A
Authority
CN
China
Prior art keywords
conference
voice
speech
feature
audio
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110978838.8A
Other languages
Chinese (zh)
Inventor
何春梅
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An International Smart City Technology Co Ltd
Original Assignee
Ping An International Smart City Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An International Smart City Technology Co Ltd filed Critical Ping An International Smart City Technology Co Ltd
Priority to CN202110978838.8A priority Critical patent/CN113691382A/en
Publication of CN113691382A publication Critical patent/CN113691382A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/02Details
    • H04L12/16Arrangements for providing special services to substations
    • H04L12/18Arrangements for providing special services to substations for broadcast or conference, e.g. multicast
    • H04L12/1813Arrangements for providing special services to substations for broadcast or conference, e.g. multicast for computer conferences, e.g. chat rooms
    • H04L12/1831Tracking arrangements for later retrieval, e.g. recording contents, participants activities or behavior, network status
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/06Decision making techniques; Pattern matching strategies
    • G10L17/14Use of phonemic categorisation or speech recognition prior to speaker recognition or verification
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/18Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being spectral information of each sub-band
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/03Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters
    • G10L25/24Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 characterised by the type of extracted parameters the extracted parameters being the cepstrum
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/63Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for estimating an emotional state

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • Computational Linguistics (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Game Theory and Decision Science (AREA)
  • Business, Economics & Management (AREA)
  • Spectroscopy & Molecular Physics (AREA)
  • Child & Adolescent Psychology (AREA)
  • Quality & Reliability (AREA)
  • Artificial Intelligence (AREA)
  • Hospice & Palliative Care (AREA)
  • General Physics & Mathematics (AREA)
  • Psychiatry (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Telephonic Communication Services (AREA)

Abstract

本申请适用于人工智能技术领域,提供了一种会议记录方法、装置、计算机设备及介质,该方法包括:对各会议语音进行声纹识别和情绪识别,得到相应的语音发言人和发言情绪特征;根据各会议语音分别对应的语音发言人和发言情绪特征,对相应会议语音对应的语音文本进行标记,将标记后的语音文本写入会议记录;根据语音文本确定相应会议语音的项目信息,根据项目信息对会议记录进行信息标记。本申请通过对各会议语音进行声纹识别和情绪识别,来确定各会议语音的语音发言人和发言情绪特征,基于语音发言人和发言情绪特征对语音文本自动标记,将标记后的语音文本写入会议记录,无需人工手动对会议记录进行标记,提高了会议记录生成效率。

Figure 202110978838

The present application is applicable to the field of artificial intelligence technology, and provides a conference recording method, device, computer equipment and medium. The method includes: performing voiceprint recognition and emotion recognition on each conference speech, and obtaining corresponding speech speakers and speech emotion characteristics ; According to the corresponding speech speaker and speech emotion feature of each conference voice, mark the corresponding voice text of the corresponding conference voice, and write the marked voice text into the conference record; Determine the project information of the corresponding conference voice according to the voice text, according to The project information marks the meeting minutes with information. In this application, voiceprint recognition and emotion recognition are performed on the speech of each conference to determine the speech speaker and speech emotion characteristics of each conference speech, and the speech text is automatically marked based on the speech speaker and speech emotion characteristics, and the marked speech text is written. The meeting records are entered, and there is no need to manually mark the meeting records, which improves the generation efficiency of the meeting records.

Figure 202110978838

Description

Conference recording method, conference recording device, computer equipment and medium
Technical Field
The present application relates to the field of artificial intelligence technologies, and in particular, to a conference recording method, apparatus, computer device, and medium.
Background
In the existing office meeting process, the function of automatic meeting recording is generally provided, namely, a pickup device is arranged in a meeting room to pick up the sound of meeting personnel, and then the content of the meeting is recorded through the function of converting the voice into characters so as to form meeting recording for the meeting personnel to use.
In the existing conference recording process, only corresponding characters are recorded, and a manual method is needed to be adopted for marking the speaker in the conference recording, so that the conference recording generation efficiency is low.
Disclosure of Invention
In view of this, embodiments of the present application provide a conference recording method, an apparatus, a computer device, and a medium, so as to solve the problem that the conference recording generation efficiency is low because a speaker is usually marked in a manual manner in the existing conference recording process.
A first aspect of an embodiment of the present application provides a conference recording method, including:
collecting voices of conference personnel in a conference room to obtain conference voices, and respectively carrying out voiceprint recognition and emotion recognition on each conference voice to obtain voice speakers and speaking emotion characteristics corresponding to the corresponding conference voices;
according to the voice speakers and speaking emotion characteristics corresponding to the conference voices respectively, marking voice texts corresponding to the corresponding conference voices, and writing the marked voice texts into conference records;
and determining project information of corresponding conference voice according to the voice text, and performing information marking on the conference record according to the determined project information, wherein the project information is used for representing the information of the project described by the corresponding conference voice.
Further, the voice print recognition and emotion recognition are respectively performed on each conference voice to obtain voice speakers and speaking emotion characteristics corresponding to the corresponding conference voice, and the method includes the following steps:
acquiring sample entropy characteristics of each conference voice, and performing silence detection according to the sample entropy characteristics;
performing voice filtering on the conference voices according to the silence detection result, and acquiring the audio features of the conference voices after the voice filtering;
determining the speaker type of the corresponding conference voice according to the audio characteristics, wherein the speaker type comprises a single-person speech type and a multi-person speech type, and performing voice separation on the corresponding conference voice according to the audio characteristics and the speaker type to obtain separated audio;
determining voice speakers of corresponding separated audios according to the audio features, and enabling the voice speakers corresponding to the same audio features to form corresponding relations with the separated audios;
and performing feature fusion on the audio features and the sample entropy features to obtain fusion features, and performing emotion classification on the fusion features to obtain the speaking emotion features.
Further, the determining the speaker type of the corresponding conference voice according to the audio feature includes:
and matching the audio features with a pre-stored feature query table to obtain the speaker type, wherein the feature query table stores different preset audio features and corresponding relations between preset feature combinations and corresponding speaker types.
Further, the performing voice separation on the corresponding conference voice according to the audio features and the speaker type to obtain a separated audio includes:
when the speaker type of the conference voice is a multi-person speaker type, acquiring a preset feature combination corresponding to the audio feature in the feature query table, and determining an audio sub-feature corresponding to the preset feature combination;
and respectively determining the voice position of each audio sub-feature in the corresponding conference voice, and performing voice separation on the conference voice according to the voice position to obtain the separated audio corresponding to each audio sub-feature.
Further, the obtaining of the audio features of the voice of each conference after the voice filtering includes:
and respectively extracting one or more combinations of frequency cepstrum coefficients, pitch periods, zero crossing rates, energy root-mean-square coefficients or spectrum flat coefficients of the conference voices after voice filtering to obtain the audio features.
Further, the information marking the meeting record according to the determined project information includes:
performing word segmentation on the voice text to obtain word segmentation vocabularies, and matching each word segmentation vocabulary with a pre-stored item query table, wherein the item query table stores the corresponding relation between a specified vocabulary and corresponding item information;
and if the word segmentation vocabulary is matched with the item query table, carrying out information marking on the corresponding voice text in the conference record by the matched item information.
Further, the marking the voice text corresponding to the corresponding conference voice according to the voice speaker and the speaking emotion characteristics respectively corresponding to each conference voice, and writing the marked voice text into the conference record includes:
respectively acquiring voice acquisition time of each conference voice, and sequencing voice texts corresponding to each conference voice according to the voice acquisition time to obtain the conference record;
and in the conference record, aiming at the same conference voice, carrying out speaker marking and emotion marking on a voice text corresponding to the corresponding conference voice according to the voice speaker and the speaking emotion characteristics.
A second aspect of an embodiment of the present application provides a conference recording apparatus, including:
the identification unit is used for collecting the voices of conference personnel in the conference room to obtain conference voices, and respectively carrying out voiceprint identification and emotion identification on each conference voice to obtain voice speakers and speaking emotion characteristics corresponding to the corresponding conference voices;
the text marking unit is used for marking the voice text corresponding to the corresponding conference voice according to the voice speaker and the speaking emotion characteristics respectively corresponding to each conference voice, and writing the marked voice text into the conference record;
and the item marking unit is used for determining item information of the corresponding conference voice according to the voice text and marking the conference record with information according to the determined item information, wherein the item information is used for representing the information of the item described by the corresponding conference voice.
A third aspect of embodiments of the present application provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the computer device, where the processor implements the steps of the conference recording method provided in the first aspect when executing the computer program.
A fourth aspect of embodiments of the present application provides a computer-readable storage medium, which stores a computer program, and the computer program, when executed by a processor, implements the steps of the conference recording method provided by the first aspect.
According to the conference recording method, the conference recording device, the computer equipment and the conference recording medium, voiceprint recognition and emotion recognition are carried out on each conference voice to determine voice speakers and speaking emotion characteristics of each conference voice, corresponding voice texts are automatically marked based on the voice speakers and the speaking emotion characteristics, the marked voice texts are written into a conference record, manual marking of the conference record is not needed, and conference record generation efficiency is improved.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the embodiments or the prior art descriptions will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
Fig. 1 is a flowchart of an implementation of a conference recording method according to an embodiment of the present application;
fig. 2 is a flowchart of an implementation of a conference recording method according to another embodiment of the present application;
fig. 3 is a block diagram of a structure of a conference recording apparatus according to an embodiment of the present application;
fig. 4 is a block diagram of a computer device according to an embodiment of the present disclosure.
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application is described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.
The embodiment of the application can acquire and process related data based on an artificial intelligence technology. Among them, Artificial Intelligence (AI) is a theory, method, technique and application system that simulates, extends and expands human Intelligence using a digital computer or a machine controlled by a digital computer, senses the environment, acquires knowledge and uses the knowledge to obtain the best result.
The artificial intelligence infrastructure generally includes technologies such as sensors, dedicated artificial intelligence chips, cloud computing, distributed storage, big data processing technologies, operation/interaction systems, mechatronics, and the like. The artificial intelligence software technology mainly comprises a computer vision technology, a robot technology, a biological recognition technology, a voice processing technology, a natural language processing technology, machine learning/deep learning and the like.
In the embodiment of the application, the conference recording method is realized based on the artificial intelligence technology, and the conference recording is carried out in the conference process.
Referring to fig. 1, fig. 1 shows a flowchart of an implementation of a conference recording method provided in an embodiment of the present application, where the conference recording method is applied to any computer device, where the computer device may be a server, a mobile phone, a tablet, or a wearable smart device, and the conference recording method includes:
step S10, collecting the voices of conference personnel in the conference room to obtain conference voices, and respectively carrying out voiceprint recognition and emotion recognition on each conference voice to obtain voice speakers and speaking emotion characteristics corresponding to the corresponding conference voices;
wherein, be provided with pronunciation collection equipment in the meeting room, this pronunciation collection equipment is used for gathering meeting indoor meeting personnel's pronunciation, and is optional, in this embodiment, when the meeting adopted the mode of online meeting to develop, this pronunciation collection equipment still is used for gathering each meeting personnel's of online meeting pronunciation to obtain this meeting pronunciation.
Wherein voiceprint recognition and emotion recognition are performed on each conference voice to determine a voice speaker and a speaking emotional characteristic corresponding to each conference voice, the speaking emotional characteristic describing an emotion of the corresponding voice speaker at the time of uttering the corresponding conference voice, the emotion described by the speaking emotional characteristic including anger, paternity, cynicism, calmness, or excitement.
Step S20, according to the voice speakers and speaking emotion characteristics corresponding to the conference voices respectively, marking the voice texts corresponding to the conference voices and writing the marked voice texts into conference records;
the voice recognition operation of the conference voices can be performed by adopting a voice recognizer or a voice recognition model based on deep learning, so that the text conversion effect of the conference voices is achieved, and voice texts corresponding to the conference voices are obtained.
In the step, the voice texts corresponding to the corresponding conference voices are marked through the voice speakers and the speaking emotion characteristics corresponding to the conference voices respectively, and the marked voice texts are written into the conference records, so that the corresponding relations between the conference voices and the corresponding voice texts, between the voice speakers and between the conference voices and the speaking emotion characteristics are stored in the conference records, manual marking of speaker information is not needed, conference record generation efficiency is improved, and emotions of the speakers during conference can be effectively known based on the speaking emotion characteristics.
Step S30, determining the project information of the corresponding conference voice according to the voice text, and marking the conference record with information according to the determined project information;
the method and the device have the advantages that the project information of the corresponding conference voice is determined through the voice text, and the conference record is subjected to information marking according to the determined project information, so that the method and the device are effectively convenient for the follow-up user to check the corresponding project information of the speech of each speaker when the conference record is checked, and the project information is used for representing the project information described by the corresponding conference voice.
For example, when the determined item information is the item information b1 with respect to the voice text a1 of the conference voice, the voice text a1 in the conference record is marked according to the item information b1, and preferably, the marking method used for marking the voice text with information may be a text mark, a serial number mark, an image mark, or the like, where the text mark, the serial number mark, or the image mark is used for representing the corresponding item information, so that when a conference person or other users not participating in the conference performs the conference record viewing, the information of the item discussed with respect to the voice text of each conference voice can be viewed.
Optionally, in this step, the marking a voice text corresponding to the conference voice according to the voice speaker and the speaking emotion feature respectively corresponding to each conference voice, and writing the marked voice text into the conference record includes:
respectively acquiring voice acquisition time of each conference voice, and sequencing voice texts corresponding to each conference voice according to the voice acquisition time to obtain the conference record;
in the conference record, aiming at the same conference voice, carrying out speaker marking and emotion marking on a voice text corresponding to the corresponding conference voice according to the voice speaker and the speaking emotion characteristics;
the voice speakers and the speaking emotion characteristics are used for carrying out speaker marking and emotion marking on the voice texts corresponding to the corresponding conference voices, so that a user viewing the conference records can effectively know the voice speakers corresponding to the voice texts and the emotion of the voice speakers when the voice speakers send the conference voices corresponding to the voice texts based on the conference records.
Optionally, in this step, the information marking of the meeting record according to the determined item information includes:
performing word segmentation on the voice text to obtain word segmentation vocabularies, and matching each word segmentation vocabulary with a pre-stored item query table;
the item query table stores a corresponding relationship between a specified vocabulary and corresponding item information, and the specified vocabulary can be set according to user requirements, for example, the specified vocabulary can be set as an item name of an item related to a current conference;
and if the word segmentation vocabulary is matched with the item query table, carrying out information marking on the corresponding voice text in the conference record by the matched item information.
In the embodiment, the voice speaker and the speaking emotion characteristics of each conference voice are determined by performing voiceprint recognition and emotion recognition on each conference voice, the corresponding voice text is automatically marked based on the determined voice speaker and speaking emotion characteristics, a corresponding conference record is generated, manual marking of the conference record is not needed, and the conference record generation efficiency is improved.
Referring to fig. 2, fig. 2 is a flowchart illustrating an implementation of a conference recording method according to another embodiment of the present application. With respect to the embodiment of fig. 1, the conference recording method provided by this embodiment is used to further refine step S10 in the embodiment of fig. 1, and includes:
step S11, acquiring sample entropy characteristics of each conference voice, and carrying out silence detection according to the sample entropy characteristics;
the Sample Entropy (Sample Entropy) is similar to the physical meaning of the approximate Entropy, and the time series complexity is measured by measuring the probability of generating a new pattern in the signal. The silence detection is carried out through the sample entropy characteristics, the voice starting point and the voice starting point in each conference voice can be accurately identified, and the accuracy of voice filtering on each conference voice in the follow-up process is further improved.
Step S12, carrying out voice filtering on the conference voices according to the silence detection result, and acquiring the audio features of the conference voices after the voice filtering;
the voice filtering is carried out on the conference voices through the voice starting point and the voice starting point obtained from the silence detection result, so that the noise and silence in the conference voices can be effectively removed, and the accuracy of voice signals in the conference voices is improved.
Optionally, in this step, the obtaining of the audio features of the conference voices after the voice filtering includes:
and respectively extracting one or more combinations of frequency cepstrum coefficients, pitch periods, zero crossing rates, energy root-mean-square coefficients or spectrum flat coefficients of the conference voices after voice filtering to obtain the audio features.
Step S13, determining the speaker type of the corresponding conference voice according to the audio characteristics, and performing voice separation on the corresponding conference voice according to the audio characteristics and the speaker type to obtain separated audio;
the speaker type comprises a single speaker type and a multi-person speaker type, when the speaker type of the conference voice is the single speaker type, it is judged that only one speaker exists in the conference voice, and when the speaker type of the conference voice is the multi-person speaker type, it is judged that a plurality of speakers exist in the conference voice.
In the step, when the speaker type of the conference voice is a single-person speaker type, the conference voice is directly set as a separated audio, when the speaker type of the conference voice is a multi-person speaker type, the audio corresponding to each speaker in the conference voice is respectively determined, the conference voice is separated according to an audio determination result, the separated audio is obtained, and one separated audio only comprises voice information of one voice speaker.
Optionally, in this step, the determining, according to the audio feature, a speaker type of the corresponding conference voice includes:
matching the audio features with a pre-stored feature query table to obtain the speaker type;
the preset audio features can be set according to the audio features of the conference personnel participating in the current conference, the preset audio features are obtained by respectively obtaining the audio features of the conference personnel and combining the audio features of different conference personnel, and further, when the audio features of different conference personnel are combined, the number of the combined audio features can be set to be 2, 3 or 4 and the like.
For example, when the conference staff of the current conference includes conference staff c1, conference staff c2, conference staff c3 and conference staff c4, the audio features of the conference staff c1, the conference staff c2, the conference staff c3 and the conference staff c4 are respectively obtained, audio feature d1, audio feature d2 and audio feature d2 are obtained, the audio feature d2 is combined with the audio feature d2 to obtain a preset audio feature e2, the audio feature d2 is combined with the audio feature d2, the preset audio feature d2 is combined with the audio feature d2 to obtain a preset audio feature e2, combining the audio feature d1, the audio feature d2 and the audio feature d3 to obtain a preset audio feature e7, combining the audio feature d1, the audio feature d2 and the audio feature d4 to obtain a preset audio feature e4, combining the audio feature d4, the audio feature d4 and the audio feature d4 to obtain a preset audio feature e4, setting the audio feature d4, the audio feature d4 and the audio feature d4 as the preset audio feature e4, the preset audio feature e4 and the preset audio feature e4 respectively, and setting the audio feature of each conference voice with the preset audio feature e4, the preset audio feature e4 and the preset audio feature e4 respectively, When the audio characteristics of any conference voice are matched with the preset audio characteristics e1, e2, e3, e4, e5, e6, e7, e 3984, e8, e5, e6, e7, e8, e9 or e10, the speaker type of the conference voice is determined to be a multi-person speaker type, and when the audio characteristics of any conference voice are matched with e11, e5, e13 or e14, that is, only one voice speaker in the conference voice is speaking.
Further, in this step, the performing voice separation on the corresponding conference voice according to the audio feature and the speaker type to obtain a separated audio includes:
when the speaker type of the conference voice is a multi-person speaker type, acquiring a preset feature combination corresponding to the audio feature in the feature query table, and determining an audio sub-feature corresponding to the preset feature combination;
when the speaker type of the conference voice is a multi-person speaker type, the audio sub-feature corresponding to the preset feature combination can be effectively queried by acquiring the preset feature combination corresponding to the audio feature in the feature lookup table, and the corresponding voice speaker can be determined based on the audio sub-feature, for example, when the audio feature of the conference voice matches with the preset audio feature e3, the determined audio sub-feature is the audio feature d1 and the audio feature d 4.
Respectively determining the voice position of each audio sub-feature in the corresponding conference voice, and performing voice separation on the conference voice according to the voice position to obtain a separated audio corresponding to each audio sub-feature;
the voice positions of the audio sub-features in the corresponding conference voice are respectively determined, and based on the determined voice positions, the effect of voice separation can be effectively achieved on the conference voice, so that the voice information (separated audio) of the audio sub-features corresponding to the voice speakers can be obtained.
Step S14, determining the voice speakers corresponding to the separated audios according to the audio features, and forming corresponding relations between the voice speakers corresponding to the same audio features and the separated audios;
the corresponding relation between the voice speakers corresponding to the same audio features and the separated audio is formed, so that the corresponding relation between the conference voices and the corresponding voice speakers is effectively determined conveniently.
Step S15, performing feature fusion on the audio features and the sample entropy features to obtain fusion features, and performing emotion classification on the fusion features to obtain the speaking emotion features;
the voice conference system comprises a voice conference system, a sample entropy characteristic acquisition system, a voice conference system and a voice recognition system, wherein the voice conference system is used for acquiring a speech emotion characteristic of each conference voice, the voice emotion characteristic acquisition system is used for acquiring a sample entropy characteristic of the voice conference voice, the sample entropy characteristic acquisition system is used for acquiring a fusion characteristic, emotion classification is performed based on the similarity between the fusion characteristic and a preset emotion characteristic, and the preset emotion characteristic can be set according to requirements and is used for representing the characteristic of a corresponding speech emotion on the voice frequency.
Optionally, in this embodiment, when the speaker type of the conference voice is a multi-person speaker type, the voice text corresponding to each separated audio is recorded in the conference record, and the voice speaker, the speaking emotion feature, and the item information are marked for each separated audio, so that the accuracy of the conference record is improved.
In the embodiment, by acquiring sample entropy characteristics of each conference voice, performing silence detection according to the sample entropy characteristics, based on a voice starting point and a voice starting point obtained from a silence detection result, noise and silence in each conference voice can be effectively removed, accuracy of voice signals in each conference voice is improved, speaker types of corresponding conference voices are determined through audio characteristics, accuracy of voice separation of each conference voice is improved based on speaker types, separated audios corresponding to voice speakers are obtained by performing voice separation on each conference voice, corresponding relations are formed between the voice speakers corresponding to the same audio characteristics and the separated audios, determination of corresponding relations between each conference voice and the corresponding voice speakers is effectively facilitated, and fusion characteristics are obtained by performing characteristic fusion on the audio characteristics and the sample entropy characteristics, and classifying the emotion based on the similarity between the fusion characteristics and the preset emotion characteristics to obtain the speech emotion characteristics of each conference.
Referring to fig. 3, fig. 3 is a block diagram of a conference recording apparatus 100 according to an embodiment of the present disclosure. The conference recording apparatus 100 in this embodiment includes units for executing the steps in the embodiments corresponding to fig. 1 and fig. 2. Please refer to fig. 1 and fig. 2 and the related descriptions in the embodiments corresponding to fig. 1 and fig. 2. For convenience of explanation, only the portions related to the present embodiment are shown. Referring to fig. 3, the conference recording apparatus 100 includes: a recognition unit 10, a text labeling unit 11 and an item labeling unit 12, wherein:
and the recognition unit 10 is configured to collect voices of conference persons in the conference room to obtain conference voices, and perform voiceprint recognition and emotion recognition on each conference voice to obtain voice speakers and speaking emotion characteristics corresponding to the corresponding conference voices.
Wherein the identification unit 10 is further configured to: acquiring sample entropy characteristics of each conference voice, and performing silence detection according to the sample entropy characteristics;
performing voice filtering on the conference voices according to the silence detection result, and acquiring the audio features of the conference voices after the voice filtering;
determining the speaker type of the corresponding conference voice according to the audio characteristics, wherein the speaker type comprises a single-person speech type and a multi-person speech type, and performing voice separation on the corresponding conference voice according to the audio characteristics and the speaker type to obtain separated audio;
determining voice speakers of corresponding separated audios according to the audio features, and enabling the voice speakers corresponding to the same audio features to form corresponding relations with the separated audios;
and performing feature fusion on the audio features and the sample entropy features to obtain fusion features, and performing emotion classification on the fusion features to obtain the speaking emotion features.
Optionally, the identification unit 10 is further configured to: and matching the audio features with a pre-stored feature query table to obtain the speaker type, wherein the feature query table stores different preset audio features and corresponding relations between preset feature combinations and corresponding speaker types.
Further, the identification unit 10 is further configured to: when the speaker type of the conference voice is a multi-person speaker type, acquiring a preset feature combination corresponding to the audio feature in the feature query table, and determining an audio sub-feature corresponding to the preset feature combination;
and respectively determining the voice position of each audio sub-feature in the corresponding conference voice, and performing voice separation on the conference voice according to the voice position to obtain the separated audio corresponding to each audio sub-feature.
Further, the identification unit 10 is further configured to: and respectively extracting one or more combinations of frequency cepstrum coefficients, pitch periods, zero crossing rates, energy root-mean-square coefficients or spectrum flat coefficients of the conference voices after voice filtering to obtain the audio features.
And the text marking unit 11 is configured to mark a voice text corresponding to the conference voice according to the voice speaker and the speaking emotion feature respectively corresponding to each conference voice, and write the marked voice text into the conference record.
And an item marking unit 12, configured to determine item information of the corresponding conference voice according to the voice text, and perform information marking on the conference record according to the determined item information, where the item information is used to represent information of an item described by the corresponding conference voice.
Wherein the item tagging unit 12 is further configured to: performing word segmentation on the voice text to obtain word segmentation vocabularies, and matching each word segmentation vocabulary with a pre-stored item query table, wherein the item query table stores the corresponding relation between a specified vocabulary and corresponding item information;
and if the word segmentation vocabulary is matched with the item query table, carrying out information marking on the corresponding voice text in the conference record by the matched item information.
Optionally, the item tagging unit 12 is further configured to: respectively acquiring voice acquisition time of each conference voice, and sequencing voice texts corresponding to each conference voice according to the voice acquisition time to obtain the conference record;
and in the conference record, aiming at the same conference voice, carrying out speaker marking and emotion marking on a voice text corresponding to the corresponding conference voice according to the voice speaker and the speaking emotion characteristics.
In the embodiment, the voice speaker and the speaking emotion characteristics of each conference voice are determined by performing voiceprint recognition and emotion recognition on each conference voice, the corresponding voice text is automatically marked based on the voice speaker and the speaking emotion characteristics, the marked voice text is written into the conference record, manual marking of the conference record is not needed, and the conference record generation efficiency is improved.
Fig. 4 is a block diagram of a computer device 2 according to another embodiment of the present application. As shown in fig. 4, the computer device 2 of this embodiment includes: a processor 20, a memory 21 and a computer program 22, such as a program of a conference recording method, stored in said memory 21 and executable on said processor 20. The processor 20, when executing the computer program 22, implements the steps in the embodiments of the conference recording methods described above, such as S10-S30 shown in fig. 1, or S11-S15 shown in fig. 2. Alternatively, when the processor 20 executes the computer program 22, the functions of the units in the embodiment corresponding to fig. 3, for example, the functions of the units 10 to 12 shown in fig. 3, are implemented, for which reference is specifically made to the relevant description in the embodiment corresponding to fig. 3, which is not repeated herein.
Illustratively, the computer program 22 may be divided into one or more units, which are stored in the memory 21 and executed by the processor 20 to accomplish the present application. The one or more units may be a series of computer program instruction segments capable of performing specific functions, which are used to describe the execution of the computer program 22 in the computer device 2. For example, the computer program 22 may be divided into a recognition unit 10, a text tagging unit 11 and an item tagging unit 12, each of which functions specifically as described above.
The computer device may include, but is not limited to, a processor 20, a memory 21. Those skilled in the art will appreciate that fig. 4 is merely an example of a computer device 2 and is not intended to limit the computer device 2 and may include more or fewer components than shown, or some components may be combined, or different components, e.g., the computer device may also include input output devices, network access devices, buses, etc.
The processor 20 may be a Central Processing Unit (CPU), other general purpose processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components, etc. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The memory 21 may be an internal storage unit of the computer device 2, such as a hard disk or a memory of the computer device 2. The memory 21 may also be an external storage device of the computer device 2, such as a plug-in hard disk, a Smart Memory Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like provided on the computer device 2. Further, the memory 21 may also include both an internal storage unit and an external storage device of the computer device 2. The memory 21 is used for storing the computer program and other programs and data required by the computer device. The memory 21 may also be used to temporarily store data that has been output or is to be output.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
The integrated module, if implemented in the form of a software functional unit and sold or used as a separate product, may be stored in a computer readable storage medium. The computer readable storage medium may be non-volatile or volatile. Based on such understanding, all or part of the flow in the method of the embodiments described above can be realized by a computer program, which can be stored in a computer readable storage medium and can realize the steps of the embodiments of the methods described above when the computer program is executed by a processor. Wherein the computer program comprises computer program code, which may be in the form of source code, object code, an executable file or some intermediate form, etc. The computer-readable storage medium may include: any entity or device capable of carrying computer program code, recording medium, U.S. disk, removable hard disk, magnetic disk, optical disk, computer Memory, Read-Only Memory (ROM), Random Access Memory (RAM), electrical carrier wave signals, telecommunications signals, software distribution media, and the like. It should be noted that the computer readable storage medium may contain content that is subject to appropriate increase or decrease as required by legislation and patent practice in jurisdictions, for example, in some jurisdictions, computer readable storage media that does not include electrical carrier signals and telecommunications signals in accordance with legislation and patent practice.
The above-mentioned embodiments are only used for illustrating the technical solutions of the present application, and not for limiting the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; such modifications and substitutions do not substantially depart from the spirit and scope of the embodiments of the present application and are intended to be included within the scope of the present application.

Claims (10)

1.一种会议记录方法,其特征在于,包括:1. a meeting recording method, is characterized in that, comprises: 对会议室内会议人员的语音进行采集,得到会议语音,并分别对各会议语音进行声纹识别和情绪识别,得到相应会议语音对应的语音发言人和发言情绪特征;Collect the voices of the meeting participants in the conference room to obtain the conference voices, and perform voiceprint recognition and emotion recognition on the conference voices respectively, and obtain the corresponding voice speakers and speech emotional characteristics of the corresponding conference voices; 根据各会议语音分别对应的语音发言人和发言情绪特征,对相应会议语音对应的语音文本进行标记,以及将标记后的语音文本写入会议记录;Mark the voice text corresponding to the corresponding conference voice according to the voice speaker and the speech emotion characteristics corresponding to the conference voices respectively, and write the marked voice text into the conference record; 根据所述语音文本确定相应会议语音的项目信息,并根据确定到的所述项目信息对所述会议记录进行信息标记,所述项目信息用于表征相应会议语音所描述到的项目的信息。Item information of the corresponding conference voice is determined according to the voice text, and information is marked on the conference record according to the determined item information, where the item information is used to represent the information of the item described by the corresponding conference voice. 2.根据权利要求1所述的会议记录方法,其特征在于,所述分别对各会议语音进行声纹识别和情绪识别,得到相应会议语音对应的语音发言人和发言情绪特征,包括:2. conference recording method according to claim 1, is characterized in that, described carries out voiceprint recognition and emotion recognition to each conference voice respectively, obtains the corresponding speech speaker and speech emotion feature of corresponding conference voice, comprising: 获取各会议语音的样本熵特征,并根据所述样本熵特征进行静音检测;Obtain the sample entropy features of each conference speech, and perform mute detection according to the sample entropy features; 根据所述静音检测结果对各会议语音进行语音过滤,并获取语音过滤后各会议语音的音频特征;Perform voice filtering on each conference voice according to the mute detection result, and obtain the audio characteristics of each conference voice after the voice filtering; 根据所述音频特征确定相应会议语音的发言人类型,所述发言人类型包括单人发言类型和多人发言类型,并根据所述音频特征和所述发言人类型对相应会议语音进行语音分离,得到分离音频;The speaker type of the corresponding conference voice is determined according to the audio feature, and the speaker type includes a single speaker type and a multi-person speaker type, and the corresponding conference voice is speech-separated according to the audio feature and the speaker type, get separated audio; 根据所述音频特征确定相应分离音频的语音发言人,并将同一所述音频特征对应的语音发言人和分离音频形成对应关系;Determine the corresponding voice speaker of the separated audio according to the audio feature, and form a corresponding relationship between the voice speaker corresponding to the same audio feature and the separated audio; 将所述音频特征与所述样本熵特征进行特征融合,得到融合特征,并对所述融合特征进行情绪分类,得到所述发言情绪特征。Feature fusion is performed on the audio feature and the sample entropy feature to obtain a fusion feature, and emotion classification is performed on the fusion feature to obtain the speech emotion feature. 3.根据权利要求2所述的会议记录方法,其特征在于,所述根据所述音频特征确定相应会议语音的发言人类型,包括:3. The conference recording method according to claim 2, wherein, determining the speaker type of the corresponding conference voice according to the audio feature, comprising: 将所述音频特征与预存储的特征查询表进行匹配,得到所述发言人类型,所述特征查询表中存储有不同预设音频特征及预设特征组合与对应发言人类型之间的对应关系。Matching the audio feature with a pre-stored feature look-up table to obtain the speaker type, where the feature look-up table stores the correspondence between different preset audio features and preset feature combinations and corresponding speaker types . 4.根据权利要求2所述的会议记录方法,其特征在于,所述根据所述音频特征和所述发言人类型对相应会议语音进行语音分离,得到分离音频,包括:4. conference recording method according to claim 2, is characterized in that, described according to described audio frequency characteristic and described speaker type, carries out speech separation to corresponding conference speech, obtains separation audio frequency, comprises: 当所述会议语音的发言人类型是多人发言类型,则获取所述音频特征在所述特征查询表中对应的预设特征组合,并确定所述预设特征组合对应的音频子特征;When the speaker type of the conference voice is a multi-person speech type, obtain the preset feature combination corresponding to the audio feature in the feature look-up table, and determine the audio sub-feature corresponding to the preset feature combination; 分别确定各音频子特征在相应所述会议语音中的语音位置,并根据所述语音位置对所述会议语音进行语音分离,得到各音频子特征对应的分离音频。The voice positions of each audio sub-feature in the corresponding conference voice are respectively determined, and the conference voice is separated according to the voice positions to obtain the separated audio corresponding to each audio sub-feature. 5.根据权利要求2所述的会议记录方法,其特征在于,所述获取语音过滤后各会议语音的音频特征,包括:5. conference recording method according to claim 2, is characterized in that, described obtaining the audio features of each conference voice after voice filtering, comprising: 分别提取语音过滤后各会议语音的频率倒谱系数、基音周期、过零率、能量均方根或谱平坦系数中的一种或多种的组合,得到所述音频特征。The audio features are obtained by extracting one or more of the frequency cepstral coefficients, pitch period, zero-crossing rate, energy root mean square or spectral flattening coefficients of the filtered voices respectively. 6.根据权利要求1所述的会议记录方法,其特征在于,所述根据确定到的所述项目信息对所述会议记录进行信息标记,包括:6. The method for recording a meeting according to claim 1, wherein the information marking of the meeting recording according to the determined item information comprises: 对所述语音文本进行分词,得到分词词汇,并将各分词词汇与预存储的项目查询表进行匹配,所述项目查询表中存储有指定词汇与对应项目信息之间的对应关系;Perform word segmentation on the phonetic text, obtain word segmentation vocabulary, and match each word segmentation vocabulary with a pre-stored item look-up table, where the correspondence between the specified vocabulary and the corresponding item information is stored in the item look-up table; 若所述分词词汇与所述项目查询表相匹配,则将匹配到的所述项目信息对所述会议记录中相应的语音文本进行信息标记。If the participle vocabulary matches the item lookup table, the matched item information is used to mark the corresponding voice text in the conference record. 7.根据权利要求1至6任一所述的会议记录方法,其特征在于,所述根据各会议语音分别对应的语音发言人和发言情绪特征,对相应会议语音对应的语音文本进行标记,以及将标记后的语音文本写入会议记录,包括:7. according to the arbitrary described meeting recording method of claim 1 to 6, it is characterized in that, described according to the corresponding speech speaker and speech emotion feature of each meeting speech respectively, the speech text corresponding to corresponding meeting speech is marked, and Write the marked voice text to the meeting minutes, including: 分别获取各会议语音的语音采集时间,并根据所述语音采集时间对各会议语音对应的语音文本进行排序,得到所述会议记录;respectively acquiring the voice collection time of each conference voice, and sorting the corresponding voice texts of the conference voices according to the voice collection time to obtain the conference record; 在所述会议记录中,针对同一所述会议语音,根据所述语音发言人和所述发言情绪特征对相应会议语音对应的语音文本进行发言人标记和情绪标记。In the conference record, for the same conference speech, speaker marking and emotion marking are performed on the speech text corresponding to the corresponding conference speech according to the speech speaker and the speech emotion feature. 8.一种会议记录装置,其特征在于,包括:8. A conference recording device, comprising: 识别单元,用于对会议室内会议人员的语音进行采集,得到会议语音,并分别对各会议语音进行声纹识别和情绪识别,得到相应会议语音对应的语音发言人和发言情绪特征;The recognition unit is used to collect the voices of the conference personnel in the conference room to obtain the conference voices, and to perform voiceprint recognition and emotion recognition on the conference voices respectively, to obtain the speech speakers and the speech emotional characteristics corresponding to the corresponding conference voices; 文本标记单元,用于根据各会议语音分别对应的语音发言人和发言情绪特征,对相应会议语音对应的语音文本进行标记,以及将标记后的语音文本写入会议记录;a text marking unit, used for marking the speech text corresponding to the corresponding meeting speech according to the speech speaker and the speech emotion feature corresponding to each meeting speech respectively, and writing the marked speech text into the meeting record; 项目标记单元,用于根据所述语音文本确定相应会议语音的项目信息,并根据确定到的所述项目信息对所述会议记录进行信息标记,所述项目信息用于表征相应会议语音所描述到的项目的信息。The item marking unit is configured to determine the item information of the corresponding conference voice according to the voice text, and perform information marking on the conference record according to the determined item information, and the item information is used to represent the description to the corresponding conference voice. project information. 9.一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机程序,其特征在于,所述处理器执行所述计算机程序时实现如权利要求1至7任一项所述方法的步骤。9. A computer device comprising a memory, a processor, and a computer program stored in the memory and executable on the processor, wherein the processor implements the computer program as claimed in the claims The steps of any one of 1 to 7 of the method. 10.一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,其特征在于,所述计算机程序被处理器执行时实现如权利要求1至7任一项所述方法的步骤。10. A computer-readable storage medium storing a computer program, characterized in that, when the computer program is executed by a processor, the steps of the method according to any one of claims 1 to 7 are implemented .
CN202110978838.8A 2021-08-25 2021-08-25 Conference recording method, device, computer equipment and medium Pending CN113691382A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110978838.8A CN113691382A (en) 2021-08-25 2021-08-25 Conference recording method, device, computer equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110978838.8A CN113691382A (en) 2021-08-25 2021-08-25 Conference recording method, device, computer equipment and medium

Publications (1)

Publication Number Publication Date
CN113691382A true CN113691382A (en) 2021-11-23

Family

ID=78582285

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110978838.8A Pending CN113691382A (en) 2021-08-25 2021-08-25 Conference recording method, device, computer equipment and medium

Country Status (1)

Country Link
CN (1) CN113691382A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115828907A (en) * 2023-02-16 2023-03-21 南昌航天广信科技有限责任公司 Intelligent conference management method, system, readable storage medium and computer equipment
CN118737165A (en) * 2024-08-30 2024-10-01 福州惠企信息科技有限公司 Intelligent management method of enterprise data based on speech analysis

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101291239A (en) * 2008-05-20 2008-10-22 华为技术有限公司 Method and apparatus for enhancing effect of meeting
CN108922538A (en) * 2018-05-29 2018-11-30 平安科技(深圳)有限公司 Conferencing information recording method, device, computer equipment and storage medium
CN109388701A (en) * 2018-08-17 2019-02-26 深圳壹账通智能科技有限公司 Minutes generation method, device, equipment and computer storage medium
CN111243590A (en) * 2020-01-17 2020-06-05 中国平安人寿保险股份有限公司 Conference record generation method and device
CN111666746A (en) * 2020-06-05 2020-09-15 中国银行股份有限公司 Method and device for generating conference summary, electronic equipment and storage medium
WO2020218664A1 (en) * 2019-04-25 2020-10-29 이봉규 Smart conference system based on 5g communication and conference support method using robotic processing automation
CN111933144A (en) * 2020-10-09 2020-11-13 融智通科技(北京)股份有限公司 Conference voice transcription method and device for post-creation of voiceprint and storage medium
CN112017632A (en) * 2020-09-02 2020-12-01 浪潮云信息技术股份公司 Automatic conference record generation method
CN112489625A (en) * 2020-10-19 2021-03-12 厦门快商通科技股份有限公司 Voice emotion recognition method, system, mobile terminal and storage medium

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101291239A (en) * 2008-05-20 2008-10-22 华为技术有限公司 Method and apparatus for enhancing effect of meeting
CN108922538A (en) * 2018-05-29 2018-11-30 平安科技(深圳)有限公司 Conferencing information recording method, device, computer equipment and storage medium
CN109388701A (en) * 2018-08-17 2019-02-26 深圳壹账通智能科技有限公司 Minutes generation method, device, equipment and computer storage medium
WO2020218664A1 (en) * 2019-04-25 2020-10-29 이봉규 Smart conference system based on 5g communication and conference support method using robotic processing automation
CN111243590A (en) * 2020-01-17 2020-06-05 中国平安人寿保险股份有限公司 Conference record generation method and device
CN111666746A (en) * 2020-06-05 2020-09-15 中国银行股份有限公司 Method and device for generating conference summary, electronic equipment and storage medium
CN112017632A (en) * 2020-09-02 2020-12-01 浪潮云信息技术股份公司 Automatic conference record generation method
CN111933144A (en) * 2020-10-09 2020-11-13 融智通科技(北京)股份有限公司 Conference voice transcription method and device for post-creation of voiceprint and storage medium
CN112489625A (en) * 2020-10-19 2021-03-12 厦门快商通科技股份有限公司 Voice emotion recognition method, system, mobile terminal and storage medium

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
李公法,蒋国璋,孔建益,江都,陶波: "《机器人灵巧手的人机交互技术及其稳定控制》", 31 July 2020, 华中科技大学出版社, pages: 13 *
王远昌: "《人工智能时代:电子产品设计与制作研究》", 31 January 2019, 成都:电子科技大学出版社, pages: 124 - 125 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115828907A (en) * 2023-02-16 2023-03-21 南昌航天广信科技有限责任公司 Intelligent conference management method, system, readable storage medium and computer equipment
CN115828907B (en) * 2023-02-16 2023-04-25 南昌航天广信科技有限责任公司 Intelligent conference management method, system, readable storage medium and computer device
CN118737165A (en) * 2024-08-30 2024-10-01 福州惠企信息科技有限公司 Intelligent management method of enterprise data based on speech analysis

Similar Documents

Publication Publication Date Title
Schuller et al. The INTERSPEECH 2021 computational paralinguistics challenge: COVID-19 cough, COVID-19 speech, escalation & primates
CN110557589B (en) System and method for integrating recorded content
CN109741732B (en) Named entity recognition method, named entity recognition device, equipment and medium
CN111785275A (en) Speech recognition method and device
CN110457432A (en) Interview methods of marking, device, equipment and storage medium
McKechnie et al. Automated speech analysis tools for children’s speech production: A systematic literature review
TW202008349A (en) Speech labeling method and apparatus, and device
CN103996155A (en) Intelligent interaction and psychological comfort robot service system
CN107767881B (en) Method and device for acquiring satisfaction degree of voice information
Ahsiah et al. Tajweed checking system to support recitation
CN109791616A (en) Automatic speech recognition
CN113923521B (en) Video scripting method
US20220238118A1 (en) Apparatus for processing an audio signal for the generation of a multimedia file with speech transcription
CN113409774A (en) Voice recognition method and device and electronic equipment
CN113691382A (en) Conference recording method, device, computer equipment and medium
Wagner et al. Applying cooperative machine learning to speed up the annotation of social signals in large multi-modal corpora
CN112735442A (en) Wetland ecology monitoring system with audio separation voiceprint recognition function and audio separation method thereof
KR20170086233A (en) Method for incremental training of acoustic and language model using life speech and image logs
Lanjewar et al. Speech emotion recognition: a review
Mane et al. Identification & Detection System for Animals from their Vocalization
CN118284932A (en) Method and apparatus for performing speaker segmentation clustering on mixed bandwidth speech signals
CN114724589A (en) Voice quality inspection method and device, electronic equipment and storage medium
Chimthankar Speech Emotion Recognition using Deep Learning
CN115700880A (en) Behavior monitoring method and device, electronic equipment and storage medium
CN108182946B (en) A method and device for selecting vocal music mode based on voiceprint recognition

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20211123

RJ01 Rejection of invention patent application after publication