CN111402892A - Conference recording template generation method based on voice recognition - Google Patents

Conference recording template generation method based on voice recognition Download PDF

Info

Publication number
CN111402892A
CN111402892A CN202010210036.8A CN202010210036A CN111402892A CN 111402892 A CN111402892 A CN 111402892A CN 202010210036 A CN202010210036 A CN 202010210036A CN 111402892 A CN111402892 A CN 111402892A
Authority
CN
China
Prior art keywords
conference
target
template
voice
conference recording
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202010210036.8A
Other languages
Chinese (zh)
Inventor
钱敏
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhengzhou Zhilixin Information Technology Co ltd
Original Assignee
Zhengzhou Zhilixin Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhengzhou Zhilixin Information Technology Co ltd filed Critical Zhengzhou Zhilixin Information Technology Co ltd
Priority to CN202010210036.8A priority Critical patent/CN111402892A/en
Publication of CN111402892A publication Critical patent/CN111402892A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention relates to a conference recording template generation method based on voice recognition, which comprises the steps of acquiring an audio signal of a speaker, recording the acquisition time of the audio signal, processing the audio signal or the acquisition time to correspondingly obtain a conference recording blank template, a target conference item, a conference keyword, a target name of the speaker and a target face image, correspondingly filling the conference recording blank template, the target conference item, the conference keyword, the target name of the speaker and the target face image into the conference recording blank template to generate a target conference recording template, and finally displaying the target conference recording template on a corresponding screen. The conference recording template is not required to be specially manufactured by conference recording personnel, and the corresponding information is not required to be specially filled into the conference recording blank template by the conference recording personnel, so that the workload of the conference recording personnel is reduced, the condition of recording errors caused by artificial generation and filling can be avoided, and the accuracy of generating the conference recording is improved.

Description

Conference recording template generation method based on voice recognition
Technical Field
The invention relates to a conference recording template generation method based on voice recognition.
Background
When a conference, especially an important conference, is carried out, special conference recording personnel are needed to record the progress of the conference, and a common conference recording mode is as follows: a conference recording template is prepared in advance on a computer, and related contents are filled into the conference recording template by conference recording personnel. The existing conference recording templates are all blank tables, and during the conference, conference preparation contents, such as conference time, speaker information and the like, are filled into corresponding areas, and the contents belong to the conference recording templates but not the contents of the conference recording text, so that the contents of the conference recording templates bring certain extra workload to conference recording personnel, increase the workload of the conference recording personnel, and possibly cause the situation of recording errors in the rapid recording process.
Disclosure of Invention
The invention aims to provide a conference recording template generation method based on voice recognition, which is used for solving the problems that extra workload is brought to conference recording personnel and the workload of the conference recording personnel is increased due to the fact that the contents of the conference recording template are specially filled in.
In order to solve the problems, the invention adopts the following technical scheme:
a conference recording template generation method based on voice recognition comprises the following steps:
acquiring an audio signal of a speaker, and recording the acquisition time of the audio signal;
generating a conference recording blank template according to the audio signal, wherein the conference recording blank template comprises a conference item filling area, a conference keyword filling area, a speaker name filling area, a speaker face image filling area and a conference recording text filling area;
determining a target conference item corresponding to the acquisition time of the audio signal according to the acquisition time of the audio signal and a preset conference flow;
identifying the audio signal to obtain corresponding text data;
extracting the meeting keywords contained in the text data;
acquiring a target voiceprint of the speaker according to the audio signal;
inputting the target voiceprint into a preset conference staff database, and acquiring a target name and a target face image of a speaker corresponding to the target voiceprint; the conference personnel database comprises at least two groups of data, wherein each group of data comprises a voiceprint, and the name and face image of a personnel corresponding to the voiceprint;
filling the target conference item into a conference item filling area of the conference recording blank template, filling the conference keyword into a conference keyword filling area of the conference recording blank template, filling the target name of the speaker into a speaker name filling area of the conference recording blank template, filling the target face image of the speaker into a speaker face image filling area of the conference recording blank template, and generating a target conference recording template;
and displaying the target conference recording template on a corresponding screen.
Optionally, after the target meeting record template is displayed on a corresponding screen, the method for generating a meeting record template further includes:
and outputting the target conference recording template to a printer for printing.
Optionally, the extracting the conference keyword included in the text data includes:
and inputting the text data into a preset conference keyword database to obtain the conference keywords in the text data.
Optionally, the recognizing the audio signal to obtain corresponding text data includes:
generating a voice waveform diagram of the audio signal in a preset voice coordinate system;
based on a voice activity detection algorithm, dividing the voice oscillogram to obtain at least two effective voice sections;
extracting a voice characteristic curve corresponding to each effective voice section through a voice characteristic recognition algorithm;
extracting a standard characteristic curve associated with each candidate character from a preset corpus;
drawing the standard characteristic curve and the voice characteristic curve on a preset characteristic coordinate, and calculating the difference area of an intersection region between the standard characteristic curve and the voice characteristic curve;
if the difference area of any candidate character is smaller than a preset area difference threshold, identifying the candidate character as character information contained in a corresponding effective speech section;
and sequentially combining the character information based on the sequence of each effective voice section in the voice oscillogram to generate the character data.
The invention has the beneficial effects that: generating a conference recording blank template according to the obtained audio signal of the speaker, wherein the conference recording blank template is an initial conference recording template and comprises a conference item filling area, a conference keyword filling area, a speaker name filling area, a speaker face image filling area and a conference recording text filling area, and then carrying out the following processing: determining a target conference item corresponding to the acquisition time of the audio signal according to the acquisition time of the audio signal and a preset conference flow; identifying the audio signal to obtain corresponding text data, and extracting a conference keyword contained in the text data; acquiring a target voiceprint of a speaker according to the audio signal, inputting the target voiceprint into a preset conference staff database, and acquiring a target name and a target face image of the speaker corresponding to the target voiceprint; then, filling the obtained target conference item into a conference item filling area of a conference recording blank template, filling a conference keyword into a conference keyword filling area of the conference recording blank template, filling a speaker target name into a speaker name filling area of the conference recording blank template, filling a speaker target face image into a speaker face image filling area of the conference recording blank template, generating a target conference recording template, and finally displaying the target conference recording template on a corresponding screen. Therefore, the method generates corresponding data information based on the audio signal of the speaker, and then automatically and correspondingly fills the generated data information into the generated blank template of the conference record to obtain the target template of the conference record.
Drawings
In order to more clearly illustrate the technical solution of the embodiment of the present invention, the drawings needed to be used in the embodiment will be briefly described as follows:
fig. 1 is a flowchart of a conference recording template generation method based on speech recognition.
Detailed Description
The embodiment provides a conference recording template generation method based on voice recognition, and an execution main body of the conference recording template generation method is an intelligent mobile terminal (such as a smart phone or a tablet computer), a computer (such as a notebook computer, a desktop computer or a computer host), a server device and the like. The subject of the present application is not specifically limited. The application scene to which the conference recording template generating method is applied can be a conference room.
As shown in fig. 1, the method for generating a conference recording template includes the following steps:
acquiring an audio signal of a speaker, and recording the acquisition time of the audio signal:
the audio signal of the speaker is captured by a microphone, which may be fixed to the speaking table in the conference room.
When the audio signal of the speaker is acquired, the acquisition time of the audio signal, that is, when the audio signal is acquired, is recorded.
And performing the following four processing procedures according to the acquired voice signal of the speaker or the acquisition time of the voice signal, and performing subsequent processing by combining the processing results of the four processing procedures. It should be understood that there is no strict sequence between the four processing procedures, and the sequence may be set according to actual needs, or may be performed simultaneously.
The first treatment process comprises the following steps:
generating a conference recording blank template according to the audio signal, wherein the conference recording blank template comprises a conference item filling area, a conference keyword filling area, a speaker name filling area, a speaker face image filling area and a conference recording text filling area:
after the audio signal is acquired, a conference recording blank template is generated according to the audio signal, wherein the conference recording blank template is an initial conference recording template and is used for obtaining a target conference recording template according to the conference recording template and relevant data information obtained subsequently. The conference recording blank template comprises a conference item filling area, a conference keyword filling area, a speaker name filling area, a speaker face image filling area and a conference recording text filling area. The conference item filling area is used for filling conference items, the conference keyword filling area is used for filling conference keywords, the speaker name filling area is used for filling names of speakers, the speaker face image filling area is used for filling face images of the speakers, and the conference record text filling area is used for filling conference records. Table 1 shows a specific template structure of a conference recording blank template, where an area a is a conference item filling area, an area B is a conference keyword filling area, an area C is a speaker name filling area, an area D is a speaker face image filling area, and an area E is a conference recording text filling area.
TABLE 1
Figure BDA0002422499070000051
And a second treatment process:
determining a target conference item corresponding to the acquisition time of the audio signal according to the acquisition time of the audio signal and a preset conference flow:
a conference process is preset, which includes at least two conference time periods and a conference process (i.e. a conference item) corresponding to each conference time period, for example: the conference items of 9:00-10:00 are the general manager to speak, the conference items of 10:00-11:00 are the department manager to speak, and the conference items of 11:00-12:00 are the staff representatives to speak.
Then, according to the acquisition time of the audio signal and a preset conference flow, a target conference item corresponding to the acquisition time of the audio signal can be determined, for example: if the acquisition time of the audio signal is 9:35, the target conference item corresponding to the acquisition time of the audio signal can be determined as the general manager speaking by combining a preset conference flow.
And a third treatment process:
identifying the audio signal to obtain corresponding text data:
and carrying out voice recognition on the audio signal to obtain corresponding text data. The voice recognition is performed on the audio signal to obtain the text data, which belongs to the conventional technical means, and a specific implementation process is provided in this embodiment. The specific implementation process steps given in this embodiment include:
(1) and generating a voice waveform diagram of the audio signal in a preset voice coordinate system. The ordinate of the speech coordinate system can be audio amplitude, and the abscissa can be acquisition time, so that a speech waveform map based on a time domain is generated. In addition, before the voice oscillogram is generated, the audio signal can be filtered to obtain the audio signal without environmental noise, and the audio signal after noise filtering can be subjected to mild processing, so that an invalid noise frequency band can be filtered.
(2) And based on a voice activity detection algorithm, dividing the voice oscillogram to obtain at least two effective voice sections. The valid speech segment refers to a speech segment containing the content of the utterance, and correspondingly, the invalid speech segment refers to a speech segment not containing the content of the utterance. A voice start amplitude and a voice end amplitude may be set, where the voice start amplitude is greater than the voice end amplitude, i.e. the start requirement of the active voice segment is higher than the end requirement of the active voice segment. Because the speaker is at the beginning time of speaking, the volume and the tone are often higher, and the value of the corresponding voice amplitude is higher at the moment; in the speaking process, some characters have weak tones or soft tones, and the interruption of speaking should not be recognized, so the ending amplitude of the speech needs to be properly reduced to avoid the occurrence of misidentification. Therefore, according to the voice starting amplitude and the voice ending amplitude, effective voice recognition is carried out on the voice oscillogram, so that at least two effective voice sections are obtained through division, wherein the amplitude corresponding to the starting time of the effective voice sections is larger than or equal to the voice starting amplitude, and the amplitude corresponding to the ending time is smaller than or equal to the voice ending amplitude. It should be appreciated that other implementations may be used in addition to the above described implementation of the division of active speech segments.
(3) And extracting the voice characteristic curve corresponding to each effective voice section through a voice characteristic recognition algorithm. In this embodiment, the voice feature recognition algorithm may be a fourier algorithm, and the effective voice segments are converted from the time domain curve to the frequency domain waveform to obtain the voice feature curves corresponding to the effective voice segments. In addition, if the frequency domain waveform obtained by conversion is a discrete waveform, the discrete waveform can be linearly fitted by a linear fitting method, and a corresponding voice characteristic curve is output.
(4) And extracting the standard characteristic curve associated with each candidate character from a preset corpus. A corpus is preset, wherein the corpus contains all candidate characters which can be identified, and each candidate character corresponds to an associated standard characteristic curve. The standard characteristic curve can be obtained by converting a speech signal of a standard pronunciation of at least one language. If multiple different languages can be identified, the speech signals corresponding to the standard pronunciations of the languages can be subjected to speech feature algorithm extraction to obtain multiple different standard feature curves, and the multiple standard feature curves and the candidate characters are associated.
(5) And drawing a standard characteristic curve and a voice characteristic curve on a preset characteristic coordinate, and calculating the difference area of the intersection area between the standard characteristic curve and the voice characteristic curve. In this embodiment, a standard characteristic curve and a voice characteristic curve are drawn on the same characteristic coordinate system, so that the difference between the two curves can be quickly compared, wherein the calculation of the difference is mainly determined by the size of the intersection area (i.e. the difference area of the intersection region) between the two curves: if the intersection area is larger, the larger the difference degree between the two curves is, the higher the probability that the candidate character is not contained in the effective speech section is; conversely, if the intersection area is smaller, the smaller the difference between the two curves is, the higher the probability that the valid speech segment contains the candidate character is. Furthermore, in order to improve the recognition accuracy, the speech characteristic curve is normalized, the speech oscillogram is divided into a plurality of different character segments according to the peak value change of the speech oscillogram of the effective speech segment, and one character segment comprises at least one peak value, so that each character segment can be ensured to correspond to one character. And normalizing the character segment in a time domain according to the length of the character segment, namely setting the time length of the character segment as preset standard time length, adjusting the amplitude value of the character segment in an equal proportion according to preset maximum amplitude, and converting a standard characteristic curve of the normalized character segment to obtain a voice characteristic curve corresponding to the normalized character segment.
(6) And if the difference area of any candidate character is smaller than the preset area difference threshold, identifying the candidate character as the character information contained in the corresponding effective speech section. If the difference area of the intersection area between the standard characteristic curve and the voice characteristic curve of any candidate character is smaller than the difference threshold value, the candidate character can be identified in the speaking content of the effective voice section, the order of each identified candidate character is determined according to the occurrence position of each identified candidate character in the effective voice section, and the candidate characters are combined based on the order to obtain character information. The standard characteristic curve of each candidate character is compared with the voice characteristic curve, so that the character information contained in the effective voice section is identified, and the accuracy of generating the character information is improved.
(7) And sequentially combining the character information based on the sequence of each effective voice section in the voice oscillogram to generate character data. Specifically, punctuation marks used for connecting two character information can be determined according to the association degree between the last character of the last effective speech section and the first character of the next effective speech section and the interval duration between the two speech sections, and the character information is generated by identifying each character information and the punctuation marks used for connecting, so that the readability of the character information is improved. In the embodiment, the audio signal is divided into the plurality of voice sections, so that the data volume of voice recognition at each time can be reduced, the accuracy rate and the calculated amount of the voice recognition are considered, and the accuracy of generating the conference recording template is improved.
Extracting the meeting keywords contained in the text data:
after the text data is acquired, the conference keywords related to the conference are extracted from the text data, and this embodiment provides an implementation manner: a conference key database is preset, the conference key database comprises at least one conference key, and the conference key in the conference key database is specifically set according to actual conditions, such as conference subjects. And inputting the text data into a preset conference keyword database, comparing the text data with each conference keyword in the conference keyword database one by one, and if the conference keyword in the conference keyword database exists in the text data, extracting the conference keyword to obtain the conference keyword in the text data. Further, in order to improve the recognition efficiency, the text data may be split into a plurality of words or single characters, each word or single character is respectively input into the conference keyword database, and the conference keywords in the text data are obtained through comparison.
As another embodiment, the text data may be semantically analyzed by a semantic analysis algorithm, and the meeting keywords may be extracted from the text data. In addition, a conference theme can be preset, and conference keywords are extracted from the text data based on the conference theme.
It should be understood that if the extracted text data does not contain the meeting keyword, the meeting keyword is not output.
And a fourth treatment process:
acquiring a target voiceprint of the speaker according to the audio signal:
and identifying the voiceprint of the obtained audio signal through a voiceprint identification algorithm, wherein the voiceprint is the target voiceprint of the speaker. Voiceprint (Voiceprint) is a spectrum of sound waves carrying verbal information. The voiceprint is unique like a fingerprint and has the function of identity recognition (identification of an individual). Each person has a specific voiceprint, which varies from person to person. Regardless of how one intentionally simulates the voice and tone of another, even if the simulation is vivid, the voiceprint is still different.
To facilitate voiceprint recognition, the audio signal may be some more common sentence, such as "family good". The voiceprint recognition algorithm is a conventional technology, and the implementation process of obtaining the voiceprint according to the audio signal also belongs to the conventional technology, and is not described in detail.
Inputting the target voiceprint into a preset conference staff database, and acquiring a target name and a target face image of a speaker corresponding to the target voiceprint; wherein the conference personnel database comprises at least two groups of data, each group of data comprises a voiceprint, and a name and a face image of a person corresponding to the voiceprint:
the conference personnel database is preset, the conference personnel database comprises at least two groups of data, each group of data comprises a voiceprint, and the name and the face image of a personnel corresponding to the voiceprint, the conference personnel database can be stored in a data table mode, and a specific implementation mode of the conference personnel database is given in a table 2.
TABLE 2
Figure BDA0002422499070000091
Figure BDA0002422499070000101
Wherein the voiceprint X1, the person name Y1 and the person face image Z1 correspond; voiceprint X2, person name Y2, and person face image Z2, and so on.
It should be understood that when the conference personnel database is established, the voiceprint, the personnel name and the personnel face image are input in advance, wherein the personnel face image is obtained by shooting through a camera in advance, or the existing photo is directly uploaded into the conference personnel database, and then the corresponding relation among the voiceprint, the personnel name and the personnel face image is established.
Inputting the target voiceprint into the conference person database, the target name and the target face image of the speaker corresponding to the target voiceprint can be obtained. For example, if the target voiceprint is X2, the target name of the speaker corresponding to the target voiceprint X2 is Y2 and the target face image is Z2.
The four processing procedures respectively obtain a target conference item, a conference keyword, a target name of a speaker and a target face image of the speaker. Then, next:
filling the target conference item into a conference item filling area of the conference recording blank template, filling the conference keyword into a conference keyword filling area of the conference recording blank template, filling the target name of the speaker into a speaker name filling area of the conference recording blank template, filling the target face image of the speaker into a speaker face image filling area of the conference recording blank template, and generating a target conference recording template:
filling the obtained target conference item into a conference item filling area of a conference recording blank template (namely, an area a in table 1), filling a conference keyword into a conference keyword filling area of the conference recording blank template (namely, an area B in table 1), filling a speaker name filling area of the conference recording blank template (namely, an area C in table 1), filling a speaker face image filling area of the conference recording blank template (namely, an area D in table 1), and generating a file with relevant data filled in all of the area a, the area B, the area C and the area D, wherein the file is the target conference recording template.
Displaying the target conference recording template on a corresponding screen:
in order to facilitate conference recording by the conference recording personnel according to the target conference recording template, the target conference recording template is displayed on a corresponding screen, for example, a computer screen, and the conference recording personnel can perform conference recording by operating the computer, that is, specific content of the conference recording is filled into a conference recording text filling area.
In addition, if the meeting record text filling area in the target meeting record template needs to be written with the specific content of the meeting record by handwriting with a pen, the target meeting record template needs to be printed, that is, after the "displaying the target meeting record template on the corresponding screen", the method for generating the meeting record template further includes:
outputting the target meeting record template to a printer for printing:
and outputting the generated target meeting record template to a printer for printing to obtain a paper target meeting record template.
The above-mentioned embodiments are merely illustrative of the technical solutions of the present invention in a specific embodiment, and any equivalent substitutions and modifications or partial substitutions of the present invention without departing from the spirit and scope of the present invention should be covered by the claims of the present invention.

Claims (4)

1. A conference recording template generation method based on voice recognition is characterized by comprising the following steps:
acquiring an audio signal of a speaker, and recording the acquisition time of the audio signal;
generating a conference recording blank template according to the audio signal, wherein the conference recording blank template comprises a conference item filling area, a conference keyword filling area, a speaker name filling area, a speaker face image filling area and a conference recording text filling area;
determining a target conference item corresponding to the acquisition time of the audio signal according to the acquisition time of the audio signal and a preset conference flow;
identifying the audio signal to obtain corresponding text data;
extracting the meeting keywords contained in the text data;
acquiring a target voiceprint of the speaker according to the audio signal;
inputting the target voiceprint into a preset conference staff database, and acquiring a target name and a target face image of a speaker corresponding to the target voiceprint; the conference personnel database comprises at least two groups of data, wherein each group of data comprises a voiceprint, and the name and face image of a personnel corresponding to the voiceprint;
filling the target conference item into a conference item filling area of the conference recording blank template, filling the conference keyword into a conference keyword filling area of the conference recording blank template, filling the target name of the speaker into a speaker name filling area of the conference recording blank template, filling the target face image of the speaker into a speaker face image filling area of the conference recording blank template, and generating a target conference recording template;
and displaying the target conference recording template on a corresponding screen.
2. The method for generating a conference recording template based on speech recognition according to claim 1, wherein after the target conference recording template is displayed on a corresponding screen, the method further comprises:
and outputting the target conference recording template to a printer for printing.
3. The method for generating a conference recording template based on speech recognition according to claim 1, wherein the extracting of the conference keyword included in the text data includes:
and inputting the text data into a preset conference keyword database to obtain the conference keywords in the text data.
4. The method for generating a conference recording template based on speech recognition according to claim 1, wherein the recognizing the audio signal to obtain corresponding text data comprises:
generating a voice waveform diagram of the audio signal in a preset voice coordinate system;
based on a voice activity detection algorithm, dividing the voice oscillogram to obtain at least two effective voice sections;
extracting a voice characteristic curve corresponding to each effective voice section through a voice characteristic recognition algorithm;
extracting a standard characteristic curve associated with each candidate character from a preset corpus;
drawing the standard characteristic curve and the voice characteristic curve on a preset characteristic coordinate, and calculating the difference area of an intersection region between the standard characteristic curve and the voice characteristic curve;
if the difference area of any candidate character is smaller than a preset area difference threshold, identifying the candidate character as character information contained in a corresponding effective speech section;
and sequentially combining the character information based on the sequence of each effective voice section in the voice oscillogram to generate the character data.
CN202010210036.8A 2020-03-23 2020-03-23 Conference recording template generation method based on voice recognition Withdrawn CN111402892A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010210036.8A CN111402892A (en) 2020-03-23 2020-03-23 Conference recording template generation method based on voice recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010210036.8A CN111402892A (en) 2020-03-23 2020-03-23 Conference recording template generation method based on voice recognition

Publications (1)

Publication Number Publication Date
CN111402892A true CN111402892A (en) 2020-07-10

Family

ID=71431146

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010210036.8A Withdrawn CN111402892A (en) 2020-03-23 2020-03-23 Conference recording template generation method based on voice recognition

Country Status (1)

Country Link
CN (1) CN111402892A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111818294A (en) * 2020-08-03 2020-10-23 上海依图信息技术有限公司 Method, medium and electronic device for multi-person conference real-time display combined with audio and video
CN111931484A (en) * 2020-07-31 2020-11-13 于梦丽 Data transmission method based on big data
WO2022062471A1 (en) * 2020-09-25 2022-03-31 华为技术有限公司 Audio data processing method, device and system
CN116993297A (en) * 2023-08-16 2023-11-03 华腾建信科技有限公司 Task data generation method and system based on electronic conference record

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111931484A (en) * 2020-07-31 2020-11-13 于梦丽 Data transmission method based on big data
CN111931484B (en) * 2020-07-31 2022-02-25 贵州多彩宝互联网服务有限公司 Data transmission method based on big data
CN111818294A (en) * 2020-08-03 2020-10-23 上海依图信息技术有限公司 Method, medium and electronic device for multi-person conference real-time display combined with audio and video
WO2022062471A1 (en) * 2020-09-25 2022-03-31 华为技术有限公司 Audio data processing method, device and system
CN116993297A (en) * 2023-08-16 2023-11-03 华腾建信科技有限公司 Task data generation method and system based on electronic conference record
CN116993297B (en) * 2023-08-16 2024-02-27 华腾建信科技有限公司 Task data generation method and system based on electronic conference record

Similar Documents

Publication Publication Date Title
US10878824B2 (en) Speech-to-text generation using video-speech matching from a primary speaker
CN109493850B (en) Growing type dialogue device
CN111402892A (en) Conference recording template generation method based on voice recognition
CN109741732B (en) Named entity recognition method, named entity recognition device, equipment and medium
US20180197548A1 (en) System and method for diarization of speech, automated generation of transcripts, and automatic information extraction
US7668718B2 (en) Synchronized pattern recognition source data processed by manual or automatic means for creation of shared speaker-dependent speech user profile
WO2021000408A1 (en) Interview scoring method and apparatus, and device and storage medium
CN105378830A (en) Processing of audio data
CN111785275A (en) Voice recognition method and device
US9251808B2 (en) Apparatus and method for clustering speakers, and a non-transitory computer readable medium thereof
CN105210147B (en) Method, apparatus and computer-readable recording medium for improving at least one semantic unit set
CN111048095A (en) Voice transcription method, equipment and computer readable storage medium
US20210081699A1 (en) Media management system for video data processing and adaptation data generation
CN112507311A (en) High-security identity verification method based on multi-mode feature fusion
CN116246610A (en) Conference record generation method and system based on multi-mode identification
CN110782902A (en) Audio data determination method, apparatus, device and medium
CN113923521B (en) Video scripting method
CN106782503A (en) Automatic speech recognition method based on physiologic information in phonation
CN114239610A (en) Multi-language speech recognition and translation method and related system
CN115831125A (en) Speech recognition method, device, equipment, storage medium and product
CN114203160A (en) Method, device and equipment for generating sample data set
CN113691382A (en) Conference recording method, conference recording device, computer equipment and medium
CN113409774A (en) Voice recognition method and device and electronic equipment
CN110428668B (en) Data extraction method and device, computer system and readable storage medium
CN110895938B (en) Voice correction system and voice correction method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication

Application publication date: 20200710

WW01 Invention patent application withdrawn after publication