CN112802480B - Voice data text conversion method based on multi-party communication - Google Patents
Voice data text conversion method based on multi-party communication Download PDFInfo
- Publication number
- CN112802480B CN112802480B CN202110404363.1A CN202110404363A CN112802480B CN 112802480 B CN112802480 B CN 112802480B CN 202110404363 A CN202110404363 A CN 202110404363A CN 112802480 B CN112802480 B CN 112802480B
- Authority
- CN
- China
- Prior art keywords
- data
- key
- voice data
- factor
- character data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000006243 chemical reaction Methods 0.000 title claims abstract description 26
- 238000000034 method Methods 0.000 title claims abstract description 16
- 230000006854 communication Effects 0.000 title claims abstract description 14
- 238000004891 communication Methods 0.000 title claims abstract description 13
- 238000012216 screening Methods 0.000 claims abstract description 6
- 238000000605 extraction Methods 0.000 claims description 12
- 238000004364 calculation method Methods 0.000 claims description 9
- 238000013139 quantization Methods 0.000 claims description 3
- 230000001502 supplementing effect Effects 0.000 claims description 3
- 230000011218 segmentation Effects 0.000 claims 1
- 230000005540 biological transmission Effects 0.000 abstract description 6
- 230000010354 integration Effects 0.000 abstract description 2
- 239000000203 mixture Substances 0.000 description 11
- 238000012549 training Methods 0.000 description 10
- 230000003595 spectral effect Effects 0.000 description 6
- 238000001228 spectrum Methods 0.000 description 4
- 241000590419 Polygonia interrogationis Species 0.000 description 2
- 230000007423 decrease Effects 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000005516 engineering process Methods 0.000 description 2
- 238000013507 mapping Methods 0.000 description 2
- 238000012163 sequencing technique Methods 0.000 description 2
- 238000007476 Maximum Likelihood Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000012360 testing method Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/06—Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
- G10L15/063—Training
Landscapes
- Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Telephonic Communication Services (AREA)
Abstract
The invention relates to the technical field of digital information transmission, in particular to a voice data text conversion method based on multi-party communication. The method comprises the steps of recognizing preset passwords input by a multi-party device end, converting voice data communicated by each device end in group chat into characters, storing the voice data and the converted character data through a memory and integrating key character data and key titles. According to the invention, the characters converted from the voice data of multi-party communication are integrated through the integration of the key titles and the key character data, and the key titles are integrated through a pre-selection marking mode, so that the problem of insufficient pertinence of voice data conversion in the prior art is solved, and the efficiency of later-stage manual screening is greatly improved after the arrangement.
Description
Technical Field
The invention relates to the technical field of digital information transmission, in particular to a voice data text conversion method based on multi-party communication.
Background
Currently, with the continuous update of chat tools, there has been a conversion from previous text chat to voice chat, in which:
the chat tool is also called IM software or IM tool, and refers to a tool for providing a client based on the Internet to perform real-time voice and text transmission, technically, the chat tool is mainly divided into server-based IM tool software and P2P-based IM tool software, the biggest difference between the real-time messaging and the e-mail is that the chat tool does not need to wait, and does not need to press 'transmission and reception' every two minutes, so long as two persons are on line simultaneously, the chat tool can transmit text, files, sound and images to the other party like a multimedia telephone, and as long as the network is provided, no matter how far apart the other party is in the sky and sea corner, or the two parties have no distance.
Therefore, many enterprises and schools give lessons and apply the digital information transmission technology of real-time messaging, namely, data transmission is carried out on a plurality of equipment ends by establishing group chat, but the existing video group chat and voice group chat only convert all voice data in the communication process, the character conversion is not targeted enough, and people who need to be finished later screen characters which do not need to be converted.
Disclosure of Invention
The invention aims to provide a voice data text conversion method based on multi-party communication so as to solve the problems in the background technology.
In order to achieve the above object, the present invention provides a method for converting voice data text based on multi-party communication, which comprises the following steps:
firstly, a preset password input by a multi-party equipment end is identified, and the password comprises two gestures:
if the gesture I is correct and the preset password is correct, marking the equipment end, outputting the mark of each equipment end, and constructing group chat according to the mark of the equipment end;
secondly, if the preset password is incorrect, continuing to pop up the input window;
performing character conversion on voice data communicated by each equipment end in group chat;
storing the voice data and the converted text data through a memory;
extracting voice data output by a preselected marking device end and character data converted by the preselected marking device end in a memory, then identifying key data information of the preselected marking device end according to the extracted character data to form a key title, and then extracting voice data output by the other marking device ends before the next key title appears after the key title and character data converted by the other marking device ends to form key character data;
and integrating the key character data and the key title, specifically, screening the key character data according to the key title to screen out the value character data, and supplementing the value character data, the voice data and the equipment end mark into a display frame of the group chat in a mutually corresponding manner.
As a further improvement of the technical scheme, the key data information of the preselected marking equipment end comprises key character information, tone word-aid information and keyword extraction information.
As a further improvement of the technical scheme, the key data information extraction adopts a weighted extraction algorithm, and the algorithm steps are as follows:
punctuation mark punctuation sentences are carried out according to the sound intervals and the tone of the sound in the voice data, wherein the punctuation marks comprise periods, question marks and exclamation marks;
quantizing the word frequency, word length, word property, position and dictionary factors of character data at the end of the pre-selection marking equipment by using the weighting factors, and performing weight calculation after quantization to obtain the total weight of each factor;
and sequencing words corresponding to the weight values in a descending order mode to obtain a keyword list, and acquiring key data information through the keyword list.
As a further improvement of the technical solution, the total weight calculation formula of the factors is as follows:
wherein,as words and phrasesThe factor total weight of the text data;the word frequency factor is the ratio;is a word frequency factor;is the word length factor ratio;is a word length factor;is the ratio of parts-of-speech factors;is a part of speech;is the ratio of the position factors;is the ratio of the positions;is the ratio of dictionary factors;is a dictionary factor, and。
as a further improvement of the technical scheme, the Chinese character conversion comprises the following specific steps:
firstly, extracting audio data output by an equipment end, and then training the audio data by using a Gaussian mixture learning algorithm;
decomposing a harmonic plus noise model of the audio output voice of the extraction source, correcting the decomposed model by using average fundamental frequency comparison to obtain corresponding corrected harmonic amplitude and phase parameters, extracting the characteristics of the harmonic amplitude and phase parameters to obtain linear spectral rate parameters, mapping the linear spectral rate parameters by using a Gaussian mixture model, and fusing the mapped linear spectral rate parameter characteristics;
and performing mixed output by using the corrected harmonic amplitude and phase parameters, and then extracting text data of the source audio output voice.
As a further improvement of the technical solution, the gaussian mixture learning algorithm includes the following steps:
firstly, training source audio output voice and target audio output voice, and decomposing corresponding harmonic and noise models;
calculating the average fundamental frequency ratio of the fundamental frequency tracks of the two output voices, and simultaneously performing feature extraction on the harmonic amplitude and phase parameters of the two output voices to obtain corresponding linear spectrum rate parameters;
and (4) carrying out dynamic time warping on the obtained linear frequency spectrum rate parameters, and obtaining a Gaussian mixture model by using a variational Bayes estimation algorithm.
As a further improvement of the technical solution, a calculation formula of the variational bayes estimation algorithm is as follows:
wherein:is the logarithmic edge density;to observe an audio variable;outputting a text variable of the voice for the source audio;for given purposeAboutA posterior probability of (d);is composed ofA priori probability of.
Compared with the prior art, the invention has the beneficial effects that:
according to the voice data and character conversion method based on multi-party communication, the characters converted from the voice data of multi-party communication are integrated through the key titles and the key character data, and the key titles are integrated in a pre-selection marking mode, so that the problem that the voice data conversion pertinence is not enough in the prior art is solved, and the efficiency of later-stage manual screening is greatly improved after arrangement.
Drawings
FIG. 1 is an overall flow chart of the present invention;
FIG. 2 is a flowchart of the Gaussian mixture learning algorithm training steps of the present invention;
FIG. 3 is a flowchart of the Gaussian mixture learning algorithm transformation steps of the present invention;
FIG. 4 is a diagram of a first embodiment of a display frame;
FIG. 5 is a second schematic diagram of a display frame according to the present invention;
FIG. 6 is a schematic view of VB-GMM algorithm and GMM algorithm broken lines of the present invention.
Detailed Description
Example 1
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, the present invention provides a technical solution:
the invention provides a voice data text conversion method based on multi-party communication, which comprises the following steps:
performing character conversion on voice data communicated by each equipment end in group chat;
the voice data and the text data converted by the voice data are stored through the memory, then the text data converted by the text data are displayed in the display frame, please refer to fig. 4, the displayed text data are convenient for memorizing in the process of arranging meeting records or learning notes at the later stage, and the problem that the text can not be extracted and memorized after the video meeting or the video learning is solved.
Example 2
In order to improve the security of group chat and prevent non-group chat people from joining the group chat, the embodiment is different from embodiment 1 in that a preset password input by a multi-party device end is firstly identified, and the preset password comprises two gestures:
if the first posture and the preset password are correct, the equipment end is marked, the marks of the equipment ends are output, group chat is built according to the marks of the equipment ends, and therefore people in the group chat are distinguished and divided in a marking mode, and distinguishing is carried out in a mode of adding a specific mark, for example: if the group chats to be an enterprise group, the marking mode comprises a boss mode and an employee mode; if the group chat is a learning group, the marking mode comprises a teacher and a student, and the identification degree of the members in the group chat is further improved;
and secondly, if the preset password is incorrect, the input window is continuously popped up, and the equipment end with the incorrect preset password cannot join the group chat, so that the safety of the group chat is greatly improved, and the problem that non-group chat personnel join the group chat is solved.
Example 3
In order to improve the pertinence of voice data conversion, the embodiment is different from embodiment 2 in that voice data output by a preselected marking device end and text data after conversion thereof are extracted from a memory, then key data information of the preselected marking device end is identified according to the extracted text data to form a key title, and then voice data output by the other marking device ends before a next key title appears after the key title and the text data after conversion thereof are extracted to form the key text data;
and integrating the key character data and the key title, specifically, screening the key character data according to the key title to screen out the value character data, and supplementing the value character data, the voice data and the equipment end mark into a display frame of the group chat in a mutually corresponding manner.
In addition, the key data information of the preselection marking device end comprises key character information, tone word-aid information and keyword extraction information.
In specific use of the embodiment, the embodiment is exemplified by a conference of an enterprise, an alternating group chat is constructed by means of password input, assuming that an equipment end set S = (a, a1, a2, a 3) in the group chat, and S = (boss a, employee a1, employee a2, employee a 3) after marking, at this time, "boss a" is set as a preselected mark, when the boss a issues "how you still have problems in the above description" in the group chat, so as to obtain a word "how" the language help word is, the "how you still have problems in the above description" is determined as a key title, and then the employee a1, a2, a3 outputs "problem 1: how to improve the normal work efficiency, and "problem 2: unknown "," problem 3: how to realize the mutual supervision of employees in the working process, and other voice data are determined as key word data, and then the word data which are not in conformity with the key title are determined as the problem 2: not knowing "culling, left" problem 1: "question 3" and "integrate the display through the display box for what question you still have" as described above, see fig. 5, where:
b, boss A: "how do you have problems with the above description";
employee a 1: "problem 1: how to improve the working efficiency at ordinary times ";
employee a 3: "problem 3: how to realize the mutual supervision of the employees in the working process ".
Therefore, the key titles and the key word data are integrated to integrate the words converted from the voice data of multi-party communication, and the key titles are integrated in a pre-selection marking mode, so that the problem of insufficient pertinence of voice data conversion in the prior art is solved, and the efficiency of later-stage manual screening is greatly improved after arrangement.
Example 4
In order to improve the accuracy of extracting the key data information, the embodiment is different from embodiment 3 in that a weighted extraction algorithm is adopted for extracting the key data information, and the algorithm steps are as follows:
punctuation mark punctuation sentences are carried out according to the sound intervals and the tone of the sound in the voice data, wherein the punctuation marks comprise periods, question marks and exclamation marks;
quantizing the word frequency, word length, word property, position and dictionary factors of character data at the end of the pre-selection marking equipment by using the weighting factors, and performing weight calculation after quantization to obtain the total weight of each factor;
and sequencing words corresponding to the weight values in a descending order mode to obtain a keyword list, and acquiring key data information through the keyword list.
Specifically, the total weight of the factors is calculated as follows:
wherein,as words and phrasesThe factor total weight of the text data;the word frequency factor is the ratio;is a word frequency factor;is the word length factor ratio;is a word length factor;is the ratio of parts-of-speech factors;is a part of speech;is the ratio of the position factors;is the ratio of the positions;is the ratio of dictionary factors;is a dictionary factor, andin the present embodiment, the determinationThe method of reverse reasoning using large-scale corpus, preferably fuzzy processing, will be used according to various importance of the resultThe value assigned was 0.4,The value assigned was 0.2,Andthe value assigned was 0.15,The value is assigned to 0.1, and then a candidate keyword table A is obtained through weight calculation, and the generation principle of the primary candidate keywords is as follows:
words of unspecific part of speech (noun, verb, adjective, idiom) or words which do not appear in the title sentence, the head of the paragraph, and the tail of the paragraph and have the word frequency of 1 are filtered out.
If the Total word number of the article is Total, and the extraction number of the keywords is k, k should satisfy:
k = Total35% if Total35% < 20; if Total35% > =20, k keywords extracted through the above two steps are used as primary candidate keywords, and therefore accuracy of key data information obtained through weight calculation is greatly improved.
Example 5
In order to improve the robustness of voice conversion in the rare data environment, the present embodiment is different from embodiment 1, please refer to fig. 2 and fig. 3, wherein:
the Chinese character conversion method comprises the following specific steps:
firstly, extracting audio data output by an equipment end, and then training the audio data by using a Gaussian mixture learning algorithm;
decomposing a harmonic plus noise model of the audio output voice of the extraction source, correcting the decomposed model by using average fundamental frequency comparison to obtain corresponding corrected harmonic amplitude and phase parameters, extracting the characteristics of the harmonic amplitude and phase parameters to obtain linear spectral rate parameters, mapping the linear spectral rate parameters by using a Gaussian mixture model, and fusing the mapped linear spectral rate parameter characteristics;
and performing mixed output by using the corrected harmonic amplitude and phase parameters, and then extracting text data of the source audio output voice.
In addition, the Gaussian mixture learning algorithm comprises the following steps:
firstly, training source audio output voice and target audio output voice, and decomposing corresponding harmonic and noise models;
calculating the average fundamental frequency ratio of the fundamental frequency tracks of the two output voices, and simultaneously performing feature extraction on the harmonic amplitude and phase parameters of the two output voices to obtain corresponding linear spectrum rate parameters;
and (4) carrying out dynamic time warping on the obtained linear frequency spectrum rate parameters, and obtaining a Gaussian mixture model by using a variational Bayes estimation algorithm.
Specifically, when in use, the Gaussian mixture model adopts a VB-GMM algorithm, and firstly outputs source audio to voiceAnd target audio output speechCombined into an extended vectorThen is aligned withReferring to fig. 6, the horizontal axis direction is the number of degrees of mixing, the vertical axis direction is the logarithmic distortion (unit: dB), specifically, the conversion error of point (1) decreases with the increasing data amount, which shows that the more sufficient the data amount, the more sufficient the model training, the better the conversion effect, point (2) shows the better than the standard solid line portion GMM (error distortion is about 0.5dB lower) with the less data amount (training data is less than 500 frames), the difference between the two performances decreases with the increasing data amount, when the data amount is relatively sufficient (training data is more than 5000 frames), the difference between the two tends to be balanced (error distortion is about 0.23dB and the above theory is the same, when the data amount tends to infinity, the result of VB-GMM estimation is approximately equal to the result of the maximum likelihood estimation, point (3) trains data about 3000 frames, the performance of the two kinds of data reaches a local low point phenomenon, and the correlation between 3000 frames of training data and test data is strong, so the conversion effect is very good.
It is worth noting that: the symbols of the points (1) - (6) refer to the optimal mixing degree numbers of the two under a certain training data volume (the standard GMM and the VB-GMM both adopt the same optimal mixing degree), and the optimal value is automatically obtained according to the VB-GMM algorithm (for different numbers of data, we initialize 32 mixing degrees, and the finally obtained mixing degree value is the self-optimization result of the VB-GMM algorithm), so that the problem of 'overfitting' is avoided, and the robustness of voice conversion under the rare data environment is improved.
Further, the calculation formula of the variational bayesian estimation algorithm is as follows:
wherein:is the logarithmic edge density;to observe an audio variable;outputting a text variable of the voice for the source audio;for given purposeAboutA posterior probability of (d);is composed ofBy a priori probability of all possible pairs, in particularBy way of integration to estimate。
The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and the preferred embodiments of the present invention are described in the above embodiments and the description, and are not intended to limit the present invention. The scope of the invention is defined by the appended claims and equivalents thereof.
Claims (2)
1. The voice data text conversion method based on multi-party communication is characterized by comprising the following steps:
firstly, a preset password input by a multi-party equipment end is identified, and the password comprises two gestures:
if the gesture I is correct and the preset password is correct, marking the equipment end, outputting the mark of each equipment end, and constructing group chat according to the mark of the equipment end;
secondly, if the preset password is incorrect, continuing to pop up the input window;
performing character conversion on voice data communicated by each equipment end in group chat;
storing the voice data and the converted text data through a memory;
extracting voice data output by a preselected marking device end and character data converted by the preselected marking device end in a memory, then identifying key data information of the preselected marking device end according to the extracted character data to form a key title, and then extracting voice data output by the other marking device ends before the next key title appears after the key title and character data converted by the other marking device ends to form key character data;
integrating the key character data and the key title, specifically, screening the key character data according to the key title to screen out the value character data, and supplementing the value character data, the voice data and the equipment end mark into a display frame of the group chat in a mutually corresponding manner;
the key data information is extracted by adopting a weighted extraction algorithm, and the algorithm comprises the following steps:
punctuation sentence segmentation is carried out according to the sound interval and the tone of the sound in the voice data;
quantizing the word frequency, word length, word property, position and dictionary factors of character data at the end of the pre-selection marking equipment by using the weighting factors, and performing weight calculation after quantization to obtain the total weight of each factor;
sorting the words corresponding to the weight by using a descending order mode to obtain a keyword list, and acquiring key data information through the keyword list;
the factor total weight value calculation formula is as follows:
wherein,as words and phrasesThe factor total weight of the text data;the word frequency factor is the ratio;is a word frequency factor;is the word length factor ratio;is a word length factor;is the ratio of parts-of-speech factors;is a part of speech;is the ratio of the position factors;is the ratio of the positions;is the ratio of dictionary factors;is a dictionary factor, and。
2. the method for converting text into voice data based on multiparty communication according to claim 1, wherein: the key data information of the preselection marking equipment end comprises key character information, tone word-aid information and keyword extraction information.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110404363.1A CN112802480B (en) | 2021-04-15 | 2021-04-15 | Voice data text conversion method based on multi-party communication |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110404363.1A CN112802480B (en) | 2021-04-15 | 2021-04-15 | Voice data text conversion method based on multi-party communication |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112802480A CN112802480A (en) | 2021-05-14 |
CN112802480B true CN112802480B (en) | 2021-07-13 |
Family
ID=75811438
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110404363.1A Active CN112802480B (en) | 2021-04-15 | 2021-04-15 | Voice data text conversion method based on multi-party communication |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112802480B (en) |
Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103150388A (en) * | 2013-03-21 | 2013-06-12 | 天脉聚源(北京)传媒科技有限公司 | Method and device for extracting key words |
CN103326929A (en) * | 2013-06-24 | 2013-09-25 | 北京小米科技有限责任公司 | Method and device for transmitting messages |
CN106487757A (en) * | 2015-08-28 | 2017-03-08 | 华为技术有限公司 | Carry out method, conference client and the system of voice conferencing |
CN106652995A (en) * | 2016-12-31 | 2017-05-10 | 深圳市优必选科技有限公司 | Text voice broadcasting method and system |
CN107733666A (en) * | 2017-10-31 | 2018-02-23 | 珠海格力电器股份有限公司 | Conference implementation method and device and electronic equipment |
CN109474763A (en) * | 2018-12-21 | 2019-03-15 | 深圳市智搜信息技术有限公司 | A kind of AI intelligent meeting system and its implementation based on voice, semanteme |
CN109508214A (en) * | 2017-09-15 | 2019-03-22 | 夏普株式会社 | The recording medium of display control unit, display control method and non-transitory |
CN110019744A (en) * | 2018-08-17 | 2019-07-16 | 深圳壹账通智能科技有限公司 | Auxiliary generates method, apparatus, equipment and the computer storage medium of meeting summary |
CN110505201A (en) * | 2019-07-10 | 2019-11-26 | 平安科技(深圳)有限公司 | Conferencing information processing method, device, computer equipment and storage medium |
CN110517689A (en) * | 2019-08-28 | 2019-11-29 | 腾讯科技(深圳)有限公司 | A kind of voice data processing method, device and storage medium |
CN110889266A (en) * | 2019-11-21 | 2020-03-17 | 北京明略软件系统有限公司 | Conference record integration method and device |
CN111415128A (en) * | 2019-01-07 | 2020-07-14 | 阿里巴巴集团控股有限公司 | Method, system, apparatus, device and medium for controlling conference |
CN111627446A (en) * | 2020-05-29 | 2020-09-04 | 国网浙江省电力有限公司信息通信分公司 | Communication conference system based on intelligent voice recognition technology |
CN111859006A (en) * | 2019-04-17 | 2020-10-30 | 上海颐为网络科技有限公司 | Method, system, electronic device and storage medium for establishing voice entry tree |
CN112468665A (en) * | 2020-11-05 | 2021-03-09 | 中国建设银行股份有限公司 | Method, device, equipment and storage medium for generating conference summary |
Family Cites Families (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN100438542C (en) * | 2004-05-27 | 2008-11-26 | 华为技术有限公司 | Method for implementing telephone conference |
US7831427B2 (en) * | 2007-06-20 | 2010-11-09 | Microsoft Corporation | Concept monitoring in spoken-word audio |
US8797380B2 (en) * | 2010-04-30 | 2014-08-05 | Microsoft Corporation | Accelerated instant replay for co-present and distributed meetings |
CN102215238B (en) * | 2011-07-27 | 2013-12-18 | 中国电信股份有限公司 | Service processing method and system fused with video conference and user terminal |
CN103631780B (en) * | 2012-08-21 | 2016-11-23 | 重庆文润科技有限公司 | Multimedia recording systems and method |
CN106802885A (en) * | 2016-12-06 | 2017-06-06 | 乐视控股(北京)有限公司 | A kind of meeting summary automatic record method, device and electronic equipment |
CN106791584A (en) * | 2017-02-07 | 2017-05-31 | 上海与德信息技术有限公司 | The implementation method of video conference, cut-in method and related device |
US10510346B2 (en) * | 2017-11-09 | 2019-12-17 | Microsoft Technology Licensing, Llc | Systems, methods, and computer-readable storage device for generating notes for a meeting based on participant actions and machine learning |
CN109302578B (en) * | 2018-10-23 | 2021-03-26 | 视联动力信息技术股份有限公司 | Method and system for logging in conference terminal and video conference |
CN110489979A (en) * | 2019-07-10 | 2019-11-22 | 平安科技(深圳)有限公司 | Conferencing information methods of exhibiting, device, computer equipment and storage medium |
CN111986677A (en) * | 2020-09-02 | 2020-11-24 | 深圳壹账通智能科技有限公司 | Conference summary generation method and device, computer equipment and storage medium |
-
2021
- 2021-04-15 CN CN202110404363.1A patent/CN112802480B/en active Active
Patent Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103150388A (en) * | 2013-03-21 | 2013-06-12 | 天脉聚源(北京)传媒科技有限公司 | Method and device for extracting key words |
CN103326929A (en) * | 2013-06-24 | 2013-09-25 | 北京小米科技有限责任公司 | Method and device for transmitting messages |
CN106487757A (en) * | 2015-08-28 | 2017-03-08 | 华为技术有限公司 | Carry out method, conference client and the system of voice conferencing |
CN106652995A (en) * | 2016-12-31 | 2017-05-10 | 深圳市优必选科技有限公司 | Text voice broadcasting method and system |
CN109508214A (en) * | 2017-09-15 | 2019-03-22 | 夏普株式会社 | The recording medium of display control unit, display control method and non-transitory |
CN107733666A (en) * | 2017-10-31 | 2018-02-23 | 珠海格力电器股份有限公司 | Conference implementation method and device and electronic equipment |
CN110019744A (en) * | 2018-08-17 | 2019-07-16 | 深圳壹账通智能科技有限公司 | Auxiliary generates method, apparatus, equipment and the computer storage medium of meeting summary |
CN109474763A (en) * | 2018-12-21 | 2019-03-15 | 深圳市智搜信息技术有限公司 | A kind of AI intelligent meeting system and its implementation based on voice, semanteme |
CN111415128A (en) * | 2019-01-07 | 2020-07-14 | 阿里巴巴集团控股有限公司 | Method, system, apparatus, device and medium for controlling conference |
CN111859006A (en) * | 2019-04-17 | 2020-10-30 | 上海颐为网络科技有限公司 | Method, system, electronic device and storage medium for establishing voice entry tree |
CN110505201A (en) * | 2019-07-10 | 2019-11-26 | 平安科技(深圳)有限公司 | Conferencing information processing method, device, computer equipment and storage medium |
CN110517689A (en) * | 2019-08-28 | 2019-11-29 | 腾讯科技(深圳)有限公司 | A kind of voice data processing method, device and storage medium |
CN110889266A (en) * | 2019-11-21 | 2020-03-17 | 北京明略软件系统有限公司 | Conference record integration method and device |
CN111627446A (en) * | 2020-05-29 | 2020-09-04 | 国网浙江省电力有限公司信息通信分公司 | Communication conference system based on intelligent voice recognition technology |
CN112468665A (en) * | 2020-11-05 | 2021-03-09 | 中国建设银行股份有限公司 | Method, device, equipment and storage medium for generating conference summary |
Non-Patent Citations (2)
Title |
---|
一种稀少训练数据条件下的语音转换算法;徐宁等;《南京邮电大学学报(自然科学版)》;20101031;第1-7页 * |
中文文本关键词提取算法;张红鹰;《计算机系统应用》;20091231;第73-76页 * |
Also Published As
Publication number | Publication date |
---|---|
CN112802480A (en) | 2021-05-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109326283B (en) | Many-to-many voice conversion method based on text encoder under non-parallel text condition | |
CN108763284A (en) | A kind of question answering system implementation method based on deep learning and topic model | |
WO2023273170A1 (en) | Welcoming robot conversation method | |
CN109388701A (en) | Minutes generation method, device, equipment and computer storage medium | |
CN107346340A (en) | A kind of user view recognition methods and system | |
CN112417134B (en) | Automatic abstract generation system and method based on voice text deep fusion features | |
CN110609896B (en) | Military scenario text event information extraction method and device based on secondary decoding | |
CN107544956B (en) | Text key point detection method and system | |
CN110009025B (en) | Semi-supervised additive noise self-encoder for voice lie detection | |
CN110060691B (en) | Many-to-many voice conversion method based on i-vector and VARSGAN | |
CN112017632A (en) | Automatic conference record generation method | |
WO2010030742A1 (en) | Method for creating a speech model | |
CN103885924A (en) | Field-adaptive automatic open class subtitle generating system and field-adaptive automatic open class subtitle generating method | |
CN114328817A (en) | Text processing method and device | |
CN113887883A (en) | Course teaching evaluation implementation method based on voice recognition technology | |
CN114091469B (en) | Network public opinion analysis method based on sample expansion | |
CN116303930A (en) | Session intelligent generation method based on semantic matching and generation model | |
CN115062139A (en) | Automatic searching method for dialogue text abstract model | |
CN118378148A (en) | Training method of multi-label classification model, multi-label classification method and related device | |
CN112802480B (en) | Voice data text conversion method based on multi-party communication | |
CN114444481A (en) | Sentiment analysis and generation method of news comments | |
CN108629019A (en) | A kind of Question sentence parsing computational methods containing name towards question and answer field | |
CN112115707A (en) | Emotion dictionary construction method for bullet screen emotion analysis and based on expressions and tone | |
Li et al. | Intelligibility enhancement via normal-to-lombard speech conversion with long short-term memory network and bayesian Gaussian mixture model | |
CN112102847B (en) | Audio and slide content alignment method |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |