CN112802480B - Voice data text conversion method based on multi-party communication - Google Patents

Voice data text conversion method based on multi-party communication Download PDF

Info

Publication number
CN112802480B
CN112802480B CN202110404363.1A CN202110404363A CN112802480B CN 112802480 B CN112802480 B CN 112802480B CN 202110404363 A CN202110404363 A CN 202110404363A CN 112802480 B CN112802480 B CN 112802480B
Authority
CN
China
Prior art keywords
data
key
voice data
factor
character data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110404363.1A
Other languages
Chinese (zh)
Other versions
CN112802480A (en
Inventor
江合文
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong International Science And Technology Co ltd
Original Assignee
Guangdong International Science And Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong International Science And Technology Co ltd filed Critical Guangdong International Science And Technology Co ltd
Priority to CN202110404363.1A priority Critical patent/CN112802480B/en
Publication of CN112802480A publication Critical patent/CN112802480A/en
Application granted granted Critical
Publication of CN112802480B publication Critical patent/CN112802480B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/06Creation of reference templates; Training of speech recognition systems, e.g. adaptation to the characteristics of the speaker's voice
    • G10L15/063Training

Landscapes

  • Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The invention relates to the technical field of digital information transmission, in particular to a voice data text conversion method based on multi-party communication. The method comprises the steps of recognizing preset passwords input by a multi-party device end, converting voice data communicated by each device end in group chat into characters, storing the voice data and the converted character data through a memory and integrating key character data and key titles. According to the invention, the characters converted from the voice data of multi-party communication are integrated through the integration of the key titles and the key character data, and the key titles are integrated through a pre-selection marking mode, so that the problem of insufficient pertinence of voice data conversion in the prior art is solved, and the efficiency of later-stage manual screening is greatly improved after the arrangement.

Description

Voice data text conversion method based on multi-party communication
Technical Field
The invention relates to the technical field of digital information transmission, in particular to a voice data text conversion method based on multi-party communication.
Background
Currently, with the continuous update of chat tools, there has been a conversion from previous text chat to voice chat, in which:
the chat tool is also called IM software or IM tool, and refers to a tool for providing a client based on the Internet to perform real-time voice and text transmission, technically, the chat tool is mainly divided into server-based IM tool software and P2P-based IM tool software, the biggest difference between the real-time messaging and the e-mail is that the chat tool does not need to wait, and does not need to press 'transmission and reception' every two minutes, so long as two persons are on line simultaneously, the chat tool can transmit text, files, sound and images to the other party like a multimedia telephone, and as long as the network is provided, no matter how far apart the other party is in the sky and sea corner, or the two parties have no distance.
Therefore, many enterprises and schools give lessons and apply the digital information transmission technology of real-time messaging, namely, data transmission is carried out on a plurality of equipment ends by establishing group chat, but the existing video group chat and voice group chat only convert all voice data in the communication process, the character conversion is not targeted enough, and people who need to be finished later screen characters which do not need to be converted.
Disclosure of Invention
The invention aims to provide a voice data text conversion method based on multi-party communication so as to solve the problems in the background technology.
In order to achieve the above object, the present invention provides a method for converting voice data text based on multi-party communication, which comprises the following steps:
firstly, a preset password input by a multi-party equipment end is identified, and the password comprises two gestures:
if the gesture I is correct and the preset password is correct, marking the equipment end, outputting the mark of each equipment end, and constructing group chat according to the mark of the equipment end;
secondly, if the preset password is incorrect, continuing to pop up the input window;
performing character conversion on voice data communicated by each equipment end in group chat;
storing the voice data and the converted text data through a memory;
extracting voice data output by a preselected marking device end and character data converted by the preselected marking device end in a memory, then identifying key data information of the preselected marking device end according to the extracted character data to form a key title, and then extracting voice data output by the other marking device ends before the next key title appears after the key title and character data converted by the other marking device ends to form key character data;
and integrating the key character data and the key title, specifically, screening the key character data according to the key title to screen out the value character data, and supplementing the value character data, the voice data and the equipment end mark into a display frame of the group chat in a mutually corresponding manner.
As a further improvement of the technical scheme, the key data information of the preselected marking equipment end comprises key character information, tone word-aid information and keyword extraction information.
As a further improvement of the technical scheme, the key data information extraction adopts a weighted extraction algorithm, and the algorithm steps are as follows:
punctuation mark punctuation sentences are carried out according to the sound intervals and the tone of the sound in the voice data, wherein the punctuation marks comprise periods, question marks and exclamation marks;
quantizing the word frequency, word length, word property, position and dictionary factors of character data at the end of the pre-selection marking equipment by using the weighting factors, and performing weight calculation after quantization to obtain the total weight of each factor;
and sequencing words corresponding to the weight values in a descending order mode to obtain a keyword list, and acquiring key data information through the keyword list.
As a further improvement of the technical solution, the total weight calculation formula of the factors is as follows:
Figure DEST_PATH_IMAGE001
wherein,
Figure 260815DEST_PATH_IMAGE002
as words and phrases
Figure 100002_DEST_PATH_IMAGE003
The factor total weight of the text data;
Figure 522163DEST_PATH_IMAGE004
the word frequency factor is the ratio;
Figure 100002_DEST_PATH_IMAGE005
is a word frequency factor;
Figure 285589DEST_PATH_IMAGE006
is the word length factor ratio;
Figure 100002_DEST_PATH_IMAGE007
is a word length factor;
Figure 523541DEST_PATH_IMAGE008
is the ratio of parts-of-speech factors;
Figure 100002_DEST_PATH_IMAGE009
is a part of speech;
Figure 806755DEST_PATH_IMAGE010
is the ratio of the position factors;
Figure 100002_DEST_PATH_IMAGE011
is the ratio of the positions;
Figure 737539DEST_PATH_IMAGE012
is the ratio of dictionary factors;
Figure 100002_DEST_PATH_IMAGE013
is a dictionary factor, and
Figure 348780DEST_PATH_IMAGE014
as a further improvement of the technical scheme, the Chinese character conversion comprises the following specific steps:
firstly, extracting audio data output by an equipment end, and then training the audio data by using a Gaussian mixture learning algorithm;
decomposing a harmonic plus noise model of the audio output voice of the extraction source, correcting the decomposed model by using average fundamental frequency comparison to obtain corresponding corrected harmonic amplitude and phase parameters, extracting the characteristics of the harmonic amplitude and phase parameters to obtain linear spectral rate parameters, mapping the linear spectral rate parameters by using a Gaussian mixture model, and fusing the mapped linear spectral rate parameter characteristics;
and performing mixed output by using the corrected harmonic amplitude and phase parameters, and then extracting text data of the source audio output voice.
As a further improvement of the technical solution, the gaussian mixture learning algorithm includes the following steps:
firstly, training source audio output voice and target audio output voice, and decomposing corresponding harmonic and noise models;
calculating the average fundamental frequency ratio of the fundamental frequency tracks of the two output voices, and simultaneously performing feature extraction on the harmonic amplitude and phase parameters of the two output voices to obtain corresponding linear spectrum rate parameters;
and (4) carrying out dynamic time warping on the obtained linear frequency spectrum rate parameters, and obtaining a Gaussian mixture model by using a variational Bayes estimation algorithm.
As a further improvement of the technical solution, a calculation formula of the variational bayes estimation algorithm is as follows:
Figure 100002_DEST_PATH_IMAGE015
wherein:
Figure 265790DEST_PATH_IMAGE016
is the logarithmic edge density;
Figure DEST_PATH_IMAGE017
to observe an audio variable;
Figure 42990DEST_PATH_IMAGE018
outputting a text variable of the voice for the source audio;
Figure DEST_PATH_IMAGE019
for given purpose
Figure 974037DEST_PATH_IMAGE017
About
Figure 39951DEST_PATH_IMAGE018
A posterior probability of (d);
Figure 573701DEST_PATH_IMAGE020
is composed of
Figure 769190DEST_PATH_IMAGE017
A priori probability of.
Compared with the prior art, the invention has the beneficial effects that:
according to the voice data and character conversion method based on multi-party communication, the characters converted from the voice data of multi-party communication are integrated through the key titles and the key character data, and the key titles are integrated in a pre-selection marking mode, so that the problem that the voice data conversion pertinence is not enough in the prior art is solved, and the efficiency of later-stage manual screening is greatly improved after arrangement.
Drawings
FIG. 1 is an overall flow chart of the present invention;
FIG. 2 is a flowchart of the Gaussian mixture learning algorithm training steps of the present invention;
FIG. 3 is a flowchart of the Gaussian mixture learning algorithm transformation steps of the present invention;
FIG. 4 is a diagram of a first embodiment of a display frame;
FIG. 5 is a second schematic diagram of a display frame according to the present invention;
FIG. 6 is a schematic view of VB-GMM algorithm and GMM algorithm broken lines of the present invention.
Detailed Description
Example 1
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, the present invention provides a technical solution:
the invention provides a voice data text conversion method based on multi-party communication, which comprises the following steps:
performing character conversion on voice data communicated by each equipment end in group chat;
the voice data and the text data converted by the voice data are stored through the memory, then the text data converted by the text data are displayed in the display frame, please refer to fig. 4, the displayed text data are convenient for memorizing in the process of arranging meeting records or learning notes at the later stage, and the problem that the text can not be extracted and memorized after the video meeting or the video learning is solved.
Example 2
In order to improve the security of group chat and prevent non-group chat people from joining the group chat, the embodiment is different from embodiment 1 in that a preset password input by a multi-party device end is firstly identified, and the preset password comprises two gestures:
if the first posture and the preset password are correct, the equipment end is marked, the marks of the equipment ends are output, group chat is built according to the marks of the equipment ends, and therefore people in the group chat are distinguished and divided in a marking mode, and distinguishing is carried out in a mode of adding a specific mark, for example: if the group chats to be an enterprise group, the marking mode comprises a boss mode and an employee mode; if the group chat is a learning group, the marking mode comprises a teacher and a student, and the identification degree of the members in the group chat is further improved;
and secondly, if the preset password is incorrect, the input window is continuously popped up, and the equipment end with the incorrect preset password cannot join the group chat, so that the safety of the group chat is greatly improved, and the problem that non-group chat personnel join the group chat is solved.
Example 3
In order to improve the pertinence of voice data conversion, the embodiment is different from embodiment 2 in that voice data output by a preselected marking device end and text data after conversion thereof are extracted from a memory, then key data information of the preselected marking device end is identified according to the extracted text data to form a key title, and then voice data output by the other marking device ends before a next key title appears after the key title and the text data after conversion thereof are extracted to form the key text data;
and integrating the key character data and the key title, specifically, screening the key character data according to the key title to screen out the value character data, and supplementing the value character data, the voice data and the equipment end mark into a display frame of the group chat in a mutually corresponding manner.
In addition, the key data information of the preselection marking device end comprises key character information, tone word-aid information and keyword extraction information.
In specific use of the embodiment, the embodiment is exemplified by a conference of an enterprise, an alternating group chat is constructed by means of password input, assuming that an equipment end set S = (a, a1, a2, a 3) in the group chat, and S = (boss a, employee a1, employee a2, employee a 3) after marking, at this time, "boss a" is set as a preselected mark, when the boss a issues "how you still have problems in the above description" in the group chat, so as to obtain a word "how" the language help word is, the "how you still have problems in the above description" is determined as a key title, and then the employee a1, a2, a3 outputs "problem 1: how to improve the normal work efficiency, and "problem 2: unknown "," problem 3: how to realize the mutual supervision of employees in the working process, and other voice data are determined as key word data, and then the word data which are not in conformity with the key title are determined as the problem 2: not knowing "culling, left" problem 1: "question 3" and "integrate the display through the display box for what question you still have" as described above, see fig. 5, where:
b, boss A: "how do you have problems with the above description";
employee a 1: "problem 1: how to improve the working efficiency at ordinary times ";
employee a 3: "problem 3: how to realize the mutual supervision of the employees in the working process ".
Therefore, the key titles and the key word data are integrated to integrate the words converted from the voice data of multi-party communication, and the key titles are integrated in a pre-selection marking mode, so that the problem of insufficient pertinence of voice data conversion in the prior art is solved, and the efficiency of later-stage manual screening is greatly improved after arrangement.
Example 4
In order to improve the accuracy of extracting the key data information, the embodiment is different from embodiment 3 in that a weighted extraction algorithm is adopted for extracting the key data information, and the algorithm steps are as follows:
punctuation mark punctuation sentences are carried out according to the sound intervals and the tone of the sound in the voice data, wherein the punctuation marks comprise periods, question marks and exclamation marks;
quantizing the word frequency, word length, word property, position and dictionary factors of character data at the end of the pre-selection marking equipment by using the weighting factors, and performing weight calculation after quantization to obtain the total weight of each factor;
and sequencing words corresponding to the weight values in a descending order mode to obtain a keyword list, and acquiring key data information through the keyword list.
Specifically, the total weight of the factors is calculated as follows:
Figure 268477DEST_PATH_IMAGE001
wherein,
Figure 306840DEST_PATH_IMAGE002
as words and phrases
Figure 129434DEST_PATH_IMAGE003
The factor total weight of the text data;
Figure 943544DEST_PATH_IMAGE004
the word frequency factor is the ratio;
Figure 341027DEST_PATH_IMAGE005
is a word frequency factor;
Figure 617419DEST_PATH_IMAGE006
is the word length factor ratio;
Figure 492971DEST_PATH_IMAGE007
is a word length factor;
Figure 36953DEST_PATH_IMAGE008
is the ratio of parts-of-speech factors;
Figure 480704DEST_PATH_IMAGE009
is a part of speech;
Figure 228080DEST_PATH_IMAGE010
is the ratio of the position factors;
Figure 156591DEST_PATH_IMAGE011
is the ratio of the positions;
Figure 305813DEST_PATH_IMAGE012
is the ratio of dictionary factors;
Figure 61410DEST_PATH_IMAGE013
is a dictionary factor, and
Figure 669984DEST_PATH_IMAGE014
in the present embodiment, the determination
Figure DEST_PATH_IMAGE021
The method of reverse reasoning using large-scale corpus, preferably fuzzy processing, will be used according to various importance of the result
Figure 434808DEST_PATH_IMAGE004
The value assigned was 0.4,
Figure 969695DEST_PATH_IMAGE008
The value assigned was 0.2,
Figure 129150DEST_PATH_IMAGE010
And
Figure 726484DEST_PATH_IMAGE012
the value assigned was 0.15,
Figure 747530DEST_PATH_IMAGE022
The value is assigned to 0.1, and then a candidate keyword table A is obtained through weight calculation, and the generation principle of the primary candidate keywords is as follows:
words of unspecific part of speech (noun, verb, adjective, idiom) or words which do not appear in the title sentence, the head of the paragraph, and the tail of the paragraph and have the word frequency of 1 are filtered out.
If the Total word number of the article is Total, and the extraction number of the keywords is k, k should satisfy:
k = Total35% if Total35% < 20; if Total35% > =20, k keywords extracted through the above two steps are used as primary candidate keywords, and therefore accuracy of key data information obtained through weight calculation is greatly improved.
Example 5
In order to improve the robustness of voice conversion in the rare data environment, the present embodiment is different from embodiment 1, please refer to fig. 2 and fig. 3, wherein:
the Chinese character conversion method comprises the following specific steps:
firstly, extracting audio data output by an equipment end, and then training the audio data by using a Gaussian mixture learning algorithm;
decomposing a harmonic plus noise model of the audio output voice of the extraction source, correcting the decomposed model by using average fundamental frequency comparison to obtain corresponding corrected harmonic amplitude and phase parameters, extracting the characteristics of the harmonic amplitude and phase parameters to obtain linear spectral rate parameters, mapping the linear spectral rate parameters by using a Gaussian mixture model, and fusing the mapped linear spectral rate parameter characteristics;
and performing mixed output by using the corrected harmonic amplitude and phase parameters, and then extracting text data of the source audio output voice.
In addition, the Gaussian mixture learning algorithm comprises the following steps:
firstly, training source audio output voice and target audio output voice, and decomposing corresponding harmonic and noise models;
calculating the average fundamental frequency ratio of the fundamental frequency tracks of the two output voices, and simultaneously performing feature extraction on the harmonic amplitude and phase parameters of the two output voices to obtain corresponding linear spectrum rate parameters;
and (4) carrying out dynamic time warping on the obtained linear frequency spectrum rate parameters, and obtaining a Gaussian mixture model by using a variational Bayes estimation algorithm.
Specifically, when in use, the Gaussian mixture model adopts a VB-GMM algorithm, and firstly outputs source audio to voice
Figure DEST_PATH_IMAGE023
And target audio output speech
Figure 651770DEST_PATH_IMAGE024
Combined into an extended vector
Figure DEST_PATH_IMAGE025
Then is aligned with
Figure 919809DEST_PATH_IMAGE026
Referring to fig. 6, the horizontal axis direction is the number of degrees of mixing, the vertical axis direction is the logarithmic distortion (unit: dB), specifically, the conversion error of point (1) decreases with the increasing data amount, which shows that the more sufficient the data amount, the more sufficient the model training, the better the conversion effect, point (2) shows the better than the standard solid line portion GMM (error distortion is about 0.5dB lower) with the less data amount (training data is less than 500 frames), the difference between the two performances decreases with the increasing data amount, when the data amount is relatively sufficient (training data is more than 5000 frames), the difference between the two tends to be balanced (error distortion is about 0.23dB and the above theory is the same, when the data amount tends to infinity, the result of VB-GMM estimation is approximately equal to the result of the maximum likelihood estimation, point (3) trains data about 3000 frames, the performance of the two kinds of data reaches a local low point phenomenon, and the correlation between 3000 frames of training data and test data is strong, so the conversion effect is very good.
It is worth noting that: the symbols of the points (1) - (6) refer to the optimal mixing degree numbers of the two under a certain training data volume (the standard GMM and the VB-GMM both adopt the same optimal mixing degree), and the optimal value is automatically obtained according to the VB-GMM algorithm (for different numbers of data, we initialize 32 mixing degrees, and the finally obtained mixing degree value is the self-optimization result of the VB-GMM algorithm), so that the problem of 'overfitting' is avoided, and the robustness of voice conversion under the rare data environment is improved.
Further, the calculation formula of the variational bayesian estimation algorithm is as follows:
Figure 129073DEST_PATH_IMAGE015
wherein:
Figure 704542DEST_PATH_IMAGE016
is the logarithmic edge density;
Figure 948442DEST_PATH_IMAGE017
to observe an audio variable;
Figure 184120DEST_PATH_IMAGE018
outputting a text variable of the voice for the source audio;
Figure 756047DEST_PATH_IMAGE019
for given purpose
Figure 118895DEST_PATH_IMAGE017
About
Figure 466569DEST_PATH_IMAGE018
A posterior probability of (d);
Figure 889460DEST_PATH_IMAGE020
is composed of
Figure 824049DEST_PATH_IMAGE017
By a priori probability of all possible pairs, in particular
Figure DEST_PATH_IMAGE027
By way of integration to estimate
Figure 161227DEST_PATH_IMAGE028
The foregoing shows and describes the general principles, essential features, and advantages of the invention. It will be understood by those skilled in the art that the present invention is not limited to the embodiments described above, and the preferred embodiments of the present invention are described in the above embodiments and the description, and are not intended to limit the present invention. The scope of the invention is defined by the appended claims and equivalents thereof.

Claims (2)

1. The voice data text conversion method based on multi-party communication is characterized by comprising the following steps:
firstly, a preset password input by a multi-party equipment end is identified, and the password comprises two gestures:
if the gesture I is correct and the preset password is correct, marking the equipment end, outputting the mark of each equipment end, and constructing group chat according to the mark of the equipment end;
secondly, if the preset password is incorrect, continuing to pop up the input window;
performing character conversion on voice data communicated by each equipment end in group chat;
storing the voice data and the converted text data through a memory;
extracting voice data output by a preselected marking device end and character data converted by the preselected marking device end in a memory, then identifying key data information of the preselected marking device end according to the extracted character data to form a key title, and then extracting voice data output by the other marking device ends before the next key title appears after the key title and character data converted by the other marking device ends to form key character data;
integrating the key character data and the key title, specifically, screening the key character data according to the key title to screen out the value character data, and supplementing the value character data, the voice data and the equipment end mark into a display frame of the group chat in a mutually corresponding manner;
the key data information is extracted by adopting a weighted extraction algorithm, and the algorithm comprises the following steps:
punctuation sentence segmentation is carried out according to the sound interval and the tone of the sound in the voice data;
quantizing the word frequency, word length, word property, position and dictionary factors of character data at the end of the pre-selection marking equipment by using the weighting factors, and performing weight calculation after quantization to obtain the total weight of each factor;
sorting the words corresponding to the weight by using a descending order mode to obtain a keyword list, and acquiring key data information through the keyword list;
the factor total weight value calculation formula is as follows:
Figure DEST_PATH_IMAGE002
wherein,
Figure DEST_PATH_IMAGE003
as words and phrases
Figure DEST_PATH_IMAGE004
The factor total weight of the text data;
Figure DEST_PATH_IMAGE005
the word frequency factor is the ratio;
Figure DEST_PATH_IMAGE006
is a word frequency factor;
Figure DEST_PATH_IMAGE007
is the word length factor ratio;
Figure DEST_PATH_IMAGE008
is a word length factor;
Figure DEST_PATH_IMAGE009
is the ratio of parts-of-speech factors;
Figure DEST_PATH_IMAGE010
is a part of speech;
Figure DEST_PATH_IMAGE011
is the ratio of the position factors;
Figure DEST_PATH_IMAGE012
is the ratio of the positions;
Figure DEST_PATH_IMAGE013
is the ratio of dictionary factors;
Figure DEST_PATH_IMAGE014
is a dictionary factor, and
Figure DEST_PATH_IMAGE015
2. the method for converting text into voice data based on multiparty communication according to claim 1, wherein: the key data information of the preselection marking equipment end comprises key character information, tone word-aid information and keyword extraction information.
CN202110404363.1A 2021-04-15 2021-04-15 Voice data text conversion method based on multi-party communication Active CN112802480B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110404363.1A CN112802480B (en) 2021-04-15 2021-04-15 Voice data text conversion method based on multi-party communication

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110404363.1A CN112802480B (en) 2021-04-15 2021-04-15 Voice data text conversion method based on multi-party communication

Publications (2)

Publication Number Publication Date
CN112802480A CN112802480A (en) 2021-05-14
CN112802480B true CN112802480B (en) 2021-07-13

Family

ID=75811438

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110404363.1A Active CN112802480B (en) 2021-04-15 2021-04-15 Voice data text conversion method based on multi-party communication

Country Status (1)

Country Link
CN (1) CN112802480B (en)

Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103150388A (en) * 2013-03-21 2013-06-12 天脉聚源(北京)传媒科技有限公司 Method and device for extracting key words
CN103326929A (en) * 2013-06-24 2013-09-25 北京小米科技有限责任公司 Method and device for transmitting messages
CN106487757A (en) * 2015-08-28 2017-03-08 华为技术有限公司 Carry out method, conference client and the system of voice conferencing
CN106652995A (en) * 2016-12-31 2017-05-10 深圳市优必选科技有限公司 Text voice broadcasting method and system
CN107733666A (en) * 2017-10-31 2018-02-23 珠海格力电器股份有限公司 Conference implementation method and device and electronic equipment
CN109474763A (en) * 2018-12-21 2019-03-15 深圳市智搜信息技术有限公司 A kind of AI intelligent meeting system and its implementation based on voice, semanteme
CN109508214A (en) * 2017-09-15 2019-03-22 夏普株式会社 The recording medium of display control unit, display control method and non-transitory
CN110019744A (en) * 2018-08-17 2019-07-16 深圳壹账通智能科技有限公司 Auxiliary generates method, apparatus, equipment and the computer storage medium of meeting summary
CN110505201A (en) * 2019-07-10 2019-11-26 平安科技(深圳)有限公司 Conferencing information processing method, device, computer equipment and storage medium
CN110517689A (en) * 2019-08-28 2019-11-29 腾讯科技(深圳)有限公司 A kind of voice data processing method, device and storage medium
CN110889266A (en) * 2019-11-21 2020-03-17 北京明略软件系统有限公司 Conference record integration method and device
CN111415128A (en) * 2019-01-07 2020-07-14 阿里巴巴集团控股有限公司 Method, system, apparatus, device and medium for controlling conference
CN111627446A (en) * 2020-05-29 2020-09-04 国网浙江省电力有限公司信息通信分公司 Communication conference system based on intelligent voice recognition technology
CN111859006A (en) * 2019-04-17 2020-10-30 上海颐为网络科技有限公司 Method, system, electronic device and storage medium for establishing voice entry tree
CN112468665A (en) * 2020-11-05 2021-03-09 中国建设银行股份有限公司 Method, device, equipment and storage medium for generating conference summary

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN100438542C (en) * 2004-05-27 2008-11-26 华为技术有限公司 Method for implementing telephone conference
US7831427B2 (en) * 2007-06-20 2010-11-09 Microsoft Corporation Concept monitoring in spoken-word audio
US8797380B2 (en) * 2010-04-30 2014-08-05 Microsoft Corporation Accelerated instant replay for co-present and distributed meetings
CN102215238B (en) * 2011-07-27 2013-12-18 中国电信股份有限公司 Service processing method and system fused with video conference and user terminal
CN103631780B (en) * 2012-08-21 2016-11-23 重庆文润科技有限公司 Multimedia recording systems and method
CN106802885A (en) * 2016-12-06 2017-06-06 乐视控股(北京)有限公司 A kind of meeting summary automatic record method, device and electronic equipment
CN106791584A (en) * 2017-02-07 2017-05-31 上海与德信息技术有限公司 The implementation method of video conference, cut-in method and related device
US10510346B2 (en) * 2017-11-09 2019-12-17 Microsoft Technology Licensing, Llc Systems, methods, and computer-readable storage device for generating notes for a meeting based on participant actions and machine learning
CN109302578B (en) * 2018-10-23 2021-03-26 视联动力信息技术股份有限公司 Method and system for logging in conference terminal and video conference
CN110489979A (en) * 2019-07-10 2019-11-22 平安科技(深圳)有限公司 Conferencing information methods of exhibiting, device, computer equipment and storage medium
CN111986677A (en) * 2020-09-02 2020-11-24 深圳壹账通智能科技有限公司 Conference summary generation method and device, computer equipment and storage medium

Patent Citations (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103150388A (en) * 2013-03-21 2013-06-12 天脉聚源(北京)传媒科技有限公司 Method and device for extracting key words
CN103326929A (en) * 2013-06-24 2013-09-25 北京小米科技有限责任公司 Method and device for transmitting messages
CN106487757A (en) * 2015-08-28 2017-03-08 华为技术有限公司 Carry out method, conference client and the system of voice conferencing
CN106652995A (en) * 2016-12-31 2017-05-10 深圳市优必选科技有限公司 Text voice broadcasting method and system
CN109508214A (en) * 2017-09-15 2019-03-22 夏普株式会社 The recording medium of display control unit, display control method and non-transitory
CN107733666A (en) * 2017-10-31 2018-02-23 珠海格力电器股份有限公司 Conference implementation method and device and electronic equipment
CN110019744A (en) * 2018-08-17 2019-07-16 深圳壹账通智能科技有限公司 Auxiliary generates method, apparatus, equipment and the computer storage medium of meeting summary
CN109474763A (en) * 2018-12-21 2019-03-15 深圳市智搜信息技术有限公司 A kind of AI intelligent meeting system and its implementation based on voice, semanteme
CN111415128A (en) * 2019-01-07 2020-07-14 阿里巴巴集团控股有限公司 Method, system, apparatus, device and medium for controlling conference
CN111859006A (en) * 2019-04-17 2020-10-30 上海颐为网络科技有限公司 Method, system, electronic device and storage medium for establishing voice entry tree
CN110505201A (en) * 2019-07-10 2019-11-26 平安科技(深圳)有限公司 Conferencing information processing method, device, computer equipment and storage medium
CN110517689A (en) * 2019-08-28 2019-11-29 腾讯科技(深圳)有限公司 A kind of voice data processing method, device and storage medium
CN110889266A (en) * 2019-11-21 2020-03-17 北京明略软件系统有限公司 Conference record integration method and device
CN111627446A (en) * 2020-05-29 2020-09-04 国网浙江省电力有限公司信息通信分公司 Communication conference system based on intelligent voice recognition technology
CN112468665A (en) * 2020-11-05 2021-03-09 中国建设银行股份有限公司 Method, device, equipment and storage medium for generating conference summary

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
一种稀少训练数据条件下的语音转换算法;徐宁等;《南京邮电大学学报(自然科学版)》;20101031;第1-7页 *
中文文本关键词提取算法;张红鹰;《计算机系统应用》;20091231;第73-76页 *

Also Published As

Publication number Publication date
CN112802480A (en) 2021-05-14

Similar Documents

Publication Publication Date Title
CN109326283B (en) Many-to-many voice conversion method based on text encoder under non-parallel text condition
CN108763284A (en) A kind of question answering system implementation method based on deep learning and topic model
WO2023273170A1 (en) Welcoming robot conversation method
CN109388701A (en) Minutes generation method, device, equipment and computer storage medium
CN107346340A (en) A kind of user view recognition methods and system
CN112417134B (en) Automatic abstract generation system and method based on voice text deep fusion features
CN110609896B (en) Military scenario text event information extraction method and device based on secondary decoding
CN107544956B (en) Text key point detection method and system
CN110009025B (en) Semi-supervised additive noise self-encoder for voice lie detection
CN110060691B (en) Many-to-many voice conversion method based on i-vector and VARSGAN
CN112017632A (en) Automatic conference record generation method
WO2010030742A1 (en) Method for creating a speech model
CN103885924A (en) Field-adaptive automatic open class subtitle generating system and field-adaptive automatic open class subtitle generating method
CN114328817A (en) Text processing method and device
CN113887883A (en) Course teaching evaluation implementation method based on voice recognition technology
CN114091469B (en) Network public opinion analysis method based on sample expansion
CN116303930A (en) Session intelligent generation method based on semantic matching and generation model
CN115062139A (en) Automatic searching method for dialogue text abstract model
CN118378148A (en) Training method of multi-label classification model, multi-label classification method and related device
CN112802480B (en) Voice data text conversion method based on multi-party communication
CN114444481A (en) Sentiment analysis and generation method of news comments
CN108629019A (en) A kind of Question sentence parsing computational methods containing name towards question and answer field
CN112115707A (en) Emotion dictionary construction method for bullet screen emotion analysis and based on expressions and tone
Li et al. Intelligibility enhancement via normal-to-lombard speech conversion with long short-term memory network and bayesian Gaussian mixture model
CN112102847B (en) Audio and slide content alignment method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant