CN112885356A - Voice recognition method based on voiceprint - Google Patents
Voice recognition method based on voiceprint Download PDFInfo
- Publication number
- CN112885356A CN112885356A CN202110124834.3A CN202110124834A CN112885356A CN 112885356 A CN112885356 A CN 112885356A CN 202110124834 A CN202110124834 A CN 202110124834A CN 112885356 A CN112885356 A CN 112885356A
- Authority
- CN
- China
- Prior art keywords
- individual
- information
- voice
- group
- voiceprint
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 29
- 238000000926 separation method Methods 0.000 claims description 6
- 230000007613 environmental effect Effects 0.000 claims description 3
- 238000007500 overflow downdraw method Methods 0.000 claims description 3
- 238000001228 spectrum Methods 0.000 claims description 3
- 238000012545 processing Methods 0.000 description 26
- 238000011156 evaluation Methods 0.000 description 6
- 230000006399 behavior Effects 0.000 description 4
- 230000003993 interaction Effects 0.000 description 4
- 230000005236 sound signal Effects 0.000 description 4
- 238000012549 training Methods 0.000 description 4
- 238000010586 diagram Methods 0.000 description 3
- 230000004927 fusion Effects 0.000 description 3
- 238000004458 analytical method Methods 0.000 description 2
- 238000013528 artificial neural network Methods 0.000 description 2
- 238000004364 calculation method Methods 0.000 description 2
- 230000000694 effects Effects 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 238000001914 filtration Methods 0.000 description 2
- 238000003672 processing method Methods 0.000 description 2
- 230000003595 spectral effect Effects 0.000 description 2
- 230000004075 alteration Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000012544 monitoring process Methods 0.000 description 1
- 230000002093 peripheral effect Effects 0.000 description 1
- 230000008569 process Effects 0.000 description 1
- 238000010223 real-time analysis Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/02—Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/18—Artificial neural networks; Connectionist approaches
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/20—Pattern transformations or operations aimed at increasing system robustness, e.g. against channel noise or different working conditions
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L17/00—Speaker identification or verification techniques
- G10L17/22—Interactive procedures; Man-machine interfaces
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0272—Voice signal separating
Landscapes
- Engineering & Computer Science (AREA)
- Multimedia (AREA)
- Acoustics & Sound (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Signal Processing (AREA)
- Quality & Reliability (AREA)
- Computational Linguistics (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
The invention discloses a voice recognition method based on voiceprints, which comprises the following steps: s1, acquiring audio information of a plurality of different positions in the environment; s2, separating individual sounds according to the voiceprint characteristics in the plurality of audio information, grouping the individual sounds, recording time information, and fusing the individual sounds in each group to obtain individual enhanced audio information; s3, calculating the position of the individual according to the time information in the individual sound in each group and the position of the audio acquisition module to assist the video information in positioning the individual; s4, realizing discussion grouping according to the individual position information, the sound intensity distribution, the video information and the semantics; and S5, displaying and playing the individual enhanced audio information in the discussion packet.
Description
Technical Field
The invention relates to the technical field of voice recognition, in particular to a voice recognition method based on voiceprints.
Background
With the development of remote classes and intelligent classes, the way teaching and students are evaluated is also changing. Meanwhile, new requirements are also provided for quality education of students, such as team cooperation and cultivation of communication capacity. The existing technologies based on speech recognition include the following:
patent CN201911342652.2 discloses a data processing method, device, electronic device and storage medium, and the specific method is as follows: acquiring data to be processed; the data to be processed is data related to the behavior generated by a user in at least one scene; processing the data to be processed by using a multi-fusion model to obtain at least two first parameters; the multi-fusion model at least comprises a first model for voice recognition, a second model for image recognition and a third model for speaker recognition; the first parameter represents a score value obtained by evaluating corresponding behaviors generated by a user in at least one scene; determining a second parameter according to the at least two first parameters; the second parameter represents a total score value obtained by evaluating at least two behaviors of the user; the second parameter is used for teaching evaluation. In fact, the user's behaviors such as morality, intelligence and constitution are comprehensively and accurately evaluated by combining various recognition models such as voice recognition, speaker recognition and image recognition, and the total score value obtained by evaluation is subjected to teaching evaluation. However, the application range of the prior art is limited to post-evaluation after class, and the results are obtained by analyzing after obtaining voice and images respectively, and real-time results cannot be obtained for real-time analysis. Moreover, the three models of the multi-fusion model cannot process multiple audio signals in real time in a multi-thread manner when processing data.
Patent CN201911418872.9 discloses an audio signal processing method, device and electronic equipment, relating to the field of speech processing. The specific implementation scheme is as follows: processing the audio signal by utilizing a plurality of threads to obtain audio information corresponding to each thread, wherein each thread corresponds to an audio function; and sending the audio information corresponding to each thread to the application program corresponding to each audio function for processing. The multi-thread simultaneous processing of the audio signals is utilized, so that various audio functions can be executed in parallel, and the improvement of the voice signal processing effect in various application scenes is facilitated.
In the teaching, the situation that all students cannot be considered simultaneously exists, and teaching evaluation cannot be accurately performed on all students.
Disclosure of Invention
The invention aims to provide a voiceprint-based voice recognition method, which can meet the requirements of multipoint monitoring and evaluation in a classroom, improve the voice recognition effect, and solve the technical problems that teachers cannot pay attention to all students or groups at the same time and the like.
In order to achieve the above object, the present invention provides a voice recognition method based on voiceprint, which comprises the following steps:
and S1, acquiring audio information of a plurality of different positions in the environment.
And S2, separating individual sounds according to the voiceprint characteristics in the plurality of audio information, grouping the individual sounds, recording time information, and fusing the individual sounds in each group to obtain individual enhanced audio information.
And S3, calculating the position of the individual according to the time information in the individual sound in each group and the position of the audio acquisition module so as to assist the video information to position the individual.
S4, grouping the discussions according to the individual position information, the sound intensity distribution, the video information and the semantics.
And S5, displaying and playing the individual enhanced audio information in the discussion packet.
Preferably, the method for separating the individual sound in step S2 may specifically be: individual sound information is collected in advance, stored and subjected to voice modeling to form a separation model; and comparing the individual sound with the separation model to perform similarity operation to realize individual sound discrimination, and then extracting the individual sound from the environmental sound, wherein the extraction method can be specifically a neural network filtering algorithm obtained by training according to frequency spectrum information.
Preferably, the individual sound grouping in step S2 is to group the separated individual sounds and to assign time stamps to the grouped individual sounds.
Preferably, the individual sound fusion method in step S2 specifically includes: the same individual sounds in each group are fused according to the spectral information.
Preferably, the individual position locating method in step S3 is obtained by calculating the flight time of the received individual voice, and then the identity of the individual voice is further verified according to the face recognition module in the locating position video; to improve the recognition accuracy.
Preferably, the specific method for implementing discussion grouping in step S4 is as follows:
1) establishing an individual position distribution map;
2) establishing sound intensity distribution, and determining grouping possible objects according to the positions which can be reached by the sound intensity by combining the individual position distribution map;
3) identifying the speaker and the listener in the video information, and determining the discussion grouping individuals by combining the individual sound semantics of the grouping possible objects in the step 2).
The grouping of the discussion persons and the discussion contents can be dynamically determined based on the discussion grouping realized in step S4, the grouping can be dynamically determined in real time, and the grouping of the discussion contents can be determined for the subsequent processing.
Preferably, the method for displaying and playing the individual enhanced audio information in the discussion group in step S5 specifically includes: displaying the discussion group in the step S4 in the teaching terminal according to the discussion content, clicking a certain discussion group to open the group discussion content, wherein the group discussion content comprises the individual enhanced audio information obtained in the step S2 by the discussion group individual. The method and the device can record and playback the group discussion information, so that the teaching personnel can obtain different group discussion information at the same time, the teaching of discussion courses can be more favorably carried out, and a certain group or a certain individual cannot be ignored.
Preferably, in order to better recognize the teaching voice, the invention also discloses a voice recognition system based on voiceprint, which comprises: the system comprises a plurality of audio acquisition modules, a plurality of audio acquisition modules and a plurality of audio processing modules, wherein the audio acquisition modules simultaneously acquire a plurality of audio information in the environment; the system comprises a plurality of video acquisition modules, a plurality of video acquisition modules and a video processing module, wherein the plurality of video acquisition modules are used for acquiring a plurality of video information in the environment; the audio processing module is used for receiving the plurality of audio information, separating individual sounds according to the voiceprint characteristics in the plurality of audio information, grouping the individual sounds and recording time information, and fusing the individual sounds in each group to obtain individual enhanced audio information; the positioning module is used for calculating the position of the individual according to the time information in the individual sound in each group and the position of the audio acquisition module so as to assist the video in positioning the individual; the grouping module is used for realizing discussion grouping according to individual position information, sound intensity distribution, video information and semantics, wherein voice in the discussion grouping adopts the individual enhanced audio information; and the teaching interaction module is used for displaying and playing the discussion grouping voice.
Preferably, the audio acquisition modules are arranged at different locations in the environment.
Preferably, the audio processing module includes: the distributed audio receiving module can receive audio information in a plurality of audio acquisition modules in parallel; the voiceprint recognition module can recognize audio information and separate individual sounds; the voiceprint identification modules respectively identify the audio information in the audio acquisition modules; the voice fusion module fuses individual voices of the same individual.
Preferably, the grouping module further comprises: and the semantic recognition module comprises a semantic understanding module and an extended semantic database which is updated in real time, and is used for training by combining semantic information in the individual voice recorded by history to obtain individual semantics.
Preferably, the simultaneous speech recognition module is further configured to synthesize a plurality of individual speeches to recognize two or more dialog objects; the priority of the integrated individual voices is determined by the individual position information, the sound intensity and the video information, whether the individuals face and whether the sound intensity can be obtained by a conversation object is determined according to time tag signals in the audio and the video, and then semantic analysis is carried out according to the individual voices in the voice recognition module to recognize two or more than two conversation objects.
Preferably, the grouping module further comprises a sound intensity distribution calculating module, which can calculate the sound intensity distribution according to the individual sounds in different groups.
Preferably, the teaching voice recognition system further comprises a video processing module, and the video processing module is used for portrait recognition and gesture recognition and transmitting data with tag information to the grouping module.
Drawings
FIG. 1 is a schematic diagram of a voiceprint based speech recognition system of the present invention.
FIG. 2 is a schematic diagram of an audio processing module according to the present invention.
FIG. 3 is a flowchart of voiceprint based speech recognition according to the present invention.
FIG. 4 is a discussion packet flow diagram for an implementation of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Example 1
As shown in fig. 3, the embodiment of the present invention discloses a voice recognition method based on voiceprint, which comprises the following steps:
and S1, acquiring audio information of a plurality of different positions in the environment.
And S2, separating individual sounds according to the voiceprint characteristics in the plurality of audio information, grouping the individual sounds, recording time information, and fusing the individual sounds in each group to obtain individual enhanced audio information.
And S3, calculating the position of the individual according to the time information in the individual sound in each group and the position of the audio acquisition module so as to assist the video information to position the individual.
S4, grouping the discussions according to the individual position information, the sound intensity distribution, the video information and the semantics.
And S5, displaying and playing the individual enhanced audio information in the discussion packet.
In one embodiment, the method for separating the individual sound in step S2 may specifically be: individual sound information is collected in advance, stored and subjected to voice modeling to form a separation model; and comparing the individual sound with the separation model to perform similarity operation to realize individual sound discrimination, and then extracting the individual sound from the environmental sound, wherein the extraction method can be specifically a neural network filtering algorithm obtained by training according to frequency spectrum information.
In one embodiment, the grouping of the individual voices in S2 is to group the separated individual voices and assign time stamps to the grouped individual voices.
In one embodiment, the individual sound fusion method in S2 is specifically: the same individual sounds in each group are fused according to the spectral information.
In one embodiment, the individual location positioning method in S3 is calculated according to the flight time of the received individual voice, and then the identity of the individual voice is further verified according to the face recognition module in the positioning location video; to improve the recognition accuracy.
In one embodiment, as shown in fig. 4, the specific method for grouping the implementation discussions in S4 is as follows:
s4-1, establishing an individual position distribution map;
s4-2, establishing sound intensity distribution, and determining grouping possible objects according to the positions which can be reached by the sound intensity by combining the individual position distribution map;
s4-3 identifies speakers and listeners in the video information, and determines discussion group individuals in conjunction with individual voice semantics of the group potential objects in S4-2.
The discussion group realized based on the method can dynamically determine the association of the group discussion personnel and the discussion content, can dynamically determine the group in real time, and can determine the group discussion content for subsequent processing.
In one embodiment, the method for displaying and playing the individual enhanced audio information in the discussion group in S5 is specifically as follows: displaying the discussion group in the S4 in the teaching terminal according to the discussion content, clicking a certain discussion group to open the group discussion content, wherein the group discussion content comprises the individual enhanced audio information obtained in the S2 by the discussion group individual. The method and the device can record and playback the group discussion information, so that the teaching personnel can obtain different group discussion information at the same time, the teaching of discussion courses can be more favorably carried out, and a certain group or a certain individual cannot be ignored.
Example 2
Referring to fig. 1, an embodiment of the present application provides a voice recognition system based on voiceprint, including:
the system comprises a plurality of audio acquisition modules 1, a plurality of audio acquisition modules and a plurality of audio processing modules, wherein the audio acquisition modules simultaneously acquire a plurality of audio information in the environment;
the plurality of video acquisition modules 2 are used for acquiring a plurality of video information in the environment; the audio processing module 3 is used for receiving the plurality of audio information, separating individual sounds according to voiceprint characteristics in the plurality of audio information, grouping the individual sounds and recording time information, and fusing the individual sounds in each group to obtain individual enhanced audio information; the positioning module 4 is used for calculating the position of the individual according to the time information in the individual sound in each group and the position of the audio acquisition module so as to assist the video in positioning the individual; the grouping module 5 is used for realizing discussion grouping according to individual position information, sound intensity distribution, video information and semantics, wherein the individual enhanced audio information is adopted by voice in the discussion grouping; and the teaching interaction module 6 is used for displaying and playing the discussion grouping voice.
Further, the audio acquisition module is disposed at different locations in the environment.
In one embodiment, as shown in fig. 2, the audio processing module 3 comprises: the distributed voice frequency receiving module 3-1, the distributed voice print recognition module 3-2 consisting of a plurality of voice print recognition modules and the voice fusion module 3-3, wherein the distributed voice frequency receiving module 3-1 can receive the voice frequency information in the plurality of voice frequency obtaining modules 1 in parallel; the voiceprint recognition module can recognize audio information and separate individual sounds; the voiceprint identification modules respectively identify the audio information in the audio acquisition modules 1; the voice fusion module 3-3 fuses the individual voices of the same individual.
In one embodiment, the grouping module 5 further comprises: the semantic recognition module comprises a semantic understanding module and an extended semantic database which is updated in real time, and is used for training by combining semantic information in individual voices recorded in history to obtain individual semantics; meanwhile, the semantic recognition module is also used for integrating a plurality of individual voices so as to recognize two or more than two conversation objects; the priority of the integrated individual voices is determined by the individual position information, the sound intensity and the video information, whether the individuals face and whether the sound intensity can be obtained by a conversation object is determined according to time tag signals in the audio and the video, and then semantic analysis is carried out according to the individual voices in the voice recognition module to recognize two or more than two conversation objects.
In one embodiment, the grouping module 5 further includes a sound intensity distribution calculating module, which can calculate the sound intensity distribution according to the individual sounds in different groups.
In one embodiment, the instructional speech recognition system further comprises a video processing module 7, the video processing module 7 being configured to recognize the face and gesture and transmit the data with the tag information to the grouping module 5.
In one embodiment, the teaching interaction module can display the discussion grouping situation through a touch screen, and the teacher can select and play back individual enhanced audio information in the discussion grouping through the touch screen.
In one embodiment, the audio processing module 3 and the video processing module 7 may be implemented by a DSP or an FPGA with an audio processing algorithm and a video processing algorithm, and the audio and video information obtained at the same time is processed in parallel to improve the calculation efficiency and save the processing time.
In one embodiment, the positioning module 4 and the grouping module 5 can be used for realizing high-speed calculation by a CPU, and the CPU is connected with a touch screen through a peripheral circuit to realize a teaching interaction module.
Although embodiments of the present invention have been shown and described, it will be appreciated by those skilled in the art that changes, modifications, substitutions and alterations can be made in these embodiments without departing from the principles and spirit of the invention, the scope of which is defined in the appended claims and their equivalents.
Claims (6)
1. A voice recognition method based on voiceprint, characterized in that the method comprises the following steps:
s1, acquiring audio information of a plurality of different positions in the environment;
s2, separating individual sounds according to the voiceprint characteristics in the plurality of audio information, grouping the individual sounds, recording time information, and fusing the individual sounds in each group to obtain individual enhanced audio information;
s3, calculating the position of the individual according to the time information in the individual sound in each group and the position of the audio acquisition module to assist the video information in positioning the individual;
s4, realizing discussion grouping according to the individual position information, the sound intensity distribution, the video information and the semantics;
and S5, displaying and playing the individual enhanced audio information in the discussion packet.
2. The voiceprint based speech recognition method according to claims 1 and 2, wherein the method of separating individual voices in step S2 is: individual voice information is collected in advance, stored and subjected to voice modeling to form a separation model.
3. The voiceprint based speech recognition method according to claim 1, wherein the grouping of the individual voices in step S2 is to group the separated individual voices and to assign time stamps thereto.
4. The voiceprint based speech recognition method according to claim 1, wherein the individual sound fusion method in step S2 is to fuse the same individual sound in each group according to the spectrum information.
5. The voiceprint based speech recognition method according to claim 1, wherein the individual position locating method in step S3 is calculated from the flight time of the received individual voice.
6. The voiceprint based speech recognition method according to claim 2, wherein the similarity operation is performed based on comparison of the individual voice with the separation model to achieve individual voice discrimination, and then the individual voice is extracted from the environmental voice.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110124834.3A CN112885356B (en) | 2021-01-29 | 2021-01-29 | Voice recognition method based on voiceprint |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110124834.3A CN112885356B (en) | 2021-01-29 | 2021-01-29 | Voice recognition method based on voiceprint |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112885356A true CN112885356A (en) | 2021-06-01 |
CN112885356B CN112885356B (en) | 2021-09-24 |
Family
ID=76053524
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110124834.3A Active CN112885356B (en) | 2021-01-29 | 2021-01-29 | Voice recognition method based on voiceprint |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112885356B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116866783A (en) * | 2023-09-04 | 2023-10-10 | 广州乐庚信息科技有限公司 | Intelligent classroom audio control system, method and storage medium |
Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130198635A1 (en) * | 2010-04-30 | 2013-08-01 | American Teleconferencing Services, Ltd. | Managing Multiple Participants at the Same Location in an Online Conference |
US20150029901A1 (en) * | 2013-07-24 | 2015-01-29 | Vonage Network Llc | Method and Apparatus for Providing Bridgeless Conferencing Services |
CN105551502A (en) * | 2015-12-18 | 2016-05-04 | 合肥寰景信息技术有限公司 | Network-teaching real-time voice analysis system |
CN107862060A (en) * | 2017-11-15 | 2018-03-30 | 吉林大学 | A kind of semantic recognition device for following the trail of target person and recognition methods |
CN107918821A (en) * | 2017-03-23 | 2018-04-17 | 广州思涵信息科技有限公司 | Teachers ' classroom teaching process analysis method and system based on artificial intelligence technology |
CN108921521A (en) * | 2018-08-17 | 2018-11-30 | 四川网道科技发展有限公司 | A kind of conference service management system |
CN109150556A (en) * | 2018-07-31 | 2019-01-04 | 何镝 | More people's teleconferences based on speech recognition record system |
CN110136032A (en) * | 2018-02-09 | 2019-08-16 | 北京新唐思创教育科技有限公司 | Classroom interaction data processing method and computer storage medium based on courseware |
US20190333520A1 (en) * | 2018-04-30 | 2019-10-31 | International Business Machines Corporation | Cognitive print speaker modeler |
CN110444211A (en) * | 2019-08-23 | 2019-11-12 | 青岛海信电器股份有限公司 | A kind of audio recognition method and equipment |
WO2019217101A1 (en) * | 2018-05-06 | 2019-11-14 | Microsoft Technology Licensing, Llc | Multi-modal speech attribution among n speakers |
CN110544481A (en) * | 2019-08-27 | 2019-12-06 | 华中师范大学 | S-T classification method and device based on voiceprint recognition and equipment terminal |
US20200194000A1 (en) * | 2018-04-11 | 2020-06-18 | Google Llc | Low latency nearby group translation |
CN111507581A (en) * | 2020-03-26 | 2020-08-07 | 威比网络科技(上海)有限公司 | Course matching method, system, equipment and storage medium based on speech speed |
CN111818294A (en) * | 2020-08-03 | 2020-10-23 | 上海依图信息技术有限公司 | Method, medium and electronic device for multi-person conference real-time display combined with audio and video |
-
2021
- 2021-01-29 CN CN202110124834.3A patent/CN112885356B/en active Active
Patent Citations (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20130198635A1 (en) * | 2010-04-30 | 2013-08-01 | American Teleconferencing Services, Ltd. | Managing Multiple Participants at the Same Location in an Online Conference |
US20150029901A1 (en) * | 2013-07-24 | 2015-01-29 | Vonage Network Llc | Method and Apparatus for Providing Bridgeless Conferencing Services |
CN105551502A (en) * | 2015-12-18 | 2016-05-04 | 合肥寰景信息技术有限公司 | Network-teaching real-time voice analysis system |
CN107918821A (en) * | 2017-03-23 | 2018-04-17 | 广州思涵信息科技有限公司 | Teachers ' classroom teaching process analysis method and system based on artificial intelligence technology |
CN107862060A (en) * | 2017-11-15 | 2018-03-30 | 吉林大学 | A kind of semantic recognition device for following the trail of target person and recognition methods |
CN110136032A (en) * | 2018-02-09 | 2019-08-16 | 北京新唐思创教育科技有限公司 | Classroom interaction data processing method and computer storage medium based on courseware |
US20200194000A1 (en) * | 2018-04-11 | 2020-06-18 | Google Llc | Low latency nearby group translation |
US20190333520A1 (en) * | 2018-04-30 | 2019-10-31 | International Business Machines Corporation | Cognitive print speaker modeler |
WO2019217101A1 (en) * | 2018-05-06 | 2019-11-14 | Microsoft Technology Licensing, Llc | Multi-modal speech attribution among n speakers |
CN109150556A (en) * | 2018-07-31 | 2019-01-04 | 何镝 | More people's teleconferences based on speech recognition record system |
CN108921521A (en) * | 2018-08-17 | 2018-11-30 | 四川网道科技发展有限公司 | A kind of conference service management system |
CN110444211A (en) * | 2019-08-23 | 2019-11-12 | 青岛海信电器股份有限公司 | A kind of audio recognition method and equipment |
CN110544481A (en) * | 2019-08-27 | 2019-12-06 | 华中师范大学 | S-T classification method and device based on voiceprint recognition and equipment terminal |
CN111507581A (en) * | 2020-03-26 | 2020-08-07 | 威比网络科技(上海)有限公司 | Course matching method, system, equipment and storage medium based on speech speed |
CN111818294A (en) * | 2020-08-03 | 2020-10-23 | 上海依图信息技术有限公司 | Method, medium and electronic device for multi-person conference real-time display combined with audio and video |
Non-Patent Citations (5)
Title |
---|
BOHONG YANG,ET AL.: "In-classroom learning analytics based on student behavior, topic and teaching characteristic mining", 《PATTERN RECOGNITION LETTERS》 * |
MUHAMMAD SALMAN KHAN ,ET AL.: "Video-Aided Model-Based Source Separation in Real Reverberant Rooms", 《 IEEE TRANSACTIONS ON AUDIO, SPEECH, AND LANGUAGE PROCESSING》 * |
张军翔: "基于视频流媒体技术的交互教学系统的研究", 《湖北师范学院学报(自然科学版)》 * |
张明源: "智能会议环境下语音增强技术及DSP系统设计", 《中国优秀硕士学位论文全文数据库 (信息科技辑)》 * |
闫晶: "网络环境下英语国家概况课程教学模式研究", 《安阳工学院学报》 * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116866783A (en) * | 2023-09-04 | 2023-10-10 | 广州乐庚信息科技有限公司 | Intelligent classroom audio control system, method and storage medium |
CN116866783B (en) * | 2023-09-04 | 2023-11-28 | 广州乐庚信息科技有限公司 | Intelligent classroom audio control system, method and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN112885356B (en) | 2021-09-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111915148B (en) | Classroom teaching evaluation method and system based on information technology | |
CN110147726A (en) | Business quality detecting method and device, storage medium and electronic device | |
CN107918821A (en) | Teachers ' classroom teaching process analysis method and system based on artificial intelligence technology | |
CN110517689A (en) | A kind of voice data processing method, device and storage medium | |
CN106294774A (en) | User individual data processing method based on dialogue service and device | |
CN110796098B (en) | Method, device, equipment and storage medium for training and auditing content auditing model | |
CN108154304A (en) | There is the server of Teaching Quality Assessment | |
CN111709358A (en) | Teacher-student behavior analysis system based on classroom video | |
CN107240047A (en) | The credit appraisal procedure and device of a kind of instructional video | |
CN111046819A (en) | Behavior recognition processing method and device | |
CN114465737B (en) | Data processing method and device, computer equipment and storage medium | |
CN103700370A (en) | Broadcast television voice recognition method and system | |
CN107609736A (en) | A kind of teaching diagnostic analysis system and method for integrated application artificial intelligence technology | |
KR20160043865A (en) | Method and Apparatus for providing combined-summary in an imaging apparatus | |
CN109271533A (en) | A kind of multimedia document retrieval method | |
CN113840109B (en) | Classroom audio and video intelligent note taking method | |
CN111048095A (en) | Voice transcription method, equipment and computer readable storage medium | |
CN109286848B (en) | Terminal video information interaction method and device and storage medium | |
CN108109446A (en) | Teaching class feelings monitoring system | |
CN112885356B (en) | Voice recognition method based on voiceprint | |
CN110148418B (en) | Scene record analysis system, method and device | |
CN112102129A (en) | Intelligent examination cheating identification system based on student terminal data processing | |
CN116050892A (en) | Intelligent education evaluation supervision method based on artificial intelligence | |
CN114065720A (en) | Conference summary generation method and device, storage medium and electronic equipment | |
CN113837907A (en) | Man-machine interaction system and method for English teaching |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |