CN105957531B - Speech content extraction method and device based on cloud platform - Google Patents

Speech content extraction method and device based on cloud platform Download PDF

Info

Publication number
CN105957531B
CN105957531B CN201610260647.7A CN201610260647A CN105957531B CN 105957531 B CN105957531 B CN 105957531B CN 201610260647 A CN201610260647 A CN 201610260647A CN 105957531 B CN105957531 B CN 105957531B
Authority
CN
China
Prior art keywords
audio
voice
speech
video
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610260647.7A
Other languages
Chinese (zh)
Other versions
CN105957531A (en
Inventor
俞凯
谢其哲
吴学阳
李文博
郭运奇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sipic Technology Co Ltd
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN201610260647.7A priority Critical patent/CN105957531B/en
Publication of CN105957531A publication Critical patent/CN105957531A/en
Application granted granted Critical
Publication of CN105957531B publication Critical patent/CN105957531B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Electrically Operated Instructional Devices (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

A speech content extraction method and device based on a cloud platform comprise the following steps: collecting audio and video of a speech, caching the collected audio and video into a PC (personal computer), and preprocessing the audio and video; sending the preprocessed audio and video and the relevant data including a lecture slide, relevant reading materials and the like to a server; the server performs voice segmentation on the received audio and segments the audio according to speakers; performing automatic voice recognition to convert the segmented audio into characters, wherein the voice recognition uses acoustic self-adaptation and language model self-adaptation; keywords are extracted from the speech recognized text and content notes are generated. The method identifies the audio frequency into a text form capable of being read repeatedly through voice identification, and improves the identification accuracy rate by using language model self-adaption and acoustic model self-adaption. And the knowledge integration is carried out, so that the time is prevented from being spent on reading redundant information. The invention also discloses a speech content extraction device based on the cloud platform, which comprises a speech recording module, a material sending module, a voice segmentation module, a voice recognition module and a keyword and content note extraction module.

Description

Speech content extraction method and device based on cloud platform
Technical Field
The invention relates to a technology in the field of word processing, in particular to a method and a device for extracting speech content based on a cloud platform.
Background
In the information age, technological advances and advances have made it possible to obtain information from all over the world, ancient and today, in quantities far exceeding the range of human audibility. In order to help people to acquire information more efficiently, the voice signal processing and natural language processing technology can effectively and automatically process massive information and extract key information and content in the massive information for people to read quickly.
In life, everyone listens a large amount of information through channels such as media, classes and the like every day, and the extraction of the information into a text form capable of being read repeatedly becomes important, so that people can read and learn quickly, and the accuracy of voice recognition is improved by the language model self-adaption method and the acoustic model self-adaption method. And the knowledge integration is carried out, so that the time is prevented from being spent on reading redundant information.
It has been found through prior art search that chinese patent document CN102292766B discloses a "method and apparatus for speech processing", which relates to a method, apparatus and computer program product for providing an architecture for a composite model for speech recognition adaptation, selecting a model based on speech characteristics of a specific speaker to improve recognition accuracy. But this approach does not involve adaptation to the language model to improve accuracy for professional vocabulary.
Further retrieval finds that Chinese patent document No. CN102122506A discloses a speech recognition method, and the system utilizes a search engine to retrieve relevant text training language models, thereby improving speech recognition rate and reducing workload of manual proofreading. However, the method needs an external search engine, is long in time consumption and is not beneficial to processing a large amount of voice.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a speech content extraction method and device based on a cloud platform, which are used for recognizing audio into a text form capable of being read repeatedly through voice recognition and improving the recognition accuracy by using language model self-adaptation and acoustic model self-adaptation. And the knowledge integration is carried out, so that the time is prevented from being spent on reading redundant information.
The invention is realized by the following technical scheme:
the invention relates to a speech content extraction method based on a cloud platform, which comprises the following steps:
step 1) collecting audio and video of a speech, caching the collected audio and video into a PC (personal computer), and preprocessing the audio and video;
step 2) sending the preprocessed audio/video and related data including lecture slides, related reading materials and the like to a server;
step 3) the server performs voice segmentation on the received audio and segments the audio according to speakers;
step 4) automatic voice recognition is carried out to convert the segmented audio frequency into characters, and the voice recognition uses acoustic self-adaptation and language model self-adaptation;
and 5) extracting keywords from the text of the voice recognition and generating a content note.
Preferably, the collection is performed by using a microphone, a camera and other equipment to collect the audio and video of the speech, and the audio and video are simultaneously cached in the PC by using a wired or wireless network;
and (3) carrying out voice enhancement on the audio by using a PC (personal computer) to remove noise, and compressing the audio and video.
The voice segmentation mode is that the server detects voice activity of the received audio and segments the audio according to voice pause; the voice is divided according to the speaker, namely, the speaker of each section of voice is identified, and the audio is divided according to the speaker.
The acoustic self-adaptation comprises the adaptation to the recording environment, the noise type, the speaker type and the like;
the language model adaptation includes adaptation to professional vocabulary in courseware and related reading materials.
The extraction comprises the following steps: and extracting keywords related to the speech content in the text subjected to voice recognition, and extracting notes related to the speech according to the relevance of each sentence in the text to the speech content.
The invention relates to a speech content extraction device for realizing the method, which comprises the following steps: the system comprises a speech recording module, a material sending module, a voice segmentation module, a voice recognition module and a keyword and content note extraction module, wherein the speech recording module is used for collecting speech audios and videos, caching the collected audios and videos into a Personal Computer (PC) in a classroom and carrying out pretreatment, the material sending module is used for sending the pretreated audios and videos and relevant data including speech slides, relevant reading materials and the like to a server, the voice segmentation module is used for carrying out voice segmentation on the received audios and segmenting the audios according to speakers, the voice recognition module is used for carrying out automatic voice recognition to convert the segmented audios into characters, and the voice recognition module uses acoustic self-adaptation and language model self-adaptation and the keyword and content note extraction module is used for extracting keywords from.
The speech recording module is used for collecting audio and video of speech by using devices such as a microphone and a camera, simultaneously caching the audio and video into a PC (personal computer) by using a wired or wireless network, performing voice enhancement on the audio by using the PC to remove noise, and compressing the audio and video.
The voice segmentation is used for carrying out voice activity detection on the received audio and segmenting according to voice pause; the speaker-to-speaker segmented speech is used for identifying the speaker of each section of speech, and the audio is segmented according to the speaker.
The voice recognition module is used for obtaining a text corresponding to each sentence of audio by using automatic voice recognition, and the acoustic self-adaption is used for adapting to a recording environment, a noise type, a speaker type and the like; the language model adaptation is used for adaptation to professional vocabularies in lecture slides and related reading materials.
The keyword and content note extraction module is used for extracting keywords related to the speech content in the speech recognition text and extracting notes related to the speech according to the relevance of each sentence in the text to the speech content.
Technical effects
Compared with the prior art, the invention recognizes the audio frequency into the text form which can be read repeatedly through the voice recognition, and improves the recognition accuracy rate by using the language model self-adaption and the acoustic model self-adaption. And the knowledge integration is carried out, so that the time is prevented from being spent on reading redundant information.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
FIG. 2 is a schematic diagram of the structure of the device of the present invention.
Detailed Description
Example 1
The embodiment comprises the following steps:
101. collecting audio and video of a speech, caching the collected audio and video into a PC (personal computer), and preprocessing the audio and video;
in the embodiment of the invention, the audio and video of the speech are collected, the collected audio and video are cached in the PC, the preprocessing comprises the steps of collecting the audio and video of the speech by using equipment such as a microphone, a camera and the like, and the audio and video are simultaneously cached in the PC by utilizing a wired or wireless network; and (3) carrying out voice enhancement on the audio by using a PC (personal computer) to remove noise, and compressing the audio and video.
102. Sending the preprocessed audio and video and the relevant data including a lecture slide, relevant reading materials and the like to a server;
103. the server performs voice segmentation on the received audio and segments the audio according to speakers;
in the embodiment of the invention, the voice segmentation mode is that the server detects voice activity of the received audio and performs segmentation according to voice pause; the voice is divided according to the speaker, namely, the speaker of each section of voice is identified, and the audio is divided according to the speaker.
104. Performing automatic voice recognition to convert the segmented audio into characters, wherein the voice recognition uses acoustic self-adaptation and language model self-adaptation;
in the embodiment of the invention, the acoustic adaptation comprises adaptation to recording environment, noise type, speaker type and the like; the language model adaptation includes adaptation to professional vocabulary in lecture slides and related reading material.
105. Keywords are extracted from the speech recognized text and content notes are generated.
Extracting keywords from the speech recognized text and generating a content note from the speech recognized text in the present example includes: and extracting keywords related to the speech content in the text subjected to voice recognition, and extracting notes related to the speech according to the relevance of each sentence in the text to the speech content.
Example 2
As shown in fig. 2, a schematic structural diagram of a data search apparatus based on interactive input according to an embodiment of the present invention is provided, where the apparatus includes: the lecture recording module 21 material sending module 22 voice segmentation module 23 voice recognition module 24 and keyword and content note extraction module 25.
The speech recording module 21 is used for acquiring speech audio and video, caching the acquired audio and video into a PC (personal computer) in a classroom, and preprocessing the audio and video;
the speech recording module 21 is used for acquiring audio and video of the speech by using a microphone, a camera and other devices, simultaneously caching the audio and video into a PC (personal computer) by using a wired or wireless network, performing voice enhancement on the audio by using the PC to remove noise, and compressing the audio and video.
For example, a video camera is used for recording a deep learning course, a teacher wears a wallet type microphone, students answering questions use a wireless microphone to cache recorded videos and audios to a Personal Computer (PC) in a classroom, background sounds such as air conditioner noise and construction noise are removed by using a filtering method such as an adaptive cancellation method, and the audio and video are compressed to enable the size of a file to be suitable for network transmission.
The material sending module 22 is configured to send the preprocessed audio/video and the relevant data including the lecture slides and the relevant reading materials to the server.
Specifically, speech enhancement, compressed audio and video, deep learning slides, deep learning reading material, and the like are transmitted to the HTTP server.
The voice segmentation module 23 is configured to perform voice segmentation on the received audio and segment the audio according to the speaker.
The voice segmentation in the voice segmentation module 23 is used for performing voice activity detection on the received audio and performing segmentation according to voice pause; the voice is divided according to the speaker for identifying the speaker of each section of voice, and the audio is divided according to the speaker.
Specifically, a part with voice is cut according to short-time energy and zero-crossing rate detection, and i-vector of each voice is extracted to identify that a speaker is a teacher and different students.
The speech recognition module 24 is configured to perform automatic speech recognition to convert the segmented audio into text, where the speech recognition uses acoustic adaptation and language model adaptation.
The voice recognition module 24 is configured to obtain a text corresponding to each sentence of audio by using automatic voice recognition, and the acoustic adaptation is configured to adapt to a recording environment, a noise type, a speaker type, and the like; the language model adaptation is used for adaptation to professional vocabularies in lecture slides and related reading materials.
Specifically, clustering the audio according to i-vector during training of the acoustic model, training an acoustic model based on a deep neural network for the audio of each cluster, finding the closest cluster of the i-vector during audio identification, and using the acoustic model of the cluster.
And extracting the reverse file frequency of each word by using massive texts, and counting keywords in deep learning courseware and extended reading by using TF-IDF. As for extended reading "Gradient Descent (GD) is a common method to minimize the risk function, the loss function, random gradient descent and bulk gradient descent are two iterative solution ideas. Batch gradient descent-minimizing the loss function of all training samples so that the final solution is a global optimal solution, i.e. the parameters of the solution are such that the risk function is minimized. Stochastic gradient descent-minimizing the loss function for each sample, and although the loss function obtained from each iteration is not oriented towards the global optimal solution, the direction of the large whole is oriented towards the global optimal solution, and the final result is often in the vicinity of the global optimal solution. "the keywords" gradient descent "," random gradient descent "," batch gradient descent "," loss function ", etc. in the extended reading can be extracted, and some commonly used words such as" commonly used method "," one "," minimize ", etc. will not be listed as keywords because the TF-IDF weight is too low.
When a recursive neural network-based language model is used to calculate the complexity (complexity) of a sentence, assuming that the model parameter is θ, the original calculation formula of the complexity (complexity) is:wherein: n is the length of the sentence, and for the keywords in this field, the complexity property can be written as:
when w isiThe key word for this field, then q (w)i) Is 1, otherwise is 0. λ is a hyperparameter. The method can improve the recognition rate of professional vocabularies.
And a keyword and content note extracting module 25, configured to extract keywords from the text and generate a content note by the server.
The keyword and content note extraction module 25 is configured to extract keywords related to the speech content in the speech-recognized text, and extract notes related to the speech according to the relevance between each sentence in the text and the speech content.
In this case, for example, the text after speech recognition is "for many machine learning algorithms, including linear regression, logistic regression, neural network, etc., the algorithm is implemented by finding a certain cost function or an optimized target, and then using a gradient descent method as the optimization algorithm to find the minimum value of the cost function. When our training set is large, the batch gradient descent algorithm appears to be very computationally intensive. Assuming you have ten million pictures of cats, performing the batch gradient descent algorithm once is equivalent to looking at one time of ten million pictures, and we need to find some less time-consuming method to find the characteristics of most cats. In this course, we want to introduce a different approach to batch gradient descent: the random gradient decreases. "
Similarly, through TF-IDF analysis, we can obtain that the words 'gradient descent', 'random gradient descent' and 'neural network' which rarely appear in the daily text but more appear in the speech recognition result are used as key words, and obtain the TF-IDF weight values.
Then, the weight of the sentence is calculated to be the average value of the weight of each word TF-IDF in the sentence, the sentence with the highest weight is output to be used as a content note, and for many machine learning algorithms including linear regression, logistic regression, neural networks and the like, the realization of the algorithm is realized by obtaining a certain cost function or a certain optimized target, and then the minimum value of the cost function is obtained by using a gradient descent method as an optimization algorithm. When our training set is large, the batch gradient descent algorithm appears to be very computationally intensive. In this course, we want to introduce a different approach to batch gradient descent: the random gradient decreases. "
The device provided by the embodiment of the invention identifies the audio frequency into a text form capable of being read repeatedly through voice identification, and improves the identification accuracy rate by using language model self-adaptation and acoustic model self-adaptation. And the knowledge integration is carried out, so that the time is prevented from being spent on reading redundant information.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, and the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The foregoing embodiments may be modified in many different ways by those skilled in the art without departing from the spirit and scope of the invention, which is defined by the appended claims and all changes that come within the meaning and range of equivalency of the claims are therefore intended to be embraced therein.

Claims (1)

1. The utility model provides a speech content extraction element based on cloud platform which characterized in that includes: the system comprises a speech recording module, a material sending module, a voice segmentation module, a voice recognition module and a keyword and content note extraction module, wherein the speech recording module is used for collecting speech audios and videos, caching the collected audios and videos into a Personal Computer (PC) in a classroom and carrying out pretreatment;
the acquisition comprises the following steps: collecting audio and video of a speech by using a microphone and a camera, and simultaneously caching the audio and video into a PC (personal computer) by using a wired or wireless network; carrying out voice enhancement on the audio by using a PC (personal computer) to remove noise, and compressing the audio and the video;
the voice segmentation mode is that the server detects voice activity of the received audio and segments the audio according to voice pause; the voice is divided according to the speaker, namely, the speaker of each section of voice is identified, and the audio is divided according to the speaker;
the acoustic self-adaptation comprises adaptation to a recording environment, a noise type and a speaker type; the language model self-adaptation comprises the adaptation to professional vocabularies in lecture slides and related reading materials;
the extraction comprises the following steps: extracting keywords related to the speech content in the text subjected to voice recognition, and extracting notes related to the speech according to the relevance of each sentence in the text to the speech content;
the speech recording module collects the audio and video of the speech through a microphone and a camera, the audio and video are simultaneously cached in a PC (personal computer) by utilizing a wired or wireless network, the PC is used for carrying out voice enhancement on the audio to remove noise, and the audio and video are compressed;
the voice segmentation detects voice activity of the received audio and performs segmentation according to voice pause; the voice is divided according to the speaker and is used for identifying the speaker of each section of voice, and the audio is divided according to the speaker, and the method specifically comprises the following steps: detecting and cutting out a part with voice according to the short-time energy and the zero crossing rate, and extracting an i-vector of each section of voice to identify a speaker as a teacher and different students;
the voice recognition module is used for obtaining a text corresponding to each sentence of audio by using automatic voice recognition, and the acoustic self-adaption is used for adapting to a recording environment, a noise type and a speaker type; the language model self-adaptation is used for adapting to professional vocabularies in lecture slides and related reading materials, and specifically comprises the following steps: clustering the audio according to i-vector during model training, training an acoustic model based on a deep neural network for the audio of each cluster, finding the closest cluster of the i-vector during audio identification, and using the acoustic model of the cluster;
the keyword and content note extraction module is used for extracting keywords related to the speech content in the speech recognition text, and extracting notes related to the speech according to the relevance of each sentence in the text to the speech content, and specifically comprises the following steps: the TF-IDF is used for counting keywords in deep learning courseware and extended reading, and the recursive neural network-based language model is used for calculating the complexity of the keywords in the field:wherein: the model parameter is theta, N is the length of the sentence, when wiThe key word for this field, then q (w)i) Is 1, otherwise 0, λ is a hyperparameter.
CN201610260647.7A 2016-04-25 2016-04-25 Speech content extraction method and device based on cloud platform Active CN105957531B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610260647.7A CN105957531B (en) 2016-04-25 2016-04-25 Speech content extraction method and device based on cloud platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610260647.7A CN105957531B (en) 2016-04-25 2016-04-25 Speech content extraction method and device based on cloud platform

Publications (2)

Publication Number Publication Date
CN105957531A CN105957531A (en) 2016-09-21
CN105957531B true CN105957531B (en) 2019-12-31

Family

ID=56915289

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610260647.7A Active CN105957531B (en) 2016-04-25 2016-04-25 Speech content extraction method and device based on cloud platform

Country Status (1)

Country Link
CN (1) CN105957531B (en)

Families Citing this family (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108335693B (en) * 2017-01-17 2022-02-25 腾讯科技(深圳)有限公司 Language identification method and language identification equipment
CN108022583A (en) * 2017-11-17 2018-05-11 平安科技(深圳)有限公司 Meeting summary generation method, application server and computer-readable recording medium
CN107818797B (en) * 2017-12-07 2021-07-06 苏州科达科技股份有限公司 Voice quality evaluation method, device and system
CN108256512A (en) * 2018-03-22 2018-07-06 长春大学 Listen the raw inclusive education classroom auxiliary system of barrier and device
CN108597521A (en) * 2018-05-04 2018-09-28 徐涌 Audio role divides interactive system, method, terminal and the medium with identification word
CN108921284B (en) * 2018-06-15 2020-11-17 山东大学 Interpersonal interaction limb language automatic generation method and system based on deep learning
CN109582823A (en) * 2018-11-21 2019-04-05 平安科技(深圳)有限公司 Video information chain type storage method, device, computer equipment and storage medium
CN109934188B (en) * 2019-03-19 2020-10-30 上海大学 Slide switching detection method, system, terminal and storage medium
CN111723816B (en) * 2020-06-28 2023-10-27 北京联想软件有限公司 Acquisition method of teaching notes and electronic equipment
CN111897918A (en) * 2020-07-28 2020-11-06 扬州大学 Online teaching classroom note generation method
CN111932964A (en) * 2020-08-21 2020-11-13 扬州大学 Online live broadcast teaching method
CN112767753B (en) * 2021-01-08 2022-07-22 中国石油大学胜利学院 Supervision type intelligent online teaching system and action method thereof
CN113409632A (en) * 2021-07-20 2021-09-17 国网安徽省电力有限公司培训中心 Classroom is teaching machine for speech recognition
CN114501112B (en) * 2022-01-24 2024-03-22 北京百度网讯科技有限公司 Method, apparatus, device, medium, and article for generating video notes

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP4127668B2 (en) * 2003-08-15 2008-07-30 株式会社東芝 Information processing apparatus, information processing method, and program
CN102280106A (en) * 2010-06-12 2011-12-14 三星电子株式会社 VWS method and apparatus used for mobile communication terminal
CN101923854B (en) * 2010-08-31 2012-03-28 中国科学院计算技术研究所 Interactive speech recognition system and method
CN103165130B (en) * 2013-02-06 2015-07-29 程戈 Speech text coupling cloud system
CN105159870B (en) * 2015-06-26 2018-06-29 徐信 A kind of accurate processing system and method for completing continuous natural-sounding textual
CN105427858B (en) * 2015-11-06 2019-09-03 科大讯飞股份有限公司 Realize the method and system that voice is classified automatically

Also Published As

Publication number Publication date
CN105957531A (en) 2016-09-21

Similar Documents

Publication Publication Date Title
CN105957531B (en) Speech content extraction method and device based on cloud platform
CN110674339B (en) Chinese song emotion classification method based on multi-mode fusion
CN106328147B (en) Speech recognition method and device
CN108319666B (en) Power supply service assessment method based on multi-modal public opinion analysis
CN110852215B (en) Multi-mode emotion recognition method and system and storage medium
CN107154264A (en) The method that online teaching wonderful is extracted
CN111785275A (en) Voice recognition method and device
CN107943786B (en) Chinese named entity recognition method and system
Boishakhi et al. Multi-modal hate speech detection using machine learning
Ding et al. Audio-visual keyword spotting based on multidimensional convolutional neural network
CN111145903A (en) Method and device for acquiring vertigo inquiry text, electronic equipment and inquiry system
CN111180025A (en) Method and device for representing medical record text vector and inquiry system
CN111510765A (en) Audio label intelligent labeling method and device based on teaching video
Chen et al. Towards unsupervised automatic speech recognition trained by unaligned speech and text only
Zhu et al. Siamese Recurrent Auto-Encoder Representation for Query-by-Example Spoken Term Detection.
CN112951237B (en) Automatic voice recognition method and system based on artificial intelligence
CN113555133A (en) Medical inquiry data processing method and device
KR20170086233A (en) Method for incremental training of acoustic and language model using life speech and image logs
CN117116251A (en) Repayment probability assessment method and device based on collection-accelerating record
Singh et al. Speaker Recognition Assessment in a Continuous System for Speaker Identification
Dua et al. Gujarati language automatic speech recognition using integrated feature extraction and hybrid acoustic model
Yu Research on music emotion classification based on CNN-LSTM network
Jeyasheeli et al. Deep learning methods for suicide prediction using audio classification
CN114822557A (en) Method, device, equipment and storage medium for distinguishing different sounds in classroom
Xiao A comparative study on speaker gender identification using mfcc and statistical learning methods

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20200617

Address after: Room 105G, 199 GuoShoujing Road, Pudong New Area, Shanghai, 200120

Patentee after: Shanghai Jiaotong University Intellectual Property Management Co.,Ltd.

Address before: 200240 Dongchuan Road, Shanghai, No. 800, No.

Patentee before: SHANGHAI JIAO TONG University

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20201028

Address after: Building 14, Tengfei Innovation Park, no.388 Xinping street, Suzhou Industrial Park, Jiangsu Province, 215000

Patentee after: AI SPEECH Ltd.

Address before: Room 105G, 199 GuoShoujing Road, Pudong New Area, Shanghai, 200120

Patentee before: Shanghai Jiaotong University Intellectual Property Management Co.,Ltd.

TR01 Transfer of patent right
CP01 Change in the name or title of a patent holder

Address after: 215000 building 14, Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou City, Jiangsu Province

Patentee after: Sipic Technology Co.,Ltd.

Address before: 215000 building 14, Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou City, Jiangsu Province

Patentee before: AI SPEECH Ltd.

CP01 Change in the name or title of a patent holder