CN105957531A - Speech content extracting method and speech content extracting device based on cloud platform - Google Patents

Speech content extracting method and speech content extracting device based on cloud platform Download PDF

Info

Publication number
CN105957531A
CN105957531A CN201610260647.7A CN201610260647A CN105957531A CN 105957531 A CN105957531 A CN 105957531A CN 201610260647 A CN201610260647 A CN 201610260647A CN 105957531 A CN105957531 A CN 105957531A
Authority
CN
China
Prior art keywords
speech
audio frequency
voice
video
module
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610260647.7A
Other languages
Chinese (zh)
Other versions
CN105957531B (en
Inventor
俞凯
谢其哲
吴学阳
李文博
郭运奇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Sipic Technology Co Ltd
Original Assignee
Shanghai Jiaotong University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Jiaotong University filed Critical Shanghai Jiaotong University
Priority to CN201610260647.7A priority Critical patent/CN105957531B/en
Publication of CN105957531A publication Critical patent/CN105957531A/en
Application granted granted Critical
Publication of CN105957531B publication Critical patent/CN105957531B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Electrically Operated Instructional Devices (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The invention discloses a speech content extracting method and a speech content extracting device based on a cloud platform. The speech content extracting method is characterized in that audio materials and video materials of a speech are acquired, and are cached in a PC for a pretreatment; the pretreated audio materials, the pretreated video materials, related data including speech slides, and related reading materials are transmitted to a server; the server is used for voice segmentation of the received audio materials, and is used for segmenting the audio materials according to speakers; automatic voice identification is used for converting the segmented audio materials into words, and adopts acoustic self-adaption and language model self-adaption; key words are extracted from texts after the voice identification, and content notes are generated. By adopting the speech content extracting method, the audio materials are converted into the text form, which can be read repeatedly, by the audio identification, and the language model self-adaption and the acoustic model self-adaption are used to improve the identification accuracy. Because of knowledge integration, time for reading redundant information is saved. The invention also discloses the speech content extracting device based on the cloud platform. The s speech content extracting device comprises a speech recording module, a material transmitting module, a voice segmenting module, a voice identifying module, and a key word and a content note extracting module.

Description

Speech content extracting method based on cloud platform and device
Technical field
The present invention relates to the technology of a kind of word processing field, a kind of speech content extracting method based on cloud platform And device.
Background technology
In the information age, the development and progress of technology makes us can obtain every day from information all over the world, through the ages, The quantity of these information exceeds well over people can listen to the scope of digestion.In order to help the more efficient acquisition information of people, voice signal Process and natural language processing technique can effectively automatically process magnanimity information, and extract key message therein and content for people Quickly reading.
In life, everyone listens to substantial amounts of information by the channel such as media, classroom every day, and these information retrievals are become The textual form that can repeatedly read becomes most important, and it makes the people can quickly reading learning, language model adaptation and acoustics Model self-adapting method improves the accuracy rate of speech recognition.And carry out Knowledge Integration, it is to avoid time flower is being read in redundancy.
Find that Chinese patent literature CN102292766B discloses one " for speech processes through the retrieval of existing technology Method and apparatus ", the method and device relate to provide for the method for framework of the adaptive composite model of speech recognition, dress Putting and computer program, phonetic feature preference pattern based on specific enunciator improves recognition accuracy.But the method does not relates to And the accuracy rate for specialized vocabulary is improved for language model adaptation.
Retrieve discovery, Chinese patent literature CN102122506A further, disclose one " method of speech recognition ", should System utilizes the text train language model that search engine retrieving is relevant, it is possible to increase phonetic recognization rate, reduces the work of artificial check and correction Amount.But the method need to utilize external search engine, the longest, it is unfavorable for processing a large amount of voice.
Summary of the invention
The present invention is directed to deficiencies of the prior art, propose a kind of speech content extracting method based on cloud platform and device, By speech recognition, audio identification is become the textual form that can repeatedly read, use language model adaptation and acoustics model adaptation to carry High recognition accuracy.And carry out Knowledge Integration, it is to avoid time flower is being read in redundancy.
The present invention is achieved by the following technical solutions:
The present invention relates to a kind of speech content extracting method based on cloud platform, including:
Step 1) gather the audio frequency and video of speech, by the audio and video buffer that collects to PC, and carry out pretreatment;
Step 2) send pretreated audio frequency and video and related data include giving a lecture lantern slide, relevant reading material waits until server;
Step 3) server to receive audio frequency carry out phonetic segmentation, audio frequency by speaker split;
Step 4) carry out automatic speech recognition the audio conversion after segmentation is changed to word, speech recognition uses acoustics self adaptation and language Model adaptation;
Step 5) from the text of speech recognition, extract keyword and generate content notes.
Described collection, is preferably used mike, images the audio frequency and video that first-class equipment collection is given a lecture, utilize wired or wireless network It is cached in PC simultaneously;
Use PC that audio frequency carries out speech enhan-cement and remove noise, and be compressed audio frequency and video processing.
The mode of described phonetic segmentation is that server carries out voice activity detection to the audio frequency received, and carries out according to the pause of voice Cutting;The described speaker that mode is every section of voice of identification splitting voice by speaker, splits audio frequency by speaker.
Described acoustics self adaptation includes the adaptation to playback environ-ment, noise types, speaker's type etc.;
Described language model adaptation includes the adaptation of specialized vocabulary in courseware and relevant reading material.
Described extraction includes: extract keyword relevant with speech content in the text of speech recognition, and according in text every The notes relevant with speech are extracted to the degree of association of speech content.
The present invention relates to a kind of speech content extraction element realizing said method, including: it is used for gathering speech audio frequency and video, will adopt Collect to audio and video buffer in the PC in classroom, and the speech carrying out pretreatment records module, for sending pretreated sound Video and related data include that speech lantern slide, relevant reading material wait until the material sending module of server, for the sound received Frequency carries out phonetic segmentation, audio frequency is split module by the voice that speaker is split, for carrying out automatic speech recognition the sound after segmentation Frequency is converted to word, speech recognition use acoustics self adaptation and the sound identification module of language model adaptation and for server from Word extracts keyword and generates keyword and the content notes extraction module of content notes.
Described speech is recorded module and is used for using mike, imaging the audio frequency and video that first-class equipment collection is given a lecture, and utilizes wired or nothing Gauze network is cached in PC simultaneously, uses PC that audio frequency carries out speech enhan-cement and removes noise, and is compressed audio frequency and video processing.
Described phonetic segmentation, for the audio frequency received is carried out voice activity detection, carries out cutting according to the pause of voice;Described Split voice for identifying the speaker of every section of voice by speaker, split audio frequency by speaker.
Described sound identification module is for using automatic speech recognition to obtain every text corresponding to audio frequency, and described acoustics is adaptive It is applied to the adaptation to playback environ-ment, noise types, speaker's type etc.;Described language model adaptation is for speech magic lantern The adaptation of specialized vocabulary in sheet and relevant reading material.
Described keyword and content notes extraction module in the text extracting speech recognition with key that speech content is relevant Word, and extract and relevant notes of giving a lecture to the degree of association of speech content according in text every.
Technique effect
Compared with prior art, the present invention becomes audio identification by speech recognition the textual form that can repeatedly read, and uses language Model adaptation and acoustics model adaptation improve recognition accuracy.And carry out Knowledge Integration, it is to avoid time flower is being read redundancy letter On breath.
Accompanying drawing explanation
Fig. 1 is the inventive method flow chart;
Fig. 2 is apparatus of the present invention structural representations.
Detailed description of the invention
Embodiment 1
The present embodiment comprises the following steps:
101, gather the audio frequency and video of speech, by the audio and video buffer that collects to PC, and carry out pretreatment;
In present example, gather the audio frequency and video of speech, by the audio and video buffer that collects to PC, and carry out pretreatment Including using mike, imaging the audio frequency and video that first-class equipment collection is given a lecture, utilize wired or wireless network to be cached in PC simultaneously; Use PC that audio frequency carries out speech enhan-cement and remove noise, and be compressed audio frequency and video processing.
102, send pretreated audio frequency and video and related data includes that speech lantern slide, relevant reading material wait until server;
103, server carries out phonetic segmentation, audio frequency is split by speaker the audio frequency received;
In present example, the mode of described phonetic segmentation is that server carries out voice activity detection to the audio frequency received, and presses Pause according to voice carries out cutting;The described speaker that mode is every section of voice of identification splitting voice by speaker, by speaker Segmentation audio frequency.
104, carrying out automatic speech recognition and the audio conversion after segmentation is changed to word, speech recognition uses acoustics self adaptation and language Model adaptation;
In present example, described acoustics self adaptation includes the adaptation to playback environ-ment, noise types, speaker's type etc.; Described language model adaptation includes the adaptation of specialized vocabulary in speech lantern slide and relevant reading material.
105, from the text of speech recognition, extract keyword and generate content notes.
In present example, from the text of speech recognition, from the text of speech recognition, extract keyword and generate content notes Including: extract keyword relevant with speech content in the text of speech recognition, and relevant to speech content according in text every Degree extracts the notes relevant to speech.
Embodiment 2
As in figure 2 it is shown, the data serching device structural representation based on interactive mode input provided for the embodiment of the present invention, this dress Put and include: module 21 material sending module 22 voice segmentation module 23 sound identification module 24 and keyword and content are recorded in speech Notes extraction module 25.
Module 21 is recorded in described speech, is used for gathering speech audio frequency and video, by the PC of the audio and video buffer that collects to classroom In, and carry out pretreatment;
Described speech is recorded module 21 and is used for using mike, imaging the audio frequency and video that first-class equipment collection is given a lecture, and utilizes wired Or wireless network is cached in PC simultaneously, uses PC that audio frequency carries out speech enhan-cement and remove noise, and audio frequency and video are compressed place Reason.
Such as using video camera to record a degree of depth learned lesson, teacher wears clip-on microphone, answer a question The raw wireless microphone that uses, video that caching is recorded and audio frequency, in the PC in classroom, use filter method such as adaptive cancellation method to remove and carry on the back Scape sound such as air conditioner noises, construction noise etc., compression Video & Audio makes file size be suitable for network transmission.
Described material sending module 22, is used for sending pretreated audio frequency and video and related data includes give a lecture lantern slide, phase Close reading material and wait until server.
Specifically, audio frequency and video, degree of depth study lantern slide and degree of depth study reading material after transmission speech enhan-cement, compression wait until Http server.
Described voice segmentation module 23, for carrying out phonetic segmentation, audio frequency being split by speaker to the audio frequency received.
In described voice segmentation module 23, phonetic segmentation is for carrying out voice activity detection, according to voice to the audio frequency received Pause carries out cutting;Split voice for identifying the speaker of every section of voice by speaker, split audio frequency by speaker.
Specifically, detect the part being syncopated as voice according to short-time energy and zero-crossing rate, and extract the i-vector of every section of voice Identify speak artificial teacher and different students.
Described sound identification module 24, is changed to word for carrying out automatic speech recognition the audio conversion after segmentation, and voice is known Shi Yong acoustics self adaptation and language model adaptation.
Described sound identification module 24 is for using automatic speech recognition to obtain every text corresponding to audio frequency, described acoustics Self adaptation is for the adaptation to playback environ-ment, noise types, speaker's type etc.;Described language model adaptation is for speech The adaptation of specialized vocabulary in lantern slide and relevant reading material.
Specifically, during training acoustic model, audio frequency is clustered by i-vector, the audio frequency of each cluster is trained one based on The acoustic model of deep neural network, finds cluster nearest for its i-vector when identifying audio frequency, and uses this acoustic model clustered.
Use mass text to extract the reverse document-frequency of each word, use TF-IDF statistics degree of depth study courseware and extension to read In key word.As for extension read " gradient decline (GD) be a kind of common method minimizing risk function, loss function, It is two kinds of iterative thinkings that stochastic gradient descent and batch gradient decline.Gradient declines in batches---minimize the damage of all training samples Losing function so that finally solve is the optimal solution of the overall situation, the parameter i.e. solved is so that risk function is minimum.Stochastic gradient descent ---minimize the loss function of every sample, although be not the loss function that obtains of each iteration all towards global optimum direction, but Big overall direction to globally optimal solution, final result is often near globally optimal solution.", expansion can be extracted out Open and read the keyword " gradient decline " in reading, " stochastic gradient descent ", " gradient decline in batches ", " loss function " etc., and some are conventional Word is such as " common method ", and " a kind of ", " minimizing " etc. then can be not listed as key word because TF-IDF weights are the lowest.
When using language model based on recurrent neural network to calculate complexity (perplexity) in short, it is assumed that model parameter is θ, then the former computing formula of complexity perplexity is:Wherein: N is this sentence The length of son, for the keyword in this field, then complexity perplexity can be written as:
Work as wiFor the keyword in this field, then q (wi) it is 1, it is otherwise 0.λ is hyper parameter.The method is used to improve right Discrimination in specialized vocabulary.
Keyword and content notes extraction module 25, extract keyword from word for server and generate content notes.
Described keyword and content notes extraction module 25 in the text extracting speech recognition with pass that speech content is relevant Key word, and extract and relevant notes of giving a lecture to the degree of association of speech content according in text every.
In this instance, the text later such as through speech recognition be " for a lot of machine learning algorithms, including linear regression, Logistic regression, neutral net etc., the realization of algorithm is all by showing that certain cost function or certain optimized target are come real Existing, then use the such method of gradient decline to be used as optimized algorithm and try to achieve the minima of cost function.Training set when us Time bigger, gradient descent algorithm then seems that amount of calculation is the biggest in batches.Assume that you have the picture of 10,000,000 cats, carry out once batch Gradient descent algorithm is equivalent to read through this ten million photo, and we need to look for some the most shorter methods to find most cats Characteristic.In this course, I wants to introduce and a kind of declines different methods with batch gradient: stochastic gradient descent.”
Being similar to, analyzed by TF-IDF, we can draw occur seldom tying this section of speech recognition in daily text Occurring more word " gradient decline " in Guo, " stochastic gradient descent ", " neutral net " is as key word, and obtains their TF-IDF Weights.
The weights calculating sentence afterwards are the meansigma methods of each word TF-IDF weights in sentence, and export the sentence work that weights are the highest Take down notes for content, " for a lot of machine learning algorithms, including linear regression, logistic regression, neutral net etc., the realization of algorithm All by showing that certain cost function or certain optimized target realize, then use gradient to decline such method and come The minima of cost function is tried to achieve as optimized algorithm.When our training set is bigger, gradient descent algorithm then seems calculating in batches Measure the biggest.In this course, I wants to introduce and a kind of declines different methods with batch gradient: stochastic gradient descent.”
The device that the embodiment of the present invention provides, is become audio identification by speech recognition the textual form that can repeatedly read, makes term Speech model adaptation and acoustics model adaptation improve recognition accuracy.And carry out Knowledge Integration, it is to avoid time flower is being read redundancy In information.
One of ordinary skill in the art will appreciate that all or part of step realizing above-described embodiment can be completed by hardware, Can also instruct relevant hardware by program to complete, this program can be stored in a kind of computer-readable recording medium, above-mentioned The storage medium mentioned can be read only memory, disk or CD etc..
Above-mentioned be embodied as can by those skilled in the art on the premise of without departing substantially from the principle of the invention and objective in a different manner It is carried out local directed complete set, and protection scope of the present invention is as the criterion with claims and is not limited by above-mentioned being embodied as, in its scope Each interior implementation is all by the constraint of the present invention.

Claims (10)

1. a speech content extracting method based on cloud platform, it is characterised in that including:
Step 1) gather the audio frequency and video of speech, by the audio and video buffer that collects to PC, and carry out pretreatment;
Step 2) send pretreated audio frequency and video and related data include giving a lecture lantern slide, relevant reading material waits until server;
Step 3) server to receive audio frequency carry out phonetic segmentation, audio frequency by speaker split;
Step 4) carry out automatic speech recognition the audio conversion after segmentation is changed to word, speech recognition uses acoustics self adaptation and language Model adaptation;
Step 5) from the text of speech recognition, extract keyword and generate content notes.
Method the most according to claim 1, is characterized in that, described collection includes: uses mike, image first-class setting The standby audio frequency and video gathering speech, utilize wired or wireless network to be cached in PC simultaneously;Use PC that audio frequency is carried out speech enhan-cement Except noise, and it is compressed audio frequency and video processing.
Method the most according to claim 1, is characterized in that, the mode of described phonetic segmentation is the server sound to receiving Frequency carries out voice activity detection, carries out cutting according to the pause of voice;Described splits the mode of voice for identifying every section by speaker The speaker of voice, splits audio frequency by speaker.
Method the most according to claim 1, is characterized in that, described acoustics self adaptation includes playback environ-ment, noise-like The adaptation of type, speaker's type etc.;Described language model adaptation includes specialty word in speech lantern slide and relevant reading material The adaptation converged.
Method the most according to claim 1, is characterized in that, described extraction includes: extract speech recognition text in The keyword that speech content is relevant, and extract and relevant notes of giving a lecture to the degree of association of speech content according in text every.
6. the speech content extraction element realizing method described in any of the above-described claim, it is characterised in that including:
Module is recorded in speech, is used for gathering speech audio frequency and video, by the audio and video buffer that collects to the PC in classroom, and carries out Pretreatment,
Material sending module, be used for sending pretreated audio frequency and video and speech lantern slide, relevant reading material to server,
Voice segmentation module, for receive audio frequency carry out phonetic segmentation, audio frequency by speaker segmentation,
Sound identification module, is changed to word for carrying out automatic speech recognition the audio conversion after segmentation, and speech recognition uses acoustics Self adaptation and language model adaptation,
Keyword and content notes extraction module, extract keyword from word for server and generate content notes.
Device the most according to claim 6, is characterized in that, described speech is recorded module and adopted by mike, photographic head The audio frequency and video of collection speech, utilize wired or wireless network to be cached in PC simultaneously, use PC that audio frequency carries out speech enhan-cement removal and make an uproar Sound, and be compressed audio frequency and video processing.
Device the most according to claim 6, is characterized in that, described phonetic segmentation carries out speech activity to the audio frequency received Detection, carries out cutting according to the pause of voice;Described splits voice for identifying the speaker of every section of voice, normally by speaker Words people splits audio frequency.
Device the most according to claim 6, is characterized in that, described sound identification module is used for using automatic speech recognition Obtaining every text corresponding to audio frequency, described acoustics self adaptation is for playback environ-ment, noise types, the adaptation of speaker's type; Described language model adaptation is for the adaptation of specialized vocabulary in speech lantern slide and relevant reading material.
Device the most according to claim 6, is characterized in that, described keyword and content notes extraction module are used for carrying Take keyword relevant with speech content in the text of speech recognition, and according to the degree of association of in text every with speech content extract with The notes that speech is relevant.
CN201610260647.7A 2016-04-25 2016-04-25 Speech content extraction method and device based on cloud platform Active CN105957531B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610260647.7A CN105957531B (en) 2016-04-25 2016-04-25 Speech content extraction method and device based on cloud platform

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610260647.7A CN105957531B (en) 2016-04-25 2016-04-25 Speech content extraction method and device based on cloud platform

Publications (2)

Publication Number Publication Date
CN105957531A true CN105957531A (en) 2016-09-21
CN105957531B CN105957531B (en) 2019-12-31

Family

ID=56915289

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610260647.7A Active CN105957531B (en) 2016-04-25 2016-04-25 Speech content extraction method and device based on cloud platform

Country Status (1)

Country Link
CN (1) CN105957531B (en)

Cited By (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107818797A (en) * 2017-12-07 2018-03-20 苏州科达科技股份有限公司 Voice quality assessment method, apparatus and its system
CN108022583A (en) * 2017-11-17 2018-05-11 平安科技(深圳)有限公司 Meeting summary generation method, application server and computer-readable recording medium
CN108256512A (en) * 2018-03-22 2018-07-06 长春大学 Listen the raw inclusive education classroom auxiliary system of barrier and device
CN108335693A (en) * 2017-01-17 2018-07-27 腾讯科技(深圳)有限公司 A kind of Language Identification and languages identification equipment
CN108597521A (en) * 2018-05-04 2018-09-28 徐涌 Audio role divides interactive system, method, terminal and the medium with identification word
CN109934188A (en) * 2019-03-19 2019-06-25 上海大学 A kind of lantern slide switching detection method, system, terminal and storage medium
WO2019237708A1 (en) * 2018-06-15 2019-12-19 山东大学 Interpersonal interaction body language automatic generation method and system based on deep learning
WO2020103447A1 (en) * 2018-11-21 2020-05-28 平安科技(深圳)有限公司 Link-type storage method and apparatus for video information, computer device and storage medium
CN111723816A (en) * 2020-06-28 2020-09-29 北京联想软件有限公司 Teaching note acquisition method and electronic equipment
CN111897918A (en) * 2020-07-28 2020-11-06 扬州大学 Online teaching classroom note generation method
CN111932964A (en) * 2020-08-21 2020-11-13 扬州大学 Online live broadcast teaching method
CN112767753A (en) * 2021-01-08 2021-05-07 中国石油大学胜利学院 Supervision type intelligent online teaching system and action method thereof
CN114501112A (en) * 2022-01-24 2022-05-13 北京百度网讯科技有限公司 Method, apparatus, device, medium and product for generating video notes

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1881415A (en) * 2003-08-15 2006-12-20 株式会社东芝 Information processing apparatus and method therefor
CN101923854A (en) * 2010-08-31 2010-12-22 中国科学院计算技术研究所 Interactive speech recognition system and method
CN102280106A (en) * 2010-06-12 2011-12-14 三星电子株式会社 VWS method and apparatus used for mobile communication terminal
CN103165130A (en) * 2013-02-06 2013-06-19 湘潭安道致胜信息科技有限公司 Voice text matching cloud system
CN105159870A (en) * 2015-06-26 2015-12-16 徐信 Processing system for precisely completing continuous natural speech textualization and method for precisely completing continuous natural speech textualization
CN105427858A (en) * 2015-11-06 2016-03-23 科大讯飞股份有限公司 Method and system for achieving automatic voice classification

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1881415A (en) * 2003-08-15 2006-12-20 株式会社东芝 Information processing apparatus and method therefor
CN102280106A (en) * 2010-06-12 2011-12-14 三星电子株式会社 VWS method and apparatus used for mobile communication terminal
CN101923854A (en) * 2010-08-31 2010-12-22 中国科学院计算技术研究所 Interactive speech recognition system and method
CN103165130A (en) * 2013-02-06 2013-06-19 湘潭安道致胜信息科技有限公司 Voice text matching cloud system
CN105159870A (en) * 2015-06-26 2015-12-16 徐信 Processing system for precisely completing continuous natural speech textualization and method for precisely completing continuous natural speech textualization
CN105427858A (en) * 2015-11-06 2016-03-23 科大讯飞股份有限公司 Method and system for achieving automatic voice classification

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108335693B (en) * 2017-01-17 2022-02-25 腾讯科技(深圳)有限公司 Language identification method and language identification equipment
CN108335693A (en) * 2017-01-17 2018-07-27 腾讯科技(深圳)有限公司 A kind of Language Identification and languages identification equipment
CN108022583A (en) * 2017-11-17 2018-05-11 平安科技(深圳)有限公司 Meeting summary generation method, application server and computer-readable recording medium
CN107818797A (en) * 2017-12-07 2018-03-20 苏州科达科技股份有限公司 Voice quality assessment method, apparatus and its system
CN108256512A (en) * 2018-03-22 2018-07-06 长春大学 Listen the raw inclusive education classroom auxiliary system of barrier and device
CN108597521A (en) * 2018-05-04 2018-09-28 徐涌 Audio role divides interactive system, method, terminal and the medium with identification word
WO2019237708A1 (en) * 2018-06-15 2019-12-19 山东大学 Interpersonal interaction body language automatic generation method and system based on deep learning
WO2020103447A1 (en) * 2018-11-21 2020-05-28 平安科技(深圳)有限公司 Link-type storage method and apparatus for video information, computer device and storage medium
CN109934188A (en) * 2019-03-19 2019-06-25 上海大学 A kind of lantern slide switching detection method, system, terminal and storage medium
CN109934188B (en) * 2019-03-19 2020-10-30 上海大学 Slide switching detection method, system, terminal and storage medium
CN111723816A (en) * 2020-06-28 2020-09-29 北京联想软件有限公司 Teaching note acquisition method and electronic equipment
CN111723816B (en) * 2020-06-28 2023-10-27 北京联想软件有限公司 Acquisition method of teaching notes and electronic equipment
CN111897918A (en) * 2020-07-28 2020-11-06 扬州大学 Online teaching classroom note generation method
CN111932964A (en) * 2020-08-21 2020-11-13 扬州大学 Online live broadcast teaching method
CN112767753A (en) * 2021-01-08 2021-05-07 中国石油大学胜利学院 Supervision type intelligent online teaching system and action method thereof
CN112767753B (en) * 2021-01-08 2022-07-22 中国石油大学胜利学院 Supervision type intelligent online teaching system and action method thereof
CN114501112A (en) * 2022-01-24 2022-05-13 北京百度网讯科技有限公司 Method, apparatus, device, medium and product for generating video notes
CN114501112B (en) * 2022-01-24 2024-03-22 北京百度网讯科技有限公司 Method, apparatus, device, medium, and article for generating video notes

Also Published As

Publication number Publication date
CN105957531B (en) 2019-12-31

Similar Documents

Publication Publication Date Title
CN105957531A (en) Speech content extracting method and speech content extracting device based on cloud platform
CN108399923B (en) More human hairs call the turn spokesman's recognition methods and device
CN102436812B (en) Conference recording device and conference recording method using same
CN108305632A (en) A kind of the voice abstract forming method and system of meeting
US20100057452A1 (en) Speech interfaces
CN109192224A (en) A kind of speech evaluating method, device, equipment and readable storage medium storing program for executing
Ding et al. Audio-visual keyword spotting based on multidimensional convolutional neural network
CN110728991B (en) Improved recording equipment identification algorithm
WO2017166483A1 (en) Method and system for processing dynamic picture
CN109712612A (en) A kind of voice keyword detection method and device
CN110970036A (en) Voiceprint recognition method and device, computer storage medium and electronic equipment
CN109452932A (en) A kind of Constitution Identification method and apparatus based on sound
CN116246610A (en) Conference record generation method and system based on multi-mode identification
CN206672635U (en) A kind of voice interaction device based on book service robot
CN108364655A (en) Method of speech processing, medium, device and computing device
CN113053361B (en) Speech recognition method, model training method, device, equipment and medium
CN114330454A (en) Live pig cough sound identification method based on DS evidence theory fusion characteristics
KR20170086233A (en) Method for incremental training of acoustic and language model using life speech and image logs
CN117078094A (en) Teacher comprehensive ability assessment method based on artificial intelligence
Wan Research on speech separation and recognition algorithm based on deep learning
Blunt et al. A model for incorporating an automatic speech recognition system in a noisy educational environment
KR102429365B1 (en) System and method for analyzing emotion of speech
CN112837688B (en) Voice transcription method, device, related system and equipment
CN116524910B (en) Manuscript prefabrication method and system based on microphone
Donai et al. Classification of indexical and segmental features of human speech using low-and high-frequency energy

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20200617

Address after: Room 105G, 199 GuoShoujing Road, Pudong New Area, Shanghai, 200120

Patentee after: Shanghai Jiaotong University Intellectual Property Management Co.,Ltd.

Address before: 200240 Dongchuan Road, Shanghai, No. 800, No.

Patentee before: SHANGHAI JIAO TONG University

TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20201028

Address after: Building 14, Tengfei Innovation Park, no.388 Xinping street, Suzhou Industrial Park, Jiangsu Province, 215000

Patentee after: AI SPEECH Ltd.

Address before: Room 105G, 199 GuoShoujing Road, Pudong New Area, Shanghai, 200120

Patentee before: Shanghai Jiaotong University Intellectual Property Management Co.,Ltd.

TR01 Transfer of patent right
CP01 Change in the name or title of a patent holder

Address after: 215000 building 14, Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou City, Jiangsu Province

Patentee after: Sipic Technology Co.,Ltd.

Address before: 215000 building 14, Tengfei Innovation Park, 388 Xinping street, Suzhou Industrial Park, Suzhou City, Jiangsu Province

Patentee before: AI SPEECH Ltd.

CP01 Change in the name or title of a patent holder