CN103491429A - Audio processing method and audio processing equipment - Google Patents
Audio processing method and audio processing equipment Download PDFInfo
- Publication number
- CN103491429A CN103491429A CN201310397999.3A CN201310397999A CN103491429A CN 103491429 A CN103491429 A CN 103491429A CN 201310397999 A CN201310397999 A CN 201310397999A CN 103491429 A CN103491429 A CN 103491429A
- Authority
- CN
- China
- Prior art keywords
- content
- voice data
- language form
- video
- object language
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Telephonic Communication Services (AREA)
- Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)
Abstract
The invention provides an audio processing method and an audio processing device. The audio processing device comprises a first extracting unit, a recognition unit, a second extracting unit, a conversion unit and a replacement unit, wherein the first extracting unit is used for extracting audio data with the content to be targeted from an audio stream through a mobile terminal; the recognition unit is used for recognizing the text content corresponding to the audio data; the second extracting unit is used for acquiring a preference language of a user as a target language; the conversion unit is used for converting the text content into the text content of a target language format, and the text content of the target language format is the text content described by the target language; the substitution unit is used for converting the text content of the target language format into the audio data of the target language format to replace the audio data to be targeted. By the adoption of the audio processing method and the audio processing device, the audio content can be converted according to the preferences of the user, and the satisfaction degree of the user is improved.
Description
Technical field
The present invention relates to the audio signal processing technique field, particularly a kind of audio-frequency processing method and audio processing equipment.
Background technology
At present, mobile terminal has become very powerful and exceedingly arrogant communication tool, has facilitated people to carry out in real time mobile communication; The introducing of third party application, enriched the function of mobile terminal, enlarged the application of mobile terminal.
When mobile terminal is installed corresponding player, can down-load music, video, then listen to, watch, or listen to music online, or watch video online; Universal along with network, promoted the cultural spreading of various countries, each place; If the strange voice that music, video adopt, the user is difficult to understand or understand, and has greatly limited the scope of application of network, music, video.
Summary of the invention
For this reason, the present invention proposes a kind of audio-frequency processing method and audio processing equipment, can eliminate fully one or more problems that restriction and defect due to prior art cause.
Additional advantages of the present invention, purpose and characteristic, a part will be elucidated in the following description, and another part will be significantly or acquire from enforcement of the present invention by the investigation of the explanation to following for those of ordinary skill in the art.Can realize and obtain the object of the invention and advantage by the structure of pointing out especially in the specification at word and claims and accompanying drawing.
The invention provides a kind of audio processing equipment, it is characterized in that, described audio processing equipment comprises:
The first extraction unit for by mobile terminal, extracts and carries the voice data for the treatment of object content from audio stream;
Recognition unit, for identifying the word content that described voice data is corresponding;
The second extraction unit, for obtaining user's preferred language, using as object language;
Converting unit, for described word content being changed into to the word content of object language form, the word content of the word content of described object language form for adopting object language to describe;
Substituting unit, for the word content by described object language form, be converted to the voice data of object language form, to substitute the described voice data for the treatment of target.
Preferably, described recognition unit utilizes speech recognition technology, identifies the word content that described voice data is corresponding.
Preferably, described audio processing equipment also comprises:
The video extraction unit for by mobile terminal, extracts the video data relevant to captions from video flowing;
The video identification unit, the video data for relevant according to captions, identify caption content;
Preferably, described audio processing equipment also comprises:
The video converting unit, for by described caption content, convert the caption content of object language form to, the caption content of the caption content of described object language form for adopting object language to describe;
The video substituting unit, for the caption content by described object language form, be converted to the video data of object language form, to substitute the described video data relevant to captions.
Preferably, described audio processing equipment also comprises:
The timestamp unit, for obtaining in advance the synchronized timestamp of described voice data and described video data;
Lock unit, for passing through described synchronized timestamp, the voice data of controlling described interpreter language form is synchronizeed with the video data with described interpreter language form.
The present invention also provides a kind of audio-frequency processing method, it is characterized in that, described method comprises:
By mobile terminal, extract the voice data that carries content to be translated from audio stream;
Identify the word content that described voice data is corresponding;
Obtain user's preferred language, using as object language;
Described word content is converted to the word content of object language form, the word content of the word content of described object language form for adopting object language to describe;
By the word content of described object language form, be converted to the voice data of object language form, to substitute described voice data to be converted.
Preferably, utilize speech recognition technology, identify the word content that described voice data is corresponding.
Preferably, described method also comprises:
By mobile terminal, extract the video data relevant to captions from video flowing;
The relevant video data according to captions, identify caption content;
By described caption content, convert the caption content of object language form to, the caption content of the caption content of described object language form for adopting object language to describe;
By the caption content of described object language form, be converted to the video data of object language form, to replace the described video data relevant to captions.
The present invention realizes the audio stream of strange language is converted to the audio stream of preferred language form, with the preferred language rendering content, to the user, has more hommization, also has more versatility.
The accompanying drawing explanation
Fig. 1 is according to the embodiment of the present invention, flow chart audio-frequency processing method.
Fig. 2 is according to the embodiment of the present invention, structural representation audio processing equipment.
Embodiment
Fig. 1 shows according to the embodiment of the present invention, flow chart audio-frequency processing method, and details are as follows for concrete steps:
Step S101 by mobile terminal, extracts the voice data that carries content to be translated from audio stream.
The playout software audio stream plays, comprising voice data in described audio stream, the content that this voice data is being put down in writing background music and recorded.If need, can from audio stream, extract the voice data that carries content to be translated.For example: when the user listens to music by mobile terminal, in order to realize the soundplay with user's appointment by music, at first, extract audio stream from the music file, after the wiping out background music, extract the voice data relevant to voice from audio stream, for example: after the wiping out background music, extract song.
As another embodiment of the present invention, state by mobile terminal, extract the step of the voice data that carries content to be translated from audio stream before, described method also comprises:
Obtain user's preferred language, using as interpreter language.
Described preferred language comprises all parts of the world dialect, global various countries mother tongue.
At first, after getting the user instruction of interpreter language is set, mobile terminal ejects the speech selection dialog box, in the voice hurdle of this dialog box, has listed all category of language that this locality and/or server comprise; The user can be according to preference, choose at least one preferred language, selected preferred language is set to interpreter language, and according to user's preferred selection, preferred sequence is set, for example: Chinese is set to the first interpreter language, Sichuan dialect is set to the second interpreter language, and English is set to the 3rd interpreter language; After confirming that the interpreter language setting completes, when by voice data, corresponding word content is translated into the word content of the first interpreter language, if all do not find the literal pool that the first interpreter language is corresponding in local and server, preferred sequence according to interpreter language, search the literal pool that the second interpreter language is corresponding, if search successfully, according to the literal pool of the second translated speech, by voice data, corresponding word content is translated into the word content of the second interpreter language, and described literal pool comprises word to be translated and the mapping relations of translating word; By that analogy, if do not search successfully, according to the preferred sequence of interpreter language, search successively, when the interpreter language for all, all do not find corresponding literal pool, retain former audio stream to play.
Preferably, when displaying video and/or audio frequency, the user can be according to the preference of oneself, the change interpreter language.Particularly, after getting change directive, call the speech selection dialog box to realize the change of interpreter language.
Preferably, the microphone that can carry by mobile terminal, obtain the voice of user's typing, according to language library, identifies the category of language of the voice of this typing.Using the language that identifies as interpreter language, certainly, also the different language of typing repeatedly, then arrange preferred sequences to all interpreter languages that get.
Step S102, utilize speech recognition technology, identifies the word content that described voice data is corresponding.
By binary voice data typing speech recognition equipment, this speech recognition equipment adopts speech recognition technology, identifies the word content that this voice data is corresponding.
Step S103, translate into described word content the word content of interpreter language form, the word content of the word content of described interpreter language form for adopting interpreter language to describe.
Adopt existing Language Translation software, described word content is translated into to the word content of interpreter language form.
Step S104, by the word content of described interpreter language form, be converted to the voice data of interpreter language form, to replace described voice data to be translated.
The voice data of the voice data of described interpreter language form for adopting interpreter language to record, form.
Corresponding timestamp, the word content of interpreter language form according to the voice data that carries content to be translated of putting down in writing in audio stream, record the voice data of interpreter language again; The voice data of interpreter language form is replaced to the described voice data that carries content to be translated.Particularly, in the situation that it is constant to keep carrying the synchronized timestamp of voice data of content to be translated, the voice data of interpreter language form is replaced to the voice data that carries content to be translated, kept audio stream synchronously to play, realize the transformation of audio speech.
As another embodiment of the present invention, described method also comprises:
By mobile terminal, extract the video data relevant to captions from video flowing;
The relevant video data according to captions, identify caption content;
By described caption content, translate into the caption content of interpreter language form, the caption content of the caption content of described interpreter language form for adopting interpreter language to describe;
By the caption content of described interpreter language form, be converted to the video data of interpreter language form, to replace the described video data relevant to captions.
Mobile terminal is by the video software playing video file, and described video file comprises video flowing and/or audio stream; After getting video flowing, extract the video data relevant to captions from described video flowing, particularly, the video data relevant to captions is the video data that carries the word content that captions comprise, simultaneously, extracts the timestamp of these captions; To be identified go out caption content after, by described caption content, translate into the caption content of interpreter language form; By the caption content of described interpreter language form, be converted to the video data of interpreter language form; Then, according to the timestamp of captions, control the video data of interpreter language form is replaced to the described video data relevant to captions.While replaying the video file after translation, captions will show caption content with the interpreter language form.
As another embodiment of the present invention, described method also comprises:
Obtain in advance the synchronized timestamp of described voice data and described video data;
By described synchronized timestamp, the voice data of controlling described interpreter language form is synchronizeed with the video data with described interpreter language form.
When watching video, in order to translate better and to show, keep video flowing and audio stream synchronous, obtain in advance the synchronized timestamp of voice data and video data, the synchronized timestamp of described voice data and video data comprises: the voice data of the timestamp of voice data, the timestamp of captions, interpreter language form with and the synchronized timestamp of the video data of interpreter language form; By above-mentioned three timestamps, realize following Synchronization Control simultaneously:
By the timestamp of voice data, control the voice data of interpreter language form and replace the voice data that carries content to be translated;
By the timestamp of captions, control the video data of interpreter language form and replace the former video data relevant to captions;
Voice data by the interpreter language form with and the synchronized timestamp of the video data of interpreter language form, the voice data of controlling described interpreter language form is synchronizeed with the video data with described interpreter language form.
The present embodiment provides a kind of audio-frequency processing method of movement-based terminal, when the user uses mobile terminal to listen to, obtain in advance user's preferred language, using as interpreter language, when needs are translated, extract the voice data that carries content to be translated and the timestamp that carries the voice data of content to be translated from audio stream, utilize speech recognition technology, identify word content that described voice data is corresponding to translate into the word content of interpreter language form, word content by described interpreter language form, be converted to the voice data of interpreter language form, to replace described voice data to be translated, more optimizedly, when if the broadcasting media are video, in the translated speech content, extract video data and the synchronized timestamp relevant to captions from video flowing, the voice data of interpreter language form is replaced to described voice data to be translated, the video data of interpreter language form is replaced to the described video data relevant to captions, more optimizedly, by described synchronized timestamp, the voice data of controlling described interpreter language form is synchronizeed with the video data with described interpreter language form, thereby, realize that the audio frequency of strange language and/or video are converted to the preferred language form presents to the user, have more hommization, have more versatility.
embodiment bis-:
Fig. 2 shows the composition structure of the audio processing equipment of the movement-based terminal that the embodiment of the present invention provides, and for convenience of description, only shows the part relevant to the embodiment of the present invention;
The audio processing equipment of described movement-based terminal can be to run on the unit that software unit, hardware cell or software and hardware in mobile terminal device combine, and also can be used as independently suspension member and is integrated in described terminal equipment or runs in the application system of described terminal equipment.
A kind of audio processing equipment of movement-based terminal, the audio processing equipment of described movement-based terminal can comprise extraction unit 21, recognition unit 22, translation unit 23 and replacement unit 24, the concrete function of each functional unit is described below:
The playout software audio stream plays, comprising voice data in described audio stream, the content that this voice data is being put down in writing background music and recorded.If need, can extract by extraction unit 21 voice data that carries content to be translated from audio stream.For example: when the user listens to music by mobile terminal, in order to realize the soundplay with user's appointment by music, at first, extract audio stream from the music file, after the wiping out background music, extraction unit 21 extracts the voice data relevant to voice from audio stream, for example: after the wiping out background music, extract song.
As another embodiment of the present invention, described device also comprises:
Acquiring unit 25, for obtaining user's preferred language, using as interpreter language.
Described preferred language comprises all parts of the world dialect, global various countries mother tongue.
At first, after getting the user instruction of interpreter language is set, acquiring unit 25 ejects the speech selection dialog boxes, in the voice hurdle of this dialog box, has listed all category of language that this locality and/or server comprise; The user can be according to preference, choose at least one preferred language, the selected preferred language of acquiring unit 25 is set to interpreter language, and according to user's preferred selection, preferred sequence is set, for example: acquiring unit 25 Chinese are set to the first interpreter language, Sichuan dialect is set to the second interpreter language, and English is set to the 3rd interpreter language; After confirming that the interpreter language setting completes, when by voice data, corresponding word content is translated into the word content of the first interpreter language, if all do not find the literal pool that the first interpreter language is corresponding in local and server, preferred sequence according to interpreter language, search the literal pool that the second interpreter language is corresponding, if search successfully, according to the literal pool of the second translated speech, by voice data, corresponding word content is translated into the word content of the second interpreter language, and described literal pool comprises word to be translated and the mapping relations of translating word; By that analogy, if do not search successfully, according to the preferred sequence of interpreter language, search successively, when the interpreter language for all, all do not find corresponding literal pool, retain former audio stream to play.
Preferably, when displaying video and/or audio frequency, the user can be according to the preference of oneself, the change interpreter language.Particularly, after getting change directive, acquiring unit 25 calls the speech selection dialog box to realize the change of interpreter language.
Preferably, the microphone that can carry by mobile terminal, obtain the voice of user's typing, according to language library, identifies the category of language of the voice of this typing.Using the language that identifies as interpreter language, certainly, also the different language of typing repeatedly, then arrange preferred sequences to all interpreter languages that get.
Translation unit (that is, converting unit) 23, for described word content being translated into to the word content of interpreter language form, the word content of the word content of described interpreter language form for adopting interpreter language to describe.
The voice data of the voice data of described interpreter language form for adopting interpreter language to record, form.
The timestamp that replacement unit 24 is corresponding according to the voice data that carries content to be translated of putting down in writing in audio stream, the word content of interpreter language form, record the voice data of interpreter language again; Replacement unit 24 is replaced the described voice data that carries content to be translated by the voice data of interpreter language form.Particularly, in the situation that it is constant to keep carrying the synchronized timestamp of voice data of content to be translated, replacement unit 24 is replaced the voice data of interpreter language form the voice data that carries content to be translated, has kept audio stream synchronously to play, and realizes the transformation of audio speech.
As another embodiment of the present invention, described device also comprises:
Mobile terminal is by the video software playing video file, and described video file comprises video flowing and/or audio stream; After getting video flowing, video extraction unit 26 extracts the video data relevant to captions from described video flowing, and particularly, the video data relevant to captions is the video data that carries the word content that captions comprise, simultaneously, extracts the timestamp of these captions; After video identification unit 27 identifies caption content, video translation unit 28, by described caption content, is translated into the caption content of interpreter language form; Video replacing unit 29, by the caption content of described interpreter language form, is converted to the video data of interpreter language form; Then, according to the timestamp of captions, video replacing unit 29 is controlled the video data of interpreter language form is replaced to the described video data relevant to captions.While replaying the video file after translation, captions will show caption content with the interpreter language form.
As another embodiment of the present invention, described device also comprises:
When watching video, in order to translate better and to show, keep video flowing and audio stream synchronous, timestamp unit 30 obtains the synchronized timestamp of voice data and video data in advance, and the synchronized timestamp of described voice data and video data comprises: the voice data of the timestamp of voice data, the timestamp of captions, interpreter language form with and the synchronized timestamp of the video data of interpreter language form; By above-mentioned three timestamps, realize following Synchronization Control simultaneously:
By the timestamp of voice data, replacement unit 24 is controlled the voice data of interpreter language form and is replaced the voice data that carries content to be translated;
By the timestamp of captions, video replacing unit 29 is controlled the video data of interpreter language form and is replaced the former video data relevant to captions;
Voice data by the interpreter language form with and the synchronized timestamp of the video data of interpreter language form, the voice data that lock unit 31 is controlled described interpreter language form is synchronizeed with the video data with described interpreter language form.
Thereby, kept voice or the video reproduction time before and after Language Translation correct.
The present embodiment provides a kind of audio processing equipment of movement-based terminal, when the user uses mobile terminal to listen to, acquiring unit obtains user's preferred language in advance, using as interpreter language, when needs are translated, extraction unit extracts the voice data that carries content to be translated and carries the timestamp of the voice data of content to be translated from audio stream, recognition unit utilizes speech recognition technology, identify word content that described voice data is corresponding to translate into the word content of interpreter language form, translation unit is by the word content of described interpreter language form, be converted to the voice data of interpreter language form, replace described voice data to be translated with replacement unit, more optimizedly, when if the broadcasting media are video, in the translated speech content, the timestamp unit extracts the video data relevant to captions and synchronized timestamp from video flowing, the voice data of interpreter language form is replaced to described voice data to be translated, the video data of interpreter language form is replaced to the described video data relevant to captions, more optimizedly, by described synchronized timestamp, the voice data that lock unit is controlled described interpreter language form with and the video data of described interpreter language form synchronize, thereby, realize that the audio frequency of strange language and/or video are converted to the preferred language form presents to the user, have more hommization, have more versatility.
As one embodiment of the invention, the invention provides a kind of mobile terminal, the audio processing equipment of the movement-based terminal that described mobile terminal is above-mentioned.
Described mobile terminal can for but be not limited to smart mobile phone and IPAD etc.
The embodiment of the present invention provides a kind of audio-frequency processing method and device of movement-based terminal, when the user uses mobile terminal to listen to, obtain in advance user's preferred language, using as interpreter language, when needs are translated, extract the voice data that carries content to be translated and the timestamp that carries the voice data of content to be translated from audio stream, utilize speech recognition technology, identify word content that described voice data is corresponding to translate into the word content of interpreter language form, word content by described interpreter language form, be converted to the voice data of interpreter language form, to replace described voice data to be translated, more optimizedly, when if the broadcasting media are video, in the translated speech content, extract video data and the synchronized timestamp relevant to captions from video flowing, the voice data of interpreter language form is replaced to described voice data to be translated, the video data of interpreter language form is replaced to the described video data relevant to captions, more optimizedly, by described synchronized timestamp, the voice data of controlling described interpreter language form is synchronizeed with the video data with described interpreter language form, thereby, realize that the audio frequency of strange language and/or video are converted to the preferred language form presents to the user, have more hommization, have more versatility.
It will be appreciated by those skilled in the art that the unit that comprises for above-described embodiment two is just divided according to function logic, but be not limited to above-mentioned division, as long as can realize corresponding function; In addition, the concrete title of each functional unit also, just for the ease of mutual differentiation, is not limited to protection scope of the present invention.
One of ordinary skill in the art will appreciate that all or part of step realized in above-described embodiment method is to come the hardware that instruction is relevant to complete by program, described program can be in being stored in a computer read/write memory medium, described storage medium, as ROM/RAM, disk, CD etc.
Above content is only preferred embodiment of the present invention, for those of ordinary skill in the art, according to thought of the present invention, all will change in specific embodiments and applications, and this description should not be construed as limitation of the present invention.
Claims (8)
1. an audio processing equipment, is characterized in that, described audio processing equipment comprises:
The first extraction unit for by mobile terminal, extracts and carries the voice data for the treatment of object content from audio stream;
Recognition unit, for identifying the word content that described voice data is corresponding;
The second extraction unit, for obtaining user's preferred language, using as object language;
Converting unit, for described word content being changed into to the word content of object language form, the word content of the word content of described object language form for adopting object language to describe;
Substituting unit, for the word content by described object language form, be converted to the voice data of object language form, to substitute the described voice data for the treatment of target.
2. audio processing equipment according to claim 1, is characterized in that, described recognition unit utilizes speech recognition technology, identifies the word content that described voice data is corresponding.
3. audio processing equipment according to claim 1, is characterized in that, described audio processing equipment also comprises:
The video extraction unit for by mobile terminal, extracts the video data relevant to captions from video flowing;
The video identification unit, the video data for relevant according to captions, identify caption content.
4. audio processing equipment according to claim 3, is characterized in that, described audio processing equipment also comprises:
The video converting unit, for by described caption content, convert the caption content of object language form to, the caption content of the caption content of described object language form for adopting object language to describe;
The video substituting unit, for the caption content by described object language form, be converted to the video data of object language form, to substitute the described video data relevant to captions.
5. according to the described audio processing equipment of any one in claim 1-4, it is characterized in that, described audio processing equipment also comprises:
The timestamp unit, for obtaining in advance the synchronized timestamp of described voice data and described video data;
Lock unit, for passing through described synchronized timestamp, the voice data of controlling described interpreter language form is synchronizeed with the video data with described interpreter language form.
6. an audio-frequency processing method, is characterized in that, described method comprises:
By mobile terminal, extract the voice data that carries content to be translated from audio stream;
Identify the word content that described voice data is corresponding;
Obtain user's preferred language, using as object language;
Described word content is converted to the word content of object language form, the word content of the word content of described object language form for adopting object language to describe;
By the word content of described object language form, be converted to the voice data of object language form, to substitute described voice data to be converted.
7. audio-frequency processing method according to claim 6, is characterized in that, utilizes speech recognition technology, identifies the word content that described voice data is corresponding.
8. audio-frequency processing method according to claim 6, is characterized in that, described method also comprises:
By mobile terminal, extract the video data relevant to captions from video flowing;
The relevant video data according to captions, identify caption content;
By described caption content, convert the caption content of object language form to, the caption content of the caption content of described object language form for adopting object language to describe;
By the caption content of described object language form, be converted to the video data of object language form, to replace the described video data relevant to captions.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310397999.3A CN103491429A (en) | 2013-09-04 | 2013-09-04 | Audio processing method and audio processing equipment |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201310397999.3A CN103491429A (en) | 2013-09-04 | 2013-09-04 | Audio processing method and audio processing equipment |
Publications (1)
Publication Number | Publication Date |
---|---|
CN103491429A true CN103491429A (en) | 2014-01-01 |
Family
ID=49831341
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201310397999.3A Pending CN103491429A (en) | 2013-09-04 | 2013-09-04 | Audio processing method and audio processing equipment |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN103491429A (en) |
Cited By (16)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103997657A (en) * | 2014-06-06 | 2014-08-20 | 福建天晴数码有限公司 | Converting method and device of audio in video |
CN105244026A (en) * | 2015-08-24 | 2016-01-13 | 陈娟 | Voice processing method and device |
CN105609106A (en) * | 2015-12-16 | 2016-05-25 | 魅族科技(中国)有限公司 | Event recording document generation method and apparatus |
CN105828101A (en) * | 2016-03-29 | 2016-08-03 | 北京小米移动软件有限公司 | Method and device for generation of subtitles files |
CN105917405A (en) * | 2014-01-17 | 2016-08-31 | 微软技术许可有限责任公司 | Incorporating an exogenous large-vocabulary model into rule-based speech recognition |
CN106791913A (en) * | 2016-12-30 | 2017-05-31 | 深圳市九洲电器有限公司 | Digital television program simultaneous interpretation output intent and system |
CN109274900A (en) * | 2018-09-05 | 2019-01-25 | 浙江工业大学 | A kind of video dubbing method |
CN109830239A (en) * | 2017-11-21 | 2019-05-31 | 群光电子股份有限公司 | Voice processing apparatus, voice recognition input systems and voice recognition input method |
WO2019205870A1 (en) * | 2018-04-24 | 2019-10-31 | 腾讯科技(深圳)有限公司 | Video stream processing method, apparatus, computer device, and storage medium |
CN110767233A (en) * | 2019-10-30 | 2020-02-07 | 合肥名阳信息技术有限公司 | Voice conversion system and method |
US10749989B2 (en) | 2014-04-01 | 2020-08-18 | Microsoft Technology Licensing Llc | Hybrid client/server architecture for parallel processing |
CN111787155A (en) * | 2020-06-30 | 2020-10-16 | 深圳传音控股股份有限公司 | Audio data processing method, terminal device and medium |
CN111800543A (en) * | 2020-06-30 | 2020-10-20 | 深圳传音控股股份有限公司 | Audio file processing method, terminal device and storage medium |
US10885918B2 (en) | 2013-09-19 | 2021-01-05 | Microsoft Technology Licensing, Llc | Speech recognition using phoneme matching |
CN112786025A (en) * | 2020-12-28 | 2021-05-11 | 腾讯音乐娱乐科技(深圳)有限公司 | Method for determining lyric timestamp information and training method of acoustic model |
WO2022000829A1 (en) * | 2020-06-30 | 2022-01-06 | 深圳传音控股股份有限公司 | Audio data processing method, terminal device, and computer-readable storage medium |
-
2013
- 2013-09-04 CN CN201310397999.3A patent/CN103491429A/en active Pending
Cited By (24)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10885918B2 (en) | 2013-09-19 | 2021-01-05 | Microsoft Technology Licensing, Llc | Speech recognition using phoneme matching |
CN105917405A (en) * | 2014-01-17 | 2016-08-31 | 微软技术许可有限责任公司 | Incorporating an exogenous large-vocabulary model into rule-based speech recognition |
CN105917405B (en) * | 2014-01-17 | 2019-11-05 | 微软技术许可有限责任公司 | Merging of the exogenous large vocabulary model to rule-based speech recognition |
US10311878B2 (en) | 2014-01-17 | 2019-06-04 | Microsoft Technology Licensing, Llc | Incorporating an exogenous large-vocabulary model into rule-based speech recognition |
US10749989B2 (en) | 2014-04-01 | 2020-08-18 | Microsoft Technology Licensing Llc | Hybrid client/server architecture for parallel processing |
CN103997657A (en) * | 2014-06-06 | 2014-08-20 | 福建天晴数码有限公司 | Converting method and device of audio in video |
CN105244026B (en) * | 2015-08-24 | 2019-09-20 | 北京意匠文枢科技有限公司 | A kind of method of speech processing and device |
CN105244026A (en) * | 2015-08-24 | 2016-01-13 | 陈娟 | Voice processing method and device |
CN105609106A (en) * | 2015-12-16 | 2016-05-25 | 魅族科技(中国)有限公司 | Event recording document generation method and apparatus |
CN105828101A (en) * | 2016-03-29 | 2016-08-03 | 北京小米移动软件有限公司 | Method and device for generation of subtitles files |
CN105828101B (en) * | 2016-03-29 | 2019-03-08 | 北京小米移动软件有限公司 | Generate the method and device of subtitle file |
WO2018121001A1 (en) * | 2016-12-30 | 2018-07-05 | 深圳市九洲电器有限公司 | Method and system for outputting simultaneous interpretation of digital television program, and smart terminal |
CN106791913A (en) * | 2016-12-30 | 2017-05-31 | 深圳市九洲电器有限公司 | Digital television program simultaneous interpretation output intent and system |
CN109830239A (en) * | 2017-11-21 | 2019-05-31 | 群光电子股份有限公司 | Voice processing apparatus, voice recognition input systems and voice recognition input method |
CN109830239B (en) * | 2017-11-21 | 2021-07-06 | 群光电子股份有限公司 | Speech processing device, speech recognition input system, and speech recognition input method |
WO2019205870A1 (en) * | 2018-04-24 | 2019-10-31 | 腾讯科技(深圳)有限公司 | Video stream processing method, apparatus, computer device, and storage medium |
US11252444B2 (en) | 2018-04-24 | 2022-02-15 | Tencent Technology (Shenzhen) Company Limited | Video stream processing method, computer device, and storage medium |
CN109274900A (en) * | 2018-09-05 | 2019-01-25 | 浙江工业大学 | A kind of video dubbing method |
CN110767233A (en) * | 2019-10-30 | 2020-02-07 | 合肥名阳信息技术有限公司 | Voice conversion system and method |
CN111787155A (en) * | 2020-06-30 | 2020-10-16 | 深圳传音控股股份有限公司 | Audio data processing method, terminal device and medium |
CN111800543A (en) * | 2020-06-30 | 2020-10-20 | 深圳传音控股股份有限公司 | Audio file processing method, terminal device and storage medium |
WO2022000829A1 (en) * | 2020-06-30 | 2022-01-06 | 深圳传音控股股份有限公司 | Audio data processing method, terminal device, and computer-readable storage medium |
CN112786025A (en) * | 2020-12-28 | 2021-05-11 | 腾讯音乐娱乐科技(深圳)有限公司 | Method for determining lyric timestamp information and training method of acoustic model |
CN112786025B (en) * | 2020-12-28 | 2023-11-14 | 腾讯音乐娱乐科技(深圳)有限公司 | Method for determining lyric timestamp information and training method of acoustic model |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN103491429A (en) | Audio processing method and audio processing equipment | |
CN103226947B (en) | A kind of audio-frequency processing method based on mobile terminal and device | |
CN105245917B (en) | A kind of system and method for multi-media voice subtitle generation | |
US9799375B2 (en) | Method and device for adjusting playback progress of video file | |
CN110035326A (en) | Subtitle generation, the video retrieval method based on subtitle, device and electronic equipment | |
US10529340B2 (en) | Voiceprint registration method, server and storage medium | |
CN104252861A (en) | Video voice conversion method, video voice conversion device and server | |
CN105704538A (en) | Method and system for generating audio and video subtitles | |
US20140372100A1 (en) | Translation system comprising display apparatus and server and display apparatus controlling method | |
CN107644637B (en) | Phoneme synthesizing method and device | |
CN102568478A (en) | Video play control method and system based on voice recognition | |
CN103067775A (en) | Subtitle display method for audio/video terminal, audio/video terminal and server | |
WO2014141054A1 (en) | Method, apparatus and system for regenerating voice intonation in automatically dubbed videos | |
CN104078044A (en) | Mobile terminal and sound recording search method and device of mobile terminal | |
CN105635782A (en) | Subtitle output method and device | |
CN110781328A (en) | Video generation method, system, device and storage medium based on voice recognition | |
CN105489072A (en) | Method for the determination of supplementary content in an electronic device | |
CN111050201A (en) | Data processing method and device, electronic equipment and storage medium | |
CN105224581A (en) | The method and apparatus of picture is presented when playing music | |
US11714973B2 (en) | Methods and systems for control of content in an alternate language or accent | |
US9905221B2 (en) | Automatic generation of a database for speech recognition from video captions | |
Pleva et al. | TUKE-BNews-SK: Slovak Broadcast News Corpus Construction and Evaluation. | |
KR20150088564A (en) | E-Book Apparatus Capable of Playing Animation on the Basis of Voice Recognition and Method thereof | |
CN110324702A (en) | Information-pushing method and device in video display process | |
CN102955809A (en) | Method and system for editing and playing media files |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20140101 |
|
WD01 | Invention patent application deemed withdrawn after publication |