CN103491429A - Audio processing method and audio processing equipment - Google Patents

Audio processing method and audio processing equipment Download PDF

Info

Publication number
CN103491429A
CN103491429A CN201310397999.3A CN201310397999A CN103491429A CN 103491429 A CN103491429 A CN 103491429A CN 201310397999 A CN201310397999 A CN 201310397999A CN 103491429 A CN103491429 A CN 103491429A
Authority
CN
China
Prior art keywords
content
voice data
language form
video
object language
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201310397999.3A
Other languages
Chinese (zh)
Inventor
黄家旺
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhangjiagang Free Trade Zone Runtong Electronic Technology R & D Co Ltd
Original Assignee
Zhangjiagang Free Trade Zone Runtong Electronic Technology R & D Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhangjiagang Free Trade Zone Runtong Electronic Technology R & D Co Ltd filed Critical Zhangjiagang Free Trade Zone Runtong Electronic Technology R & D Co Ltd
Priority to CN201310397999.3A priority Critical patent/CN103491429A/en
Publication of CN103491429A publication Critical patent/CN103491429A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Telephonic Communication Services (AREA)
  • Two-Way Televisions, Distribution Of Moving Picture Or The Like (AREA)

Abstract

The invention provides an audio processing method and an audio processing device. The audio processing device comprises a first extracting unit, a recognition unit, a second extracting unit, a conversion unit and a replacement unit, wherein the first extracting unit is used for extracting audio data with the content to be targeted from an audio stream through a mobile terminal; the recognition unit is used for recognizing the text content corresponding to the audio data; the second extracting unit is used for acquiring a preference language of a user as a target language; the conversion unit is used for converting the text content into the text content of a target language format, and the text content of the target language format is the text content described by the target language; the substitution unit is used for converting the text content of the target language format into the audio data of the target language format to replace the audio data to be targeted. By the adoption of the audio processing method and the audio processing device, the audio content can be converted according to the preferences of the user, and the satisfaction degree of the user is improved.

Description

A kind of audio-frequency processing method and audio processing equipment
Technical field
The present invention relates to the audio signal processing technique field, particularly a kind of audio-frequency processing method and audio processing equipment.
Background technology
At present, mobile terminal has become very powerful and exceedingly arrogant communication tool, has facilitated people to carry out in real time mobile communication; The introducing of third party application, enriched the function of mobile terminal, enlarged the application of mobile terminal.
When mobile terminal is installed corresponding player, can down-load music, video, then listen to, watch, or listen to music online, or watch video online; Universal along with network, promoted the cultural spreading of various countries, each place; If the strange voice that music, video adopt, the user is difficult to understand or understand, and has greatly limited the scope of application of network, music, video.
Summary of the invention
For this reason, the present invention proposes a kind of audio-frequency processing method and audio processing equipment, can eliminate fully one or more problems that restriction and defect due to prior art cause.
Additional advantages of the present invention, purpose and characteristic, a part will be elucidated in the following description, and another part will be significantly or acquire from enforcement of the present invention by the investigation of the explanation to following for those of ordinary skill in the art.Can realize and obtain the object of the invention and advantage by the structure of pointing out especially in the specification at word and claims and accompanying drawing.
The invention provides a kind of audio processing equipment, it is characterized in that, described audio processing equipment comprises:
The first extraction unit for by mobile terminal, extracts and carries the voice data for the treatment of object content from audio stream;
Recognition unit, for identifying the word content that described voice data is corresponding;
The second extraction unit, for obtaining user's preferred language, using as object language;
Converting unit, for described word content being changed into to the word content of object language form, the word content of the word content of described object language form for adopting object language to describe;
Substituting unit, for the word content by described object language form, be converted to the voice data of object language form, to substitute the described voice data for the treatment of target.
Preferably, described recognition unit utilizes speech recognition technology, identifies the word content that described voice data is corresponding.
Preferably, described audio processing equipment also comprises:
The video extraction unit for by mobile terminal, extracts the video data relevant to captions from video flowing;
The video identification unit, the video data for relevant according to captions, identify caption content;
Preferably, described audio processing equipment also comprises:
The video converting unit, for by described caption content, convert the caption content of object language form to, the caption content of the caption content of described object language form for adopting object language to describe;
The video substituting unit, for the caption content by described object language form, be converted to the video data of object language form, to substitute the described video data relevant to captions.
Preferably, described audio processing equipment also comprises:
The timestamp unit, for obtaining in advance the synchronized timestamp of described voice data and described video data;
Lock unit, for passing through described synchronized timestamp, the voice data of controlling described interpreter language form is synchronizeed with the video data with described interpreter language form.
The present invention also provides a kind of audio-frequency processing method, it is characterized in that, described method comprises:
By mobile terminal, extract the voice data that carries content to be translated from audio stream;
Identify the word content that described voice data is corresponding;
Obtain user's preferred language, using as object language;
Described word content is converted to the word content of object language form, the word content of the word content of described object language form for adopting object language to describe;
By the word content of described object language form, be converted to the voice data of object language form, to substitute described voice data to be converted.
Preferably, utilize speech recognition technology, identify the word content that described voice data is corresponding.
Preferably, described method also comprises:
By mobile terminal, extract the video data relevant to captions from video flowing;
The relevant video data according to captions, identify caption content;
By described caption content, convert the caption content of object language form to, the caption content of the caption content of described object language form for adopting object language to describe;
By the caption content of described object language form, be converted to the video data of object language form, to replace the described video data relevant to captions.
The present invention realizes the audio stream of strange language is converted to the audio stream of preferred language form, with the preferred language rendering content, to the user, has more hommization, also has more versatility.
The accompanying drawing explanation
Fig. 1 is according to the embodiment of the present invention, flow chart audio-frequency processing method.
Fig. 2 is according to the embodiment of the present invention, structural representation audio processing equipment.
Embodiment
Fig. 1 shows according to the embodiment of the present invention, flow chart audio-frequency processing method, and details are as follows for concrete steps:
Step S101 by mobile terminal, extracts the voice data that carries content to be translated from audio stream.
The playout software audio stream plays, comprising voice data in described audio stream, the content that this voice data is being put down in writing background music and recorded.If need, can from audio stream, extract the voice data that carries content to be translated.For example: when the user listens to music by mobile terminal, in order to realize the soundplay with user's appointment by music, at first, extract audio stream from the music file, after the wiping out background music, extract the voice data relevant to voice from audio stream, for example: after the wiping out background music, extract song.
As another embodiment of the present invention, state by mobile terminal, extract the step of the voice data that carries content to be translated from audio stream before, described method also comprises:
Obtain user's preferred language, using as interpreter language.
Described preferred language comprises all parts of the world dialect, global various countries mother tongue.
At first, after getting the user instruction of interpreter language is set, mobile terminal ejects the speech selection dialog box, in the voice hurdle of this dialog box, has listed all category of language that this locality and/or server comprise; The user can be according to preference, choose at least one preferred language, selected preferred language is set to interpreter language, and according to user's preferred selection, preferred sequence is set, for example: Chinese is set to the first interpreter language, Sichuan dialect is set to the second interpreter language, and English is set to the 3rd interpreter language; After confirming that the interpreter language setting completes, when by voice data, corresponding word content is translated into the word content of the first interpreter language, if all do not find the literal pool that the first interpreter language is corresponding in local and server, preferred sequence according to interpreter language, search the literal pool that the second interpreter language is corresponding, if search successfully, according to the literal pool of the second translated speech, by voice data, corresponding word content is translated into the word content of the second interpreter language, and described literal pool comprises word to be translated and the mapping relations of translating word; By that analogy, if do not search successfully, according to the preferred sequence of interpreter language, search successively, when the interpreter language for all, all do not find corresponding literal pool, retain former audio stream to play.
Preferably, when displaying video and/or audio frequency, the user can be according to the preference of oneself, the change interpreter language.Particularly, after getting change directive, call the speech selection dialog box to realize the change of interpreter language.
Preferably, the microphone that can carry by mobile terminal, obtain the voice of user's typing, according to language library, identifies the category of language of the voice of this typing.Using the language that identifies as interpreter language, certainly, also the different language of typing repeatedly, then arrange preferred sequences to all interpreter languages that get.
Step S102, utilize speech recognition technology, identifies the word content that described voice data is corresponding.
By binary voice data typing speech recognition equipment, this speech recognition equipment adopts speech recognition technology, identifies the word content that this voice data is corresponding.
Step S103, translate into described word content the word content of interpreter language form, the word content of the word content of described interpreter language form for adopting interpreter language to describe.
Adopt existing Language Translation software, described word content is translated into to the word content of interpreter language form.
Step S104, by the word content of described interpreter language form, be converted to the voice data of interpreter language form, to replace described voice data to be translated.
The voice data of the voice data of described interpreter language form for adopting interpreter language to record, form.
Corresponding timestamp, the word content of interpreter language form according to the voice data that carries content to be translated of putting down in writing in audio stream, record the voice data of interpreter language again; The voice data of interpreter language form is replaced to the described voice data that carries content to be translated.Particularly, in the situation that it is constant to keep carrying the synchronized timestamp of voice data of content to be translated, the voice data of interpreter language form is replaced to the voice data that carries content to be translated, kept audio stream synchronously to play, realize the transformation of audio speech.
As another embodiment of the present invention, described method also comprises:
By mobile terminal, extract the video data relevant to captions from video flowing;
The relevant video data according to captions, identify caption content;
By described caption content, translate into the caption content of interpreter language form, the caption content of the caption content of described interpreter language form for adopting interpreter language to describe;
By the caption content of described interpreter language form, be converted to the video data of interpreter language form, to replace the described video data relevant to captions.
Mobile terminal is by the video software playing video file, and described video file comprises video flowing and/or audio stream; After getting video flowing, extract the video data relevant to captions from described video flowing, particularly, the video data relevant to captions is the video data that carries the word content that captions comprise, simultaneously, extracts the timestamp of these captions; To be identified go out caption content after, by described caption content, translate into the caption content of interpreter language form; By the caption content of described interpreter language form, be converted to the video data of interpreter language form; Then, according to the timestamp of captions, control the video data of interpreter language form is replaced to the described video data relevant to captions.While replaying the video file after translation, captions will show caption content with the interpreter language form.
As another embodiment of the present invention, described method also comprises:
Obtain in advance the synchronized timestamp of described voice data and described video data;
By described synchronized timestamp, the voice data of controlling described interpreter language form is synchronizeed with the video data with described interpreter language form.
When watching video, in order to translate better and to show, keep video flowing and audio stream synchronous, obtain in advance the synchronized timestamp of voice data and video data, the synchronized timestamp of described voice data and video data comprises: the voice data of the timestamp of voice data, the timestamp of captions, interpreter language form with and the synchronized timestamp of the video data of interpreter language form; By above-mentioned three timestamps, realize following Synchronization Control simultaneously:
By the timestamp of voice data, control the voice data of interpreter language form and replace the voice data that carries content to be translated;
By the timestamp of captions, control the video data of interpreter language form and replace the former video data relevant to captions;
Voice data by the interpreter language form with and the synchronized timestamp of the video data of interpreter language form, the voice data of controlling described interpreter language form is synchronizeed with the video data with described interpreter language form.
The present embodiment provides a kind of audio-frequency processing method of movement-based terminal, when the user uses mobile terminal to listen to, obtain in advance user's preferred language, using as interpreter language, when needs are translated, extract the voice data that carries content to be translated and the timestamp that carries the voice data of content to be translated from audio stream, utilize speech recognition technology, identify word content that described voice data is corresponding to translate into the word content of interpreter language form, word content by described interpreter language form, be converted to the voice data of interpreter language form, to replace described voice data to be translated, more optimizedly, when if the broadcasting media are video, in the translated speech content, extract video data and the synchronized timestamp relevant to captions from video flowing, the voice data of interpreter language form is replaced to described voice data to be translated, the video data of interpreter language form is replaced to the described video data relevant to captions, more optimizedly, by described synchronized timestamp, the voice data of controlling described interpreter language form is synchronizeed with the video data with described interpreter language form, thereby, realize that the audio frequency of strange language and/or video are converted to the preferred language form presents to the user, have more hommization, have more versatility.
embodiment bis-:
Fig. 2 shows the composition structure of the audio processing equipment of the movement-based terminal that the embodiment of the present invention provides, and for convenience of description, only shows the part relevant to the embodiment of the present invention;
The audio processing equipment of described movement-based terminal can be to run on the unit that software unit, hardware cell or software and hardware in mobile terminal device combine, and also can be used as independently suspension member and is integrated in described terminal equipment or runs in the application system of described terminal equipment.
A kind of audio processing equipment of movement-based terminal, the audio processing equipment of described movement-based terminal can comprise extraction unit 21, recognition unit 22, translation unit 23 and replacement unit 24, the concrete function of each functional unit is described below:
Extraction unit 21 for by mobile terminal, extracts the voice data that carries content to be translated from audio stream.
The playout software audio stream plays, comprising voice data in described audio stream, the content that this voice data is being put down in writing background music and recorded.If need, can extract by extraction unit 21 voice data that carries content to be translated from audio stream.For example: when the user listens to music by mobile terminal, in order to realize the soundplay with user's appointment by music, at first, extract audio stream from the music file, after the wiping out background music, extraction unit 21 extracts the voice data relevant to voice from audio stream, for example: after the wiping out background music, extract song.
As another embodiment of the present invention, described device also comprises:
Acquiring unit 25, for obtaining user's preferred language, using as interpreter language.
Described preferred language comprises all parts of the world dialect, global various countries mother tongue.
At first, after getting the user instruction of interpreter language is set, acquiring unit 25 ejects the speech selection dialog boxes, in the voice hurdle of this dialog box, has listed all category of language that this locality and/or server comprise; The user can be according to preference, choose at least one preferred language, the selected preferred language of acquiring unit 25 is set to interpreter language, and according to user's preferred selection, preferred sequence is set, for example: acquiring unit 25 Chinese are set to the first interpreter language, Sichuan dialect is set to the second interpreter language, and English is set to the 3rd interpreter language; After confirming that the interpreter language setting completes, when by voice data, corresponding word content is translated into the word content of the first interpreter language, if all do not find the literal pool that the first interpreter language is corresponding in local and server, preferred sequence according to interpreter language, search the literal pool that the second interpreter language is corresponding, if search successfully, according to the literal pool of the second translated speech, by voice data, corresponding word content is translated into the word content of the second interpreter language, and described literal pool comprises word to be translated and the mapping relations of translating word; By that analogy, if do not search successfully, according to the preferred sequence of interpreter language, search successively, when the interpreter language for all, all do not find corresponding literal pool, retain former audio stream to play.
Preferably, when displaying video and/or audio frequency, the user can be according to the preference of oneself, the change interpreter language.Particularly, after getting change directive, acquiring unit 25 calls the speech selection dialog box to realize the change of interpreter language.
Preferably, the microphone that can carry by mobile terminal, obtain the voice of user's typing, according to language library, identifies the category of language of the voice of this typing.Using the language that identifies as interpreter language, certainly, also the different language of typing repeatedly, then arrange preferred sequences to all interpreter languages that get.
Recognition unit 22, for utilizing speech recognition technology, identify the word content that described voice data is corresponding.
Recognition unit 22 is by binary voice data typing speech recognition equipment, and this speech recognition equipment adopts speech recognition technology, identifies the word content that this voice data is corresponding.
Translation unit (that is, converting unit) 23, for described word content being translated into to the word content of interpreter language form, the word content of the word content of described interpreter language form for adopting interpreter language to describe.
Translation unit 23 adopts existing Language Translation software, described word content is translated into to the word content of interpreter language form.
Replacement unit 24, for the word content by described interpreter language form, be converted to the voice data of interpreter language form, to replace described voice data to be translated.
The voice data of the voice data of described interpreter language form for adopting interpreter language to record, form.
The timestamp that replacement unit 24 is corresponding according to the voice data that carries content to be translated of putting down in writing in audio stream, the word content of interpreter language form, record the voice data of interpreter language again; Replacement unit 24 is replaced the described voice data that carries content to be translated by the voice data of interpreter language form.Particularly, in the situation that it is constant to keep carrying the synchronized timestamp of voice data of content to be translated, replacement unit 24 is replaced the voice data of interpreter language form the voice data that carries content to be translated, has kept audio stream synchronously to play, and realizes the transformation of audio speech.
As another embodiment of the present invention, described device also comprises:
Video extraction unit 26 for by mobile terminal, extracts the video data relevant to captions from video flowing;
Video identification unit 27, the video data for relevant according to captions, identify caption content;
Video translation unit 28, for by described caption content, translate into the caption content of interpreter language form, the caption content of the caption content of described interpreter language form for adopting interpreter language to describe;
Video replacing unit 29, for the caption content by described interpreter language form, be converted to the video data of interpreter language form, to replace the described video data relevant to captions.
Mobile terminal is by the video software playing video file, and described video file comprises video flowing and/or audio stream; After getting video flowing, video extraction unit 26 extracts the video data relevant to captions from described video flowing, and particularly, the video data relevant to captions is the video data that carries the word content that captions comprise, simultaneously, extracts the timestamp of these captions; After video identification unit 27 identifies caption content, video translation unit 28, by described caption content, is translated into the caption content of interpreter language form; Video replacing unit 29, by the caption content of described interpreter language form, is converted to the video data of interpreter language form; Then, according to the timestamp of captions, video replacing unit 29 is controlled the video data of interpreter language form is replaced to the described video data relevant to captions.While replaying the video file after translation, captions will show caption content with the interpreter language form.
As another embodiment of the present invention, described device also comprises:
Timestamp unit 30, for obtaining in advance the synchronized timestamp of described voice data and described video data;
Lock unit 31, for passing through described synchronized timestamp, the voice data of controlling described interpreter language form is synchronizeed with the video data with described interpreter language form.
When watching video, in order to translate better and to show, keep video flowing and audio stream synchronous, timestamp unit 30 obtains the synchronized timestamp of voice data and video data in advance, and the synchronized timestamp of described voice data and video data comprises: the voice data of the timestamp of voice data, the timestamp of captions, interpreter language form with and the synchronized timestamp of the video data of interpreter language form; By above-mentioned three timestamps, realize following Synchronization Control simultaneously:
By the timestamp of voice data, replacement unit 24 is controlled the voice data of interpreter language form and is replaced the voice data that carries content to be translated;
By the timestamp of captions, video replacing unit 29 is controlled the video data of interpreter language form and is replaced the former video data relevant to captions;
Voice data by the interpreter language form with and the synchronized timestamp of the video data of interpreter language form, the voice data that lock unit 31 is controlled described interpreter language form is synchronizeed with the video data with described interpreter language form.
Thereby, kept voice or the video reproduction time before and after Language Translation correct.
The present embodiment provides a kind of audio processing equipment of movement-based terminal, when the user uses mobile terminal to listen to, acquiring unit obtains user's preferred language in advance, using as interpreter language, when needs are translated, extraction unit extracts the voice data that carries content to be translated and carries the timestamp of the voice data of content to be translated from audio stream, recognition unit utilizes speech recognition technology, identify word content that described voice data is corresponding to translate into the word content of interpreter language form, translation unit is by the word content of described interpreter language form, be converted to the voice data of interpreter language form, replace described voice data to be translated with replacement unit, more optimizedly, when if the broadcasting media are video, in the translated speech content, the timestamp unit extracts the video data relevant to captions and synchronized timestamp from video flowing, the voice data of interpreter language form is replaced to described voice data to be translated, the video data of interpreter language form is replaced to the described video data relevant to captions, more optimizedly, by described synchronized timestamp, the voice data that lock unit is controlled described interpreter language form with and the video data of described interpreter language form synchronize, thereby, realize that the audio frequency of strange language and/or video are converted to the preferred language form presents to the user, have more hommization, have more versatility.
As one embodiment of the invention, the invention provides a kind of mobile terminal, the audio processing equipment of the movement-based terminal that described mobile terminal is above-mentioned.
Described mobile terminal can for but be not limited to smart mobile phone and IPAD etc.
The embodiment of the present invention provides a kind of audio-frequency processing method and device of movement-based terminal, when the user uses mobile terminal to listen to, obtain in advance user's preferred language, using as interpreter language, when needs are translated, extract the voice data that carries content to be translated and the timestamp that carries the voice data of content to be translated from audio stream, utilize speech recognition technology, identify word content that described voice data is corresponding to translate into the word content of interpreter language form, word content by described interpreter language form, be converted to the voice data of interpreter language form, to replace described voice data to be translated, more optimizedly, when if the broadcasting media are video, in the translated speech content, extract video data and the synchronized timestamp relevant to captions from video flowing, the voice data of interpreter language form is replaced to described voice data to be translated, the video data of interpreter language form is replaced to the described video data relevant to captions, more optimizedly, by described synchronized timestamp, the voice data of controlling described interpreter language form is synchronizeed with the video data with described interpreter language form, thereby, realize that the audio frequency of strange language and/or video are converted to the preferred language form presents to the user, have more hommization, have more versatility.
It will be appreciated by those skilled in the art that the unit that comprises for above-described embodiment two is just divided according to function logic, but be not limited to above-mentioned division, as long as can realize corresponding function; In addition, the concrete title of each functional unit also, just for the ease of mutual differentiation, is not limited to protection scope of the present invention.
One of ordinary skill in the art will appreciate that all or part of step realized in above-described embodiment method is to come the hardware that instruction is relevant to complete by program, described program can be in being stored in a computer read/write memory medium, described storage medium, as ROM/RAM, disk, CD etc.
Above content is only preferred embodiment of the present invention, for those of ordinary skill in the art, according to thought of the present invention, all will change in specific embodiments and applications, and this description should not be construed as limitation of the present invention.

Claims (8)

1. an audio processing equipment, is characterized in that, described audio processing equipment comprises:
The first extraction unit for by mobile terminal, extracts and carries the voice data for the treatment of object content from audio stream;
Recognition unit, for identifying the word content that described voice data is corresponding;
The second extraction unit, for obtaining user's preferred language, using as object language;
Converting unit, for described word content being changed into to the word content of object language form, the word content of the word content of described object language form for adopting object language to describe;
Substituting unit, for the word content by described object language form, be converted to the voice data of object language form, to substitute the described voice data for the treatment of target.
2. audio processing equipment according to claim 1, is characterized in that, described recognition unit utilizes speech recognition technology, identifies the word content that described voice data is corresponding.
3. audio processing equipment according to claim 1, is characterized in that, described audio processing equipment also comprises:
The video extraction unit for by mobile terminal, extracts the video data relevant to captions from video flowing;
The video identification unit, the video data for relevant according to captions, identify caption content.
4. audio processing equipment according to claim 3, is characterized in that, described audio processing equipment also comprises:
The video converting unit, for by described caption content, convert the caption content of object language form to, the caption content of the caption content of described object language form for adopting object language to describe;
The video substituting unit, for the caption content by described object language form, be converted to the video data of object language form, to substitute the described video data relevant to captions.
5. according to the described audio processing equipment of any one in claim 1-4, it is characterized in that, described audio processing equipment also comprises:
The timestamp unit, for obtaining in advance the synchronized timestamp of described voice data and described video data;
Lock unit, for passing through described synchronized timestamp, the voice data of controlling described interpreter language form is synchronizeed with the video data with described interpreter language form.
6. an audio-frequency processing method, is characterized in that, described method comprises:
By mobile terminal, extract the voice data that carries content to be translated from audio stream;
Identify the word content that described voice data is corresponding;
Obtain user's preferred language, using as object language;
Described word content is converted to the word content of object language form, the word content of the word content of described object language form for adopting object language to describe;
By the word content of described object language form, be converted to the voice data of object language form, to substitute described voice data to be converted.
7. audio-frequency processing method according to claim 6, is characterized in that, utilizes speech recognition technology, identifies the word content that described voice data is corresponding.
8. audio-frequency processing method according to claim 6, is characterized in that, described method also comprises:
By mobile terminal, extract the video data relevant to captions from video flowing;
The relevant video data according to captions, identify caption content;
By described caption content, convert the caption content of object language form to, the caption content of the caption content of described object language form for adopting object language to describe;
By the caption content of described object language form, be converted to the video data of object language form, to replace the described video data relevant to captions.
CN201310397999.3A 2013-09-04 2013-09-04 Audio processing method and audio processing equipment Pending CN103491429A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201310397999.3A CN103491429A (en) 2013-09-04 2013-09-04 Audio processing method and audio processing equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201310397999.3A CN103491429A (en) 2013-09-04 2013-09-04 Audio processing method and audio processing equipment

Publications (1)

Publication Number Publication Date
CN103491429A true CN103491429A (en) 2014-01-01

Family

ID=49831341

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201310397999.3A Pending CN103491429A (en) 2013-09-04 2013-09-04 Audio processing method and audio processing equipment

Country Status (1)

Country Link
CN (1) CN103491429A (en)

Cited By (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103997657A (en) * 2014-06-06 2014-08-20 福建天晴数码有限公司 Converting method and device of audio in video
CN105244026A (en) * 2015-08-24 2016-01-13 陈娟 Voice processing method and device
CN105609106A (en) * 2015-12-16 2016-05-25 魅族科技(中国)有限公司 Event recording document generation method and apparatus
CN105828101A (en) * 2016-03-29 2016-08-03 北京小米移动软件有限公司 Method and device for generation of subtitles files
CN105917405A (en) * 2014-01-17 2016-08-31 微软技术许可有限责任公司 Incorporating an exogenous large-vocabulary model into rule-based speech recognition
CN106791913A (en) * 2016-12-30 2017-05-31 深圳市九洲电器有限公司 Digital television program simultaneous interpretation output intent and system
CN109274900A (en) * 2018-09-05 2019-01-25 浙江工业大学 A kind of video dubbing method
CN109830239A (en) * 2017-11-21 2019-05-31 群光电子股份有限公司 Voice processing apparatus, voice recognition input systems and voice recognition input method
WO2019205870A1 (en) * 2018-04-24 2019-10-31 腾讯科技(深圳)有限公司 Video stream processing method, apparatus, computer device, and storage medium
CN110767233A (en) * 2019-10-30 2020-02-07 合肥名阳信息技术有限公司 Voice conversion system and method
US10749989B2 (en) 2014-04-01 2020-08-18 Microsoft Technology Licensing Llc Hybrid client/server architecture for parallel processing
CN111787155A (en) * 2020-06-30 2020-10-16 深圳传音控股股份有限公司 Audio data processing method, terminal device and medium
CN111800543A (en) * 2020-06-30 2020-10-20 深圳传音控股股份有限公司 Audio file processing method, terminal device and storage medium
US10885918B2 (en) 2013-09-19 2021-01-05 Microsoft Technology Licensing, Llc Speech recognition using phoneme matching
CN112786025A (en) * 2020-12-28 2021-05-11 腾讯音乐娱乐科技(深圳)有限公司 Method for determining lyric timestamp information and training method of acoustic model
WO2022000829A1 (en) * 2020-06-30 2022-01-06 深圳传音控股股份有限公司 Audio data processing method, terminal device, and computer-readable storage medium

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10885918B2 (en) 2013-09-19 2021-01-05 Microsoft Technology Licensing, Llc Speech recognition using phoneme matching
CN105917405A (en) * 2014-01-17 2016-08-31 微软技术许可有限责任公司 Incorporating an exogenous large-vocabulary model into rule-based speech recognition
CN105917405B (en) * 2014-01-17 2019-11-05 微软技术许可有限责任公司 Merging of the exogenous large vocabulary model to rule-based speech recognition
US10311878B2 (en) 2014-01-17 2019-06-04 Microsoft Technology Licensing, Llc Incorporating an exogenous large-vocabulary model into rule-based speech recognition
US10749989B2 (en) 2014-04-01 2020-08-18 Microsoft Technology Licensing Llc Hybrid client/server architecture for parallel processing
CN103997657A (en) * 2014-06-06 2014-08-20 福建天晴数码有限公司 Converting method and device of audio in video
CN105244026B (en) * 2015-08-24 2019-09-20 北京意匠文枢科技有限公司 A kind of method of speech processing and device
CN105244026A (en) * 2015-08-24 2016-01-13 陈娟 Voice processing method and device
CN105609106A (en) * 2015-12-16 2016-05-25 魅族科技(中国)有限公司 Event recording document generation method and apparatus
CN105828101A (en) * 2016-03-29 2016-08-03 北京小米移动软件有限公司 Method and device for generation of subtitles files
CN105828101B (en) * 2016-03-29 2019-03-08 北京小米移动软件有限公司 Generate the method and device of subtitle file
WO2018121001A1 (en) * 2016-12-30 2018-07-05 深圳市九洲电器有限公司 Method and system for outputting simultaneous interpretation of digital television program, and smart terminal
CN106791913A (en) * 2016-12-30 2017-05-31 深圳市九洲电器有限公司 Digital television program simultaneous interpretation output intent and system
CN109830239A (en) * 2017-11-21 2019-05-31 群光电子股份有限公司 Voice processing apparatus, voice recognition input systems and voice recognition input method
CN109830239B (en) * 2017-11-21 2021-07-06 群光电子股份有限公司 Speech processing device, speech recognition input system, and speech recognition input method
WO2019205870A1 (en) * 2018-04-24 2019-10-31 腾讯科技(深圳)有限公司 Video stream processing method, apparatus, computer device, and storage medium
US11252444B2 (en) 2018-04-24 2022-02-15 Tencent Technology (Shenzhen) Company Limited Video stream processing method, computer device, and storage medium
CN109274900A (en) * 2018-09-05 2019-01-25 浙江工业大学 A kind of video dubbing method
CN110767233A (en) * 2019-10-30 2020-02-07 合肥名阳信息技术有限公司 Voice conversion system and method
CN111787155A (en) * 2020-06-30 2020-10-16 深圳传音控股股份有限公司 Audio data processing method, terminal device and medium
CN111800543A (en) * 2020-06-30 2020-10-20 深圳传音控股股份有限公司 Audio file processing method, terminal device and storage medium
WO2022000829A1 (en) * 2020-06-30 2022-01-06 深圳传音控股股份有限公司 Audio data processing method, terminal device, and computer-readable storage medium
CN112786025A (en) * 2020-12-28 2021-05-11 腾讯音乐娱乐科技(深圳)有限公司 Method for determining lyric timestamp information and training method of acoustic model
CN112786025B (en) * 2020-12-28 2023-11-14 腾讯音乐娱乐科技(深圳)有限公司 Method for determining lyric timestamp information and training method of acoustic model

Similar Documents

Publication Publication Date Title
CN103491429A (en) Audio processing method and audio processing equipment
CN103226947B (en) A kind of audio-frequency processing method based on mobile terminal and device
CN105245917B (en) A kind of system and method for multi-media voice subtitle generation
US9799375B2 (en) Method and device for adjusting playback progress of video file
CN110035326A (en) Subtitle generation, the video retrieval method based on subtitle, device and electronic equipment
US10529340B2 (en) Voiceprint registration method, server and storage medium
CN104252861A (en) Video voice conversion method, video voice conversion device and server
CN105704538A (en) Method and system for generating audio and video subtitles
US20140372100A1 (en) Translation system comprising display apparatus and server and display apparatus controlling method
CN107644637B (en) Phoneme synthesizing method and device
CN102568478A (en) Video play control method and system based on voice recognition
CN103067775A (en) Subtitle display method for audio/video terminal, audio/video terminal and server
WO2014141054A1 (en) Method, apparatus and system for regenerating voice intonation in automatically dubbed videos
CN104078044A (en) Mobile terminal and sound recording search method and device of mobile terminal
CN105635782A (en) Subtitle output method and device
CN110781328A (en) Video generation method, system, device and storage medium based on voice recognition
CN105489072A (en) Method for the determination of supplementary content in an electronic device
CN111050201A (en) Data processing method and device, electronic equipment and storage medium
CN105224581A (en) The method and apparatus of picture is presented when playing music
US11714973B2 (en) Methods and systems for control of content in an alternate language or accent
US9905221B2 (en) Automatic generation of a database for speech recognition from video captions
Pleva et al. TUKE-BNews-SK: Slovak Broadcast News Corpus Construction and Evaluation.
KR20150088564A (en) E-Book Apparatus Capable of Playing Animation on the Basis of Voice Recognition and Method thereof
CN110324702A (en) Information-pushing method and device in video display process
CN102955809A (en) Method and system for editing and playing media files

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20140101

WD01 Invention patent application deemed withdrawn after publication