WO2016119370A1 - 一种实现录音的方法、装置和移动终端 - Google Patents

一种实现录音的方法、装置和移动终端 Download PDF

Info

Publication number
WO2016119370A1
WO2016119370A1 PCT/CN2015/081454 CN2015081454W WO2016119370A1 WO 2016119370 A1 WO2016119370 A1 WO 2016119370A1 CN 2015081454 W CN2015081454 W CN 2015081454W WO 2016119370 A1 WO2016119370 A1 WO 2016119370A1
Authority
WO
WIPO (PCT)
Prior art keywords
recording
information
file
voice
track
Prior art date
Application number
PCT/CN2015/081454
Other languages
English (en)
French (fr)
Inventor
奚黎明
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2016119370A1 publication Critical patent/WO2016119370A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0487Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser
    • G06F3/0488Interaction techniques based on graphical user interfaces [GUI] using specific features provided by the input device, e.g. functions controlled by the rotation of a mouse with dual sensing arrangements, or of the nature of the input device, e.g. tap gestures based on pressure sensed by a digitiser using a touch-screen or digitiser, e.g. input of commands through traced gestures
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/02Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
    • G11B27/031Electronic editing of digitised analogue information signals, e.g. audio or video signals

Definitions

  • This paper relates to mobile terminal recording and broadcasting technology, especially a method, device and mobile terminal for realizing recording.
  • the content of the meeting is often recorded in the meeting.
  • the mobile phone in the process of recording by the user using the mobile phone, the mobile phone must stay in the recording interface all the time, so that the mobile phone screen is always lit and consumes a lot of power.
  • the recorded content cannot be marked during the recording process, and it is necessary to mark the content of the recorded file after the recording is completed, such as the speaker identification information and other recorded information.
  • tag information can only be viewed through the display on the screen of the mobile phone, which is very inconvenient to use.
  • the annotation information cannot be viewed normally.
  • an embodiment of the present invention provides a method, an apparatus, and a mobile terminal for implementing recording, which can implement marking of a recording file simply and quickly, and play the marking information while playing the recording file.
  • an embodiment of the present invention provides a method for implementing recording, including: acquiring, by a mobile terminal, recording mark information;
  • the voice information obtained after the conversion is inserted into the recording file to form a recording synthesis file.
  • the obtaining the recording mark information includes:
  • the gesture information is recognized, and the recognized gesture information is converted into the recording mark information according to a preset correspondence.
  • the obtaining the recording mark information includes:
  • the input suspension interface of the mobile terminal is called and displayed, and information input through the input suspension interface is acquired as the recording mark information.
  • the converting the obtained tag information into a voice file includes:
  • the forming a recording synthesis file includes:
  • the first track and the second track for recording the recording file are saved as the recording composition file.
  • the method also includes:
  • the embodiment of the invention further provides an apparatus for realizing recording, which at least comprises an obtaining module, a converting module, and a synthesizing processing module; wherein
  • a conversion module configured to convert the obtained recording mark information into voice information
  • the synthesis processing module is configured to insert the converted voice information into the recording file to form a recording synthesis file.
  • the acquiring module is configured to: when the screen of the mobile terminal is in a black screen state, identify gesture information, and convert the recognized gesture information into a recording mark according to a preset correspondence relationship. Information; or,
  • the obtaining module is configured to: when the screen of the mobile terminal is in a bright screen state, invoke and display an input floating interface of the mobile terminal, and acquire information input through the input floating interface as the recording mark information.
  • the conversion module is configured to: record an acquisition time point of the mark information, and mark the time on the first track of the record; establish a correspondence between the time point of the record and the obtained record mark information; The tag information is converted into a voice file and the obtained voice file correspondence is added to the correspondence.
  • the synthesizing processing module is configured to sequentially insert the voice file into a first track corresponding to a corresponding time point according to a sequence of recorded time points; when the recording is completed, the first track is The second track for recording the recording file is saved as a recording composition file.
  • the synthesis processing module is further configured to: play the voice file at the time point of the first channel corresponding to the first track, and play the sound file in the second channel.
  • An embodiment of the present invention further provides a mobile terminal, including at least a display screen and a processor, where
  • the processor is configured to obtain recording mark information when the display screen is in a black screen state or a bright screen state; convert the obtained recording mark information into voice information; and insert the converted voice information into the sound recording file to form a sound recording composite file.
  • the embodiment of the invention further provides a computer storage medium, wherein the computer storage medium stores computer executable instructions, and the computer executable instructions are used to execute the above method.
  • the technical solution of the present application includes acquiring the tag information; converting the obtained tag information into a voice file; and inserting the converted voice file into the recording file to form a synthesized recording file.
  • the marking information is used as a part of the final recording file, and the marking of the recording file is realized at any time during the recording process simply and quickly; and, simultaneously, the synchronous playback of the marking information is realized while playing the recording file. Instead of having to manually check it out.
  • the recording file and the tag information in the embodiment of the present invention are simultaneously stored in the same recording synthesis file, the recording file is shared to other devices, and the tag information is also shared to other devices.
  • FIG. 1 is a flowchart of a method for implementing recording according to an embodiment of the present invention
  • FIG. 2 is a schematic structural diagram of a device for implementing recording according to an embodiment of the present invention
  • FIG. 3 is a schematic flowchart of a first embodiment of implementing recording according to an embodiment of the present invention.
  • FIG. 4 is a schematic flowchart diagram of a second embodiment of implementing recording according to an embodiment of the present invention.
  • FIG. 1 is a flowchart of a method for implementing recording according to an embodiment of the present invention. In the recording process, as shown in FIG. 1 , the following steps are included:
  • Step 100 The mobile terminal acquires the recording mark information and converts it into voice information.
  • the information of obtaining the recording mark in this step includes:
  • the gesture information is recognized, and the recognized gesture information is converted into the recording mark information according to the correspondence relationship between the preset gesture information and the recording mark information.
  • the gesture information may be pre-set action information capable of representing the speaker identity information or other record information, for example, a 26-letter gesture may be used to represent the initial name of the speaker's first name and the first name, so as to distinguish different speakers. Identity and so on. For example, it is assumed that the correspondence relationship between the speaker name "Zhang Ming" and the input "Z" and "M” letter gesture information is stored in advance; when the Zhang Ming speaks, the user inputs "Z" and "” respectively on the screen.
  • the M" letter gesture the mobile terminal can convert the gesture information into a corresponding speaker information, namely Zhang Ming.
  • the mobile terminal may prompt an input error, or may not operate, etc., and is not limited herein.
  • the screen of the mobile terminal can also be illuminated.
  • the obtaining the recording mark information in this step includes: calling and displaying the mobile terminal input floating interface, and acquiring information input through the input floating interface as the recording mark information.
  • the preset input floating interface is called up, and the information is input (such as by handwriting, pinyin, stroke, etc.) in the input area (not limited to text information, and may also include numbers, letters, symbols, etc.).
  • the user can input “Zhang Ming” in the input area and click the preset button to confirm the input.
  • the conversion of the obtained recording mark information into voice information in this step includes:
  • the obtained recording mark information is converted into voice information and the obtained voice information is correspondingly added to the correspondence.
  • a database of recorded mark information and voice information may be established in advance, and the database may be updated; a database of text information and voice information already exists in the related art, which may be implemented by the present invention.
  • the database in the related art may be used.
  • the voice information obtained in the embodiment of the present invention may be one or more voice files, and respectively correspond to different time points.
  • Step 101 Insert the converted voice information into a recording file to form a recording synthesis file. This step includes:
  • the voice information is sequentially inserted into the first track corresponding to the corresponding time point.
  • the first track is only responsible for inserting the voice information matching the recorded mark information, and the speaker is not recorded. Sound; the speaker's speech information is normally recorded on the second track; when the recording is completed, the first track and the second track are saved as a recorded composite file.
  • the first track and the second track are separated, such that the first track corresponds to one of the channels and the second track corresponds to the other channel.
  • one channel will play the recording mark information at a certain point in the mark, while the other channel still plays the speaker's recorded content.
  • the embodiment of the present invention emphasizes that the recording mark information is used as a part of the final recording file, and the marking of the recording file at any time during the recording process is simply and quickly realized. Moreover, synchronous playback of the tag information is realized at the same time as the subsequent playback of the recording file, without the user manually checking. Further, since the recording file and the tag information in the embodiment of the present invention are simultaneously stored in the same recording synthesis file, the recording file is shared to other devices, and the tag information is also shared with other devices. .
  • FIG. 2 is a schematic structural diagram of a device for realizing recording according to an embodiment of the present invention. As shown in FIG. 2, at least an acquiring module, a converting module, and a synthesizing processing module are included;
  • a conversion module configured to convert the obtained recording mark information into voice information
  • the synthesis processing module is configured to insert the converted voice information into the recording file to form a recording synthesis file.
  • the acquiring module is configured to: when the screen of the mobile terminal is in a black screen state, identify the gesture information, and convert the recognized gesture information into the recording mark information according to the preset correspondence relationship;
  • the obtaining module is configured to: when the screen of the mobile terminal is in a bright state, call and display the input floating interface of the mobile terminal, and obtain information input through the input floating interface as the recording mark information.
  • the conversion module is configured to: record the acquisition time point of the marker information, and mark the time on the first track of the recording; establish a correspondence relationship between the recorded time point and the obtained recording mark information; convert the obtained mark information into a voice The file is added to the corresponding relationship of the obtained voice file.
  • the compositing processing module is configured to: sequentially insert the voice files into the first track corresponding to the corresponding time points according to the order of the recorded time points; when the recording is completed, the first track and the speaker for recording the speaker
  • the second track of the speech information is saved as a recording synthesis file.
  • the synthesizing processing module is further configured to: play a voice file at a time point of the first channel corresponding to the first track, and play the recording file in the second channel.
  • the device of the embodiment of the invention may be disposed in the mobile terminal.
  • the mobile terminal includes at least a display screen and a processor, wherein
  • the processor is configured to obtain recording mark information when the display screen is in a black screen state or a bright screen state; convert the obtained recording mark information into voice information; and insert the converted voice information into the sound recording file to form a sound recording composite file.
  • the method includes the following steps: acquiring the tag information; converting the obtained tag information into a voice file; and inserting the converted voice file into the recording file to form a synthesized recording file.
  • the storage medium is, for example, a ROM/RAM, a magnetic disk, an optical disk, or the like.
  • FIG. 3 is a schematic diagram of a process for realizing recording according to the first embodiment of the present invention. As shown in FIG. 3, in the first embodiment, assuming that the mobile phone is in a black screen state and the recording function is started in the background, the process includes:
  • Steps 300 to 301 The mobile phone is black screen and the background recording is performed, that is, the user starts the mobile phone recorder to start recording, and the recording application is running in the background. After the mobile phone lock screen and the screen backlight is turned off, the display of the mobile phone can still be powered. It is judged whether the letter gesture is input on the screen, that is, whether the user is recognized to input a specific letter gesture on the display screen, and if it is recognized that the next step is entered, the process ends.
  • Step 302 Identify a gesture image.
  • the touch screen extracts key points from the edge information of the letter gesture image to recognize the gesture, and the corresponding letter gesture image is displayed on the display screen, and the screen backlight does not need to be lit. For example, when “Zhang Ming” speaks, the user can input “Z” and “M” letter gestures on the screen to indicate.
  • Step 303 Record the recognized gesture signal, that is, the recording mark information, including the letter gesture image and the input time point information.
  • the gesture signal input time point information records a time point after the gesture input is completed
  • Step 304 Record the time point of the gesture input, and mark the time point on the track 1.
  • Step 305 Insert the matched voice file into the track 1 corresponding to the time point: record the input time point, the gesture image, and generate a correspondence list.
  • the recorded "Z" and "M” gesture image information is matched with the corresponding voice file in the local voice library, and the matching relationship information is added to the corresponding relationship list.
  • the established relationship list is called, and the voice files in which the matching relationship has been established are sequentially inserted into the track 1 merged to the corresponding mark time according to the recorded time point.
  • Step 306 At the same time, the normal background recording is still maintained on the track 2.
  • track 1 is only responsible for recording the voice file whose inserted tag information matches, and does not record the speaker's voice; track 2 continues to record normally.
  • Step 307 After the recording is completed, the track 1 and the track 2 are saved to generate a new recording file.
  • Step 308 Determine whether the multi-channel device is required to play the recorded recording file, if yes, go to step 309; if not, end the process.
  • Step 309 The left channel plays the track 1 gesture voice, and the right channel normally plays the recording.
  • the track 1 and the track 2 are separated so that the track 1 corresponds to the left channel and the track 2 corresponds to the right channel.
  • the left channel of the earphone plays the voice information corresponding to the gesture information “Z” and “M” at a certain point of the mark, and the right channel plays the recorded content of the speaker.
  • FIG. 4 is a schematic flowchart of implementing a recording according to a second embodiment of the present invention. As shown in FIG. 4, in the second embodiment, assuming that the mobile phone is in a bright screen state, and the recording function is started in the background, the method includes:
  • Step 400 The mobile phone is bright and in the background recording, that is, the user starts the mobile phone recorder to start recording, and the recording application is running in the background, the mobile phone is in the standby interface and the screen backlight is not extinguished.
  • Step 401 Invoking the recording mark information input interface, inputting the recording mark information and confirming, optionally, the user calls up the mark input floating interface on the standby interface of the mobile phone, and can input by handwriting (also by pinyin, stroke, etc.) in the input area.
  • Mark information (not limited to text information, may also include numbers, letters, symbols, etc.). For example, when “Zhang Ming” speaks, the user can input “Zhang Ming” in the input area and click the preset marker button to confirm the input is completed.
  • Step 402 Record the obtained record mark information and convert it into a voice file, optionally, record the time point of the obtained record mark information input, and mark the time on the recorded track 1; A list of correspondences between time points and markup texts is generated. At the same time, the recorded "Zhang Ming" mark text information is converted into a corresponding voice file, and the converted voice file information is added to the established relationship list.
  • Step 403 Insert the converted voice file into the track 1 at the corresponding time point, that is, call the established relationship list, and sequentially insert the converted voice file into the sounds corresponding to the corresponding mark time according to the recorded time point. On track 1.
  • Step 404 At the same time, the normal background recording is still maintained on the track 2.
  • track 1 is only responsible for recording the voice file whose inserted tag information matches, and does not record the speaker's voice; track 2 continues to record normally.
  • Step 405 After the recording is completed, the track 1 and the track 2 are saved to generate a new recording file.
  • Step 406 Determine whether the multi-channel device is required to play the recorded recording file, if yes, go to step 407; if not, end the process.
  • Step 407 The left channel plays the track 1 gesture voice, and the right channel normally plays the recording.
  • the track 1 and the track 2 are separated so that the track 1 corresponds to the left channel and the track 2 corresponds to the right channel.
  • the left channel of the earphone will play the voice content corresponding to the “Zhang Ming” mark text at a certain point of the mark, and the right channel plays the recorded content of the speaker.
  • all or part of the steps of the foregoing embodiments may also be implemented by using an integrated circuit, and the steps may be separately fabricated into integrated circuit modules, or multiple modules thereof or The steps are made into a single integrated circuit module.
  • the devices/function modules/functional units in the above embodiments may be implemented by a general-purpose computing device, which may be centralized on a single computing device or distributed over a network of multiple computing devices.
  • each device/function module/functional unit in the above embodiment When each device/function module/functional unit in the above embodiment is implemented in the form of a software function module and sold or used as a stand-alone product, it can be stored in a computer readable storage medium.
  • the above mentioned computer readable storage medium may be a read only memory, a magnetic disk or an optical disk or the like.
  • the above technical solution realizes the marking of the recording file at any time during the recording process simply and quickly; and, at the same time of playing the recording file, the synchronous playback of the marking information is realized, and no manual inspection by the user is required.
  • the above technical solution can share the mark information to other devices when sharing the recorded file to other devices.

Abstract

一种实现录音的方法、装置和移动终端,包括移动终端获取录音标记信息;将获得的录音标记信息转换为语音信息;将转换后得到的语音信息插入到录音文件中形成录音合成文件。上述技术方案将标记信息作为最终的录音文件的一部分,简单地、快捷地实现了在录音过程中随时对录音文件的标记;并且,在后续播放录音文件的同时,实现了标记信息的同步播放,而不要用户手动查阅。进一步地,由于录音文件和标记信息是同时存储在同一个合成录音文件中的,因此,在将录音文件分享到其他设备,同时也将其标记信息共享给了其他设备。

Description

一种实现录音的方法、装置和移动终端 技术领域
本文涉及移动终端录播技术,尤指一种实现录音的方法、装置和移动终端。
背景技术
随着智能终端的应用越来越广泛,在很多场景下,人们使用智能终端替代了如照相机、MP3、MP4、录音笔等传统设备,方便地实现了拍照、音视频播录等功能。
目前,在会议场合经常会对会议内容进行录音。例如,在用户使用手机进行录音的过程中,手机必须一直停留在录音界面,使得手机屏幕一直处于点亮状态,非常耗电。并且在录音过程中无法对录音内容进行标记,需要在录音完成后,根据录音文件内容再进行标记,例如发言人身份信息及其他记录信息等。而且,这种标记信息只能通过手机屏幕上的显示来查看,使用非常不方便。另外,如果用户想要将录音文件分享到其他设备上播放,则无法正常查看所述标注信息。
发明内容
为了解决上述技术问题,本发明实施例提供一种实现录音的方法、装置和移动终端,能够简单地、快捷地实现对录音文件的标记,并在播放录音文件的同时播放标记信息。
为了达到本发明目的,本发明实施例提供了一种实现录音的方法,包括:移动终端获取录音标记信息;
将获得的录音标记信息转换为语音信息;
将转换后得到的语音信息插入到录音文件中形成录音合成文件。
所述获取录音标记信息包括:
在所述移动终端的屏幕处于黑屏状态时,识别手势信息,并按照预先设置的对应关系将识别出的手势信息转换为所述录音标记信息。
所述获取录音标记信息包括:
在所述移动终端的屏幕处于亮屏状态时,调用并显示所述移动终端的输入悬浮界面,并获取通过输入悬浮界面输入的信息作为所述录音标记信息。
所述将获得的标记信息转换为语音文件包括:
记录所述记录标记信息的获取时刻点,并在录音的第一音轨上标记该时刻;
建立记录的时刻点与获得的标记信息的对应关系;
将所述获得的标记信息转换为语音信息,并将得到的语音信息对应添加到对应关系中。
所述形成录音合成文件包括:
根据所述记录的时刻点顺序,将所述语音信息依次插入到对应标记有相应时刻点的第一音轨上;
完成录音时,将第一音轨与用于录制所述录音文件的第二音轨保存为所述录音合成文件。
该方法还包括:
在所述第一音轨对应的第一声道的所述时刻点播放所述语音文件,在所述第二声道播放所述录音文件。
本发明实施例又提供了一种实现录音的装置,至少包括获取模块、转换模块,以及合成处理模块;其中,
获取模块,设置为获取录音标记信息;
转换模块,设置为将获得的录音标记信息转换为语音信息;
合成处理模块,设置为将转换后得到的语音信息插入到录音文件中形成录音合成文件。
所述获取模块是设置为:在所述移动终端的屏幕处于黑屏状态时,识别手势信息,并按照预先设置的对应关系将识别出的手势信息转换为录音标记 信息;或者,
所述获取模块是设置为:在所述移动终端的屏幕处于亮屏状态时,调用并显示所述移动终端的输入悬浮界面,并获取通过输入悬浮界面输入的信息作为所述录音标记信息。
所述转换模块是设置为:记录所述标记信息的获取时刻点,并在录音的第一音轨上标记该时刻;建立记录的时刻点与获得的录音标记信息的对应关系;将所述获得的标记信息转换为语音文件并将得到的语音文件对应添加到对应关系中。
所述合成处理模块是设置为:根据记录的时刻点顺序,将所述语音文件依次插入到对应标记有相应时刻点的第一音轨上;在完成录音时,将所述第一音轨与用于录制所述录音文件的所述第二音轨保存为录音合成文件。
所述合成处理模块还设置为:在所述第一音轨对应的第一声道的所述时刻点播放所述语音文件,在所述第二声道播放所述录音文件。
本发明实施例还提供了一种移动终端,至少包括显示屏及处理器,其中,
处理器,设置为在显示屏处于黑屏状态或亮屏状态时,获取录音标记信息;将获得的录音标记信息转换为语音信息;将转换后得到的语音信息插入到录音文件中形成录音合成文件。
本发明实施例又提供了一种计算机存储介质,所述计算机存储介质中存储有计算机可执行指令,所述计算机可执行指令用于执行上述的方法。
与相关技术相比,本申请技术方案包括获取标记信息;将获得的标记信息转换为语音文件;将转换后的语音文件插入合并到录音文件中形成合成录音文件。本发明实施例将标记信息作为最终的录音文件的一部分,简单地、快捷地实现了在录音过程中随时对录音文件的标记;并且,在后续播放录音文件的同时,实现了标记信息的同步播放,而不要用户手动查阅。
进一步地,由于本发明实施例中的录音文件和标记信息是同时存储在同一个录音合成文件中的,因此,在将录音文件分享到其他设备,同时也将其标记信息共享给了其他设备。
附图概述
图1为本发明实施例实现录音的方法的流程图;
图2为本发明实施例实现录音的装置的组成结构示意图;
图3为本发明实施例实现录音的第一实施例的流程示意图;
图4为本发明实施例实现录音的第二实施例的流程示意图。
本发明的较佳实施方式
下文中将结合附图对本发明的实施例进行详细说明。需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互任意组合。
图1为本发明实施例实现录音的方法的流程图,在录音过程中,如图1所示,包括以下步骤:
步骤100:移动终端获取录音标记信息并将其转换为语音信息。
在移动终端的屏幕处于黑屏状态时,本步骤中的获取录音标记信息包括:
识别手势信息,并按照预先设置的手势信息与录音标记信息的对应关系将识别出的手势信息转换为录音标记信息。其中,手势信息可以是预先设置的能够表示发言人身份信息或其他记录信息的动作信息等,比如可以用26个字母手势来代表发言人的姓和名的首字母,用以区分不同的发言人身份等。举例来看,假设预先存储有发言人姓名“张明”与输入的“Z”和“M”字母手势信息的对应关系;在张明发言时,用户通过在屏幕上分别输入“Z”和“M”字母的手势,移动终端便可将该手势信息转换为对应的发言人信息即张明。
需要说明的是,如何识别手势信息属于本领域技术人员的惯用技术手段,具体实现方式并不用于限定本发明的保护范围,这里不再赘述。
如果识别出的手势信息对应的录音标记信息不对,移动终端可以提示输入出错,或无法操作等,这里不做限定。
如果识别出的手势信息没有找到对应的标记信息,也可以点亮移动终端的屏幕。
在移动终端的屏幕处于亮屏状态时,本步骤中的获取录音标记信息包括:调用并显示移动终端输入悬浮界面,并获取通过输入悬浮界面输入的信息作为所述录音标记信息。举例来看,调出预先设置的输入悬浮界面,在输入区域内输入(如通过手写、拼音、笔画等方式)标记信息(不限于文字信息,也可以包括数字、字母及符号等)。比如“张明”发言时,用户可以在输入区域内输入“张明”,并点击预先设置的标记按钮确认输入完成即可。
需要说明的是,输入悬浮界面的设置属于本领域技术人员的惯用技术手段,其具体实现并不用于限定本发明的保护范围,这里不再赘述。
本步骤中的将获得的录音标记信息转换为语音信息包括:
记录获得的录音标记信息的获取时刻点,并在录音的第一音轨上标记该时刻;
建立记录的时刻点与获得的录音标记信息的对应关系;
将获得的录音标记信息转换为语音信息并将得到的语音信息对应添加到对应关系中。可选地,为将获得的录音标记信息转换为语音信息,可预先建立录音标记信息和语音信息的数据库,该数据库可更新;相关技术中已存在文字信息与语音信息的数据库,可本发明实施例可使用所述相关技术中的数据库;需要说明的是,本发明实施例中获得的语音信息可以是一个或一个以上语音文件,分别与不同的时刻点对应。
步骤101:将转换后的语音信息插入到录音文件中形成录音合成文件。本步骤包括:
根据记录的时刻点顺序,将语音信息依次插入合并到对应标记有相应时刻点的第一音轨上,这里,第一音轨只负责插入与录音标记信息匹配的语音信息,不录制发言人的声音;发言人的发言信息则正常录制在第二音轨上;在完成录音时,将第一音轨与第二音轨保存为录音合成文件即可。
可选地,将第一音轨和第二音轨做分离处理,使得第一音轨对应其中一个声道,第二音轨对应另一个声道。这样,在播放录音时,一个声道在标记的某一时刻点上会播放录音标记信息,而另一声道则仍然播放发言人的录音内容。
其中,关于不同音轨的应用及合成属于本领域技术人员的惯用技术手段,其具体实现并不用于限定本发明的保护范围,这里不再赘述。本发明实施例强调的是,将录音标记信息作为最终的录音文件的一部分,简单地、快捷地实现了在录音过程中随时对录音文件的标记。并且,在后续播放录音文件的同时,实现了标记信息的同步播放,而不要用户手动查阅。更进一步地,由于本发明实施例中的录音文件和标记信息是同时存储在同一个录音合成文件中的,因此,在将录音文件分享到其他设备,同时也将其标记信息共享给了其他设备。
图2为本发明实施例实现录音的装置的组成结构示意图,如图2所示,至少包括获取模块、转换模块,以及合成处理模块;其中,
获取模块,设置为获取录音标记信息;
转换模块,设置为将获得的录音标记信息转换为语音信息;
合成处理模块,设置为将转换后得到的语音信息插入到录音文件中形成录音合成文件。
可选地,
获取模块是设置为:在移动终端的屏幕处于黑屏状态时,识别手势信息,并按照预先设置的对应关系将识别出的手势信息转换为录音标记信息;
或者,获取模块是设置为:在移动终端的屏幕处于亮屏状态时,调用并显示移动终端的输入悬浮界面,并获取通过输入悬浮界面输入的信息作为录音标记信息。
转换模块是设置为:记录标记信息的获取时刻点,并在录音的第一音轨上标记该时刻;建立记录的时刻点与获得的录音标记信息的对应关系;将获得的标记信息转换为语音文件并将得到的语音文件对应添加到对应关系中。
合成处理模块是设置为:根据记录的时刻点顺序,将语音文件依次插入到对应标记有相应时刻点的第一音轨上;在完成录音时,将第一音轨与用于录制发言人的发言信息的第二音轨保存为录音合成文件。
可选地,合成处理模块还设置为:在第一音轨对应的第一声道的时刻点播放语音文件,在第二声道播放录音文件。
本发明实施例装置可以设置在移动终端中。该移动终端至少包括显示屏及处理器,其中,
处理器,设置为在显示屏处于黑屏状态或亮屏状态时,获取录音标记信息;将获得的录音标记信息转换为语音信息;将转换后得到的语音信息插入到录音文件中形成录音合成文件。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分步骤是可以通过程序来指令相关的硬件来完成,所述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,包括如下步骤:获取标记信息;将获得的标记信息转换为语音文件;将转换后的语音文件插入合并到录音文件中形成合成录音文件。所述的存储介质,如:ROM/RAM、磁碟、光盘等。
下面结合具体实施例对本发明方法进行详细描述。
图3为本发明第一实施例的实现录音的流程示意图,如图3所示,第一实施例中,假设手机处于黑屏状态,且后台已启动录音功能,所述流程包括:
步骤300~步骤301:手机黑屏且后台录音中,即用户启动手机录音机开始录音,并将录音应用处于后台运行,手机锁屏且屏幕背光熄灭后,手机的显示屏仍然可以通电工作。判断屏幕上是否输入字母手势,即是否识别到用户在显示屏上输入特定字母手势,如果识别到进入下一步,否者结束本流程。
步骤302:识别手势图像。触摸屏从字母手势图像的边缘信息中提取关键点对手势进行识别,显示屏上则会显示出相应的字母手势图像,此时屏幕背光不需要点亮。比如“张明”发言时,用户可以在屏幕上分别输入“Z”和“M”字母手势以表示。
步骤303:记录识别出的手势信号即记录标记信息,包括字母手势图像和输入时刻点信息。
可选地,手势信号输入时刻点信息记录的是手势输入完毕后的时刻点;
步骤304:记录手势输入的时刻点,并在音轨1上标记该时刻点。
步骤305:将匹配的语音文件插入到对应时刻点的音轨1上:记录输入的时刻点、手势图像,并生成对应关系列表。同时将记录的“Z”和“M”手势图像信息与本地语音库中对应的语音文件建立匹配关系,并将匹配关系信息新增到对应的关系列表中。调用建立好的关系列表,根据记录的时刻点,将已建立匹配关系的语音文件顺序依次插入合并到对应标记时刻的音轨1上。
步骤306:同时音轨2上仍然保持着正常的后台录音。
这里音轨1只负责录制插入的标记信息匹配的语音文件,不录制发言人的声音;音轨2则继续正常录音。
步骤307:录音完毕,将音轨1和音轨2保存生成一个新的录音文件。
步骤308:判断是否需要多声道设备播放录制的录音文件,如果是进入步骤309;如果不是,结束本流程。
步骤309:左声道播放音轨1手势语音,右声道正常播放录音。可选地,将音轨1和音轨2做分离处理,使得音轨1对应左声道,音轨2对应右声道。
这样,当用户插入耳机播放录音时,耳机的左声道在标记的某一时刻点上会播放对应手势信息“Z”和“M”的语音信息,右声道则播放发言人的录音内容。
图4为本发明第二实施例的实现录音的流程示意图,如图4所示,第二实施例中,假设手机处于亮屏状态,且后台已启动录音功能,包括:
步骤400:手机亮屏且后台录音中,即用户启动手机录音机开始录音,并将录音应用处于后台运行,手机处于待机界面且屏幕背光未熄灭。
步骤401:调用录音标记信息输入界面输入录音标记信息并确认,可选地,用户在手机待机界面调出标记输入悬浮界面,可以在输入区域内通过手写(也可以通过拼音、笔画等方式)输入标记信息(不限于文字信息,也可以包括数字、字母及符号等)。比如“张明”发言时,用户可以在输入区域内输入“张明”,并点击预先设置的标记按钮确认输入完成。
步骤402:记录获得的记录标记信息并将其转换为语音文件,可选地,记录获得的记录标记信息输入的时刻点,并在录音的音轨1上标记该时刻; 并生成时刻点、标记文字之间的对应关系列表。同时将记录的“张明”标记文字信息转换为对应的语音文件,并将转换后的语音文件信息新增到建立好的关系列表中。
步骤403:将转换后的语音文件插入到对应时刻点的音轨1上,即调用建立好的关系列表,根据记录的时刻点,将转换后的语音文件顺序依次插入合并到对应标记时刻的音轨1上。
步骤404:同时音轨2上仍然保持着正常的后台录音。
这里音轨1只负责录制插入的标记信息匹配的语音文件,不录制发言人的声音;音轨2则继续正常录音。
步骤405:录音完毕,将音轨1和音轨2保存生成一个新的录音文件。
步骤406:判断是否需要多声道设备播放录制的录音文件,如果是进入步骤407;如果不是结束本流程。
步骤407:左声道播放音轨1手势语音,右声道正常播放录音。可选地,将音轨1和音轨2做分离处理,使得音轨1对应左声道,音轨2对应右声道。
这样,当用户插入耳机播放录音时,耳机的左声道在标记的某一时刻点上会播放“张明”标记文字对应的语音内容,右声道则播放发言人的录音内容。
以上所述,仅为本发明的较佳实例而已,并非用于限定本发明的保护范围。凡在本发明的精神和原则之内,所做的任何修改、等同替换、改进等,均应包含在本发明的保护范围之内。
本领域普通技术人员可以理解上述实施例的全部或部分步骤可以使用计算机程序流程来实现,所述计算机程序可以存储于一计算机可读存储介质中,所述计算机程序在相应的硬件平台上(如系统、设备、装置、器件等)执行,在执行时,包括方法实施例的步骤之一或其组合。
可选地,上述实施例的全部或部分步骤也可以使用集成电路来实现,这些步骤可以被分别制作成一个个集成电路模块,或者将它们中的多个模块或 步骤制作成单个集成电路模块来实现。
上述实施例中的各装置/功能模块/功能单元可以采用通用的计算装置来实现,它们可以集中在单个的计算装置上,也可以分布在多个计算装置所组成的网络上。
上述实施例中的各装置/功能模块/功能单元以软件功能模块的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。上述提到的计算机可读取存储介质可以是只读存储器,磁盘或光盘等。
工业实用性
上述技术方案简单、快捷地实现了在录音过程中随时对录音文件的标记;并且,在后续播放录音文件的同时,实现了标记信息的同步播放,不需要用户手动查阅。另外,上述技术方案能够实现在将录音文件分享到其他设备时,将其标记信息共享给了其他设备。

Claims (13)

  1. 一种实现录音的方法,包括:
    移动终端获取录音标记信息;
    将获得的录音标记信息转换为语音信息;
    将转换后得到的语音信息插入到录音文件中形成录音合成文件。
  2. 根据权利要求1所述的方法,其中,所述获取录音标记信息包括:
    在所述移动终端的屏幕处于黑屏状态时,识别手势信息,并按照预先设置的手势信息与录音标记信息的对应关系将识别出的手势信息转换为所述录音标记信息。
  3. 根据权利要求1所述的方法,其中,所述获取录音标记信息包括:
    在所述移动终端的屏幕处于亮屏状态时,调用并显示所述移动终端的输入悬浮界面,并获取通过输入悬浮界面输入的信息作为所述录音标记信息。
  4. 根据权利要求1、2或3所述的方法,其中,所述将获得的标记信息转换为语音文件包括:
    记录所述记录标记信息的获取时刻点,并在录音的第一音轨上标记该时刻;
    建立记录的时刻点与获得的标记信息的对应关系;
    将所述获得的标记信息转换为语音信息,并将得到的语音信息对应添加到对应关系中。
  5. 根据权利要求4所述的方法,其中,所述形成录音合成文件包括:
    根据所述记录的时刻点顺序,将所述语音信息依次插入到对应标记有相应时刻点的第一音轨上;
    完成录音时,将第一音轨与用于录制所述录音文件的第二音轨保存为所述录音合成文件。
  6. 根据权利要求5所述的方法,该方法还包括:
    在所述第一音轨对应的第一声道的所述时刻点播放所述语音文件,在所 述第二声道播放所述录音文件。
  7. 一种实现录音的装置,包括获取模块、转换模块,以及合成处理模块;其中,
    获取模块,设置为获取录音标记信息;
    转换模块,设置为将获得的录音标记信息转换为语音信息;
    合成处理模块,设置为将转换后得到的语音信息插入到录音文件中形成录音合成文件。
  8. 根据权利要求7所述的装置,其中,
    所述获取模块是设置为:在所述移动终端的屏幕处于黑屏状态时,识别手势信息,并按照预先设置的手势信息与录音标记信息的对应关系将识别出的手势信息转换为录音标记信息;或者,
    所述获取模块是设置为:在所述移动终端的屏幕处于亮屏状态时,调用并显示所述移动终端的输入悬浮界面,并获取通过输入悬浮界面输入的信息作为所述录音标记信息。
  9. 根据权利要求7或8所述的装置,其中,所述转换模块是设置为:记录所述标记信息的获取时刻点,并在录音的第一音轨上标记该时刻;建立记录的时刻点与获得的录音标记信息的对应关系;将所述获得的标记信息转换为语音文件并将得到的语音文件对应添加到对应关系中。
  10. 根据权利要求9所述的装置,其中,所述合成处理模块是设置为:根据记录的时刻点顺序,将所述语音文件依次插入到对应标记有相应时刻点的第一音轨上;在完成录音时,将所述第一音轨与用于录制所述录音文件的所述第二音轨保存为录音合成文件。
  11. 根据权利要求10所述的装置,所述合成处理模块还设置为:在所述第一音轨对应的第一声道的所述时刻点播放所述语音文件,在所述第二声道播放所述录音文件。
  12. 一种移动终端,包括显示屏及处理器,其中,
    处理器,设置为在显示屏处于黑屏状态或亮屏状态时,获取录音标记信息;将获得的录音标记信息转换为语音信息;将转换后得到的语音信息插入 到录音文件中形成录音合成文件。
  13. 一种计算机存储介质,所述计算机存储介质中存储有计算机可执行指令,所述计算机可执行指令用于执行权利要求1~6中任一项所述的方法。
PCT/CN2015/081454 2015-01-27 2015-06-15 一种实现录音的方法、装置和移动终端 WO2016119370A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510042030.3 2015-01-27
CN201510042030.3A CN104657074A (zh) 2015-01-27 2015-01-27 一种实现录音的方法、装置和移动终端

Publications (1)

Publication Number Publication Date
WO2016119370A1 true WO2016119370A1 (zh) 2016-08-04

Family

ID=53248272

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/081454 WO2016119370A1 (zh) 2015-01-27 2015-06-15 一种实现录音的方法、装置和移动终端

Country Status (2)

Country Link
CN (2) CN104978145A (zh)
WO (1) WO2016119370A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110636175A (zh) * 2019-10-18 2019-12-31 深圳传音控股股份有限公司 通讯录制方法、终端设备及计算机可读存储介质
CN113660446A (zh) * 2021-08-17 2021-11-16 深圳市唐为电子有限公司 一种智能手机对讲通话优化系统

Families Citing this family (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104978145A (zh) * 2015-01-27 2015-10-14 中兴通讯股份有限公司 一种实现录音的方法、装置和移动终端
CN106406718A (zh) * 2015-07-29 2017-02-15 中兴通讯股份有限公司 信息记录方法和装置
CN107025913A (zh) * 2016-02-02 2017-08-08 西安中兴新软件有限责任公司 一种录音方法及终端
CN107026931A (zh) * 2016-02-02 2017-08-08 中兴通讯股份有限公司 一种音频数据处理方法和终端
CN105721710A (zh) 2016-03-28 2016-06-29 联想(北京)有限公司 一种录音方法及装置、电子设备
CN105702278A (zh) * 2016-04-19 2016-06-22 珠海格力电器股份有限公司 一种会议的录音方法、装置及终端
CN107591166B (zh) * 2016-07-07 2021-02-23 中兴通讯股份有限公司 一种录音标记与回放的方法和装置
CN106653077A (zh) * 2016-12-30 2017-05-10 网易(杭州)网络有限公司 用于记录语音笔记的方法和装置及可读存储介质
CN107147803A (zh) * 2017-06-21 2017-09-08 努比亚技术有限公司 一种录音方法、终端设备及计算机可读存储介质
CN111435600B (zh) * 2019-01-15 2021-05-18 北京字节跳动网络技术有限公司 用于处理音频的方法和装置
CN115242747A (zh) * 2022-07-21 2022-10-25 维沃移动通信有限公司 语音消息处理方法、装置、电子设备和可读存储介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030177010A1 (en) * 2002-03-11 2003-09-18 John Locke Voice enabled personalized documents
CN1822189A (zh) * 2006-03-02 2006-08-23 无敌科技(西安)有限公司 一种数字录音文件的内容识别方法
US20090319275A1 (en) * 2007-03-20 2009-12-24 Fujitsu Limited Speech synthesizing device, speech synthesizing system, language processing device, speech synthesizing method and recording medium
CN103377203A (zh) * 2012-04-18 2013-10-30 宇龙计算机通信科技(深圳)有限公司 终端和录音管理方法
WO2013176366A1 (en) * 2012-05-21 2013-11-28 Lg Electronics Inc. Method and electronic device for easy search during voice record
CN104184870A (zh) * 2014-07-29 2014-12-03 小米科技有限责任公司 通话记录标记方法、装置及电子设备
CN104657074A (zh) * 2015-01-27 2015-05-27 中兴通讯股份有限公司 一种实现录音的方法、装置和移动终端

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1099898A (zh) * 1993-09-03 1995-03-08 侯培庄 语言学习助记录音带的制作方法
CN101169954A (zh) * 2006-10-27 2008-04-30 智易科技股份有限公司 录制流式音频的方法及装置
US8296681B2 (en) * 2007-08-24 2012-10-23 Nokia Corporation Searching a list based upon user input
CN103024221B (zh) * 2012-11-27 2015-06-17 华为终端有限公司 一种对群组通话进行录音的方法、终端和服务器
CN103049209B (zh) * 2012-12-31 2016-04-06 广东欧珀移动通信有限公司 在手机熄屏状态下启动手机相机的方法和装置
CN103345360B (zh) * 2013-06-21 2016-02-10 广东欧珀移动通信有限公司 一种智能终端触摸屏手势识别方法
CN103369305B (zh) * 2013-06-28 2016-02-24 武汉烽火众智数字技术有限责任公司 实现视频监控系统中语音对讲同步录音及回放的方法
CN103400592A (zh) * 2013-07-30 2013-11-20 北京小米科技有限责任公司 录音方法、播放方法、装置、终端及系统
CN104092809A (zh) * 2014-07-24 2014-10-08 广东欧珀移动通信有限公司 通话录音方法、通话录音播放方法及其相应装置

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030177010A1 (en) * 2002-03-11 2003-09-18 John Locke Voice enabled personalized documents
CN1822189A (zh) * 2006-03-02 2006-08-23 无敌科技(西安)有限公司 一种数字录音文件的内容识别方法
US20090319275A1 (en) * 2007-03-20 2009-12-24 Fujitsu Limited Speech synthesizing device, speech synthesizing system, language processing device, speech synthesizing method and recording medium
CN103377203A (zh) * 2012-04-18 2013-10-30 宇龙计算机通信科技(深圳)有限公司 终端和录音管理方法
WO2013176366A1 (en) * 2012-05-21 2013-11-28 Lg Electronics Inc. Method and electronic device for easy search during voice record
CN104184870A (zh) * 2014-07-29 2014-12-03 小米科技有限责任公司 通话记录标记方法、装置及电子设备
CN104657074A (zh) * 2015-01-27 2015-05-27 中兴通讯股份有限公司 一种实现录音的方法、装置和移动终端

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110636175A (zh) * 2019-10-18 2019-12-31 深圳传音控股股份有限公司 通讯录制方法、终端设备及计算机可读存储介质
CN113660446A (zh) * 2021-08-17 2021-11-16 深圳市唐为电子有限公司 一种智能手机对讲通话优化系统

Also Published As

Publication number Publication date
CN104657074A (zh) 2015-05-27
CN104978145A (zh) 2015-10-14

Similar Documents

Publication Publication Date Title
WO2016119370A1 (zh) 一种实现录音的方法、装置和移动终端
CN106024009B (zh) 音频处理方法及装置
WO2016197708A1 (zh) 一种录音方法及终端
US8315866B2 (en) Generating representations of group interactions
US20210243528A1 (en) Spatial Audio Signal Filtering
US11281707B2 (en) System, summarization apparatus, summarization system, and method of controlling summarization apparatus, for acquiring summary information
WO2018130173A1 (zh) 配音方法、终端设备、服务器及存储介质
CN111527746B (zh) 一种控制电子设备的方法及电子设备
WO2014154097A1 (en) Automatic page content reading-aloud method and device thereof
KR20160129787A (ko) 디지털 녹취 파일 녹취록 생성 방법
JP6624476B2 (ja) 翻訳装置および翻訳システム
US10181312B2 (en) Acoustic system, communication device, and program
US9612519B2 (en) Method and system for organising image recordings and sound recordings
US20140297285A1 (en) Automatic page content reading-aloud method and device thereof
Koenig et al. Forensic authentication of digital audio and video files
CN105373585B (zh) 歌曲收藏方法和装置
TW201409259A (zh) 多媒體記錄系統及方法
KR20160129203A (ko) 무결성 디지털 녹취 파일 생성 방법
CN113056908A (zh) 视频字幕合成方法、装置、存储介质及电子设备
JP7288491B2 (ja) 情報処理装置、及び制御方法
US9542922B2 (en) Method for inserting watermark to image and electronic device thereof
KR101562901B1 (ko) 대화 지원 서비스 제공 시스템 및 방법
WO2016197755A1 (zh) 一种音频数据处理方法和终端
JP7423164B2 (ja) カラオケ装置
JP6689705B2 (ja) カラオケ歌唱サポート装置、カラオケ歌唱サポートプログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15879576

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15879576

Country of ref document: EP

Kind code of ref document: A1