WO2016201765A1 - 一种录音控制方法和装置 - Google Patents

一种录音控制方法和装置 Download PDF

Info

Publication number
WO2016201765A1
WO2016201765A1 PCT/CN2015/084954 CN2015084954W WO2016201765A1 WO 2016201765 A1 WO2016201765 A1 WO 2016201765A1 CN 2015084954 W CN2015084954 W CN 2015084954W WO 2016201765 A1 WO2016201765 A1 WO 2016201765A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice
state
target object
recording
mouth
Prior art date
Application number
PCT/CN2015/084954
Other languages
English (en)
French (fr)
Inventor
李百玲
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2016201765A1 publication Critical patent/WO2016201765A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H03ELECTRONIC CIRCUITRY
    • H03KPULSE TECHNIQUE
    • H03K17/00Electronic switching or gating, i.e. not by contact-making and –breaking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/174Facial expression recognition
    • G06V40/176Dynamic expression
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11CSTATIC STORES
    • G11C7/00Arrangements for writing information into, or reading information out from, a digital store
    • G11C7/16Storage of analogue signals in digital stores using an arrangement comprising analogue/digital [A/D] converters, digital memories and digital/analogue [D/A] converters 

Definitions

  • the present invention relates to the field of communications, and in particular, to a recording control method and apparatus.
  • the main technical problem to be solved by the embodiments of the present invention is to provide a recording control method and device, which can solve the problem that only the recording of all personnel can be selected, and the recording and screening must be performed by professionals to extract the voice of the personnel.
  • an embodiment of the present invention provides a recording control method, including: determining a recording target object; detecting a current voice state of the recording target object; and performing recording control on the recording target object according to the detection result.
  • the detecting the current voice state of the recording target object includes: detecting current voice action information of the target object, and determining a current voice state of the target object according to the voice action information. .
  • the voice action information includes the state of the mouth and/or the state of the larynx.
  • the state of the mouth includes a mouth opening or a closing mouth
  • the detecting the current voice motion information of the target object comprises: collecting the target object by using an image capturing device in real time or according to a preset period.
  • the mouth image is compared with the previous mouth image by the current mouth image, the mouth is opened, the mouth state is the mouth; the lips are closed, the mouth state is the mouth; the state of the throat includes the throat undulation and The throat is stationary, and the detecting the current voice action information of the target object includes: real-time or preset according to the image acquisition device
  • the throat image of the target object is periodically collected, and the current throat image is compared with the previous throat image.
  • the voice state includes a voice progress state or a voice stop state; and determining, according to the voice action information, the current voice state of the target object includes: determining that the target user is a mouth opening and / or pharyngeal undulations, it is the state of speech; determine that the target user is shut off and / or the larynx is still, it is the voice stop state.
  • the detecting a current voice state of the recording target object includes: detecting a current voice utterance state of the target object, and determining a current voice state of the target object according to the voice utterance state. .
  • the voice utterance state includes sounding or no sound
  • detecting the current voice utterance state of the target object includes: detecting, by the audio detecting device, the target object in real time or according to a preset period. The sound, when the sound of the target object is detected, the voice utterance state is vocal, the sound of the target object is not detected, and the voice utterance state is silent.
  • the voice state includes a voice progress state or a voice stop state; and determining, according to the voice action information, the current voice state of the target object includes: determining that the target user is voiced, Then, it is a voice progress state; if it is determined that the target user is silent, it is a voice stop state.
  • the voice state includes a voice progress state or a voice stop state
  • the recording control of the sound recording target object according to the detection result includes: determining that the voice state is a voice progress state, Recording is performed; if the voice state is judged to be a voice stop state, recording is stopped.
  • an embodiment of the present invention further provides a recording control apparatus, including a determining module, a detecting module, and a control module: the determining module is configured to determine a recording target object; and the detecting module is configured to detect the recording target object a current voice state; the control module is configured to perform recording control on the sound recording target object according to the detection result.
  • the detecting module includes a motion detecting submodule: the motion detecting submodule is configured to detect current voice motion information of the target object, and determine the target object according to the voice motion information. Current voice status.
  • the detecting module further includes a sound detecting submodule configured to detect a current voice utterance state of the target object, and determine the target according to the voice utterance state The current voice state of the object.
  • the voice state includes a voice progress state or a voice stop state; the control module is further configured to: determine that the voice state is a voice progress state, perform recording; determine the voice state If the voice is stopped, stop recording.
  • the recording control method and device provided by the embodiment of the invention first determines the recording target object, then detects the current voice state of the recording target object, and finally performs recording control on the recording target object according to the detection result.
  • it is not for all the people to record only to determine that the target person is recording when speaking, without extracting, the user can easily get the voice of the target user, and can avoid thinking about recording all the people.
  • To get the complex process of the target person's voice needs to be extracted, and to extract the target person's voice needs to be able to extract the high difficulty of operation and improve the user experience.
  • FIG. 1 is a schematic flowchart of a recording control method according to Embodiment 1 of the present invention.
  • FIG. 2 is a schematic flowchart of a recording control method according to Embodiment 2 of the present invention.
  • FIG. 3 is a schematic structural diagram 1 of a recording control apparatus according to Embodiment 3 of the present invention.
  • FIG. 4 is a second schematic structural diagram of a recording control apparatus according to Embodiment 3 of the present invention.
  • FIG. 5 is a schematic structural diagram 3 of a recording control apparatus according to Embodiment 3 of the present invention.
  • the recording control method of this embodiment includes the following steps:
  • Step S101 determining a recording target object
  • the target object here refers to the person who wants to get the voice, for example, when the project leader discusses the speech of the project leader, then the project supervisor is the target object; when communicating with the client, the key entry is entered into the client's Demand recording, then the customer is the target audience; the company meeting, focusing on the company leader’s speech, Then the company leader is the target audience. If there is a lecture given by the teacher during the listening and learning process, then the teacher is the target audience.
  • Step S102 detecting a current voice state of the recording target object
  • the voice output state includes: a voice progress state and a voice stop state.
  • the voice progress state indicates that the target object is speaking, and the sound is in a state;
  • the voice stop state indicates that the target object does not speak, and there is no sound, and there may be other sounds in the state, such as the teacher speaking during the lecture, and the supervisor meeting During the discussion, other people’s speeches, etc.
  • Step S103 Perform recording control on the recording target object according to the detection result.
  • the recording target object is subjected to recording control according to the detection result, optionally, when the target object is recording while speaking, when the target object does not speak, the recording is paused, when the target object does not speak.
  • the other person does not record, that is, if the recording state is the voice progress state, the recording is performed; if the voice state is the voice stop state, the recording is stopped. This ensures that only one voice of the target object is recorded, and finally the voice of the target object is obtained, so that the desired voice can be obtained without voice extraction, and the operation is simple.
  • the stop recording in this embodiment may be the stop recording after the final recording is completed, or may be the pause recording during the recording process.
  • acquiring the voice state of the target object may be acquiring voice action information of the target object, and determining a voice state of the target object according to the voice action information. It should be understood that not only the voice action information but also other ways of determining whether the target object is speaking or not speaking should be included.
  • the voice action information includes the state of the mouth and/or the state of the larynx. It should be understood that other action information that can be used to reflect that the target object is speaking or not speaking can be realized.
  • the state of the mouth includes opening or closing the mouth
  • detecting the current voice motion information of the target object may be collecting the mouth image of the target object in real time through the image collecting device, and comparing the current mouth image with the previous mouth image, the lips sheet
  • the state of the mouth is a mouth
  • the lips are closed
  • the state of the mouth is a closed mouth
  • the current voice action information of the detection target object may also be through the image acquisition device.
  • the mouth image of the target object is collected according to a preset period, and the current mouth image is compared with the previous mouth image, the lips are opened, the mouth state is a mouth opening; when the lips are closed, the mouth state is a mouth closing;
  • the preset period here can be set according to the specific situation.
  • the state of the larynx includes laryngorrhea and larynx stagnation
  • detecting the current voice action information of the target object may be real-time collecting the throat image of the target object through the image acquisition device, comparing the current pharyngeal image with the previous pharyngeal image, and moving the larynx.
  • the state of the larynx is undulation of the larynx; if the larynx does not move, the state of the larynx is that the larynx is still. Since there may be a certain pause between the general speech, preferably, the current voice action information of the detection target object may also be through the image capture device.
  • the throat image of the target object is collected according to a preset period, and the current throat image is compared with the previous throat image. When the throat is moved, the state of the larynx is undulation; when the larynx is not moved, the state of the larynx is still; It is the preset period here that can be specifically set according to the specific situation.
  • the voice state includes a voice progress state or a voice stop state; determining the current voice state of the target object according to the voice action information includes: determining that the target user is a mouth opening and/or a throat undulation, then performing a voice progress state; determining that the target user is closed If the mouth and / or throat are still, it is the voice stop state.
  • the state of the target user's mouth is a mouth-opening state
  • the voice state is performed
  • the state of the target user's pharyngeal node is a state of voice progress
  • a single voice action information can be used to determine whether the target object is speaking or not, in order to To improve accuracy, it is also possible to determine whether the target object is speaking or not speaking through a plurality of voice action information.
  • the state of the mouth, through the state of the mouth of the target object is to open the mouth is to talk, not to open the mouth (shut up) is not to talk; multiple times, the state of the mouth and the state of the larynx, through the state of the mouth of the target object is open mouth
  • the state of the larynx is that when the larynx is undulating, it is talking. If the state of the mouth is only the mouth, the state of the larynx is that the larynx is still not talking. It should be understood that the specific judgment criteria can be specifically set by specific circumstances.
  • detecting the current voice state of the recording target object may be detecting the current voice utterance state of the target object in real time, and determining the current voice state of the target object according to the voice utterance state.
  • the current voice state of the target object may be detected according to a preset period, and the current voice state of the target object is determined according to the voice utterance state, and the current voice state of the target object is determined according to the voice utterance state.
  • the preset period here can be set according to the specific situation. It should be understood that it is not limited to the current voice utterance state of the target object, as long as it can determine whether the target object is talking or not.
  • the voice utterance state includes sound or no sound
  • detecting the current voice utterance state of the target object can detect the sound of the target object through the audio detecting device, and detecting the sound of the target object, the voice utterance state is sound, and the target object sound is not detected.
  • the voice utterance state is silent.
  • the voice state includes a voice progress state or a voice stop state; determining the current voice state of the target object according to the voice action information includes: determining that the target user is voiced, that is, the voice progress state; and determining that the target user is silent, the voice stop status.
  • the recording control method of this embodiment is mainly illustrated by the state of the mouth. As shown in FIG. 2, the method includes the following steps:
  • Step S201 Start the camera module; optionally, it may be a camera of the smart terminal, and the smart terminal may be a mobile phone, a tablet, etc., and other terminals with camera functions;
  • Step S202 determining whether the target object opens the mouth, if the mouth is opened, the process proceeds to step S203; if the mouth is not opened, the determination is continued;
  • Step S203 Start recording
  • Step S204 determining whether the target object is closed, if it is closed, then proceeds to step S205; if not, then proceeds to step S208;
  • Step S205 pause recording
  • Step S206 determining whether the camera module is closed, if closed, proceeding to step S210, if not closed, proceeding to step S207;
  • Step S207 determining whether the target object opens the mouth, if the mouth is opened, the process proceeds to step S208; if the mouth is not opened, the determination is continued;
  • Step S208 continue recording
  • Step S209 determining whether the camera module is closed, if closed, proceeding to step S210, if not closed, proceeding to step S204;
  • Step S210 End and save the recording.
  • this embodiment can have the following scenarios: when the project meeting is discussed, the recording project supervisor's speech; when communicating with the customer, the key recording of the customer's demand recording; the company meeting, focusing on the company leader's speech; It can also be used in other scenes where someone needs to record.
  • the device needs to be turned to a target object in a specific direction. Turn on the camera mode, and adjust the range of the camera, put the target object in the screen and start the camera mode. According to the content of the camera, the picture is intercepted every 5ms and compared with the previous picture to determine whether the mouth (lip) is open. If it is not open, continue to judge whether the mouth is open; if the mouth is open, start the recording module to start recording.
  • the picture is intercepted every 5 ms and compared with the previous picture to determine whether the target object is closed. If the mouth is closed, the recording module is paused, otherwise the recording is continued. After the recording is paused, the picture comparison is taken every 5ms to determine whether the mouth is open and the recording is continued. If you are still in a closed state, continue to determine if your mouth is open.
  • the recording module stops recording and saves the recording content, and deletes the recording content. Otherwise, continue to record, re-determine whether the target object shuts up and continue to cycle.
  • the recording control method of the embodiment when only recording the target object in a specific direction, it is only necessary to turn the device to the target object, start the device imaging mode, and determine whether the target object opens the mouth according to the captured image.
  • Speaking if it is determined that the target object is talking, the recorder is started to record, and once it is determined that the target object is shutting down, the recording mode is paused.
  • the recording control apparatus 300 includes a determining module 301, a detecting module 302, and a control module 303: the determining module 301 is configured to determine a recording target object; and the detecting module 302 is configured to detect a current recording target object. The voice state; the control module 303 is configured to perform recording control on the recording target object according to the detection result.
  • the detection module 302 includes a motion detection sub-module 3021.
  • the motion detection sub-module 3021 is configured to detect current voice motion information of the target object, and is determined according to the voice motion information. The current voice state of the target object.
  • the detection module 302 further includes a sound detection sub-module 3022: the sound detection sub-module 3022 is configured to detect the current voice utterance state of the target object, and vocalize according to the voice. The state determines the current voice state of the target object.
  • the voice state includes a voice progress state or a voice stop state; the control module 303 is further configured to: determine that the voice state is a voice progress state, perform recording; and determine that the voice state is a voice stop state, stop recording.
  • recording is not performed for all personnel, and only when it is determined that the target person is recording while speaking, the user can easily obtain the voice of the target user without being extracted, and can avoid All the people who want to get the voice of the target person need to extract after recording.
  • Complex processes, as well as the need to extract the voice of the target personnel, must be able to extract high-level operational problems and improve user experience.

Abstract

本发明提供的录音控制方法和装置,属于通信领域。本发明提供的录音控制方法和装置,先确定录音目标对象,然后检测录音目标对象当前的语音状态,最后根据检测结果对录音目标对象进行录音控制。与现有技术相比,不是针对所有的人员进行录音,只有确定是目标人员在说话时才进行录音,不用进行提取,用户能够很容易得到目标用户的语音,能够避免对所有人员进行录音后想要得到目标人员的语音需要进行提取的复杂过程,以及提取目标人员语音需要必须通过专业人员才可以提取操作难度高问题,提高用户体验度。

Description

一种录音控制方法和装置 技术领域
本发明涉及通信领域,特别涉及一种录音控制方法和装置。
背景技术
手机作为一种方便人们通信的工具,已成为日常生活中不可或缺的必需品,手机使用越来越广泛,基本手机都有录音功能,但是这个录音只是简单的录制功能,设定以下生活场景:某项目多人讨论会议中,只对项目负责人的会议发言感兴趣;公司级会议中,只对某个领导的会议讲话感兴趣;与客户沟通过程中,重点想了解客户的需求,想把客户的讲话提取出来等等。普通的录音功能,录音只能选择录制所有人员,录音筛选必须通过专业人员才可以提取,普通用户难以操作。
发明内容
本发明实施例要解决的主要技术问题是,提供一种录音控制方法和装置,解决现有只能选择录制所有人员,录音筛选必须通过专业人员才可以提取制定人员语音的问题。
为解决上述问题,本发明实施例提供一种录音控制方法,包括:确定录音目标对象;检测所述录音目标对象当前的语音状态;根据检测结果对所述录音目标对象进行录音控制。
在本发明的一种实施例中,所述检测所述录音目标对象当前的语音状态包括:检测所述目标对象当前的语音动作信息,根据所述语音动作信息确定所述目标对象当前的语音状态。
在本发明的一种实施例中,所述语音动作信息包括嘴的状态和/或喉结的状态。
在本发明的一种实施例中,所述嘴的状态包括张嘴或闭嘴,所述检测所述目标对象当前的语音动作信息包括:通过图像采集装置实时或按照预设周期采集所述目标对象的嘴部图像,通过当前嘴部图像与前一次嘴部图像进行比较,嘴唇张开,则嘴的状态为张嘴;嘴唇闭合,则嘴的状态为闭嘴;所述喉结的状态包括喉结起伏和喉结静止,所述检测所述目标对象当前的语音动作信息包括:通过图像采集装置实时或按照预设 周期采集所述目标对象的喉结图像,通过当前喉结图像与前一次喉结图像进行比较,喉结移动,则喉结的状态为喉结起伏;喉结没有移动,则喉结的状态为喉结静止。
在本发明的一种实施例中,所述语音状态包括语音进行状态或语音停止状态;所述根据所述语音动作信息确定所述目标对象当前的语音状态包括:判断所述目标用户是张嘴和/或喉结起伏,则是语音进行状态;判断所述目标用户是闭嘴和/或喉结静止,则是语音停止状态。
在本发明的一种实施例中,所述检测所述录音目标对象当前的语音状态包括:检测所述目标对象当前的语音发声状态,根据所述语音发声状态确定所述目标对象当前的语音状态。
在本发明的一种实施例中,所述语音发声状态包括有声或无声,所述检测所述目标对象当前的语音发声状态包括:通过音频检测装置实时或按照预设周期检测所述目标对象的声音,检测到所述目标对象的声音,则语音发声状态为有声,没有检测到所述目标对象的声音,语音发声状态为无声。
在本发明的一种实施例中,所述语音状态包括语音进行状态或语音停止状态;所述根据所述语音动作信息确定所述目标对象当前的语音状态包括:判断所述目标用户是有声,则是语音进行状态;判断所述目标用户是无声,则是语音停止状态。
在本发明的一种实施例中,所述语音状态包括语音进行状态或语音停止状态;所述根据检测结果对所述录音目标对象进行录音控制包括:判断所述语音状态是语音进行状态,则进行录音;判断所述语音状态是语音停止状态,则停止录音。
为解决上述问题,本发明实施例还提供一种录音控制装置,包括确定模块、检测模块和控制模块:所述确定模块设置为确定录音目标对象;所述检测模块设置为检测所述录音目标对象当前的语音状态;所述控制模块设置为根据检测结果对所述录音目标对象进行录音控制。
在本发明的一种实施例中,所述检测模块包括动作检测子模块:所述动作检测子模块设置为检测所述目标对象当前的语音动作信息,根据所述语音动作信息确定所述目标对象当前的语音状态。
在本发明的一种实施例中,所述检测模块还包括声音检测子模块:所述声音检测子模块设置为检测所述目标对象当前的语音发声状态,根据所述语音发声状态确定所述目标对象当前的语音状态。
在本发明的一种实施例中,所述语音状态包括语音进行状态或语音停止状态;所述控制模块还设置为:判断所述语音状态是语音进行状态,则进行录音;判断所述语音状态是语音停止状态,则停止录音。
本发明实施例的有益效果是:
本发明实施例提供的录音控制方法和装置,先确定录音目标对象,然后检测录音目标对象当前的语音状态,最后根据检测结果对录音目标对象进行录音控制。与现有技术相比,不是针对所有的人员进行录音,只有确定是目标人员在说话时才进行录音,不用进行提取,用户能够很容易得到目标用户的语音,能够避免对所有人员进行录音后想要得到目标人员的语音需要进行提取的复杂过程,以及提取目标人员语音需要必须通过专业人员才可以提取操作难度高问题,提高用户体验度。
附图说明
图1为本发明实施例一提供的录音控制方法流程示意图;
图2为本发明实施例二提供的录音控制方法流程示意图;
图3为本发明实施例三提供的录音控制装置结构示意图一;
图4为本发明实施例三提供的录音控制装置结构示意图二;
图5为本发明实施例三提供的录音控制装置结构示意图三。
具体实施方式
为使本领域技术人员更好地理解本发明实施例的技术方案,下面结合附图和具体实施方式对本发明作进一步详细描述。
实施例一
本实施例的录音控制方法,如图1所示,包括以下步骤:
步骤S101:确定录音目标对象;
在该步骤中,这里的目标对象是指用户想要得到语音的那个人,例如项目会议讨论时,录音项目主管的讲话,那么项目主管则为目标对象;与客户沟通会议时,重点录入客户的需求录音,那么客户则为目标对象;公司会议,重点录入公司领导的发言, 那么公司领导则为目标对象,如有在听讲学习过程中,录制老师的讲课,那么老师则为目标对象。
步骤S102:检测录音目标对象当前的语音状态;
在该步骤中,可选地,语音输出状态包括:语音进行状态和语音停止状态。语音进行状态表示目标对象正在进行讲话,发出声音的状态;语音停止状态表示目标对象没有进行讲话,没有发出声音的状态,该状态可能存在其他的声音,例如老师讲课过程中让同学发言,主管会议讨论过程中,其他人员的发言等。
步骤S103:根据检测结果对录音目标对象进行录音控制。
在该步骤中,根据检测结果对录音目标对象进行录音控制,可选地可以是当目标对象在进行说话时进行录音,当目标对象没有进行说话时,就暂停录音,在目标对象没有进行说话时,其他人员说话不进行录音,即如果录音状态是语音进行状态,则进行录音;如果语音状态是语音停止状态,则停止录音。这样可以保证只针对目标对象一个的语音进行录音,最终得到的是目标对象的语音,这样就不用进行语音提取就能得到想要的语音,操作简单。值得注意是,本实施例中的停止录音可以是最终录音完成后的停止录音,也可以是录音过程中的暂停录音。
可选地,在上述步骤S102中,获取目标对象的语音状态可以是获取目标对象的语音动作信息,根据语音动作信息确定目标对象的语音状态。应该理解为,不仅仅限于语音动作信息,只要能够确定目标对象是否在进行说话或者不说话的其他方式都应包含在内。可选地,语音动作信息包括嘴的状态和/或喉结的状态。应该理解为,其他可以用来体现目标对象是在进行说话或者不说话的动作信息都可以实现。其中,嘴的状态包括张嘴或闭嘴,检测目标对象当前的语音动作信息可以是通过图像采集装置实时采集目标对象的嘴部图像,通过当前嘴部图像与前一次嘴部图像进行比较,嘴唇张开,则嘴的状态为张嘴;嘴唇闭合,则嘴的状态为闭嘴;由于一般说话之间可能会存在一定的停顿,优选的,检测目标对象当前的语音动作信息还可以是通过图像采集装置按照预设周期采集目标对象的嘴部图像,通过当前嘴部图像与前一次嘴部图像进行比较,嘴唇张开,则嘴的状态为张嘴;嘴唇闭合,则嘴的状态为闭嘴;值得注意的是这里的预设周期可以根据具体情况进行具体设置。其中,喉结的状态包括喉结起伏和喉结静止,检测目标对象当前的语音动作信息可以是通过图像采集装置实时采集目标对象的喉结图像,通过当前喉结图像与前一次喉结图像进行比较,喉结移动,则喉结的状态为喉结起伏;喉结没有移动,则喉结的状态为喉结静止。由于一般说话之间可能会存在一定的停顿,优选的,检测目标对象当前的语音动作信息还可以是通过图像采集装 置按照预设周期采集目标对象的喉结图像,通过当前喉结图像与前一次喉结图像进行比较,喉结移动,则喉结的状态为喉结起伏;喉结没有移动,则喉结的状态为喉结静止;值得注意的是这里的预设周期可以根据具体情况进行具体设置。可选地,语音状态包括语音进行状态或语音停止状态;根据语音动作信息确定目标对象当前的语音状态包括:判断目标用户是张嘴和/或喉结起伏,则是语音进行状态;判断目标用户是闭嘴和/或喉结静止,则是语音停止状态。即如果目标用户嘴的状态是张嘴则是语音进行状态,目标用户喉结的状态是喉结起伏则是语音进行状态,当然,可以通过单个语音动作信息来判断目标对象是否在进行说话与不说话,为了提高精确度,也可以通过多个语音动作信息来判断目标对象是否在进行说话与不说话。例如,单个时,嘴的状态,通过目标对象嘴的状态是张嘴就是说话,不张嘴(闭嘴)就是不说话;多个时,嘴的状态和喉结的状态,通过目标对象嘴的状态为张嘴和喉结的状态是喉结起伏时就是说话,如果只是嘴的状态为张嘴,喉结的状态为喉结静止为不说话。应该理解为,具体判断标准可以通过具体的情况进行具体设置。
可选地,在上述步骤S102中,检测录音目标对象当前的语音状态可以是实时检测目标对象当前的语音发声状态,根据语音发声状态确定目标对象当前的语音状态。由于一般说话之间可能会存在一定的停顿,优选的,检测录音目标对象当前的语音状态还可以是按照预设周期检测目标对象当前的语音发声状态,根据语音发声状态确定目标对象当前的语音状态。值得注意的是这里的预设周期可以根据具体情况进行具体设置。应该理解为,不仅仅限于目标对象当前的语音发声状态,只要能够确定目标对象是否在进行说话或者不说话的其他方式都应包含在内。其中,语音发声状态包括有声或无声,检测目标对象当前的语音发声状态可以通过音频检测装置检测目标对象的声音,检测到目标对象的声音,则语音发声状态为有声,没有检测到目标对象的声音,语音发声状态为无声。可选地,语音状态包括语音进行状态或语音停止状态;根据语音动作信息确定目标对象当前的语音状态包括:判断目标用户是有声,则是语音进行状态;判断目标用户是无声,则是语音停止状态。
实施例二
本实施例的录音控制方法,本实施例主要以嘴部的状态进行举例说明,如图2所示,包括以下步骤:
步骤S201:启动摄像模块;可选地可以是智能终端的摄像头,智能终端可以是手机、平板等以及其他带有摄像功能的终端;
步骤S202:判断目标对象是否张嘴,如果张嘴则进入步骤S203;如果没张嘴,则继续判断;
步骤S203:启动录音;
步骤S204:判断目标对象是否闭嘴,如果闭嘴则进入步骤S205;如果没闭嘴,则进入步骤S208;
步骤S205:暂停录音;
步骤S206:判断摄像模块是否关闭,如果关闭,则进入步骤S210,如果没关闭则进入步骤S207;
步骤S207:判断目标对象是否张嘴,如果张嘴则进入步骤S208;如果没张嘴,则继续判断;
步骤S208:继续录音;
步骤S209:判断摄像模块是否关闭,如果关闭,则进入步骤S210,如果没关闭则进入步骤S204;
步骤S210:结束并保存录音。
值得注意的是,本实施例可以拥有如下几个场景:项目会议讨论时,录音项目主管的讲话;与客户沟通会议时,重点录入客户的需求录音;公司会议,重点录入公司领导的发言;当然还可以用于其他需要进行某个人进行录音的场景。在录制过程中,需要将设备转向某个特定方向的目标对象。开启摄像模式,并调整摄像的范围,将目标对象放在屏幕的中并启动摄像模式。根据摄像内容,每5ms截取一次图片并与前一个图片比较,判断嘴巴(唇)是否张开,如果没有张开,继续判断嘴巴是否张开;如果嘴巴张开则启动录音模块开始录音。录音过程中同时进行每5ms截取一次图片并与前一个图片比较,判断目标对象是否闭嘴,如果闭嘴则暂停录音模块,否则继续录音。暂停录音后,每5ms截取一次图片比较,判断嘴巴是否张开,张开则继续录音。如果还是闭嘴状态,则继续判断嘴巴是否张开。继续录音过程中,如果用户结束摄像,则录音模块停止录音并保存录音内容,删除摄像内容。否则继续录音过程中,重新判断目标对象是否闭嘴,继续循环。
本实施例的录音控制方法,只需要录制某个特定方向的目标对象录音时,只需要将设备转向目标对象一方,启动设备摄像模式并根据摄像图片判断目标对象是否张嘴 说话,如果判断到目标对象张嘴说话,则启动录音机录音,一旦判定到目标对象闭嘴,则暂停录音模式。
实施例三
本实施例提供的录音控制装置300,如图3所示,包括确定模块301、检测模块302和控制模块303:确定模块301设置为确定录音目标对象;检测模块302设置为检测录音目标对象当前的语音状态;控制模块303设置为根据检测结果对录音目标对象进行录音控制。
本实施例提供的另外一种录音控制装置300,如图4所示,检测模块302包括动作检测子模块3021:动作检测子模块3021设置为检测目标对象当前的语音动作信息,根据语音动作信息确定目标对象当前的语音状态。
本实施例提供的再一种录音控制装置300,如图5所示,,检测模块302还包括声音检测子模块3022:声音检测子模块3022设置为检测目标对象当前的语音发声状态,根据语音发声状态确定目标对象当前的语音状态。
可选地,语音状态包括语音进行状态或语音停止状态;控制模块303还设置为:判断语音状态是语音进行状态,则进行录音;判断语音状态是语音停止状态,则停止录音。
本领域普通技术人员可以理解上述方法中的全部或部分步骤可通过程序来指令相关硬件完成,上述程序可以存储于计算机可读存储介质中,如只读存储器、磁盘或光盘等。可选地,上述实施例的全部或部分步骤也可以使用一个或多个集成电路来实现。相应地,上述实施例中的各模块/单元可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。本发明实施例不限制于任何特定形式的硬件和软件的结合。
以上实施例仅用以说明本发明实施例的技术方案而非限制,仅仅参照较佳实施例对本发明进行了详细说明。本领域的普通技术人员应当理解,可以对本发明的技术方案进行修改或者等同替换,而不脱离本发明技术方案的精神和范围,均应涵盖在本发明的权利要求范围当中。
工业实用性
如上所述,通过上述实施例及优选实施方式,不是针对所有的人员进行录音,只有确定是目标人员在说话时才进行录音,不用进行提取,用户能够很容易得到目标用户的语音,能够避免对所有人员进行录音后想要得到目标人员的语音需要进行提取的 复杂过程,以及提取目标人员语音需要必须通过专业人员才可以提取操作难度高问题,提高用户体验度。

Claims (13)

  1. 一种录音控制方法,包括:
    确定录音目标对象;
    检测所述录音目标对象当前的语音状态;
    根据检测结果对所述录音目标对象进行录音控制。
  2. 如权利要求1所述的录音控制方法,其中,所述检测所述录音目标对象当前的语音状态包括:检测所述目标对象当前的语音动作信息,根据所述语音动作信息确定所述目标对象当前的语音状态。
  3. 如权利要求2所述的录音控制方法,其中,所述语音动作信息包括嘴的状态和/或喉结的状态。
  4. 如权利要求3所述的录音控制方法,其中,
    所述嘴的状态包括张嘴或闭嘴,所述检测所述目标对象当前的语音动作信息包括:通过图像采集装置实时或按照预设周期采集所述目标对象的嘴部图像,通过当前嘴部图像与前一次嘴部图像进行比较,嘴唇张开,则嘴的状态为张嘴;嘴唇闭合,则嘴的状态为闭嘴;
    所述喉结的状态包括喉结起伏和喉结静止,所述检测所述目标对象当前的语音动作信息包括:通过图像采集装置实时或按照预设周期采集所述目标对象的喉结图像,通过当前喉结图像与前一次喉结图像进行比较,喉结移动,则喉结的状态为喉结起伏;喉结没有移动,则喉结的状态为喉结静止。
  5. 如权利要求4所述的录音控制方法,其中,所述语音状态包括语音进行状态或语音停止状态;所述根据所述语音动作信息确定所述目标对象当前的语音状态包括:
    判断所述目标用户是张嘴和/或喉结起伏,则是语音进行状态;
    判断所述目标用户是闭嘴和/或喉结静止,则是语音停止状态。
  6. 如权利要求1所述的录音控制方法,其中,所述检测所述录音目标对象当前的语音状态包括:检测所述目标对象当前的语音发声状态,根据所述语音发声状态确定所述目标对象当前的语音状态。
  7. 如权利要求5所述的录音控制方法,其中,所述语音发声状态包括有声或无声,所述检测所述目标对象当前的语音发声状态包括:通过音频检测装置实时或按照预设周期检测所述目标对象的声音,检测到所述目标对象的声音,则语音发声状态为有声,没有检测到所述目标对象的声音,语音发声状态为无声。
  8. 如权利要求7所述的录音控制方法,其中,所述语音状态包括语音进行状态或语音停止状态;所述根据所述语音动作信息确定所述目标对象当前的语音状态包括:
    判断所述目标用户是有声,则是语音进行状态;
    判断所述目标用户是无声,则是语音停止状态。
  9. 如权利要求1-8任一项所述的录音控制方法,其中,所述语音状态包括语音进行状态或语音停止状态;所述根据检测结果对所述录音目标对象进行录音控制包括:
    判断所述语音状态是语音进行状态,则进行录音;
    判断所述语音状态是语音停止状态,则停止录音。
  10. 一种录音控制装置,包括确定模块、检测模块和控制模块:
    所述确定模块设置为确定录音目标对象;
    所述检测模块设置为检测所述录音目标对象当前的语音状态;
    所述控制模块设置为根据检测结果对所述录音目标对象进行录音控制。
  11. 如权利要求10所述的录音控制装置,其中,所述检测模块包括动作检测子模块:所述动作检测子模块设置为检测所述目标对象当前的语音动作信息,根据所述语音动作信息确定所述目标对象当前的语音状态。
  12. 如权利要求10所述的录音控制装置,其中,所述检测模块还包括声音检测子模块:所述声音检测子模块设置为检测所述目标对象当前的语音发声状态,根据所述语音发声状态确定所述目标对象当前的语音状态。
  13. 如权利要求10-12任一项所述的录音控制装置,其中,所述语音状态包括语音进行状态或语音停止状态;所述控制模块还设置为:
    判断所述语音状态是语音进行状态,则进行录音;
    判断所述语音状态是语音停止状态,则停止录音。
PCT/CN2015/084954 2015-06-16 2015-07-23 一种录音控制方法和装置 WO2016201765A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201510334365.2A CN106326804B (zh) 2015-06-16 2015-06-16 一种录音控制方法和装置
CN201510334365.2 2015-06-16

Publications (1)

Publication Number Publication Date
WO2016201765A1 true WO2016201765A1 (zh) 2016-12-22

Family

ID=57544754

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2015/084954 WO2016201765A1 (zh) 2015-06-16 2015-07-23 一种录音控制方法和装置

Country Status (2)

Country Link
CN (1) CN106326804B (zh)
WO (1) WO2016201765A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TWI699120B (zh) * 2019-04-30 2020-07-11 陳筱涵 會議記錄系統與會議記錄方法
FR3094599A1 (fr) * 2019-04-01 2020-10-02 Orange Procédé, terminal et système de captation d'un signal audio et d'un signal vidéo d'un participant à une réunion, et système de traitement de flux de données issus de la réunion.

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113422865A (zh) * 2021-06-01 2021-09-21 维沃移动通信有限公司 定向录音方法和装置
CN113660537A (zh) * 2021-09-28 2021-11-16 北京七维视觉科技有限公司 一种字幕生成方法和装置

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101211615A (zh) * 2006-12-31 2008-07-02 于柏泉 一种对特定人的语音进行自动录制的方法、系统及设备
CN201532766U (zh) * 2009-09-29 2010-07-21 北京爱国者存储科技有限责任公司 一种声控录音的电子装置
CN103258550A (zh) * 2012-02-21 2013-08-21 爱国者电子科技有限公司 声控录音设备
CN103391347A (zh) * 2012-05-10 2013-11-13 中兴通讯股份有限公司 一种自动录音的方法及装置
CN204362016U (zh) * 2014-12-09 2015-05-27 重庆西胜电子科技有限公司 声控录音电路

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101458943B (zh) * 2008-12-31 2013-01-30 无锡中星微电子有限公司 一种录音控制方法和录音设备
CN102655009B (zh) * 2008-12-31 2014-12-17 无锡中星微电子有限公司 录音控制方法和录音设备
CN102403007A (zh) * 2011-11-22 2012-04-04 深圳市万兴软件有限公司 一种自动录制歌曲的方法和系统
US8655152B2 (en) * 2012-01-31 2014-02-18 Golden Monkey Entertainment Method and system of presenting foreign films in a native language
CN104301564A (zh) * 2014-09-30 2015-01-21 成都英博联宇科技有限公司 一种带嘴型识别的智能会议电话机

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101211615A (zh) * 2006-12-31 2008-07-02 于柏泉 一种对特定人的语音进行自动录制的方法、系统及设备
CN201532766U (zh) * 2009-09-29 2010-07-21 北京爱国者存储科技有限责任公司 一种声控录音的电子装置
CN103258550A (zh) * 2012-02-21 2013-08-21 爱国者电子科技有限公司 声控录音设备
CN103391347A (zh) * 2012-05-10 2013-11-13 中兴通讯股份有限公司 一种自动录音的方法及装置
CN204362016U (zh) * 2014-12-09 2015-05-27 重庆西胜电子科技有限公司 声控录音电路

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
FR3094599A1 (fr) * 2019-04-01 2020-10-02 Orange Procédé, terminal et système de captation d'un signal audio et d'un signal vidéo d'un participant à une réunion, et système de traitement de flux de données issus de la réunion.
TWI699120B (zh) * 2019-04-30 2020-07-11 陳筱涵 會議記錄系統與會議記錄方法
CN111866421A (zh) * 2019-04-30 2020-10-30 陈筱涵 会议记录系统与会议记录方法

Also Published As

Publication number Publication date
CN106326804A (zh) 2017-01-11
CN106326804B (zh) 2022-03-08

Similar Documents

Publication Publication Date Title
US10743107B1 (en) Synchronization of audio signals from distributed devices
US11023690B2 (en) Customized output to optimize for user preference in a distributed system
EP3963576B1 (en) Speaker attributed transcript generation
US11875796B2 (en) Audio-visual diarization to identify meeting attendees
CN105282345B (zh) 通话音量的调节方法和装置
JP2007147762A (ja) 発話者予測装置および発話者予測方法
CN105828101B (zh) 生成字幕文件的方法及装置
US11138980B2 (en) Processing overlapping speech from distributed devices
US20130211826A1 (en) Audio Signals as Buffered Streams of Audio Signals and Metadata
CN105278380B (zh) 智能设备的控制方法和装置
WO2016201765A1 (zh) 一种录音控制方法和装置
CN110121083A (zh) 弹幕的生成方法及装置
CN108648754B (zh) 语音控制方法及装置
US11468895B2 (en) Distributed device meeting initiation
CN107277368A (zh) 一种用于智能设备的拍摄方法及拍摄装置
WO2022143040A1 (zh) 一种音量调节方法、电子设备、终端及可存储介质
TWI687917B (zh) 語音系統及聲音偵測方法
CN109472225A (zh) 会议控制方法及装置
CN116129931B (zh) 一种视听结合的语音分离模型搭建方法及语音分离方法
CN109361959A (zh) 弹幕控制方法及装置
CN114333810A (zh) 一种控制方法、装置和录音设备
CN113707130A (zh) 一种语音识别方法、装置和用于语音识别的装置
Kato et al. Preliminary Study of Mobile Device-Based Speech Enhancement System Using Lip-Reading
KR20210049601A (ko) 음성 서비스 제공 방법 및 장치
JP2010062749A (ja) 受付装置、受付方法、及び受付プログラム

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 15895338

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 15895338

Country of ref document: EP

Kind code of ref document: A1