WO2016197755A1 - Audio data processing method and terminal - Google Patents

Audio data processing method and terminal Download PDF

Info

Publication number
WO2016197755A1
WO2016197755A1 PCT/CN2016/081022 CN2016081022W WO2016197755A1 WO 2016197755 A1 WO2016197755 A1 WO 2016197755A1 CN 2016081022 W CN2016081022 W CN 2016081022W WO 2016197755 A1 WO2016197755 A1 WO 2016197755A1
Authority
WO
WIPO (PCT)
Prior art keywords
information
audio data
speaker
tag information
tag
Prior art date
Application number
PCT/CN2016/081022
Other languages
French (fr)
Chinese (zh)
Inventor
奚黎明
Original Assignee
中兴通讯股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中兴通讯股份有限公司 filed Critical 中兴通讯股份有限公司
Publication of WO2016197755A1 publication Critical patent/WO2016197755A1/en

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72403User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
    • H04M1/7243User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages
    • H04M1/72433User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages for voice messaging, e.g. dictaphones
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F17/00Digital computing or data processing equipment or methods, specially adapted for specific functions
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/64Automatic arrangements for answering calls; Automatic arrangements for recording messages for absent subscribers; Arrangements for recording conversations
    • H04M1/65Recording arrangements for recording a message from the calling party
    • H04M1/6505Recording arrangements for recording a message from the calling party storing speech in digital form

Definitions

  • the present application relates to, but is not limited to, the field of communications, and in particular to an audio data processing method and terminal.
  • the recording function is a basic function of the communication terminal, and recording functions are required in many occasions, such as various conferences, trainings, and calls.
  • recording functions are required in many occasions, such as various conferences, trainings, and calls.
  • it is cumbersome to organize the recorded content after recording, and it is often necessary to listen to or distinguish the recorded content to distinguish the speech of different speakers.
  • Even it is often impossible to distinguish between the recorded content or a part of the recorded content to which the recorded content belongs; in addition, the same speaker in the meeting often does not speak continuously, but speaks at different time periods, which makes it difficult to organize the recorded content.
  • An embodiment of the present invention provides a method and a terminal for processing audio data, which can mark and save audio data acquired in a corresponding time according to the identified tag information.
  • the embodiment of the invention provides an audio data processing method, including:
  • Tag information includes content information and a time when the tag information is detected
  • the audio data marked with the tag information is saved.
  • the application further provides a computer readable storage medium storing computer executable instructions that are implemented when the computer executable instructions are executed.
  • the invention also provides a terminal, comprising an identification module and a processing module, wherein
  • the identification module is configured to mark the audio data being acquired by using tag information, where the tag information includes content information and a time when the tag information is detected;
  • the processing module is configured to save audio data marked with the tag information.
  • the audio data processing method and terminal provided by the application mark the recorded content and save the marked audio data, thereby improving the efficiency of organizing the recorded content.
  • FIG. 1 is a flowchart of an audio data processing method in an embodiment of the present invention
  • FIG. 2 is a schematic diagram of a recording mark information input interface in another embodiment of the present invention.
  • FIG. 3 is a schematic diagram of a terminal in an embodiment of the present invention.
  • the audio data processing method in the embodiment of the present invention includes:
  • the audio data is acquired from the moment when the tag information is detected.
  • the moment when the gesture of the trajectory approximating the character is detected is a time when the gesture of starting the input trajectory is approximated to the character is detected, instead of determining the moment when the trajectory of the gesture approximates the character.
  • the audio data processing method before the step of marking the audio data being acquired by using the tag information, the audio data processing method further includes:
  • the trajectory of the gesture is recognized and saved as a character that approximates the trajectory of the gesture, and the moment at which the gesture of the trajectory is approximated to the character is recorded.
  • the moment when the gesture of the trajectory approximating the character is detected is a time when the gesture of starting the input trajectory is approximated to the character is detected, instead of determining the moment when the trajectory of the gesture approximates the character.
  • the step of identifying a trajectory of the gesture and saving a character that is similar to the trajectory of the gesture, and recording a moment when the marker information is detected includes:
  • the edge of the shape of the trajectory of the gesture is identified, and the trajectory of the gesture is saved as a character that approximates the trajectory of the gesture, and the character in which the trajectory of the gesture is saved is displayed.
  • the screen of the terminal When the screen of the terminal is not lit and the terminal performs recording, that is, when the terminal performs recording through the recording application in the terminal according to the user instruction, and the terminal keeps its screen not lighting, the screen of the terminal is still powered, so that the terminal detects Whether the screen of the terminal receives a gesture that the user inputs a trajectory that approximates a character, and if detecting that the trajectory input by the user approximates a gesture of a character, the terminal extracts a key point from an edge of a shape of the trajectory of the gesture to The shape of the trajectory of the gesture is recognized, and a part of the display screen that receives the trajectory input by the user displays an image of the corresponding gesture trajectory, and the entire screen does not need to be lit at this time.
  • the terminal detects that the trajectory input by the user to the screen is similar to the letters "Z” and "M", where the trajectory is similar to the letters "Z” and "M”.
  • the gesture is sequentially input by the user in chronological order, and the intermediate interval is short.
  • the terminal can use the recognized combination of the letters "Z" and "M" "ZM" as the content information in the tag information.
  • the audio data processing method before the step of marking the audio data being acquired by using the tag information, the audio data processing method further includes:
  • the content information of the tag information and the time at which the tag information is detected are saved.
  • the step of receiving the input tag information includes: when the screen of the terminal is lit and the terminal performs recording, displaying a tag information input interface on the screen of the terminal, and receiving tag information input by the user; that is, the terminal is according to the user.
  • the command is started to record through the recording application in the terminal, and when the terminal keeps its screen lit, the mark information input interface may be displayed according to the user's request, and the tag information input by the user is received; the tag information is identified, and the Content letter of tag information
  • the step of detecting the time of the tag information includes: identifying and saving the content information of the input tag information as a character.
  • the tag information input interface is displayed in the standby interface, and the tag information input by the user is received according to the user request, and the content information of the input tag information may include a trajectory similar to the gesture input by the user to the terminal through the gesture.
  • the characters (not limited to text information, may also include numbers, letters, symbols, etc.), and the user input pinyin, strokes, and the like through the input method in the terminal. For example, when recording in a conference, if it is a "Zhang Ming" speech, the Chinese character “Zhang Ming" or the letter “ZM” input by the user in chronological order is received, as shown in FIG. 2 .
  • the method further includes:
  • the audio data is marked with identity information of a speaker that matches the tag information.
  • the step of the relationship includes: matching the content information in the tag information with the speaker identity information in the pre-existing speaker information database, if there is a pre-stored speaker identity information and the tag information in the speaker information database If the content information matches, a matching relationship is generated.
  • the content information "ZM" in the tag information "12:10; ZM” is extracted, and a matching relationship is generated with the identity information of the speaker Zhang Ming in the pre-stored speaker information database.
  • the step of saving the audio data marked by using the tag information includes:
  • the first speaker matching the first tag information is employed
  • the identity information marks the audio data being acquired
  • the audio data acquired from the time when the first tag information is detected to the time when the second tag information is detected is saved;
  • the name of the saved audio data includes part or all of character information in the speaker identity information that matches the first tag information;
  • the identity information of the speaker includes: a name and an avatar of the speaker;
  • the name includes part or all of the character information of the identity information of the second speaker that matches the second tag information
  • the time when the second tag information is detected is a time when the audio data named by the part or all of the character information of the first speaker is terminated, and the The time at which the information is marked is the time at which the acquisition of the audio data named with the character information of some or all of the second mark information is started.
  • the device After the second tag information is identified, according to the time of the first tag information and the second tag information, and in time sequence, the device will be detected first.
  • the audio data acquired during the time when the first marker information is detected until the time when the second marker information is detected is saved as a part or all of the characters of the first speaker's identity information that matches the first marker information
  • An audio file named by the information which may be referred to herein as a first file, wherein the first file is marked with a time when the audio data is acquired, wherein the name of the first file may include the name of the first speaker, A avatar, job number, or a combination of multiple identities.
  • the moment of the second mark information is used as the starting point of the audio data to be continuously collected.
  • the first tag information is "12:00; ZM”
  • the second tag information is "12:10, LH”
  • the audio data acquired in 10 is saved as an audio file named after the name "Zhang Ming" of the recorder whose name matches the content of the first tag information.
  • the above method for naming the audio data may be adopted, and the plurality of audio files marked with the same tag information of the content information may be used.
  • the content of the tag information of two or more audio files When the information matches the identity information of the same speaker, each audio file that matches the identity information of the same speaker may be named as the character information including some or all of the identity information of the same speaker in the name and A combination of sequence numbers indicating audio data of the same speaker, for example, "Zhang Ming-1", “Zhang Ming-2", and the like.
  • the audio data may be detected.
  • the audio data acquired during the period from the eighth marker information time to the time when the acquisition of the audio data is terminated is saved as an audio data file.
  • the saving of the audio data marked with the marker information is saved.
  • the steps include:
  • the time when the fourth marker information is detected is the fourth time
  • the time when the fifth marker information is detected is the fifth time
  • the time when the sixth marker information is detected is the sixth time, and the detected time is detected.
  • the time at which the seventh mark information is described is the seventh time
  • the fourth tag information Determining whether the content information of the respective tag information is the same; if the content information of the fourth tag information and the content information of the sixth tag information are the same, according to the time sequence of the respective moments, the fourth The audio data acquired during the time from the time instant to the fifth time and the audio data acquired during the period from the sixth time to the seventh time are combined and saved as an audio file; wherein the name of the saved audio file includes Part or all of the character information in the identity information of the speaker that matches the fourth tag information or the sixth tag information;
  • the audio data marked with the tag information is temporally separated by two adjacent tag information, that is, the fourth tag information is marked.
  • the audio data of the fifth mark information, the sixth mark information, and the seventh mark information are stored as a period from the fourth time to the fifth time, a period from the fifth time to the sixth time, and a time from the sixth time to the seventh time.
  • the name of each of the audio files stored therein includes some or all of the characters of the speaker's identity information that match the tag information detected at the temporally preceding time of the audio data Information, for example, from the fourth moment to the fifth moment
  • the audio data acquired during the period is stored as an audio file whose name includes some or all of the character information in the identity information of the speaker that matches the fourth tag information.
  • the step of saving the audio data marked by using the tag information includes:
  • the audio track in which the audio information of the tag information is converted and saved is merged with the audio track in which the acquired audio data is stored, and saved as a new audio file.
  • the track of the saved voice information converted into the track information is the track 1; the track for storing the acquired audio data is the track 2; the track 1 stores the voice information of the mark information, And the collected audio data is not saved.
  • the terminal After detecting the previous marking information, the terminal records the content information of the preceding marking information and the moment when the preceding marking information is detected, and the preceding marking information is The content information is converted into voice information and stored in the track 1; the time at which the preceding mark information is the mark start time, and the time at which the detected mark information is detected is the end time of the mark, The acquired audio data is marked on the track mark information, and the acquired audio data is stored in the track 2; the audio data saved on the track 1 and the track 2 are combined and saved into one according to the timing at which the mark information is detected.
  • the new audio file when the new audio file is played, the new audio file can be separated into track 1 and track 2 such that track 1 corresponds to the left channel and track 2 corresponds to the right channel.
  • the left channel of the multi-channel device plays the voice information converted into the content information of the marker information stored in the track 1, and the right channel plays the recording.
  • the method before the step of combining the audio track storing the voice information and the audio track storing the acquired audio data into a new audio file, the method further includes:
  • the speaker identity information database is a database pre-stored with the speaker identity information; according to the matching a relationship, the speaker identity information matching the tag information is marked in the collected audio data; and the collected audio data is extracted and saved on the second track, that is, the track 2.
  • the process includes: recording the identified tag information and converting it into a voice file, recording the time point of the identified tag information, marking the time point on the track 1 of the collected audio, and generating the time point information and the tag information of the tag information.
  • Corresponding relationship list between the content information and the saved location information, and simultaneously converting the content information of the tag information into a corresponding voice file; matching the content information of the tag information with a speaker pre-existing in the speaker information database
  • the identity information is generated and a matching relationship is generated. According to the correspondence relationship list and the matching relationship, the converted voice information is added to the matching relationship, and the new voice information is matched with the speaker identity information. New mapping relationship.
  • the track 1 and the track 2 are combined and saved as a new audio file, wherein the new audio file is named as a speaker matching the identified tag information.
  • the identity information wherein the new audio file can separate the track 1 and the track 2 during playback, and play with the multi-channel device.
  • the content information of the tag information is converted and saved as audio data; wherein the converted audio data and the acquired audio data are saved in different audio tracks.
  • the step further includes: recording and saving according to the related art on the second track. For example, the recognized tag information is recorded and converted into a voice file, the time point of the recognized tag information is recorded, and the time is marked on the track 1 for acquiring the audio, and the time at which the tag information is detected, the content of the tag information is generated.
  • the track 1 and the track 2 are separated, so that the track 1 corresponds to the left channel, the track 2 corresponds to the right channel, and the multi-channel device is used for playing the recording, the left channel Play the audio file of track 1 and play the audio file of track 2 to the right channel.
  • the content information of the tag information is converted into an audio file, "Zhang Ming”
  • the left channel of the earphone will play the “Zhang Ming” voice content at a certain point in the mark
  • the right channel will play the speaker “Zhang Ming”.
  • Embodiments of the present invention further provide a computer readable storage medium storing computer executable instructions that are implemented when the computer executable instructions are executed.
  • the terminal in this embodiment includes an identification module 1 and a processing module 2;
  • the identification module 1 is configured to mark the audio data being acquired by using tag information, where the tag information includes content information and a time when the tag information is detected;
  • the processing module 2 is configured to save audio data marked with the tag information.
  • the audio data is acquired from the moment when the tag information is detected.
  • the moment when the gesture of the trajectory approximating the character is detected is a time when the gesture of starting the input trajectory is approximated to the character is detected, instead of determining the moment when the trajectory of the gesture approximates the character.
  • the identification module includes: a receiving unit and an identifying unit; wherein
  • the receiving unit is configured to receive a gesture input by the user when the screen of the terminal is not lit;
  • the recognition unit is configured to save the trajectory of the gesture as a character that approximates a trajectory of the gesture, and record a time at which a gesture in which the trajectory approximates a character is detected.
  • the identifying unit is further configured to receive the input tag information before the tag information is used to mark the audio data being acquired; save the content information of the tag information and the time when the tag information is detected .
  • the identifying unit is configured to display a mark information input interface on the screen of the terminal when the screen of the terminal is lit and the terminal performs recording, and receive the mark information input by the user;
  • the step of identifying the mark information and saving the content information of the mark information and the time at which the mark information is detected includes: identifying and saving the content information of the input mark information as a character;
  • the input tag information includes a character that is input by the user through a gesture and approximates a trajectory of the gesture, and a pinyin, a stroke, and the like that are input by the user through an input method in the terminal.
  • the terminal further includes: a storage module 3 and a matching module 4, where
  • the storage module is configured to match the tag information with identity information of a speaker pre-existing in the speaker information database, and if the identity information of the speaker in the speaker information database matches the tag information, generate Matching relationship
  • the matching module is configured to mark the audio data by using identity information of a speaker that matches the tag information according to the matching relationship.
  • the storage module is configured to match the content information in the tag information with the speaker identity information in the pre-existing speaker information database, if there is a pre-stored speaker identity information in the speaker information database. Matching with the content information in the tag information, a matching relationship is generated.
  • the processing module of the terminal includes: a marking unit, an extracting unit, and a saving unit; wherein, when the first marking information and the second marking information have been detected in time sequence,
  • the marking unit is configured to, when detecting the first marking information, mark the audio data being acquired by using a first speaker identity that matches the first marking information;
  • the saving unit is configured to, when the second tag information is detected, save audio data acquired from a time when the first tag information is detected to a time when the second tag information is detected; wherein the saving The name of the audio data includes some or all of the character information in the first speaker identity information that matches the first tag information; wherein the identity information of the speaker includes: the first speaker's name and avatar.
  • the processing module includes a marking unit, a determining unit, and a saving unit; wherein, when the fourth marker information, the fifth marker information, the sixth marker information, and the seventh marker information have been detected in chronological order Time,
  • the marking unit is configured to store audio data marked with the fourth marker information, the fifth marker information, the sixth marker information, and the seventh marker information respectively; wherein the moment when the fourth marker information is detected is the fourth At the moment, the time when the fifth marker information is detected is the fifth time, the time when the sixth marker information is detected is the sixth time, and the time when the seventh marker information is detected is the seventh time;
  • the determining unit is configured to determine whether the content information of each of the tag information is the same;
  • the saving unit is configured to: if the content information of the fourth tag information and the content information of the sixth tag information are the same, according to the time sequence of the moments, the fourth moment will be The audio data acquired during the fifth time period and the audio data acquired during the period from the sixth time to the seventh time are combined and saved as one audio file; wherein the name of the saved audio file includes The fourth tag information or the sixth tag information matches part or all of the character information of the speaker's identity information.
  • processing module is further configured to convert and save the content information of the tag information into audio data; wherein the converted audio data and the acquired audio data are saved in different audio tracks;
  • the audio track in which the audio information of the tag information is converted and saved is merged with the second audio track in which the acquired audio data is stored, and saved as a new audio file.
  • the embodiment of the invention provides an audio data processing method and a terminal, which can generate a matching relationship according to the identified tag information according to the identified tag information, and mark and extract the collected audio file according to the matching relationship. It solves the problem that the speaker cannot be distinguished during the recording process, and also solves the cumbersome work of recording the recording content, and improves the efficiency of the recording content.
  • each module in the foregoing embodiment may be implemented in the form of hardware, or may be implemented in the form of a software function module. This application is not limited to any specific combination of hardware and software.
  • the audio data processing method and terminal provided by the present application mark the recorded content and save the marked audio data, thereby improving the efficiency of organizing the recorded content.

Abstract

An audio data processing method and a terminal. The audio data processing method comprises: using tag information to tag currently obtained audio data, the tag information comprising content information and the time when the tag information was detected; storing the audio data tagged using the tag information. The present invention improves the efficiency of arranging recording content.

Description

一种音频数据处理方法和终端Audio data processing method and terminal 技术领域Technical field
本申请涉及但不限于通讯领域,尤其是一种音频数据处理方法和终端。The present application relates to, but is not limited to, the field of communications, and in particular to an audio data processing method and terminal.
背景技术Background technique
录音功能是通信终端的一项基本的功能,在很多场合中都需要录音功能,例如各类会议、培训以及通话中。但是,目前,在录音后整理录音内容比较繁琐,往往需要再次听取或辨别录音内容以区分不同发言者的言语。甚至,经常无法辨别录音内容或者一部分的录音内容所属的发言者;此外,会议中同一发言者经常不会连续地发言,而是在不同的时段发言,如此就会对整理录音内容造成困难。The recording function is a basic function of the communication terminal, and recording functions are required in many occasions, such as various conferences, trainings, and calls. However, at present, it is cumbersome to organize the recorded content after recording, and it is often necessary to listen to or distinguish the recorded content to distinguish the speech of different speakers. Even, it is often impossible to distinguish between the recorded content or a part of the recorded content to which the recorded content belongs; in addition, the same speaker in the meeting often does not speak continuously, but speaks at different time periods, which makes it difficult to organize the recorded content.
发明内容Summary of the invention
以下是对本文详细描述的主题的概述。本概述并非是为了限制权利要求的保护范围。The following is an overview of the topics detailed in this document. This Summary is not intended to limit the scope of the claims.
本发明的实施例提供了一种音频数据的处理方法和终端,能够根据识别的标记信息,对相应时间内获取的音频数据进行标记和保存。An embodiment of the present invention provides a method and a terminal for processing audio data, which can mark and save audio data acquired in a corresponding time according to the identified tag information.
本发明实施例提供了一种音频数据处理方法,包括:The embodiment of the invention provides an audio data processing method, including:
采用标记信息对正在获取的音频数据进行标记,其中,所述标记信息包括内容信息和检测到所述标记信息的时刻;Marking the audio data being acquired with the tag information, wherein the tag information includes content information and a time when the tag information is detected;
保存采用所述标记信息进行标记的音频数据。The audio data marked with the tag information is saved.
本申请另外提供一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令被执行时实现上述方法。The application further provides a computer readable storage medium storing computer executable instructions that are implemented when the computer executable instructions are executed.
本发明还提供了一种终端,包括识别模块和处理模块,其中,The invention also provides a terminal, comprising an identification module and a processing module, wherein
所述识别模块,设置成采用标记信息对正在获取的音频数据进行标记,其中,所述标记信息包括内容信息和检测到所述标记信息的时刻; The identification module is configured to mark the audio data being acquired by using tag information, where the tag information includes content information and a time when the tag information is detected;
所述处理模块,设置成保存采用所述标记信息进行标记的音频数据。The processing module is configured to save audio data marked with the tag information.
本申请所提供的音频数据处理方法和终端,将录音内容进行标记并保存被标记的音频数据,提高整理录音内容的效率。The audio data processing method and terminal provided by the application mark the recorded content and save the marked audio data, thereby improving the efficiency of organizing the recorded content.
在阅读并理解了附图和详细描述后,可以明白其他方面。Other aspects will be apparent upon reading and understanding the drawings and detailed description.
附图概述BRIEF abstract
附图用来提供对本申请技术方案的进一步理解,并且构成说明书的一部分,与本申请的实施例共同用于解释本申请的技术方案,并不构成对本申请技术方案的限制。The drawings are used to provide a further understanding of the technical solutions of the present application, and constitute a part of the specification, which is used together with the embodiments of the present application to explain the technical solutions of the present application, and does not constitute a limitation of the technical solutions of the present application.
图1是本发明实施例中音频数据处理方法的流程图;1 is a flowchart of an audio data processing method in an embodiment of the present invention;
图2是本发明另一实施例中录音标记信息输入界面的示意图;2 is a schematic diagram of a recording mark information input interface in another embodiment of the present invention;
图3是本发明实施例中终端的示意图。3 is a schematic diagram of a terminal in an embodiment of the present invention.
本发明的较佳实施方式Preferred embodiment of the invention
下文中将结合附图对本发明的实施例进行详细说明。需要说明的是,在不冲突的情况下,本申请中的实施例及实施例中的特征可以相互任意组合。Embodiments of the present invention will be described in detail below with reference to the accompanying drawings. It should be noted that, in the case of no conflict, the features in the embodiments and the embodiments in the present application may be arbitrarily combined with each other.
图1是本发明实施例中的音频数据处理方法的流程图,如图1所示,本发明实施例中的所述音频数据处理方法包括:1 is a flowchart of an audio data processing method in an embodiment of the present invention. As shown in FIG. 1, the audio data processing method in the embodiment of the present invention includes:
S10、采用标记信息对正在获取的音频数据进行标记,其中,所述标记信息包括内容信息和检测到所述标记信息的时刻;S10. Mark the audio data that is being acquired by using tag information, where the tag information includes content information and a time when the tag information is detected;
S20、保存采用所述标记信息进行标记的所述音频数据。S20. Save the audio data marked by using the tag information.
其中,所述音频数据从检测到所述标记信息的时刻开始获取。其中,所述检测到轨迹近似于字符的手势的时刻是检测到开始输入轨迹近似于字符的手势的时刻,而不是确定手势的轨迹近似于字符的时刻。The audio data is acquired from the moment when the tag information is detected. The moment when the gesture of the trajectory approximating the character is detected is a time when the gesture of starting the input trajectory is approximated to the character is detected, instead of determining the moment when the trajectory of the gesture approximates the character.
可选地,在第一实施例中,在所述采用标记信息对正在获取的音频数据进行标记的步骤之前,所述音频数据处理方法还包括:Optionally, in the first embodiment, before the step of marking the audio data being acquired by using the tag information, the audio data processing method further includes:
在终端的屏幕不点亮时,接收用户输入的手势; Receiving a gesture input by the user when the screen of the terminal is not lit;
将所述手势的轨迹识别并保存为与手势的轨迹近似的字符,并记录检测到轨迹近似于字符的手势的时刻。The trajectory of the gesture is recognized and saved as a character that approximates the trajectory of the gesture, and the moment at which the gesture of the trajectory is approximated to the character is recorded.
其中,所述检测到轨迹近似于字符的手势的时刻是检测到开始输入轨迹近似于字符的手势的时刻,而不是确定手势的轨迹近似于字符的时刻。The moment when the gesture of the trajectory approximating the character is detected is a time when the gesture of starting the input trajectory is approximated to the character is detected, instead of determining the moment when the trajectory of the gesture approximates the character.
可选地,所述识别所述手势的轨迹,并保存与所述手势的轨迹近似的字符,并记录检测到所述标记信息的时刻的步骤包括:Optionally, the step of identifying a trajectory of the gesture and saving a character that is similar to the trajectory of the gesture, and recording a moment when the marker information is detected includes:
识别所述手势的轨迹的形状的边缘,并将手势的轨迹保存为与手势的轨迹近似的字符,并显示将所述手势的轨迹保存成的字符。The edge of the shape of the trajectory of the gesture is identified, and the trajectory of the gesture is saved as a character that approximates the trajectory of the gesture, and the character in which the trajectory of the gesture is saved is displayed.
在终端的屏幕不点亮且终端进行录音时,即在终端根据用户指令通过终端中的录音应用进行录音,同时终端保持其屏幕不点亮时,终端的屏幕仍然通电,以使所述终端检测终端的屏幕是否接收到用户输入的轨迹近似于字符的手势,如果检测到用户输入的轨迹近似于字符的手势,则所述终端从将所述手势的轨迹的形状的边缘中提取关键点,以对手势的轨迹的形状进行识别,接收到用户输入的轨迹的部分显示屏则会显示出相应的手势轨迹的图像,此时屏幕的整体不需要点亮。例如,在会议录音中“张明”发言时,终端检测到用户对屏幕分别输入的轨迹近似于字母“Z”和“M”的手势,这里的轨迹近似于字母“Z”和“M”的手势是用户按照时间顺序依次输入的,中间间隔时间较短,终端可将识别出的字母“Z”和“M”的组合“ZM”作为一个标记信息中的内容信息。When the screen of the terminal is not lit and the terminal performs recording, that is, when the terminal performs recording through the recording application in the terminal according to the user instruction, and the terminal keeps its screen not lighting, the screen of the terminal is still powered, so that the terminal detects Whether the screen of the terminal receives a gesture that the user inputs a trajectory that approximates a character, and if detecting that the trajectory input by the user approximates a gesture of a character, the terminal extracts a key point from an edge of a shape of the trajectory of the gesture to The shape of the trajectory of the gesture is recognized, and a part of the display screen that receives the trajectory input by the user displays an image of the corresponding gesture trajectory, and the entire screen does not need to be lit at this time. For example, when the "Zhangming" speech is spoken in the conference recording, the terminal detects that the trajectory input by the user to the screen is similar to the letters "Z" and "M", where the trajectory is similar to the letters "Z" and "M". The gesture is sequentially input by the user in chronological order, and the intermediate interval is short. The terminal can use the recognized combination of the letters "Z" and "M" "ZM" as the content information in the tag information.
在第二实施例中,在所述采用标记信息对正在获取的音频数据进行标记的步骤之前,所述音频数据处理方法还包括:In the second embodiment, before the step of marking the audio data being acquired by using the tag information, the audio data processing method further includes:
接收输入的标记信息;Receiving input tag information;
保存所述标记信息的内容信息和检测到所述标记信息的时刻。The content information of the tag information and the time at which the tag information is detected are saved.
可选地,所述接收输入的标记信息的步骤包括:在终端的屏幕点亮且终端进行录音时,在终端的屏幕上显示标记信息输入界面,接收用户输入的标记信息;即在终端根据用户指令启动通过终端中的录音应用进行录音,同时终端保持其屏幕点亮时,可根据用户的请求显示标记信息输入界面,接收用户输入的标记信息;所述识别所述标记信息,并保存所述标记信息的内容信 息和检测到所述标记信息的时刻的步骤包括:将所输入的标记信息的内容信息识别并保存为字符。Optionally, the step of receiving the input tag information includes: when the screen of the terminal is lit and the terminal performs recording, displaying a tag information input interface on the screen of the terminal, and receiving tag information input by the user; that is, the terminal is according to the user. The command is started to record through the recording application in the terminal, and when the terminal keeps its screen lit, the mark information input interface may be displayed according to the user's request, and the tag information input by the user is received; the tag information is identified, and the Content letter of tag information And the step of detecting the time of the tag information includes: identifying and saving the content information of the input tag information as a character.
可选地,根据用户请求,在待机界面中显示标记信息输入界面,接收用户的输入的标记信息,所述输入的标记信息的内容信息可以包括用户对终端通过手势输入的、与手势的轨迹近似的字符(不限于文字信息,也可以包括数字、字母及符号等),以及用户通过终端中的输入法应用输入的拼音、笔画等。例如,在会议中进行录音时,若是“张明”发言,接收用户按照时间顺序输入的汉字“张明”或字母“ZM”,如图2所示。Optionally, the tag information input interface is displayed in the standby interface, and the tag information input by the user is received according to the user request, and the content information of the input tag information may include a trajectory similar to the gesture input by the user to the terminal through the gesture. The characters (not limited to text information, may also include numbers, letters, symbols, etc.), and the user input pinyin, strokes, and the like through the input method in the terminal. For example, when recording in a conference, if it is a "Zhang Ming" speech, the Chinese character "Zhang Ming" or the letter "ZM" input by the user in chronological order is received, as shown in FIG. 2 .
在本发明另一实施例中,所述方法还包括:In another embodiment of the present invention, the method further includes:
将所述标记信息与预存在发言者信息库中的发言者的身份信息进行匹配,若发言者信息库中有发言者的身份信息与所述标记信息匹配,则生成匹配关系;Matching the tag information with the identity information of the speaker in the pre-existing speaker information database, and if the identity information of the speaker in the speaker information database matches the tag information, generating a matching relationship;
根据所述匹配关系,采用与所述标记信息匹配的发言者的身份信息对所述音频数据进行标记。And according to the matching relationship, the audio data is marked with identity information of a speaker that matches the tag information.
可选地,所述将所述标记信息与预存在发言者信息库中的发言者的身份信息进行匹配,若发言者信息库中有发言者的身份信息与所述标记信息匹配,则生成匹配关系的步骤包括:将所述标记信息中的内容信息与预存在发言者信息库中的发言者身份信息进行匹配,若发言者信息库中有预存的发言者的身份信息与所述标记信息中的内容信息匹配,则生成匹配关系。Optionally, the matching the tag information with the identity information of the speaker in the pre-existing speaker information database, and if the identity information of the speaker in the speaker information database matches the tag information, generating a match The step of the relationship includes: matching the content information in the tag information with the speaker identity information in the pre-existing speaker information database, if there is a pre-stored speaker identity information and the tag information in the speaker information database If the content information matches, a matching relationship is generated.
例如,将所述标记信息中内容信息与在录音前存储在发言者身份信息的数据库中发言者的身份的信息进行匹配,并生成所述内容信息和与其相匹配的发言者的身份信息的匹配关系,其中,所述匹配关系中的发言者的身份信息包括但不限于姓名、头像、代号等。例如提取所述标记信息“12:10;ZM”中的内容信息“ZM”,与预存的发言者信息库中的发言者张明的身份信息生成匹配关系。For example, matching the content information in the tag information with the information of the identity of the speaker stored in the database of speaker identity information before recording, and generating a match between the content information and the identity information of the speaker matching the same A relationship, wherein the identity information of the speaker in the matching relationship includes, but is not limited to, a name, an avatar, a code, and the like. For example, the content information "ZM" in the tag information "12:10; ZM" is extracted, and a matching relationship is generated with the identity information of the speaker Zhang Ming in the pre-stored speaker information database.
可选地,当按照时间顺序已检测到第一标记信息和第二标记信息时,所述保存采用所述标记信息进行标记的音频数据的步骤包括:Optionally, when the first tag information and the second tag information have been detected in chronological order, the step of saving the audio data marked by using the tag information includes:
在检测到第一标记信息时,采用与所述第一标记信息匹配的第一发言者 的身份信息对正在获取的音频数据进行标记;When the first tag information is detected, the first speaker matching the first tag information is employed The identity information marks the audio data being acquired;
在检测到第二标记信息时,保存从检测到所述第一标记信息的时刻至检测到所述第二标记信息时刻的期间获取到的音频数据;When the second tag information is detected, the audio data acquired from the time when the first tag information is detected to the time when the second tag information is detected is saved;
其中所述保存的音频数据的名称包括与所述第一标记信息匹配的发言者身份信息中部分或全部的字符信息;The name of the saved audio data includes part or all of character information in the speaker identity information that matches the first tag information;
其中所述发言者的身份信息包括:所述发言者的姓名和头像;The identity information of the speaker includes: a name and an avatar of the speaker;
继续采用标记信息对正在获取的音频数据进行标记,在检测到第三标记信息时,保存从检测到所述第二标记信息的时刻至检测到所述第三标记信息时刻的期间获取的音频数据,其名称包括与所述第二标记信息匹配的第二发言者的身份信息中部分或全部的字符信息;And continuing to mark the audio data being acquired by using the tag information, and storing the audio data acquired from the time when the second tag information is detected until the time when the third tag information is detected, when the third tag information is detected And the name includes part or all of the character information of the identity information of the second speaker that matches the second tag information;
其中,所述检测到所述第二标记信息的时刻为终止获取所述以第一发言者的身份信息中部分或全部的字符信息进行命名的音频数据的时刻,并且所述检测到所述第二标记信息的时刻是开始获取以所述第二标记信息中部分或全部的字符信息进行命名的音频数据的时刻。The time when the second tag information is detected is a time when the audio data named by the part or all of the character information of the first speaker is terminated, and the The time at which the information is marked is the time at which the acquisition of the audio data named with the character information of some or all of the second mark information is started.
为更详细介绍上述实施例,进行说明如下:识别所述第二标记信息后,根据所述第一标记信息和所述第二标记信息各自的时刻,并按照时间顺序,首先将从检测到所述第一标记信息的时刻到检测到所述第二标记信息的时刻的期间获取的音频数据保存为以与所述第一标记信息相匹配的第一发言者的身份信息中部分或全部的字符信息进行命名的音频文件,在此可以称之为第一文件,其中所述第一文件中标记着获取音频数据时的时刻,其中所述第一文件的名称可以包括第一发言者的姓名、头像、工号或是多种身份信息的组合。而所述第二标记信息的时刻则作为继续采集的音频数据的起始标记点。例如,所述第一标记信息为“12:00;ZM”,所述第二标记信息为“12:10,LH”,在检测到所述第二标记信息,将在12:00—12:10内获取的音频数据保存为以所述第一标记信息的内容相匹配的录音者张明的姓名“张明”进行命名的音频文件。In order to describe the above embodiment in more detail, after the second tag information is identified, according to the time of the first tag information and the second tag information, and in time sequence, the device will be detected first. The audio data acquired during the time when the first marker information is detected until the time when the second marker information is detected is saved as a part or all of the characters of the first speaker's identity information that matches the first marker information An audio file named by the information, which may be referred to herein as a first file, wherein the first file is marked with a time when the audio data is acquired, wherein the name of the first file may include the name of the first speaker, A avatar, job number, or a combination of multiple identities. The moment of the second mark information is used as the starting point of the audio data to be continuously collected. For example, the first tag information is "12:00; ZM", the second tag information is "12:10, LH", and when the second tag information is detected, it will be at 12:00-12: The audio data acquired in 10 is saved as an audio file named after the name "Zhang Ming" of the recorder whose name matches the content of the first tag information.
此外,可选地,在会议录音中,若同一发言者在不同期间均有发言,则可采用上述对音频数据进行命名的方法,对于采用内容信息相同的标记信息进行标记的多个音频文件,当两个或两个以上的音频文件的标记信息的内容 信息与同一个发言者的身份信息匹配时,可将与同一个发言者的身份信息匹配的各个音频文件分别命名为其名称中包括所述同一发言者的身份信息中部分或全部的字符信息和表示所述同一发言者的音频数据的序列号的组合,例如“张明-1”、“张明-2”等。In addition, optionally, in the conference recording, if the same speaker has a speech during different periods, the above method for naming the audio data may be adopted, and the plurality of audio files marked with the same tag information of the content information may be used. When the content of the tag information of two or more audio files When the information matches the identity information of the same speaker, each audio file that matches the identity information of the same speaker may be named as the character information including some or all of the identity information of the same speaker in the name and A combination of sequence numbers indicating audio data of the same speaker, for example, "Zhang Ming-1", "Zhang Ming-2", and the like.
在一个可选实施例中,若识别第八标记信息后,再未接收到用户输入的手势或标记信息,即再也没有检测到标记信息就终止获取音频数据,则也可将从检测到所述第八标记信息时刻到终止获取音频数据的时刻的期间获取的音频数据保存为一个音频数据文件。In an optional embodiment, if the gesture information or the marker information input by the user is not received after the eighth marker information is recognized, and the audio data is terminated without detecting the marker information, the audio data may be detected. The audio data acquired during the period from the eighth marker information time to the time when the acquisition of the audio data is terminated is saved as an audio data file.
在一个可选实施例中,当已按照时间顺序检测到第四标记信息、第五标记信息、第六标记信息和第七标记信息时,所述保存采用所述标记信息进行标记的音频数据的步骤包括:In an optional embodiment, when the fourth marker information, the fifth marker information, the sixth marker information, and the seventh marker information have been detected in chronological order, the saving of the audio data marked with the marker information is saved. The steps include:
保存采用检测到的第四标记信息、第五标记信息、第六标记信息和第七标记信息分别进行标记的音频数据;And storing the audio data marked with the detected fourth marker information, the fifth marker information, the sixth marker information, and the seventh marker information respectively;
其中,所述检测到第四标记信息的时刻为第四时刻,所述检测到第五标记信息的时刻为第五时刻,检测到所述第六标记信息的时刻为第六时刻,检测到所述第七标记信息的时刻为第七时刻;The time when the fourth marker information is detected is the fourth time, the time when the fifth marker information is detected is the fifth time, and the time when the sixth marker information is detected is the sixth time, and the detected time is detected. The time at which the seventh mark information is described is the seventh time;
判断所述各个标记信息的内容信息是否相同;如果所述第四标记信息的内容信息和所述第六标记信息的内容信息相同,则按照所述各时刻的时间顺序,将从所述第四时刻至第五时刻的期间获取的音频数据和从所述第六时刻至所述第七时刻的期间获取的音频数据进行合并,并保存为一个音频文件;其中,所保存的音频文件的名称包括与所述第四标记信息或所述第六标记信息相匹配的发言者的身份信息中部分或全部的字符信息;Determining whether the content information of the respective tag information is the same; if the content information of the fourth tag information and the content information of the sixth tag information are the same, according to the time sequence of the respective moments, the fourth The audio data acquired during the time from the time instant to the fifth time and the audio data acquired during the period from the sixth time to the seventh time are combined and saved as an audio file; wherein the name of the saved audio file includes Part or all of the character information in the identity information of the speaker that matches the fourth tag information or the sixth tag information;
如果任意标记信息的内容信息之间均不同,则将标记有所述各标记信息的音频数据在时间上以相邻的两个标记信息作为分割点,即将标记有所述第四标记信息、第五标记信息、第六标记信息和第七标记信息的音频数据保存为分别从第四时刻到第五时刻的期间、从第五时刻到第六时刻的期间、从第六时刻到第七时刻的期间获取的三个音频数据;其中保存成的每个音频文件的名称包括与该音频数据的在时间上在前的时刻检测到的标记信息相匹配的发言者的身份信息中部分或全部的字符信息,例如,从第四时刻到第五时刻 的期间获取的音频数据保存成的音频文件,其名称包括与所述第四标记信息相匹配的发言者的身份信息中部分或全部的字符信息。If the content information of the arbitrary tag information is different, the audio data marked with the tag information is temporally separated by two adjacent tag information, that is, the fourth tag information is marked. The audio data of the fifth mark information, the sixth mark information, and the seventh mark information are stored as a period from the fourth time to the fifth time, a period from the fifth time to the sixth time, and a time from the sixth time to the seventh time. Three pieces of audio data acquired during the period; the name of each of the audio files stored therein includes some or all of the characters of the speaker's identity information that match the tag information detected at the temporally preceding time of the audio data Information, for example, from the fourth moment to the fifth moment The audio data acquired during the period is stored as an audio file whose name includes some or all of the character information in the identity information of the speaker that matches the fourth tag information.
在本发明另一实施例中,所述保存采用所述标记信息进行标记的音频数据的步骤包括:In another embodiment of the present invention, the step of saving the audio data marked by using the tag information includes:
将所述标记信息的内容信息转换并保存为音频数据;其中转换成的音频数据与所述获取的音频数据被保存在不同音轨;Converting and saving the content information of the tag information into audio data; wherein the converted audio data and the acquired audio data are saved in different audio tracks;
将所述标记信息的内容信息转换并保存成的所述音频数据所在的音轨和保存所获取的音频数据所在的音轨合并,并保存为一个新的音频文件。其中,所保存的所述标记信息转换成的语音信息的音轨为音轨1;保存所获取的音频数据的音轨为音轨2;所述音轨1保存所述标记信息的语音信息,并且不保存采集的音频数据。The audio track in which the audio information of the tag information is converted and saved is merged with the audio track in which the acquired audio data is stored, and saved as a new audio file. The track of the saved voice information converted into the track information is the track 1; the track for storing the acquired audio data is the track 2; the track 1 stores the voice information of the mark information, And the collected audio data is not saved.
现对上述实施例进行详细介绍,终端在检测到在前的标记信息后记录所述在前的标记信息的内容信息和检测到在前的标记信息的时刻,并将所述在前的标记信息的内容信息转换为语音信息并保存于音轨1;以所述在前的标记信息的时刻为标记起始时刻,以检测到的在后的标记信息的时刻为标记的终结时刻,将采用所述在前的标记信息进行标记的、所获取的音频数据保存于音轨2;根据检测到标记信息的时刻进行排列,将在音轨1和音轨2上保存的音频数据合并并且保存成一个新的音频文件,其中在播放所述新的音频文件时,所述新的音频文件可分离为音轨1和音轨2,使得音轨1对应左声道,音轨2对应右声道,在采用多声道设备播放时,多声道设备的左声道播放在音轨1中保存的标记信息的内容信息转换成的语音,右声道播放录音。The foregoing embodiment is described in detail. After detecting the previous marking information, the terminal records the content information of the preceding marking information and the moment when the preceding marking information is detected, and the preceding marking information is The content information is converted into voice information and stored in the track 1; the time at which the preceding mark information is the mark start time, and the time at which the detected mark information is detected is the end time of the mark, The acquired audio data is marked on the track mark information, and the acquired audio data is stored in the track 2; the audio data saved on the track 1 and the track 2 are combined and saved into one according to the timing at which the mark information is detected. a new audio file, wherein when the new audio file is played, the new audio file can be separated into track 1 and track 2 such that track 1 corresponds to the left channel and track 2 corresponds to the right channel. When playing with a multi-channel device, the left channel of the multi-channel device plays the voice information converted into the content information of the marker information stored in the track 1, and the right channel plays the recording.
本发明另一实施例中,所述将保存所述语音信息的音轨和保存所述获取的音频数据的音轨合并并保存为一个新的音频文件的步骤之前,所述方法还包括:In another embodiment of the present invention, before the step of combining the audio track storing the voice information and the audio track storing the acquired audio data into a new audio file, the method further includes:
将所述标记信息与预存在发言者信息库中的发言者的身份信息进行匹配,并生成匹配关系;其中,所述发言者身份信息库为预存有发言者身份信息的数据库;根据所述匹配关系,将与所述标记信息匹配的发言者身份信息标记在所述采集的音频数据中;提取并保存所述采集的音频数据在第二音轨,即音轨2上。 Matching the tag information with the identity information of the speaker pre-existing in the speaker information base, and generating a matching relationship; wherein the speaker identity information database is a database pre-stored with the speaker identity information; according to the matching a relationship, the speaker identity information matching the tag information is marked in the collected audio data; and the collected audio data is extracted and saved on the second track, that is, the track 2.
过程包括:记录识别的标记信息并将其转换为语音文件,记录识别的标记信息的时刻点,并在采集音频的音轨1上标记该时刻点,并生成标记信息的时刻点信息、标记信息的内容信息和保存位置信息之间的对应关系列表,同时将所述标记信息的内容信息转换为对应的语音文件;匹配所述标记信息的内容信息与预存在发言者信息库中的发言者的身份信息并生成匹配关系;根据所述对应关系列表和所述匹配关系,将所述转化的语音信息新增到所述匹配关系中生成新的所述语音信息与所述发言者身份信息相匹配的新的映射关系。根据所述映射关系,将与所述标记信息匹配的发言者身份信息标记在所述采集的音频数据中,并提取保存于第二音轨,即音轨2;后根据标记在音轨1和音轨2上的标记时刻点,将所述音轨1和音轨2合并保存为一个新的音频文件,其中所述新的音频文件的命名为与所述识别的标记信息相匹配的发言者的身份信息;其中,所述新的音频文件在播放时可将音轨1和音轨2做分离处理,采用多声道设备进行播放。The process includes: recording the identified tag information and converting it into a voice file, recording the time point of the identified tag information, marking the time point on the track 1 of the collected audio, and generating the time point information and the tag information of the tag information. Corresponding relationship list between the content information and the saved location information, and simultaneously converting the content information of the tag information into a corresponding voice file; matching the content information of the tag information with a speaker pre-existing in the speaker information database The identity information is generated and a matching relationship is generated. According to the correspondence relationship list and the matching relationship, the converted voice information is added to the matching relationship, and the new voice information is matched with the speaker identity information. New mapping relationship. Determining, according to the mapping relationship, speaker identity information matching the tag information in the collected audio data, and extracting and storing the second audio track, that is, the track 2; At the marked time point on the track 2, the track 1 and the track 2 are combined and saved as a new audio file, wherein the new audio file is named as a speaker matching the identified tag information. The identity information; wherein the new audio file can separate the track 1 and the track 2 during playback, and play with the multi-channel device.
此外,在本发明的另一实施例中,所述将所述标记信息的内容信息转换并保存为音频数据;其中,转换成的音频数据与所述获取的音频数据被保存在不同音轨的步骤还包括:在第二音轨上按照相关技术正常录音并保存。例如,记录识别的标记信息并将其转换为语音文件,记录识别的标记信息的时刻点,并在获取音频的音轨1上标记该时刻,并生成检测到标记信息的时刻、标记信息的内容信息和在音轨上的保存位置信息之间的对应关系列表,同时将所述标记信息的内容信息转换为对应的语音文件;将所述标记信息的内容信息与预存在发言者信息库中的发言者的身份信息进行匹配,并生成匹配关系;根据所述对应关系列表和所述匹配关系,将所述转换成的语音信息新增到所述匹配关系中,生成新的所述语音信息与相匹配的所述发言者身份信息的映射关系;同时音轨2上仍然保存着获取的音频数据。所述音轨1只保存录制与标记信息匹配的语音文件,不录制发言者的音频信息;音轨2则保存获取的音频数据;当录音完毕时,将音轨1和音轨2二者中的音频信息保存成一个新的音频文件。在播放所述新的音频文件时将音轨1和音轨2做分离处理,使得音轨1对应左声道,音轨2对应右声道,采用多声道设备进行播放录音,左声道播放音轨1的音频文件,右声道播放音轨2的音频文件。例如所述标记信息的内容信息转化成的音频文件为“张明”,则当用户插入耳 机播放录音时,耳机的左声道在标记的某一时刻点上会播放“张明”的语音内容,右声道则播放发言者“张明”的发言。Furthermore, in another embodiment of the present invention, the content information of the tag information is converted and saved as audio data; wherein the converted audio data and the acquired audio data are saved in different audio tracks. The step further includes: recording and saving according to the related art on the second track. For example, the recognized tag information is recorded and converted into a voice file, the time point of the recognized tag information is recorded, and the time is marked on the track 1 for acquiring the audio, and the time at which the tag information is detected, the content of the tag information is generated. a correspondence list between the information and the saved location information on the audio track, and simultaneously converting the content information of the tag information into a corresponding voice file; and the content information of the tag information and the pre-existing speaker information database The identity information of the speaker is matched, and a matching relationship is generated. According to the correspondence relationship list and the matching relationship, the converted voice information is added to the matching relationship, and the new voice information is generated. Matching the mapping relationship of the speaker identity information; at the same time, the acquired audio data is still stored on the track 2. The track 1 only saves the recorded voice file matching the mark information, does not record the speaker's audio information; the track 2 saves the acquired audio data; when the recording is completed, the track 1 and the track 2 are both The audio information is saved into a new audio file. When the new audio file is played, the track 1 and the track 2 are separated, so that the track 1 corresponds to the left channel, the track 2 corresponds to the right channel, and the multi-channel device is used for playing the recording, the left channel Play the audio file of track 1 and play the audio file of track 2 to the right channel. For example, if the content information of the tag information is converted into an audio file, "Zhang Ming", when the user inserts the ear When the machine plays the recording, the left channel of the earphone will play the “Zhang Ming” voice content at a certain point in the mark, and the right channel will play the speaker “Zhang Ming”.
本发明实施例另外提供一种计算机可读存储介质,存储有计算机可执行指令,所述计算机可执行指令被执行时实现上述方法。Embodiments of the present invention further provide a computer readable storage medium storing computer executable instructions that are implemented when the computer executable instructions are executed.
图3是本发明实施例中终端的示意图,如图3所示,本实施例的终端包括识别模块1和处理模块2;其中,3 is a schematic diagram of a terminal in the embodiment of the present invention. As shown in FIG. 3, the terminal in this embodiment includes an identification module 1 and a processing module 2;
所述识别模块1,设置成采用标记信息对正在获取的音频数据进行标记,其中,所述标记信息包括内容信息和检测到所述标记信息的时刻;The identification module 1 is configured to mark the audio data being acquired by using tag information, where the tag information includes content information and a time when the tag information is detected;
所述处理模块2,设置成保存采用所述标记信息进行标记的音频数据。The processing module 2 is configured to save audio data marked with the tag information.
其中,所述音频数据从检测到所述标记信息的时刻开始获取。其中,所述检测到轨迹近似于字符的手势的时刻是检测到开始输入轨迹近似于字符的手势的时刻,而不是确定手势的轨迹近似于字符的时刻。The audio data is acquired from the moment when the tag information is detected. The moment when the gesture of the trajectory approximating the character is detected is a time when the gesture of starting the input trajectory is approximated to the character is detected, instead of determining the moment when the trajectory of the gesture approximates the character.
可选地,所述识别模块包括:接收单元和识别单元;其中,Optionally, the identification module includes: a receiving unit and an identifying unit; wherein
所述接收单元,设置成在终端的屏幕不点亮时,接收用户输入的手势;The receiving unit is configured to receive a gesture input by the user when the screen of the terminal is not lit;
所述识别单元,设置成将所述手势的轨迹并保存为与所述手势的轨迹近似的字符,并记录检测到轨迹近似于字符的手势的时刻。The recognition unit is configured to save the trajectory of the gesture as a character that approximates a trajectory of the gesture, and record a time at which a gesture in which the trajectory approximates a character is detected.
可选地,所述识别单元还设置成在所述采用标记信息对正在获取的音频数据进行标记之前,接收输入的标记信息;保存所述标记信息的内容信息和检测到所述标记信息的时刻。Optionally, the identifying unit is further configured to receive the input tag information before the tag information is used to mark the audio data being acquired; save the content information of the tag information and the time when the tag information is detected .
可选地,所述识别单元是设置成在终端的屏幕点亮且终端进行录音时,在终端的屏幕上显示标记信息输入界面,接收用户输入的标记信息;Optionally, the identifying unit is configured to display a mark information input interface on the screen of the terminal when the screen of the terminal is lit and the terminal performs recording, and receive the mark information input by the user;
所述识别所述标记信息,并保存所述标记信息的内容信息和检测到所述标记信息的时刻的步骤包括:将所输入的标记信息的内容信息识别并保存为字符;The step of identifying the mark information and saving the content information of the mark information and the time at which the mark information is detected includes: identifying and saving the content information of the input mark information as a character;
其中,所述输入的标记信息包括用户对终端通过手势输入的、与手势的轨迹近似的字符,以及用户通过终端中的输入法应用输入的拼音、笔画等。The input tag information includes a character that is input by the user through a gesture and approximates a trajectory of the gesture, and a pinyin, a stroke, and the like that are input by the user through an input method in the terminal.
在可选实施例中,所述终端还包括:存储模块3和匹配模块4,其中, In an optional embodiment, the terminal further includes: a storage module 3 and a matching module 4, where
所述存储模块,设置成将所述标记信息与预存在发言者信息库中的发言者的身份信息进行匹配,若发言者信息库中有发言者的身份信息与所述标记信息匹配,则生成匹配关系;The storage module is configured to match the tag information with identity information of a speaker pre-existing in the speaker information database, and if the identity information of the speaker in the speaker information database matches the tag information, generate Matching relationship
所述匹配模块,设置成根据所述匹配关系,采用与所述标记信息匹配的发言者的身份信息对所述音频数据进行标记。The matching module is configured to mark the audio data by using identity information of a speaker that matches the tag information according to the matching relationship.
可选地,所述存储模块是设置成将所述标记信息中的内容信息与预存在发言者信息库中的发言者身份信息进行匹配,若发言者信息库中有预存的发言者的身份信息与所述标记信息中的内容信息匹配,则生成匹配关系。Optionally, the storage module is configured to match the content information in the tag information with the speaker identity information in the pre-existing speaker information database, if there is a pre-stored speaker identity information in the speaker information database. Matching with the content information in the tag information, a matching relationship is generated.
可选地,所述终端的所述处理模块包括:标记单元、提取单元和保存单元;其中,当按照时间顺序已检测到第一标记信息和第二标记信息时,Optionally, the processing module of the terminal includes: a marking unit, an extracting unit, and a saving unit; wherein, when the first marking information and the second marking information have been detected in time sequence,
所述标记单元,设置成在检测到第一标记信息时,采用与所述第一标记信息匹配的第一发言者身份对正在获取的音频数据进行标记;The marking unit is configured to, when detecting the first marking information, mark the audio data being acquired by using a first speaker identity that matches the first marking information;
所述保存单元,设置成在检测到第二标记信息时,保存从检测到所述第一标记信息的时刻至检测到所述第二标记信息的时刻期间获取到的音频数据;其中所述保存的音频数据的名称包括与所述第一标记信息匹配的第一发言者身份信息中部分或全部的字符信息;其中所述发言者的身份信息包括:所述第一发言者的姓名和头像。The saving unit is configured to, when the second tag information is detected, save audio data acquired from a time when the first tag information is detected to a time when the second tag information is detected; wherein the saving The name of the audio data includes some or all of the character information in the first speaker identity information that matches the first tag information; wherein the identity information of the speaker includes: the first speaker's name and avatar.
在另一实施例中,所述处理模块的包括标记单元、判断单元和保存单元;其中,当已按照时间顺序检测到第四标记信息、第五标记信息、第六标记信息和第七标记信息时,In another embodiment, the processing module includes a marking unit, a determining unit, and a saving unit; wherein, when the fourth marker information, the fifth marker information, the sixth marker information, and the seventh marker information have been detected in chronological order Time,
所述标记单元,设置成保存采用第四标记信息、第五标记信息、第六标记信息和第七标记信息分别进行标记的音频数据;其中,检测到所述第四标记信息的时刻为第四时刻,检测到所述第五标记信息的时刻为第五时刻,检测到所述第六标记信息的时刻为第六时刻,检测到所述第七标记信息的时刻为第七时刻;The marking unit is configured to store audio data marked with the fourth marker information, the fifth marker information, the sixth marker information, and the seventh marker information respectively; wherein the moment when the fourth marker information is detected is the fourth At the moment, the time when the fifth marker information is detected is the fifth time, the time when the sixth marker information is detected is the sixth time, and the time when the seventh marker information is detected is the seventh time;
所述判断单元,设置成判断所述各个标记信息的内容信息是否相同;The determining unit is configured to determine whether the content information of each of the tag information is the same;
所述保存单元,设置成如果所述第四标记信息的内容信息和所述第六标记信息的内容信息相同,则按照所述各时刻的时间顺序,将从所述第四时刻 至第五时刻的期间获取的音频数据和从所述第六时刻至所述第七时刻的期间获取的音频数据进行合并,并保存为一个音频文件;其中,所保存的音频文件的名称包括与所述第四标记信息或所述第六标记信息相匹配的发言者的身份信息中部分或全部的字符信息。The saving unit is configured to: if the content information of the fourth tag information and the content information of the sixth tag information are the same, according to the time sequence of the moments, the fourth moment will be The audio data acquired during the fifth time period and the audio data acquired during the period from the sixth time to the seventh time are combined and saved as one audio file; wherein the name of the saved audio file includes The fourth tag information or the sixth tag information matches part or all of the character information of the speaker's identity information.
在另一实施例中,所述处理模块还设置成将所述标记信息的内容信息转换并保存为音频数据;其中,转换成的音频数据与所述获取的音频数据被保存在不同音轨;In another embodiment, the processing module is further configured to convert and save the content information of the tag information into audio data; wherein the converted audio data and the acquired audio data are saved in different audio tracks;
将所述标记信息的内容信息转换并保存成的所述音频数据所在的音轨和保存所获取的音频数据所在的第二音轨合并,并保存为一个新的音频文件。The audio track in which the audio information of the tag information is converted and saved is merged with the second audio track in which the acquired audio data is stored, and saved as a new audio file.
本发明实施例提供了一种音频数据处理方法和终端,能够在录音过程中,根据识别的标记信息匹配预存的发言者身份信息生成匹配关系,根据匹配关系将采集的音频文件进行标记并提取保存,解决了录音过程中不能区分发言者的身份,也解决了录音内容整理工作的繁琐,提高了录音内容整理工作的效率。The embodiment of the invention provides an audio data processing method and a terminal, which can generate a matching relationship according to the identified tag information according to the identified tag information, and mark and extract the collected audio file according to the matching relationship. It solves the problem that the speaker cannot be distinguished during the recording process, and also solves the cumbersome work of recording the recording content, and improves the efficiency of the recording content.
本领域普通技术人员可以理解上述方法中的全部或部分步骤,可通过程序来指令相关硬件完成,所述程序可以存储于计算机可读存储介质中,如只读存储器、磁盘或光盘等。相应地,上述实施例中的各模块可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。本申请不限制于任何特定形式的硬件和软件的结合。One of ordinary skill in the art can understand that all or part of the above steps can be completed by a program to instruct related hardware, and the program can be stored in a computer readable storage medium such as a read only memory, a magnetic disk or an optical disk. Correspondingly, each module in the foregoing embodiment may be implemented in the form of hardware, or may be implemented in the form of a software function module. This application is not limited to any specific combination of hardware and software.
以上仅为本发明的可选实施例,当然,本申请还可有其他多种实施例,在不背离本申请精神及其实质的情况下,熟悉本领域的技术人员当可根据本申请作出各种相应的改变和变形,但这些相应的改变和变形都应属于本申请所附的权利要求的保护范围。The above is only an alternative embodiment of the present invention. Of course, the present application may also have various other embodiments. Those skilled in the art can make various according to the present application without departing from the spirit and spirit of the present application. Corresponding changes and modifications are intended to be included within the scope of the appended claims.
本领域普通技术人员可以理解上述方法中的全部或部分步骤可通过程序来指令相关硬件(例如处理器)完成,所述程序可以存储于计算机可读存储介质中,如只读存储器、磁盘或光盘等。可选地,上述实施例的全部或部分步骤也可以使用一个或多个集成电路来实现。相应地,上述实施例中的各模块/单元可以采用硬件的形式实现,例如通过集成电路来实现其相应功能,也可以采用软件功能模块的形式实现,例如通过处理器执行存储于存储器中的 程序/指令来实现其相应功能。本发明实施例不限制于任何特定形式的硬件和软件的结合。One of ordinary skill in the art will appreciate that all or a portion of the above steps may be performed by a program to instruct related hardware, such as a processor, which may be stored in a computer readable storage medium, such as a read only memory, disk or optical disk. Wait. Alternatively, all or part of the steps of the above embodiments may also be implemented using one or more integrated circuits. Correspondingly, each module/unit in the foregoing embodiment may be implemented in the form of hardware, for example, by implementing an integrated circuit to implement its corresponding function, or may be implemented in the form of a software function module, for example, being executed by a processor and stored in a memory. Programs/instructions to implement their respective functions. Embodiments of the invention are not limited to any specific form of combination of hardware and software.
工业实用性Industrial applicability
本申请所提供的音频数据处理方法和终端,将录音内容进行标记并保存被标记的音频数据,能够提高整理录音内容的效率。 The audio data processing method and terminal provided by the present application mark the recorded content and save the marked audio data, thereby improving the efficiency of organizing the recorded content.

Claims (20)

  1. 一种音频数据处理方法,包括:An audio data processing method includes:
    采用标记信息对正在获取的音频数据进行标记,其中,所述标记信息包括内容信息和检测到所述标记信息的时刻;Marking the audio data being acquired with the tag information, wherein the tag information includes content information and a time when the tag information is detected;
    保存采用所述标记信息进行标记的音频数据。The audio data marked with the tag information is saved.
  2. 根据权利要求1所述方法,在所述采用标记信息对正在获取的音频数据进行标记的步骤之前,所述音频数据处理方法还包括:The method of claim 1, before the step of marking the audio data being acquired with the tag information, the audio data processing method further comprising:
    在终端的屏幕不点亮时,接收用户输入的手势;Receiving a gesture input by the user when the screen of the terminal is not lit;
    将所述手势的轨迹识别并保存为与所述手势的轨迹近似的字符,并记录检测到轨迹近似于字符的手势的时刻。The trajectory of the gesture is recognized and saved as a character that approximates the trajectory of the gesture, and the time at which the gesture of the trajectory approximates the character is detected is recorded.
  3. 根据权利要求2所述方法,所述将所述手势的轨迹识别并保存为与所述手势的轨迹近似的字符,并记录检测到轨迹近似于字符的手势的时刻的步骤包括:The method according to claim 2, wherein the step of recognizing and saving the trajectory of the gesture as a character approximate to the trajectory of the gesture, and recording the moment at which the trajectory approximates the gesture of the character is recorded includes:
    识别将所述手势的轨迹的形状的边缘,并将手势的轨迹保存为与手势的轨迹近似的字符,并显示将所述手势的轨迹保存成的字符。An edge that shapes the trajectory of the gesture is identified, and the trajectory of the gesture is saved as a character that approximates the trajectory of the gesture, and a character that saves the trajectory of the gesture is displayed.
  4. 根据权利要求1所述方法,在所述采用标记信息对正在获取的音频数据进行标记的步骤之前,所述音频数据处理的方法还包括:The method of claim 1, before the step of marking the audio data being acquired with the tag information, the method of audio data processing further comprises:
    接收输入的标记信息;Receiving input tag information;
    保存所述标记信息的内容信息和检测到所述标记信息的时刻。The content information of the tag information and the time at which the tag information is detected are saved.
  5. 根据权利要求4所述方法,其中,所述接收输入的标记信息的步骤包括:在终端的屏幕点亮且终端进行录音时,在终端的屏幕上显示标记信息输入界面,接收用户输入的标记信息;The method according to claim 4, wherein said step of receiving the input tag information comprises: displaying a tag information input interface on a screen of the terminal when the screen of the terminal is lit and the terminal performs recording, and receiving tag information input by the user ;
    所述识别所述标记信息,并保存所述标记信息的内容信息和检测到所述标记信息的时刻的步骤包括:将所输入的标记信息的内容信息识别并保存为字符;The step of identifying the mark information and saving the content information of the mark information and the time at which the mark information is detected includes: identifying and saving the content information of the input mark information as a character;
    其中,所述输入的标记信息包括用户对终端通过手势输入的、与手势的轨迹近似的字符,以及用户通过终端中的输入法应用输入的拼音、笔画等。 The input tag information includes a character that is input by the user through a gesture and approximates a trajectory of the gesture, and a pinyin, a stroke, and the like that are input by the user through an input method in the terminal.
  6. 根据权利要求1-5任一项所述方法,所述方法还包括:A method according to any one of claims 1 to 5, the method further comprising:
    将所述标记信息与预存在发言者信息库中的发言者的身份信息进行匹配,若发言者信息库中有发言者的身份信息与所述标记信息匹配,则生成匹配关系;Matching the tag information with the identity information of the speaker in the pre-existing speaker information database, and if the identity information of the speaker in the speaker information database matches the tag information, generating a matching relationship;
    根据所述匹配关系,采用与所述标记信息匹配的发言者的身份信息对所述音频数据进行标记。And according to the matching relationship, the audio data is marked with identity information of a speaker that matches the tag information.
  7. 根据权利要求6所述方法,其中,所述将所述标记信息与预存在发言者信息库中的发言者的身份信息进行匹配,若发言者信息库中有发言者的身份信息与所述标记信息匹配,则生成匹配关系的步骤包括:The method of claim 6 wherein said matching said tag information with identity information of a speaker pre-existing in a database of speakers, if said speaker information has a speaker's identity information and said tag If the information matches, the steps of generating a matching relationship include:
    将所述标记信息中的内容信息与预存在发言者信息库中的发言者身份信息进行匹配,若发言者信息库中有预存的发言者的身份信息与所述标记信息中的内容信息匹配,则生成匹配关系。Matching the content information in the tag information with the speaker identity information in the pre-existing speaker information database, if the pre-existing speaker identity information in the speaker information database matches the content information in the tag information, Then generate a matching relationship.
  8. 根据权利要求6所述方法,其中,当按照时间顺序已检测到第一标记信息和第二标记信息时,所述保存采用所述标记信息进行标记的音频数据的步骤包括:The method according to claim 6, wherein when the first tag information and the second tag information have been detected in chronological order, the step of saving the audio data marked with the tag information comprises:
    在检测到第一标记信息时,采用与所述第一标记信息匹配的第一发言者的身份信息对正在获取的音频数据进行标记;When the first tag information is detected, the audio data that is being acquired is marked by using the identity information of the first speaker that matches the first tag information;
    在检测到第二标记信息时,保存从检测到所述第一标记信息的时刻至检测到所述第二标记信息的时刻期间获取到的音频数据;When the second tag information is detected, the audio data acquired from the time when the first tag information is detected to the time when the second tag information is detected is saved;
    其中所述保存的音频数据的名称包括与所述第一标记信息匹配的第一发言者身份信息中部分或全部的字符信息;The name of the saved audio data includes part or all of character information in the first speaker identity information that matches the first tag information;
    其中所述发言者的身份信息包括:所述第一发言者的姓名、和头像。The identity information of the speaker includes: a name and an avatar of the first speaker.
  9. 根据权利要求6所述方法,其中,当已按照时间顺序检测到第四标记信息、第五标记信息、第六标记信息和第七标记信息时,所述保存采用所述标记信息进行标记的音频数据的步骤包括:The method according to claim 6, wherein said audio marked with said flag information is saved when fourth flag information, fifth flag information, sixth flag information, and seventh flag information have been detected in chronological order The steps of the data include:
    保存采用第四标记信息、第五标记信息、第六标记信息和第七标记信息分别进行标记的音频数据;其中,检测到第四标记信息的时刻为第四时刻,检测到所述第五标记信息的时刻为第五时刻,检测到所述第六标记信息的时 刻为第六时刻,检测到所述第七标记信息的时刻为第七时刻;And storing the audio data marked with the fourth marker information, the fifth marker information, the sixth marker information, and the seventh marker information respectively; wherein the moment when the fourth marker information is detected is the fourth moment, and the fifth marker is detected The moment of the information is the fifth moment, when the sixth marker information is detected The sixth moment is detected, and the moment when the seventh marker information is detected is the seventh moment;
    判断所述各个标记信息的内容信息是否相同;Determining whether the content information of each of the tag information is the same;
    如果所述第四标记信息的内容信息和所述第六标记信息的内容信息相同,则按照所述各时刻的时间顺序,将从所述第四时刻至第五时刻的期间获取的音频数据和从所述第六时刻至所述第七时刻的期间获取的音频数据If the content information of the fourth tag information and the content information of the sixth tag information are the same, the audio data acquired from the fourth time to the fifth time period and the time sequence of the respective time points are Audio data acquired from the sixth time to the seventh time
    进行合并,并保存为一个音频文件;其中,所保存的音频文件的名称包括与所述第四标记信息或所述第六标记信息相匹配的发言者的身份信息中部分或全部的字符信息。The merging is performed and saved as an audio file; wherein the name of the saved audio file includes some or all of the character information in the identity information of the speaker that matches the fourth tag information or the sixth tag information.
  10. 根据权利要求1-5任一项所述方法,其中,所述保存采用所述标记信息进行标记的音频数据的步骤包括:A method according to any one of claims 1 to 5, wherein said step of saving audio data marked with said tag information comprises:
    将所述标记信息的内容信息转换并保存为音频数据;其中,转换成的音频数据与所述获取的音频数据被保存在不同音轨;Converting and saving the content information of the tag information into audio data; wherein the converted audio data and the acquired audio data are saved in different audio tracks;
    将所述标记信息的内容信息转换并保存成的所述音频数据所在的音轨和保存所获取的音频数据所在的音轨合并,并保存为一个新的音频文件。The audio track in which the audio information of the tag information is converted and saved is merged with the audio track in which the acquired audio data is stored, and saved as a new audio file.
  11. 根据权利要求6所述方法,当两个或两个以上的音频文件的标记信息的内容信息与同一个发言者的身份信息匹配时,将与同一个发言者的身份信息匹配的各个音频文件分别命名为其名称中包括所述同一发言者的身份信息中部分或全部的字符信息和表示所述同一发言者的音频数据的序列号的组合。According to the method of claim 6, when the content information of the tag information of two or more audio files matches the identity information of the same speaker, each audio file matching the identity information of the same speaker is respectively A combination of character information including some or all of the identity information of the same speaker and a sequence number indicating the audio data of the same speaker is named.
  12. 一种终端,包括识别模块和处理模块;其中,A terminal includes an identification module and a processing module; wherein
    所述识别模块,设置成采用标记信息对正在获取的音频数据进行标记,其中,所述标记信息包括内容信息和检测到所述标记信息的时刻;The identification module is configured to mark the audio data being acquired by using tag information, where the tag information includes content information and a time when the tag information is detected;
    所述处理模块,设置成保存采用所述标记信息进行标记的音频数据。The processing module is configured to save audio data marked with the tag information.
  13. 根据权利要求12所述终端,其中,所述识别模块包括:接收单元和识别单元;其中,The terminal according to claim 12, wherein the identification module comprises: a receiving unit and an identifying unit; wherein
    所述接收单元,设置成在终端的屏幕不点亮时,接收用户输入的手势;The receiving unit is configured to receive a gesture input by the user when the screen of the terminal is not lit;
    所述识别单元,设置成将所述手势的轨迹并保存为与所述手势的轨迹近 似的字符,并记录检测到轨迹近似于字符的手势的时刻。The recognition unit is configured to save the trajectory of the gesture as being closer to the trajectory of the gesture Like a character, and record the moment when the gesture that approximates the character is detected.
  14. 根据权利要求12所述终端,所述识别单元还设置成在所述采用标记信息对正在获取的音频数据进行标记之前,接收输入的标记信息;保存所述标记信息的内容信息和检测到所述标记信息的时刻。The terminal according to claim 12, wherein said identification unit is further configured to receive the input tag information before said tag information is used to tag the audio data being acquired; save content information of said tag information and detect said The moment when the information is marked.
  15. 根据权利要求14所述终端,其中,所述识别单元是设置成在终端的屏幕点亮且终端进行录音时,在终端的屏幕上显示标记信息输入界面,接收用户输入的标记信息;The terminal according to claim 14, wherein the identification unit is configured to display a mark information input interface on a screen of the terminal when the screen of the terminal is lit and the terminal performs recording, and receive the mark information input by the user;
    所述识别所述标记信息,并保存所述标记信息的内容信息和检测到所述标记信息的时刻的步骤包括:将所输入的标记信息的内容信息识别并保存为字符;The step of identifying the mark information and saving the content information of the mark information and the time at which the mark information is detected includes: identifying and saving the content information of the input mark information as a character;
    其中,所述输入的标记信息包括用户对终端通过手势输入的、与手势的轨迹近似的字符,以及用户通过终端中的输入法应用输入的拼音、笔画等。The input tag information includes a character that is input by the user through a gesture and approximates a trajectory of the gesture, and a pinyin, a stroke, and the like that are input by the user through an input method in the terminal.
  16. 根据权利要求12-15任一项所述终端,所述终端还包括:存储模块和匹配模块,其中,The terminal according to any one of claims 12-15, further comprising: a storage module and a matching module, wherein
    所述存储模块,设置成将所述标记信息与预存在发言者信息库中的发言者的身份信息进行匹配,若发言者信息库中有发言者的身份信息与所述标记信息匹配,则生成匹配关系;The storage module is configured to match the tag information with identity information of a speaker pre-existing in the speaker information database, and if the identity information of the speaker in the speaker information database matches the tag information, generate Matching relationship
    所述匹配模块,设置成根据所述匹配关系,采用与所述标记信息匹配的发言者的身份信息对所述音频数据进行标记。The matching module is configured to mark the audio data by using identity information of a speaker that matches the tag information according to the matching relationship.
  17. 根据权利要求16所述终端,其中,所述存储模块是设置成将所述标记信息中的内容信息与预存在发言者信息库中的发言者身份信息进行匹配,若发言者信息库中有预存的发言者的身份信息与所述标记信息中的内容信息匹配,则生成匹配关系。The terminal according to claim 16, wherein the storage module is configured to match the content information in the tag information with the speaker identity information in the pre-existing speaker information database, if the speaker information database is pre-stored The identity information of the speaker matches the content information in the tag information, and a matching relationship is generated.
  18. 根据权利要求16所述终端,其中,所述处理模块包括:标记单元和保存单元;其中,当按照时间顺序已检测到第一标记信息和第二标记信息时,The terminal according to claim 16, wherein the processing module comprises: a marking unit and a saving unit; wherein, when the first marking information and the second marking information have been detected in chronological order,
    所述标记单元,设置成在检测到第一标记信息时,采用与所述第一标记信息匹配的第一发言者的身份信息对正在获取的音频数据进行标记;The marking unit is configured to, when detecting the first marking information, mark the audio data being acquired by using the identity information of the first speaker that matches the first marking information;
    所述保存单元,设置成在检测到第二标记信息时,保存从检测到所述第 一标记信息的时刻至检测到所述第二标记信息的时刻期间获取到的音频数据;其中所述保存的音频数据的名称包括与所述第一标记信息匹配的第一发言者身份信息中部分或全部的字符信息;其中所述发言者的身份信息包括:所述第一发言者的姓名和头像。The saving unit is configured to save the detection from the detection when the second marker information is detected An audio data acquired during a time when the information is marked until a time at which the second tag information is detected; wherein the name of the saved audio data includes a portion of the first speaker identity information that matches the first tag information Or all of the character information; wherein the identity information of the speaker includes: the first speaker's name and avatar.
  19. 根据权利要求16所述终端,其中,所述处理模块的包括标记单元、判断单元和保存单元;其中,当已按照时间顺序检测到第四标记信息、第五标记信息、第六标记信息和第七标记信息时,The terminal according to claim 16, wherein the processing module includes a marking unit, a judging unit, and a saving unit; wherein, when the fourth mark information, the fifth mark information, the sixth mark information, and the Seven when marking information,
    所述标记单元,设置成保存采用第四标记信息、第五标记信息、第六标记信息和第七标记信息分别进行标记的音频数据;其中,检测到所述第四标记信息的时刻为第四时刻,检测到所述第五标记信息的时刻为第五时刻,检测到所述第六标记信息的时刻为第六时刻,检测到所述第七标记信息的时刻为第七时刻;The marking unit is configured to store audio data marked with the fourth marker information, the fifth marker information, the sixth marker information, and the seventh marker information respectively; wherein the moment when the fourth marker information is detected is the fourth At the moment, the time when the fifth marker information is detected is the fifth time, the time when the sixth marker information is detected is the sixth time, and the time when the seventh marker information is detected is the seventh time;
    所述判断单元,设置成判断所述各个标记信息的内容信息是否相同;The determining unit is configured to determine whether the content information of each of the tag information is the same;
    所述保存单元,设置成如果所述第四标记信息的内容信息和所述第六标记信息的内容信息相同,则按照所述各时刻的时间顺序,将从所述第四时刻至第五时刻的期间获取的音频数据和从所述第六时刻至所述第七时刻的期间获取的音频数据进行合并,并保存为一个音频文件;其中,所保存的音频文件的名称包括与所述第四标记信息或所述第六标记信息相匹配的发言者的身份信息中部分或全部的字符信息。The saving unit is configured to: if the content information of the fourth tag information and the content information of the sixth tag information are the same, the fourth time to the fifth time will be from the time sequence according to the time sequence of each time The audio data acquired during the period and the audio data acquired during the period from the sixth time to the seventh time are combined and saved as an audio file; wherein the name of the saved audio file includes the fourth Some or all of the character information in the identity information of the speaker that matches the tag information or the sixth tag information.
  20. 根据权利要求12-15任一项所述终端,其中,所述处理模块还设置成将所述标记信息的内容信息转换并保存为音频数据;其中,转换成的音频数据与所述获取的音频数据被保存在不同音轨;The terminal according to any one of claims 12-15, wherein the processing module is further configured to convert and save the content information of the tag information into audio data; wherein the converted audio data and the acquired audio The data is saved in different tracks;
    将所述标记信息的内容信息转换并保存成的所述音频数据所在的音轨和保存所获取的音频数据所在的音轨合并,并保存为一个新的音频文件。 The audio track in which the audio information of the tag information is converted and saved is merged with the audio track in which the acquired audio data is stored, and saved as a new audio file.
PCT/CN2016/081022 2016-02-02 2016-05-04 Audio data processing method and terminal WO2016197755A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201610074943.8A CN107026931A (en) 2016-02-02 2016-02-02 A kind of audio data processing method and terminal
CN201610074943.8 2016-02-02

Publications (1)

Publication Number Publication Date
WO2016197755A1 true WO2016197755A1 (en) 2016-12-15

Family

ID=57502786

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/081022 WO2016197755A1 (en) 2016-02-02 2016-05-04 Audio data processing method and terminal

Country Status (2)

Country Link
CN (1) CN107026931A (en)
WO (1) WO2016197755A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113380247A (en) * 2021-06-08 2021-09-10 阿波罗智联(北京)科技有限公司 Multi-tone-zone voice awakening and recognizing method and device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020133513A1 (en) * 2001-03-16 2002-09-19 Ftr Pty Ltd. Log note system for digitally recorded audio
CN102262890A (en) * 2010-05-31 2011-11-30 鸿富锦精密工业(深圳)有限公司 Electronic device and marking method thereof
CN103020306A (en) * 2013-01-04 2013-04-03 深圳市中兴移动通信有限公司 Lookup method and system for character indexes based on gesture recognition
CN103400592A (en) * 2013-07-30 2013-11-20 北京小米科技有限责任公司 Recording method, playing method, device, terminal and system

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104978145A (en) * 2015-01-27 2015-10-14 中兴通讯股份有限公司 Recording realization method and apparatus and mobile terminal

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020133513A1 (en) * 2001-03-16 2002-09-19 Ftr Pty Ltd. Log note system for digitally recorded audio
CN102262890A (en) * 2010-05-31 2011-11-30 鸿富锦精密工业(深圳)有限公司 Electronic device and marking method thereof
CN103020306A (en) * 2013-01-04 2013-04-03 深圳市中兴移动通信有限公司 Lookup method and system for character indexes based on gesture recognition
CN103400592A (en) * 2013-07-30 2013-11-20 北京小米科技有限责任公司 Recording method, playing method, device, terminal and system

Also Published As

Publication number Publication date
CN107026931A (en) 2017-08-08

Similar Documents

Publication Publication Date Title
CN110322869B (en) Conference character-division speech synthesis method, device, computer equipment and storage medium
CN106782545B (en) A kind of system and method that audio, video data is converted to writing record
CN112087656B (en) Online note generation method and device and electronic equipment
CN104078044B (en) The method and apparatus of mobile terminal and recording search thereof
CN104123115B (en) Audio information processing method and electronic device
CN107562760B (en) Voice data processing method and device
CN107274916B (en) Method and device for operating audio/video file based on voiceprint information
WO2016165346A1 (en) Method and apparatus for storing and playing audio file
WO2016197708A1 (en) Recording method and terminal
US20050182627A1 (en) Audio signal processing apparatus and audio signal processing method
US8315866B2 (en) Generating representations of group interactions
CN106024009A (en) Audio processing method and device
WO2016119370A1 (en) Method and device for implementing sound recording, and mobile terminal
WO2017080239A1 (en) Audio recording tagging method and recording device
JPWO2005069171A1 (en) Document association apparatus and document association method
US9251808B2 (en) Apparatus and method for clustering speakers, and a non-transitory computer readable medium thereof
CN109471955B (en) Video clip positioning method, computing device and storage medium
CN106982344A (en) video information processing method and device
CN112084756B (en) Conference file generation method and device and electronic equipment
CN106373598A (en) Audio replay control method and apparatus
CN107885483B (en) Audio information verification method and device, storage medium and electronic equipment
CN102855317A (en) Multimode indexing method and system based on demonstration video
US20120035919A1 (en) Voice recording device and method thereof
CN111402892A (en) Conference recording template generation method based on voice recognition
KR102036721B1 (en) Terminal device for supporting quick search for recorded voice and operating method thereof

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16806653

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16806653

Country of ref document: EP

Kind code of ref document: A1