WO2016197755A1

WO2016197755A1 - Audio data processing method and terminal

Info

Publication number: WO2016197755A1
Application number: PCT/CN2016/081022
Authority: WO
Inventors: 奚黎明
Original assignee: 中兴通讯股份有限公司
Priority date: 2016-02-02
Filing date: 2016-05-04
Publication date: 2016-12-15
Also published as: CN107026931A

Abstract

An audio data processing method and a terminal. The audio data processing method comprises: using tag information to tag currently obtained audio data, the tag information comprising content information and the time when the tag information was detected; storing the audio data tagged using the tag information. The present invention improves the efficiency of arranging recording content.

Description

Audio data processing method and terminal

Technical field

The present application relates to, but is not limited to, the field of communications, and in particular to an audio data processing method and terminal.

Background technique

The recording function is a basic function of the communication terminal, and recording functions are required in many occasions, such as various conferences, trainings, and calls. However, at present, it is cumbersome to organize the recorded content after recording, and it is often necessary to listen to or distinguish the recorded content to distinguish the speech of different speakers. Even, it is often impossible to distinguish between the recorded content or a part of the recorded content to which the recorded content belongs; in addition, the same speaker in the meeting often does not speak continuously, but speaks at different time periods, which makes it difficult to organize the recorded content.

Summary of the invention

The following is an overview of the topics detailed in this document. This Summary is not intended to limit the scope of the claims.

An embodiment of the present invention provides a method and a terminal for processing audio data, which can mark and save audio data acquired in a corresponding time according to the identified tag information.

The embodiment of the invention provides an audio data processing method, including:

Marking the audio data being acquired with the tag information, wherein the tag information includes content information and a time when the tag information is detected;

The audio data marked with the tag information is saved.

The application further provides a computer readable storage medium storing computer executable instructions that are implemented when the computer executable instructions are executed.

The invention also provides a terminal, comprising an identification module and a processing module, wherein

The identification module is configured to mark the audio data being acquired by using tag information, where the tag information includes content information and a time when the tag information is detected;

The processing module is configured to save audio data marked with the tag information.

The audio data processing method and terminal provided by the application mark the recorded content and save the marked audio data, thereby improving the efficiency of organizing the recorded content.

Other aspects will be apparent upon reading and understanding the drawings and detailed description.

BRIEF abstract

The drawings are used to provide a further understanding of the technical solutions of the present application, and constitute a part of the specification, which is used together with the embodiments of the present application to explain the technical solutions of the present application, and does not constitute a limitation of the technical solutions of the present application.

1 is a flowchart of an audio data processing method in an embodiment of the present invention;

2 is a schematic diagram of a recording mark information input interface in another embodiment of the present invention;

3 is a schematic diagram of a terminal in an embodiment of the present invention.

Preferred embodiment of the invention

Embodiments of the present invention will be described in detail below with reference to the accompanying drawings. It should be noted that, in the case of no conflict, the features in the embodiments and the embodiments in the present application may be arbitrarily combined with each other.

1 is a flowchart of an audio data processing method in an embodiment of the present invention. As shown in FIG. 1, the audio data processing method in the embodiment of the present invention includes:

S10. Mark the audio data that is being acquired by using tag information, where the tag information includes content information and a time when the tag information is detected;

S20. Save the audio data marked by using the tag information.

The audio data is acquired from the moment when the tag information is detected. The moment when the gesture of the trajectory approximating the character is detected is a time when the gesture of starting the input trajectory is approximated to the character is detected, instead of determining the moment when the trajectory of the gesture approximates the character.

Optionally, in the first embodiment, before the step of marking the audio data being acquired by using the tag information, the audio data processing method further includes:

Receiving a gesture input by the user when the screen of the terminal is not lit;

The trajectory of the gesture is recognized and saved as a character that approximates the trajectory of the gesture, and the moment at which the gesture of the trajectory is approximated to the character is recorded.

The moment when the gesture of the trajectory approximating the character is detected is a time when the gesture of starting the input trajectory is approximated to the character is detected, instead of determining the moment when the trajectory of the gesture approximates the character.

Optionally, the step of identifying a trajectory of the gesture and saving a character that is similar to the trajectory of the gesture, and recording a moment when the marker information is detected includes:

The edge of the shape of the trajectory of the gesture is identified, and the trajectory of the gesture is saved as a character that approximates the trajectory of the gesture, and the character in which the trajectory of the gesture is saved is displayed.

When the screen of the terminal is not lit and the terminal performs recording, that is, when the terminal performs recording through the recording application in the terminal according to the user instruction, and the terminal keeps its screen not lighting, the screen of the terminal is still powered, so that the terminal detects Whether the screen of the terminal receives a gesture that the user inputs a trajectory that approximates a character, and if detecting that the trajectory input by the user approximates a gesture of a character, the terminal extracts a key point from an edge of a shape of the trajectory of the gesture to The shape of the trajectory of the gesture is recognized, and a part of the display screen that receives the trajectory input by the user displays an image of the corresponding gesture trajectory, and the entire screen does not need to be lit at this time. For example, when the "Zhangming" speech is spoken in the conference recording, the terminal detects that the trajectory input by the user to the screen is similar to the letters "Z" and "M", where the trajectory is similar to the letters "Z" and "M". The gesture is sequentially input by the user in chronological order, and the intermediate interval is short. The terminal can use the recognized combination of the letters "Z" and "M" "ZM" as the content information in the tag information.

In the second embodiment, before the step of marking the audio data being acquired by using the tag information, the audio data processing method further includes:

Receiving input tag information;

The content information of the tag information and the time at which the tag information is detected are saved.

Optionally, the step of receiving the input tag information includes: when the screen of the terminal is lit and the terminal performs recording, displaying a tag information input interface on the screen of the terminal, and receiving tag information input by the user; that is, the terminal is according to the user. The command is started to record through the recording application in the terminal, and when the terminal keeps its screen lit, the mark information input interface may be displayed according to the user's request, and the tag information input by the user is received; the tag information is identified, and the Content letter of tag information And the step of detecting the time of the tag information includes: identifying and saving the content information of the input tag information as a character.

Optionally, the tag information input interface is displayed in the standby interface, and the tag information input by the user is received according to the user request, and the content information of the input tag information may include a trajectory similar to the gesture input by the user to the terminal through the gesture. The characters (not limited to text information, may also include numbers, letters, symbols, etc.), and the user input pinyin, strokes, and the like through the input method in the terminal. For example, when recording in a conference, if it is a "Zhang Ming" speech, the Chinese character "Zhang Ming" or the letter "ZM" input by the user in chronological order is received, as shown in FIG. 2 .

In another embodiment of the present invention, the method further includes:

Matching the tag information with the identity information of the speaker in the pre-existing speaker information database, and if the identity information of the speaker in the speaker information database matches the tag information, generating a matching relationship;

And according to the matching relationship, the audio data is marked with identity information of a speaker that matches the tag information.

Optionally, the matching the tag information with the identity information of the speaker in the pre-existing speaker information database, and if the identity information of the speaker in the speaker information database matches the tag information, generating a match The step of the relationship includes: matching the content information in the tag information with the speaker identity information in the pre-existing speaker information database, if there is a pre-stored speaker identity information and the tag information in the speaker information database If the content information matches, a matching relationship is generated.

For example, matching the content information in the tag information with the information of the identity of the speaker stored in the database of speaker identity information before recording, and generating a match between the content information and the identity information of the speaker matching the same A relationship, wherein the identity information of the speaker in the matching relationship includes, but is not limited to, a name, an avatar, a code, and the like. For example, the content information "ZM" in the tag information "12:10; ZM" is extracted, and a matching relationship is generated with the identity information of the speaker Zhang Ming in the pre-stored speaker information database.

Optionally, when the first tag information and the second tag information have been detected in chronological order, the step of saving the audio data marked by using the tag information includes:

When the first tag information is detected, the first speaker matching the first tag information is employed The identity information marks the audio data being acquired;

When the second tag information is detected, the audio data acquired from the time when the first tag information is detected to the time when the second tag information is detected is saved;

The name of the saved audio data includes part or all of character information in the speaker identity information that matches the first tag information;

The identity information of the speaker includes: a name and an avatar of the speaker;

And continuing to mark the audio data being acquired by using the tag information, and storing the audio data acquired from the time when the second tag information is detected until the time when the third tag information is detected, when the third tag information is detected And the name includes part or all of the character information of the identity information of the second speaker that matches the second tag information;

The time when the second tag information is detected is a time when the audio data named by the part or all of the character information of the first speaker is terminated, and the The time at which the information is marked is the time at which the acquisition of the audio data named with the character information of some or all of the second mark information is started.

In order to describe the above embodiment in more detail, after the second tag information is identified, according to the time of the first tag information and the second tag information, and in time sequence, the device will be detected first. The audio data acquired during the time when the first marker information is detected until the time when the second marker information is detected is saved as a part or all of the characters of the first speaker's identity information that matches the first marker information An audio file named by the information, which may be referred to herein as a first file, wherein the first file is marked with a time when the audio data is acquired, wherein the name of the first file may include the name of the first speaker, A avatar, job number, or a combination of multiple identities. The moment of the second mark information is used as the starting point of the audio data to be continuously collected. For example, the first tag information is "12:00; ZM", the second tag information is "12:10, LH", and when the second tag information is detected, it will be at 12:00-12: The audio data acquired in 10 is saved as an audio file named after the name "Zhang Ming" of the recorder whose name matches the content of the first tag information.

In addition, optionally, in the conference recording, if the same speaker has a speech during different periods, the above method for naming the audio data may be adopted, and the plurality of audio files marked with the same tag information of the content information may be used. When the content of the tag information of two or more audio files When the information matches the identity information of the same speaker, each audio file that matches the identity information of the same speaker may be named as the character information including some or all of the identity information of the same speaker in the name and A combination of sequence numbers indicating audio data of the same speaker, for example, "Zhang Ming-1", "Zhang Ming-2", and the like.

In an optional embodiment, if the gesture information or the marker information input by the user is not received after the eighth marker information is recognized, and the audio data is terminated without detecting the marker information, the audio data may be detected. The audio data acquired during the period from the eighth marker information time to the time when the acquisition of the audio data is terminated is saved as an audio data file.

In an optional embodiment, when the fourth marker information, the fifth marker information, the sixth marker information, and the seventh marker information have been detected in chronological order, the saving of the audio data marked with the marker information is saved. The steps include:

And storing the audio data marked with the detected fourth marker information, the fifth marker information, the sixth marker information, and the seventh marker information respectively;

The time when the fourth marker information is detected is the fourth time, the time when the fifth marker information is detected is the fifth time, and the time when the sixth marker information is detected is the sixth time, and the detected time is detected. The time at which the seventh mark information is described is the seventh time;

Determining whether the content information of the respective tag information is the same; if the content information of the fourth tag information and the content information of the sixth tag information are the same, according to the time sequence of the respective moments, the fourth The audio data acquired during the time from the time instant to the fifth time and the audio data acquired during the period from the sixth time to the seventh time are combined and saved as an audio file; wherein the name of the saved audio file includes Part or all of the character information in the identity information of the speaker that matches the fourth tag information or the sixth tag information;

If the content information of the arbitrary tag information is different, the audio data marked with the tag information is temporally separated by two adjacent tag information, that is, the fourth tag information is marked. The audio data of the fifth mark information, the sixth mark information, and the seventh mark information are stored as a period from the fourth time to the fifth time, a period from the fifth time to the sixth time, and a time from the sixth time to the seventh time. Three pieces of audio data acquired during the period; the name of each of the audio files stored therein includes some or all of the characters of the speaker's identity information that match the tag information detected at the temporally preceding time of the audio data Information, for example, from the fourth moment to the fifth moment The audio data acquired during the period is stored as an audio file whose name includes some or all of the character information in the identity information of the speaker that matches the fourth tag information.

In another embodiment of the present invention, the step of saving the audio data marked by using the tag information includes:

Converting and saving the content information of the tag information into audio data; wherein the converted audio data and the acquired audio data are saved in different audio tracks;

The audio track in which the audio information of the tag information is converted and saved is merged with the audio track in which the acquired audio data is stored, and saved as a new audio file. The track of the saved voice information converted into the track information is the track 1; the track for storing the acquired audio data is the track 2; the track 1 stores the voice information of the mark information, And the collected audio data is not saved.

The foregoing embodiment is described in detail. After detecting the previous marking information, the terminal records the content information of the preceding marking information and the moment when the preceding marking information is detected, and the preceding marking information is The content information is converted into voice information and stored in the track 1; the time at which the preceding mark information is the mark start time, and the time at which the detected mark information is detected is the end time of the mark, The acquired audio data is marked on the track mark information, and the acquired audio data is stored in the track 2; the audio data saved on the track 1 and the track 2 are combined and saved into one according to the timing at which the mark information is detected. a new audio file, wherein when the new audio file is played, the new audio file can be separated into track 1 and track 2 such that track 1 corresponds to the left channel and track 2 corresponds to the right channel. When playing with a multi-channel device, the left channel of the multi-channel device plays the voice information converted into the content information of the marker information stored in the track 1, and the right channel plays the recording.

In another embodiment of the present invention, before the step of combining the audio track storing the voice information and the audio track storing the acquired audio data into a new audio file, the method further includes:

Matching the tag information with the identity information of the speaker pre-existing in the speaker information base, and generating a matching relationship; wherein the speaker identity information database is a database pre-stored with the speaker identity information; according to the matching a relationship, the speaker identity information matching the tag information is marked in the collected audio data; and the collected audio data is extracted and saved on the second track, that is, the track 2.

The process includes: recording the identified tag information and converting it into a voice file, recording the time point of the identified tag information, marking the time point on the track 1 of the collected audio, and generating the time point information and the tag information of the tag information. Corresponding relationship list between the content information and the saved location information, and simultaneously converting the content information of the tag information into a corresponding voice file; matching the content information of the tag information with a speaker pre-existing in the speaker information database The identity information is generated and a matching relationship is generated. According to the correspondence relationship list and the matching relationship, the converted voice information is added to the matching relationship, and the new voice information is matched with the speaker identity information. New mapping relationship. Determining, according to the mapping relationship, speaker identity information matching the tag information in the collected audio data, and extracting and storing the second audio track, that is, the track 2; At the marked time point on the track 2, the track 1 and the track 2 are combined and saved as a new audio file, wherein the new audio file is named as a speaker matching the identified tag information. The identity information; wherein the new audio file can separate the track 1 and the track 2 during playback, and play with the multi-channel device.

Furthermore, in another embodiment of the present invention, the content information of the tag information is converted and saved as audio data; wherein the converted audio data and the acquired audio data are saved in different audio tracks. The step further includes: recording and saving according to the related art on the second track. For example, the recognized tag information is recorded and converted into a voice file, the time point of the recognized tag information is recorded, and the time is marked on the track 1 for acquiring the audio, and the time at which the tag information is detected, the content of the tag information is generated. a correspondence list between the information and the saved location information on the audio track, and simultaneously converting the content information of the tag information into a corresponding voice file; and the content information of the tag information and the pre-existing speaker information database The identity information of the speaker is matched, and a matching relationship is generated. According to the correspondence relationship list and the matching relationship, the converted voice information is added to the matching relationship, and the new voice information is generated. Matching the mapping relationship of the speaker identity information; at the same time, the acquired audio data is still stored on the track 2. The track 1 only saves the recorded voice file matching the mark information, does not record the speaker's audio information; the track 2 saves the acquired audio data; when the recording is completed, the track 1 and the track 2 are both The audio information is saved into a new audio file. When the new audio file is played, the track 1 and the track 2 are separated, so that the track 1 corresponds to the left channel, the track 2 corresponds to the right channel, and the multi-channel device is used for playing the recording, the left channel Play the audio file of track 1 and play the audio file of track 2 to the right channel. For example, if the content information of the tag information is converted into an audio file, "Zhang Ming", when the user inserts the ear When the machine plays the recording, the left channel of the earphone will play the “Zhang Ming” voice content at a certain point in the mark, and the right channel will play the speaker “Zhang Ming”.

Embodiments of the present invention further provide a computer readable storage medium storing computer executable instructions that are implemented when the computer executable instructions are executed.

3 is a schematic diagram of a terminal in the embodiment of the present invention. As shown in FIG. 3, the terminal in this embodiment includes an identification module 1 and a processing module 2;

The identification module 1 is configured to mark the audio data being acquired by using tag information, where the tag information includes content information and a time when the tag information is detected;

The processing module 2 is configured to save audio data marked with the tag information.

Optionally, the identification module includes: a receiving unit and an identifying unit; wherein

The receiving unit is configured to receive a gesture input by the user when the screen of the terminal is not lit;

The recognition unit is configured to save the trajectory of the gesture as a character that approximates a trajectory of the gesture, and record a time at which a gesture in which the trajectory approximates a character is detected.

Optionally, the identifying unit is further configured to receive the input tag information before the tag information is used to mark the audio data being acquired; save the content information of the tag information and the time when the tag information is detected .

Optionally, the identifying unit is configured to display a mark information input interface on the screen of the terminal when the screen of the terminal is lit and the terminal performs recording, and receive the mark information input by the user;

The step of identifying the mark information and saving the content information of the mark information and the time at which the mark information is detected includes: identifying and saving the content information of the input mark information as a character;

The input tag information includes a character that is input by the user through a gesture and approximates a trajectory of the gesture, and a pinyin, a stroke, and the like that are input by the user through an input method in the terminal.

In an optional embodiment, the terminal further includes: a storage module 3 and a matching module 4, where

The storage module is configured to match the tag information with identity information of a speaker pre-existing in the speaker information database, and if the identity information of the speaker in the speaker information database matches the tag information, generate Matching relationship

The matching module is configured to mark the audio data by using identity information of a speaker that matches the tag information according to the matching relationship.

Optionally, the storage module is configured to match the content information in the tag information with the speaker identity information in the pre-existing speaker information database, if there is a pre-stored speaker identity information in the speaker information database. Matching with the content information in the tag information, a matching relationship is generated.

Optionally, the processing module of the terminal includes: a marking unit, an extracting unit, and a saving unit; wherein, when the first marking information and the second marking information have been detected in time sequence,

The marking unit is configured to, when detecting the first marking information, mark the audio data being acquired by using a first speaker identity that matches the first marking information;

The saving unit is configured to, when the second tag information is detected, save audio data acquired from a time when the first tag information is detected to a time when the second tag information is detected; wherein the saving The name of the audio data includes some or all of the character information in the first speaker identity information that matches the first tag information; wherein the identity information of the speaker includes: the first speaker's name and avatar.

In another embodiment, the processing module includes a marking unit, a determining unit, and a saving unit; wherein, when the fourth marker information, the fifth marker information, the sixth marker information, and the seventh marker information have been detected in chronological order Time,

The marking unit is configured to store audio data marked with the fourth marker information, the fifth marker information, the sixth marker information, and the seventh marker information respectively; wherein the moment when the fourth marker information is detected is the fourth At the moment, the time when the fifth marker information is detected is the fifth time, the time when the sixth marker information is detected is the sixth time, and the time when the seventh marker information is detected is the seventh time;

The determining unit is configured to determine whether the content information of each of the tag information is the same;

The saving unit is configured to: if the content information of the fourth tag information and the content information of the sixth tag information are the same, according to the time sequence of the moments, the fourth moment will be The audio data acquired during the fifth time period and the audio data acquired during the period from the sixth time to the seventh time are combined and saved as one audio file; wherein the name of the saved audio file includes The fourth tag information or the sixth tag information matches part or all of the character information of the speaker's identity information.

In another embodiment, the processing module is further configured to convert and save the content information of the tag information into audio data; wherein the converted audio data and the acquired audio data are saved in different audio tracks;

The audio track in which the audio information of the tag information is converted and saved is merged with the second audio track in which the acquired audio data is stored, and saved as a new audio file.

The embodiment of the invention provides an audio data processing method and a terminal, which can generate a matching relationship according to the identified tag information according to the identified tag information, and mark and extract the collected audio file according to the matching relationship. It solves the problem that the speaker cannot be distinguished during the recording process, and also solves the cumbersome work of recording the recording content, and improves the efficiency of the recording content.

One of ordinary skill in the art can understand that all or part of the above steps can be completed by a program to instruct related hardware, and the program can be stored in a computer readable storage medium such as a read only memory, a magnetic disk or an optical disk. Correspondingly, each module in the foregoing embodiment may be implemented in the form of hardware, or may be implemented in the form of a software function module. This application is not limited to any specific combination of hardware and software.

The above is only an alternative embodiment of the present invention. Of course, the present application may also have various other embodiments. Those skilled in the art can make various according to the present application without departing from the spirit and spirit of the present application. Corresponding changes and modifications are intended to be included within the scope of the appended claims.

One of ordinary skill in the art will appreciate that all or a portion of the above steps may be performed by a program to instruct related hardware, such as a processor, which may be stored in a computer readable storage medium, such as a read only memory, disk or optical disk. Wait. Alternatively, all or part of the steps of the above embodiments may also be implemented using one or more integrated circuits. Correspondingly, each module/unit in the foregoing embodiment may be implemented in the form of hardware, for example, by implementing an integrated circuit to implement its corresponding function, or may be implemented in the form of a software function module, for example, being executed by a processor and stored in a memory. Programs/instructions to implement their respective functions. Embodiments of the invention are not limited to any specific form of combination of hardware and software.

Industrial applicability

The audio data processing method and terminal provided by the present application mark the recorded content and save the marked audio data, thereby improving the efficiency of organizing the recorded content.

Claims

An audio data processing method includes:

Marking the audio data being acquired with the tag information, wherein the tag information includes content information and a time when the tag information is detected;

The audio data marked with the tag information is saved.
The method of claim 1, before the step of marking the audio data being acquired with the tag information, the audio data processing method further comprising:

Receiving a gesture input by the user when the screen of the terminal is not lit;

The trajectory of the gesture is recognized and saved as a character that approximates the trajectory of the gesture, and the time at which the gesture of the trajectory approximates the character is detected is recorded.
The method according to claim 2, wherein the step of recognizing and saving the trajectory of the gesture as a character approximate to the trajectory of the gesture, and recording the moment at which the trajectory approximates the gesture of the character is recorded includes:

An edge that shapes the trajectory of the gesture is identified, and the trajectory of the gesture is saved as a character that approximates the trajectory of the gesture, and a character that saves the trajectory of the gesture is displayed.
The method of claim 1, before the step of marking the audio data being acquired with the tag information, the method of audio data processing further comprises:

Receiving input tag information;

The content information of the tag information and the time at which the tag information is detected are saved.
The method according to claim 4, wherein said step of receiving the input tag information comprises: displaying a tag information input interface on a screen of the terminal when the screen of the terminal is lit and the terminal performs recording, and receiving tag information input by the user ;

The step of identifying the mark information and saving the content information of the mark information and the time at which the mark information is detected includes: identifying and saving the content information of the input mark information as a character;

The input tag information includes a character that is input by the user through a gesture and approximates a trajectory of the gesture, and a pinyin, a stroke, and the like that are input by the user through an input method in the terminal.
A method according to any one of claims 1 to 5, the method further comprising:

Matching the tag information with the identity information of the speaker in the pre-existing speaker information database, and if the identity information of the speaker in the speaker information database matches the tag information, generating a matching relationship;

And according to the matching relationship, the audio data is marked with identity information of a speaker that matches the tag information.
The method of claim 6 wherein said matching said tag information with identity information of a speaker pre-existing in a database of speakers, if said speaker information has a speaker's identity information and said tag If the information matches, the steps of generating a matching relationship include:

Matching the content information in the tag information with the speaker identity information in the pre-existing speaker information database, if the pre-existing speaker identity information in the speaker information database matches the content information in the tag information, Then generate a matching relationship.
The method according to claim 6, wherein when the first tag information and the second tag information have been detected in chronological order, the step of saving the audio data marked with the tag information comprises:

When the first tag information is detected, the audio data that is being acquired is marked by using the identity information of the first speaker that matches the first tag information;

When the second tag information is detected, the audio data acquired from the time when the first tag information is detected to the time when the second tag information is detected is saved;

The name of the saved audio data includes part or all of character information in the first speaker identity information that matches the first tag information;

The identity information of the speaker includes: a name and an avatar of the first speaker.
The method according to claim 6, wherein said audio marked with said flag information is saved when fourth flag information, fifth flag information, sixth flag information, and seventh flag information have been detected in chronological order The steps of the data include:

And storing the audio data marked with the fourth marker information, the fifth marker information, the sixth marker information, and the seventh marker information respectively; wherein the moment when the fourth marker information is detected is the fourth moment, and the fifth marker is detected The moment of the information is the fifth moment, when the sixth marker information is detected The sixth moment is detected, and the moment when the seventh marker information is detected is the seventh moment;

Determining whether the content information of each of the tag information is the same;

If the content information of the fourth tag information and the content information of the sixth tag information are the same, the audio data acquired from the fourth time to the fifth time period and the time sequence of the respective time points are Audio data acquired from the sixth time to the seventh time

The merging is performed and saved as an audio file; wherein the name of the saved audio file includes some or all of the character information in the identity information of the speaker that matches the fourth tag information or the sixth tag information.
A method according to any one of claims 1 to 5, wherein said step of saving audio data marked with said tag information comprises:

Converting and saving the content information of the tag information into audio data; wherein the converted audio data and the acquired audio data are saved in different audio tracks;

The audio track in which the audio information of the tag information is converted and saved is merged with the audio track in which the acquired audio data is stored, and saved as a new audio file.
According to the method of claim 6, when the content information of the tag information of two or more audio files matches the identity information of the same speaker, each audio file matching the identity information of the same speaker is respectively A combination of character information including some or all of the identity information of the same speaker and a sequence number indicating the audio data of the same speaker is named.
A terminal includes an identification module and a processing module; wherein

The identification module is configured to mark the audio data being acquired by using tag information, where the tag information includes content information and a time when the tag information is detected;

The processing module is configured to save audio data marked with the tag information.
The terminal according to claim 12, wherein the identification module comprises: a receiving unit and an identifying unit; wherein

The receiving unit is configured to receive a gesture input by the user when the screen of the terminal is not lit;

The recognition unit is configured to save the trajectory of the gesture as being closer to the trajectory of the gesture Like a character, and record the moment when the gesture that approximates the character is detected.
The terminal according to claim 12, wherein said identification unit is further configured to receive the input tag information before said tag information is used to tag the audio data being acquired; save content information of said tag information and detect said The moment when the information is marked.
The terminal according to claim 14, wherein the identification unit is configured to display a mark information input interface on a screen of the terminal when the screen of the terminal is lit and the terminal performs recording, and receive the mark information input by the user;

The step of identifying the mark information and saving the content information of the mark information and the time at which the mark information is detected includes: identifying and saving the content information of the input mark information as a character;

The input tag information includes a character that is input by the user through a gesture and approximates a trajectory of the gesture, and a pinyin, a stroke, and the like that are input by the user through an input method in the terminal.
The terminal according to any one of claims 12-15, further comprising: a storage module and a matching module, wherein

The storage module is configured to match the tag information with identity information of a speaker pre-existing in the speaker information database, and if the identity information of the speaker in the speaker information database matches the tag information, generate Matching relationship

The matching module is configured to mark the audio data by using identity information of a speaker that matches the tag information according to the matching relationship.
The terminal according to claim 16, wherein the storage module is configured to match the content information in the tag information with the speaker identity information in the pre-existing speaker information database, if the speaker information database is pre-stored The identity information of the speaker matches the content information in the tag information, and a matching relationship is generated.
The terminal according to claim 16, wherein the processing module comprises: a marking unit and a saving unit; wherein, when the first marking information and the second marking information have been detected in chronological order,

The marking unit is configured to, when detecting the first marking information, mark the audio data being acquired by using the identity information of the first speaker that matches the first marking information;

The saving unit is configured to save the detection from the detection when the second marker information is detected An audio data acquired during a time when the information is marked until a time at which the second tag information is detected; wherein the name of the saved audio data includes a portion of the first speaker identity information that matches the first tag information Or all of the character information; wherein the identity information of the speaker includes: the first speaker's name and avatar.
The terminal according to claim 16, wherein the processing module includes a marking unit, a judging unit, and a saving unit; wherein, when the fourth mark information, the fifth mark information, the sixth mark information, and the Seven when marking information,

The marking unit is configured to store audio data marked with the fourth marker information, the fifth marker information, the sixth marker information, and the seventh marker information respectively; wherein the moment when the fourth marker information is detected is the fourth At the moment, the time when the fifth marker information is detected is the fifth time, the time when the sixth marker information is detected is the sixth time, and the time when the seventh marker information is detected is the seventh time;

The determining unit is configured to determine whether the content information of each of the tag information is the same;

The saving unit is configured to: if the content information of the fourth tag information and the content information of the sixth tag information are the same, the fourth time to the fifth time will be from the time sequence according to the time sequence of each time The audio data acquired during the period and the audio data acquired during the period from the sixth time to the seventh time are combined and saved as an audio file; wherein the name of the saved audio file includes the fourth Some or all of the character information in the identity information of the speaker that matches the tag information or the sixth tag information.
The terminal according to any one of claims 12-15, wherein the processing module is further configured to convert and save the content information of the tag information into audio data; wherein the converted audio data and the acquired audio The data is saved in different tracks;

The audio track in which the audio information of the tag information is converted and saved is merged with the audio track in which the acquired audio data is stored, and saved as a new audio file.