Voice recording method and device
Technical Field
The embodiment of the invention relates to the field of intelligent terminals, in particular to a voice recording method and device.
Background
Due to the rapid development of wireless mobile communication technology, users have become indispensable communication methods for voice communication using wireless mobile communication devices.
The user often encounters a scene that the party A tells the party B about some important information in the call when answering the call through a mobile phone, a tablet and other call terminals, for example, a teleconference, a leader sends some important information to a subordinate through the call terminal, the subordinate needs to record the content sent by the leader and arrange the content into a draft, and due to limited memory, the user easily forgets some important content, which brings trouble to the following work.
In the prior art, in order to solve such problems, a user may select a recording function of the call terminal itself, record call contents, and then arrange the recorded contents into a draft. However, this method has the following disadvantages: even if the user records the call content, the user still needs to listen to the recording and take the pen and the paper for recording afterwards, which is still inconvenient.
Disclosure of Invention
The embodiment of the invention provides a method and a device for voice recording, which enable the voice recording to be more convenient and faster.
In a first aspect, an embodiment of the present invention provides a method for voice recording, where the method includes:
acquiring first audio information and first audio information generation time when a first acquisition unit is in a working state, and forming text data corresponding to the first audio information according to the first audio information when the first acquisition unit is in an idle state;
acquiring second audio information and second audio information generation time when a second acquisition unit is in a working state, and forming text data corresponding to the second audio information according to the second audio information when the second acquisition unit is in an idle state;
and when the first acquisition unit and/or the second acquisition unit are in an idle state, sequencing text data corresponding to the first audio information and text data corresponding to the second audio information according to the first audio information generation time and the second audio information generation time to form a text file.
Further, the method also comprises the following steps:
establishing a corresponding relation between text data corresponding to the first audio information and the first audio information generation time;
and establishing a corresponding relation between the text data corresponding to the second audio information and the second audio information generation time.
Further, when the first capturing unit and/or the second capturing unit is in an idle state, the text data corresponding to the first audio information and the text data corresponding to the second audio information are sorted according to the first audio information generation time and the second audio information generation time to form a text file, including:
and after the first acquisition unit and/or the second acquisition unit are/is in an idle state and preset time passes, sequencing text data corresponding to the first audio information and text data corresponding to the second audio information according to the first audio information generation time and the second audio information generation time to form a text file.
Further, the method also comprises the following steps:
marking error character data in text data, establishing a mapping relation between the marked error character data and audio information corresponding to the error character data, and performing secondary identification conversion on first audio information corresponding to the error character data, wherein the text data comprises correct character data and error character data.
In a second aspect, an embodiment of the present invention further provides an apparatus for voice recording, where the apparatus includes:
the first text data forming module is used for acquiring first audio information and first audio information generation time when the first acquisition unit is in a working state, and forming text data corresponding to the first audio information according to the first audio information when the first acquisition unit is in an idle state;
the second text data forming module is used for acquiring second audio information and second audio information generation time when the second acquisition unit is in a working state, and forming text data corresponding to the second audio information according to the second audio information when the second acquisition unit is in an idle state;
and the text file forming module is used for sequencing text data corresponding to the first audio information and text data corresponding to the second audio information according to the first audio information generation time and the second audio information generation time when the first acquisition unit and/or the second acquisition unit are in an idle state to form a text file.
Further, the method also comprises the following steps:
the first corresponding relation establishing module is used for establishing the corresponding relation between the text data corresponding to the first audio information and the first audio information generation time;
and the second corresponding relation establishing module is used for establishing the corresponding relation between the text data corresponding to the second audio information and the second audio information generation time.
Further, the text file forming module is specifically configured to:
and after the first acquisition unit and/or the second acquisition unit are/is in an idle state and preset time passes, sequencing text data corresponding to the first audio information and text data corresponding to the second audio information according to the first audio information generation time and the second audio information generation time to form a text file.
Further, the method also comprises the following steps:
the mapping relation establishing module is used for marking error character data in text data, establishing a mapping relation between the marked error character data and audio information corresponding to the error character data, and performing secondary identification conversion on first audio information corresponding to the error character data, wherein the text data comprises correct character data and error character data.
The embodiment of the invention obtains the first audio information and the first audio information generation time when the first acquisition unit is in a working state, and forms text data corresponding to the first audio information according to the first audio information when the first acquisition unit is in an idle state; acquiring second audio information and second audio information generation time when the second acquisition unit is in a working state, and forming text data corresponding to the second audio information according to the second audio information when the second acquisition unit is in an idle state; and sequencing the text data corresponding to the first audio information and the text data corresponding to the second audio information according to the first audio information generation time and the second audio information generation time when the first acquisition unit and/or the second acquisition unit are in an idle state to form a text file. The condition that some important contents in the voice call process are forgotten due to limited memory is avoided, or the condition that the voice recording process is very complicated due to the fact that the voice call contents are recorded in a mode of recording the voice call, and recording is carried out while listening to the recording and by taking a pen and paper to record the recording contents is finished into a draft is avoided, the recording of voice can be more convenient, and user experience is improved.
Drawings
Fig. 1 is a flow chart of a method of voice recording in one embodiment of the present invention;
FIG. 2 is a flow chart of a method of voice recording according to a second embodiment of the present invention;
fig. 3 is a schematic structural diagram of a voice recording apparatus according to a third embodiment of the present invention.
Detailed Description
The present invention will be described in further detail with reference to the accompanying drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting of the invention. It should be further noted that, for the convenience of description, only some of the structures related to the present invention are shown in the drawings, not all of the structures.
Example one
Fig. 1 is a flowchart of a voice recording method according to an embodiment of the present invention, where the present embodiment is applicable to a voice recording situation, and the method may be executed by a voice recording apparatus according to an embodiment of the present invention, where the apparatus may be implemented in a software and/or hardware manner, as shown in fig. 1, the method specifically includes the following steps:
s110, acquiring first audio information and first audio information generation time when a first acquisition unit is in a working state, and forming text data corresponding to the first audio information according to the first audio information when the first acquisition unit is in an idle state.
The first acquisition unit can be in a working state when two users are in a communication process, and can also be in a working state when the two users are in a voice communication process through communication applications.
The first audio information is audio information sent by a user who carries out voice communication. The first audio information generation time may be a time when the first acquisition unit acquires that the user starts to send the audio information, or may be a time when the first acquisition unit acquires that the user finishes sending the audio information, which is not limited in this embodiment.
The first acquisition unit is in an idle state, which means that the first acquisition unit does not need to acquire audio information under the condition that a user does not send the audio information.
And S120, acquiring second audio information and the second audio information generation time when a second acquisition unit is in a working state, and forming text data corresponding to the second audio information according to the second audio information when the second acquisition unit is in an idle state.
The second acquisition unit can be in a working state when two users are in a communication process, and can also be in a working state when the two users are in a voice communication process through communication applications.
The second audio information is audio information sent by other users who have voice communication with the user. The second audio information generation time may be a time when the second acquisition unit acquires that another user performing a voice call with the user has just started to send audio information, or may be a time when the second acquisition unit acquires that another user performing a voice call with the user has finished sending audio information, which is not limited in this embodiment.
The second acquisition unit is in an idle state, which means that the second acquisition unit does not need to acquire audio information under the condition that other users who have voice communication with the user do not send audio information.
In a specific example, in the process of a conversation between a user a and a user B, a first acquisition unit is arranged on the user a side, a second acquisition unit is arranged on the user B side, when the user a speaks, the first acquisition unit is in a working state, the first acquisition unit acquires the utterance of the user a, records the time of acquiring the utterance of the user a, the user a finishes speaking, translates the utterance of the user a acquired by the first acquisition unit into characters, the user B speaks after the user a finishes speaking, the second acquisition unit is in a working state, the second acquisition unit acquires the utterance of the user B, records the time of acquiring the utterance of the user B, and translates the utterance of the user a into characters by using the idle speaking of the user B; and after the user B finishes speaking, the user A speaks, and the speaking idle of the user A is utilized to translate the speaking of the user B into characters.
S130, when the first acquisition unit and/or the second acquisition unit are/is in an idle state, the text data corresponding to the first audio information and the text data corresponding to the second audio information are sequenced according to the first audio information generation time and the second audio information generation time to form a text file.
The recording form of the text file is to record the speaking content of the user according to the sequence of the speaking time of the user.
Specifically, if the first acquisition unit is in an idle state, the text data are sequenced to form a text file; if the second acquisition unit is in an idle state, sequencing the text data to form a text file; and if the first acquisition unit and the second acquisition unit are in an idle state, sequencing the text data to form a text file.
Optionally, the method further includes:
establishing a corresponding relation between text data corresponding to the first audio information and the first audio information generation time;
and establishing a corresponding relation between the text data corresponding to the second audio information and the second audio information generation time.
Specifically, a correspondence between text data corresponding to first audio information and the first audio information generation time and a correspondence between text data corresponding to second audio information and the second audio information generation time may be established in advance, and then a text file may be formed according to the correspondence.
Optionally, when the first capturing unit and/or the second capturing unit are in an idle state, the text data corresponding to the first audio information and the text data corresponding to the second audio information are sorted according to the first audio information generation time and the second audio information generation time to form a text file, including:
and after the first acquisition unit and/or the second acquisition unit are/is in an idle state and preset time passes, sequencing text data corresponding to the first audio information and text data corresponding to the second audio information according to the first audio information generation time and the second audio information generation time to form a text file.
And the preset time is the time from the beginning of collecting the audio information to the time of sequencing the text data to form the text file. The preset time may be a value set empirically or a value set based on subjective thinking. The present embodiment does not limit this.
Specifically, if the first acquisition unit is in an idle state and a preset time elapses, the text data are sorted to form a text file; if the second acquisition unit is in an idle state and preset time passes, sequencing the text data to form a text file; and if the first acquisition unit and the second acquisition unit are in an idle state and preset time passes, sequencing the text data to form a text file.
In a specific example, a mode is added in the call process, namely a 'recording' mode, namely, recorded audio information is translated into characters through a voice recognition function, and specific call time is marked. In the voice communication process, after a recording mode is clicked, the system is divided into two parts, one part is a common recording function and can directly record the heard sound; the other part is to carry out noise processing on the recorded audio file at the background, namely, to remove the sound which does not conform to the human voice, and to translate the sound into characters through a voice recognition technology after collecting the sound. For example, the two users a and B can make a voice call, there is time when the call is made, when recording, the conversation is recorded according to the time, and the time sequence is counted, if the user a starts speaking first, the first acquisition unit acquires the speaking of the user a, records the speaking time of the user a, and records the speaking of the user a in the form of text, and when the user a finishes speaking the user B, the second acquisition unit acquires the speaking of the user B, records the speaking time of the user B, and records the speaking of the user B in the form of text. The text data corresponding to the audio information is sequenced according to the time sequence to form a text file, for example: the user A: …, user B: …, user A: …, user B: …, user A: … are provided.
According to the technical scheme of the embodiment, the first audio information and the first audio information generation time are acquired when the first acquisition unit is in a working state, and text data corresponding to the first audio information is formed according to the first audio information when the first acquisition unit is in an idle state; acquiring second audio information and second audio information generation time when the second acquisition unit is in a working state, and forming text data corresponding to the second audio information according to the second audio information when the second acquisition unit is in an idle state; and sequencing the text data corresponding to the first audio information and the text data corresponding to the second audio information according to the first audio information generation time and the second audio information generation time when the first acquisition unit and/or the second acquisition unit are in an idle state to form a text file. The condition that some important contents in the voice call process are forgotten due to limited memory is avoided, or the condition that the voice recording process is very complicated due to the fact that the voice call contents are recorded in a mode of recording the voice call, and recording is carried out while listening to the recording and by taking a pen and paper to record the recording contents is finished into a draft is avoided, the recording of voice can be more convenient, and user experience is improved.
Example two
Fig. 2 is a flowchart of a voice recording method according to a second embodiment of the present invention, which is optimized based on the first embodiment and provides a preferred voice recording method, specifically, the method further includes: marking error character data in text data, establishing a mapping relation between the marked error character data and audio information corresponding to the error character data, and performing secondary identification conversion on first audio information corresponding to the error character data, wherein the text data comprises correct character data and error character data.
Correspondingly, the method of the embodiment specifically includes the following steps:
s210, acquiring first audio information and first audio information generation time when a first acquisition unit is in a working state, and forming text data corresponding to the first audio information according to the first audio information when the first acquisition unit is in an idle state.
S220, second audio information and second audio information generation time are obtained when a second acquisition unit is in a working state, and text data corresponding to the second audio information is formed according to the second audio information when the second acquisition unit is in an idle state.
And S230, when the first acquisition unit and/or the second acquisition unit are/is in an idle state, sorting text data corresponding to the first audio information and text data corresponding to the second audio information according to the first audio information generation time and the second audio information generation time to form a text file.
S240, marking error character data in the text data, and establishing a mapping relation between the marked error character data and the audio information corresponding to the error character data, wherein the text data comprises correct character data and error character data.
Specifically, when the error character data is clicked, secondary recognition is performed on audio information linked with the error character data by adopting voice conversion software, and editable display is performed on the secondarily-recognized character information in the document.
Specifically, when the text data is sorted, the text data includes correct character data and incorrect character data. Therefore, the error character data is marked by the present invention, for example, the error character data may be marked by red underlining, or by changing the font color, or by annotation. And simultaneously, performing relevant link on the marked error character data and the audio information corresponding to the error character data, performing secondary identification on the audio information corresponding to the error character data when the error character data is clicked, and performing editable display on the secondarily identified character information data in a document. Thus, the error character data can be corrected and edited in the editable display to obtain corrected character information data, and the corrected character information data is used for replacing the error character data.
In a specific example, the content spoken in the first audio information is "you are really good and go shopping at a park bar together", and after the first audio information is subjected to identification conversion, the obtained content is "you are right and go to a official board together", so that it can be seen that "right" and "official board" are wrong text data, and therefore, when a text file is formed, the wrong text data "right" and "official board" are marked. The error word data can be corrected manually by clicking the error word data marked in the text file, because the error word data is associated and linked with the first audio information, the first audio information is subjected to secondary recognition conversion, editable display is performed in the text file, for example, the text file is displayed as 'true, complete, true, needle, earner, certificate … …', if the correct word exists in the editable content, the 'true' can be directly clicked to select, then the 'true' will be replaced by the 'false word data', if the correct corresponding word does not exist in the editable content, for example, the text is displayed as 'complete, true, needle, earner, certificate … …', then the 'true' can be clicked first, then the word matched with the 'true' is automatically displayed again, for example, 'good, luxury, No. … …', at this time, "good" can be selected again, so as to complete the correction of the error text message. Or, inserting the audio information corresponding to the error word data into a text file, and directly and manually changing the error word data by manually clicking the audio information.
According to the technical scheme of the embodiment, error character data in text data are marked, and a mapping relation is established between the marked error character data and audio information corresponding to the error character data. The voice recording is more convenient and accurate, and the user experience is improved.
EXAMPLE III
Fig. 3 is a schematic structural diagram of a voice recording apparatus according to a third embodiment of the present invention. The present embodiment may be applicable to the case of voice recording, and the apparatus may be implemented in a software and/or hardware manner, as shown in fig. 3, and includes a first text data forming module 310, a second text data forming module 320, and a text file forming module 330.
The first text data forming module 310 is configured to obtain first audio information and first audio information generation time when a first acquisition unit is in a working state, and form text data corresponding to the first audio information according to the first audio information when the first acquisition unit is in an idle state;
a second text data forming module 320, configured to obtain second audio information and second audio information generation time when a second acquisition unit is in a working state, and form text data corresponding to the second audio information according to the second audio information when the second acquisition unit is in an idle state;
the text file forming module 330 is configured to sort, when the first capturing unit and/or the second capturing unit is in an idle state, text data corresponding to the first audio information and text data corresponding to the second audio information according to the first audio information generation time and the second audio information generation time to form a text file.
Optionally, the method further includes:
the first corresponding relation establishing module is used for establishing the corresponding relation between the text data corresponding to the first audio information and the first audio information generation time;
and the second corresponding relation establishing module is used for establishing the corresponding relation between the text data corresponding to the second audio information and the second audio information generation time.
Optionally, the text file forming module 330 is specifically configured to:
and after the first acquisition unit and/or the second acquisition unit are/is in an idle state and preset time passes, sequencing text data corresponding to the first audio information and text data corresponding to the second audio information according to the first audio information generation time and the second audio information generation time to form a text file.
Optionally, the method further includes:
the mapping relation establishing module is used for marking error character data in text data, establishing a mapping relation between the marked error character data and audio information corresponding to the error character data, and performing secondary identification conversion on first audio information corresponding to the error character data, wherein the text data comprises correct character data and error character data.
According to the technical scheme of the embodiment, the first audio information and the first audio information generation time are acquired when the first acquisition unit is in a working state, and text data corresponding to the first audio information is formed according to the first audio information when the first acquisition unit is in an idle state; acquiring second audio information and second audio information generation time when the second acquisition unit is in a working state, and forming text data corresponding to the second audio information according to the second audio information when the second acquisition unit is in an idle state; and sequencing the text data corresponding to the first audio information and the text data corresponding to the second audio information according to the first audio information generation time and the second audio information generation time when the first acquisition unit and/or the second acquisition unit are in an idle state to form a text file. The condition that some important contents in the voice call process are forgotten due to limited memory is avoided, or the condition that the voice recording process is very complicated due to the fact that the voice call contents are recorded in a mode of recording the voice call, and recording is carried out while listening to the recording and by taking a pen and paper to record the recording contents is finished into a draft is avoided, the recording of voice can be more convenient, and user experience is improved.
The product can execute the method provided by any embodiment of the invention, and has corresponding functional modules and beneficial effects of the execution method.
It is to be noted that the foregoing is only illustrative of the preferred embodiments of the present invention and the technical principles employed. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, although the present invention has been described in greater detail by the above embodiments, the present invention is not limited to the above embodiments, and may include other equivalent embodiments without departing from the spirit of the present invention, and the scope of the present invention is determined by the scope of the appended claims.