CN112017655A

CN112017655A - Intelligent voice recording playback method and system

Info

Publication number: CN112017655A
Application number: CN202010726611.XA
Authority: CN
Inventors: 董林辉
Original assignee: Yunkai Intelligent Shenzhen Co ltd
Current assignee: Yunkai Intelligent Shenzhen Co ltd
Priority date: 2020-07-25
Filing date: 2020-07-25
Publication date: 2020-12-01

Abstract

The application relates to the technical field of sound recording, in particular to an intelligent voice recording playback method and an intelligent voice recording playback system, wherein the method comprises the following steps of S1: acquiring first sound information, controlling recording of indoor sound according to the first sound information, and generating a recording file; s2: identifying the voiceprint characteristics of indoor sound, generating voiceprint mark information according to the voiceprint characteristics, and separating the recording files by using different voiceprint mark information; s3: separating and identifying the sound recording files according to the voiceprint mark information to generate character information segments, and integrating the character information segments in the same sound recording file according to a time sequence to form a character information recording list; s4: and acquiring second sound information, and controlling to stop recording the indoor sound according to the second sound information. The method and the device have the effect of conveniently arranging the recording contents.

Description

Intelligent voice recording playback method and system

Technical Field

The present application relates to the field of sound recording technologies, and in particular, to an intelligent voice recording playback method and system.

Background

In daily work and life, recording functions such as recording conversations, recording meetings and the like are often used, stored recording contents are often required to be arranged to form notes for subsequent checking, and voice information is generally recognized through voice recognition software and translated into character information.

The existing voice recognition software generally recognizes all voices in the recording and directly translates the voices into continuous character records, and when a conversation or a conference is recorded, the voices are usually spoken by two or more people for use, so that a worker needs to listen to the recording again at the later stage and rearrange the character records, and the arrangement of the recording is complicated.

With respect to the related art described above, the inventors consider that there is a drawback that it is inconvenient to sort the character records of the audio records.

Disclosure of Invention

In order to facilitate the arrangement of recording contents, the application provides an intelligent voice recording playback method and system.

In a first aspect, the following technical solutions are adopted in an intelligent voice recording playback method provided by the present application:

an intelligent voice recording playback method comprises the following steps,

s1: acquiring first sound information, controlling recording of indoor sound according to the first sound information, and generating a recording file;

s2: identifying the voiceprint characteristics of indoor sound, generating voiceprint mark information according to the voiceprint characteristics, and separating the recording files by using different voiceprint mark information;

s3: separating and identifying the sound recording files according to the voiceprint mark information to generate character information segments, and integrating the character information segments in the same sound recording file according to a time sequence to form a character information recording list;

s4: and acquiring second sound information, and controlling to stop recording the indoor sound according to the second sound information.

By adopting the technical scheme, the recording file is divided according to the voiceprint mark information and is converted into the text information segment, so that the texts of one speech conversion of different people are separated, and the voiceprint characteristic is attached, and the recording content is convenient to arrange.

Preferably, the step S3 further includes S31, S31: and acquiring third sound information, acquiring the recording duration of the current recording file according to the third sound information, and generating recording node information corresponding to the current recording duration.

By adopting the technical scheme, when the recording is played back, the corresponding recording duration can be obtained according to the recording node information, so that the recording content needing to be played can be jumped to according to the requirement, and the effect of positioning the corresponding recording content when the recording is played back is achieved.

Preferably, the step S3 further includes S32, S32: and writing a node label corresponding to the current recording node information into the character information recording table according to the third sound information, wherein the node labels are arranged in time sequence in the same recording file.

By adopting the technical scheme, the node labels are written in the text information record table, so that the content in the text information record table is labeled according to the third sound information, and when a person calls the text information record table to check the content, the content in the text information record table is segmented according to the conference theme, so that the person can conveniently find the required content.

Preferably, the step S4 further includes S41, S41: and acquiring fourth sound information, and controlling and storing the recording file and the character information recording list according to the fourth sound information.

By adopting the technical scheme, the fourth sound information controls and stores the recording file and the character information recording list, so that the effect of storing the recording file and the character information recording list according to the requirement is achieved.

Preferably, the method further comprises steps S5, S5: and acquiring a fifth sound signal carrying the recording node information, and playing the recording file according to the recording node information carried by the fifth sound signal.

By adopting the technical scheme, through the fifth sound signal related to the text, different fifth sound signals correspond to different recording node information, and the effect of directionally playing the recording file according to the fifth sound signal is achieved.

In a second aspect, the present application provides an intelligent voice recording and playback system, which adopts the following technical solutions:

a smart voice listing playback system comprising:

the voice recognition and control module is used for recognizing indoor sound, and if the indoor sound is first sound information, first control information for controlling the recording of the indoor sound is generated; if the indoor sound is the second sound information, generating second control information for controlling the recording of the indoor sound to be finished; if the indoor sound is the third sound information, generating third control information for storing the current recording duration; if the indoor sound is the fourth sound information, generating fourth control information for controlling and storing the recording file and the character information recording list;

the sound acquisition module is used for acquiring first control information and recording indoor sound according to the first control information; the voice recorder is used for acquiring second control information and ending recording the indoor voice according to the second control information;

the storage module is used for storing the indoor sound recorded by the sound acquisition module, generating a recording file, acquiring third control information, acquiring the recording duration of the current recording file according to the third control information and generating recording node information corresponding to the current recording duration; acquiring fourth control information, and storing the sound recording file according to the fourth control information;

the voiceprint recognition module is used for recognizing the voiceprint characteristics of indoor sound and distributing different voiceprint mark information according to the voiceprint characteristics;

the voice conversion module is used for acquiring the recording file, acquiring voiceprint mark information, separating the recording file according to different voiceprint mark information, identifying the recording file to generate a text information segment carrying the voiceprint mark information, and integrating the text information segment according to a time sequence to generate a text information recording list; acquiring fourth control information, and sending the character information record table to the storage module according to the fourth control information; and acquiring the recording node information, and writing the node labels in the character information recording list in sequence according to the recording duration corresponding to the recording node information.

Through adopting above-mentioned technical scheme, sound acquisition module acquires indoor sound, speech recognition, a control module, whether discernment indoor sound carries sound information, thereby generate control information according to sound information, the storage module stores the recording file, voiceprint recognition module discerns the voiceprint characteristic in the indoor sound, and according to voiceprint characteristic generation voiceprint marking information, speech conversion module, the audio information who is used for converting the recording file is text information, the recording file is divided into a section by voiceprint marking information, speech conversion module is according to the division of voiceprint marking information with the recording file translation for text information section, and carry voiceprint marking information with text information section and form the text information record table, thereby have the effect of being convenient for arrange in order the recording content.

Preferably, the storage module comprises a temporary storage submodule, a storage submodule and a timing deletion submodule;

the temporary storage sub-module is used for acquiring the indoor sound recorded by the sound acquisition module and storing the indoor sound as a temporary storage file; acquiring third control information, generating recording node information from the current recording duration, and storing the recording node information in a corresponding temporary storage file;

the storage submodule is used for acquiring fourth control information and storing the temporary storage file as a recording file according to the fourth control information;

the timing deletion submodule is used for acquiring fourth control information and deleting the temporary storage file at a first preset time interval according to the fourth control information; and if the fourth control information is not acquired within the second preset time, deleting the temporary storage file after the second preset time.

Through adopting above-mentioned technical scheme, in the sound information storage and the submodule of keeping in that are acquireed by sound acquisition module, the storage submodule piece is the recording file according to the file of keeping in of third control information unloading, regularly deletes the submodule piece and deletes the file of keeping in according to the fourth control information control submodule piece of keeping in that acquires to the space of the submodule piece of keeping in of release is convenient for save next time, and this scheme has good transfer, buffer memory effect.

Preferably, the system also comprises an audio playing module,

and the audio playing module is used for receiving the fifth control information, judging the recording node information corresponding to the fifth control information, and starting to play the recording file by the recording node information.

By adopting the technical scheme, the audio playing module acquires the fifth control information, and the fifth control information controls the recording duration corresponding to the recording node information to start playing the audio file, so that the effect of positioning playing the audio file is achieved.

Preferably, the audio playing module comprises a judging submodule and a playing submodule;

the judgment submodule is used for receiving the fifth control information, acquiring the recording node information and judging the recording duration corresponding to the recording node information carried by the fifth control information;

and the playing submodule acquires the recording time length obtained by the judgment of the judging submodule, acquires the recording file or the temporary storage file, and skips the recording file or the temporary storage file to the time corresponding to the recording time length to play the recording file or the temporary storage file.

By adopting the technical scheme, the judgment submodule judges the information content of the fifth control information in advance, and the playing submodule plays the sound recording file or the temporary storage file, so that the effect of orderly controlling the sound recording file playing is achieved.

In summary, the present application includes at least one of the following beneficial technical effects:

1. dividing the recording file according to the voiceprint mark information and converting the recording file into character information segments, so that characters converted by one speech of different people are separated, and voiceprint characteristics are attached, so that the recording content is convenient to arrange;

2. when the recording is played back, the corresponding recording duration can be obtained according to the recording node information, so that the recording content needing to be played can be jumped to as required, and the effect of positioning the corresponding recording content when the recording is played back is facilitated;

3. the voice acquisition module acquires indoor voice, voice recognition, a control module, whether the indoor voice carries voice information is recognized, thereby control information is generated according to the voice information, a recording file is stored in a storage module, voiceprint characteristics in the indoor voice are recognized by a voiceprint recognition module, voiceprint mark information is generated according to the voiceprint characteristics, a voice conversion module is used for converting audio information of the recording file into character information, the recording file is divided into a section by the voiceprint mark information, the voice conversion module translates the recording file into a character information section according to the division of the voiceprint mark information, and the character information section carries the voiceprint mark information to form a character information recording list, thereby the effect of conveniently arranging recording content is achieved.

Drawings

FIG. 1 is a schematic block diagram of an intelligent voice recording and playback system according to an embodiment of the present application;

FIG. 2 is a block diagram of a memory module of an intelligent voice recording and playback system according to an embodiment of the present application;

fig. 3 is a block diagram of an audio playing module in an intelligent voice recording and playback system according to an embodiment of the present application;

fig. 4 is a flowchart of a method for playback of an intelligent voice listing according to an embodiment of the present application.

Detailed Description

The present application is described in further detail below with reference to figures 1-4.

The embodiment of the application discloses playback system is noted to intelligence pronunciation includes:

referring to fig. 1, the speech recognition and control module is configured to recognize an indoor sound, and if the indoor sound is first sound information, generate first control information for controlling starting recording of the indoor sound; if the indoor sound is the second sound information, generating second control information for controlling the recording of the indoor sound to be finished; if the indoor sound is the third sound information, generating third control information for storing the current recording duration; and if the indoor sound is the fourth sound information, generating fourth control information for controlling and storing the recording file and the character information recording list.

Specifically, the voice recognition and control module can be composed of a processor and a voice recognition program stored in a computer, when the voice recognition and control module is used, the voice recognition and control module is matched with a voice collection terminal to collect indoor voice, the voice recognition program recognizes voice content and responds according to the recognized content, specifically, the first voice information is information set according to a text and can be 'start recording' of Mandarin, the voice recognition program recognizes the 'start recording' information, and the processor generates first control information; the second sound information is set according to the text, the second sound signal can be 'recording end' of Mandarin, the voice recognition program recognizes the 'recording end' information, and the processor generates second control information; the third sound information is information set according to the text, the third sound information can be 'next theme' of Mandarin, the speech recognition program recognizes the 'next theme' information, and the processor generates third control information; the fourth voice information is set according to the text, the fourth voice information can be 'saved record' of Mandarin, the voice recognition program recognizes the 'saved record' information, and the processor generates the fourth control information.

The sound acquisition module is used for acquiring indoor sound, acquiring first control information and recording the indoor sound according to the first control information; and the voice recorder is used for acquiring the second control information and ending recording the indoor voice according to the second control information.

Specifically, the sound acquisition module may be a voice acquisition terminal with a recording function, such as a recording device, a microphone, and the like, and the sound acquisition module receives the first control information and starts recording; and the sound acquisition module receives the second control information and then ends the recording.

The storage module is used for storing the indoor sound recorded by the sound acquisition module, generating a recording file, acquiring third control information, acquiring the recording duration of the current recording file according to the third control information and generating recording node information corresponding to the current recording duration; and acquiring fourth control information, and storing the sound recording file according to the fourth control information.

Referring to fig. 2, further, the storage module includes a temporary storage sub-module, a storage sub-module, and a timed deletion sub-module.

The temporary storage sub-module is used for acquiring the indoor sound recorded by the sound acquisition module and storing the indoor sound as a temporary storage file; and acquiring third control information, generating recording node information from the current recording time length, and storing the recording node information in a corresponding temporary storage file.

And the storage submodule is used for acquiring the fourth control information and storing the temporary storage file as a recording file according to the fourth control information.

Specifically, the sound acquisition module acquires indoor sound, receives first control information, stores the indoor sound in the temporary storage sub-module, the temporary storage sub-module plays a role of transferring temporary storage files, when the storage sub-module acquires fourth control information, the temporary storage files are transferred and stored in the storage sub-module, the timed deletion sub-module acquires the fourth control information, the temporary storage files in the temporary storage sub-module are deleted after a first preset time interval, and the space occupied by the temporary storage files in the temporary storage sub-module is released, wherein in the embodiment, the first preset time is longer than the time required by the storage sub-module to transfer the temporary storage files; if the timed deletion sub-module does not acquire the fourth control information after the second preset time, the timed deletion sub-module controls the temporary storage sub-module to delete the current temporary storage file after the second preset time so as to release the space of the temporary storage sub-module, in the embodiment, the second preset time is one hour after the sound acquisition module receives the second control information, the timed deletion of the unnecessary temporary storage files can be conveniently realized actually according to the setting requirement, and the buffered temporary storage files are buffered for the second preset time before being deleted, so that personnel are prevented from forgetting to store the recording files; in actual use, ten minutes before the temporary storage file is deleted, the intelligent voice recording playback system can send out light and sound signals to remind that the temporary storage file is about to be deleted, the light signals can be generated by a flashing LED lamp, and the sound signals can be generated by a buzzer or a loudspeaker and the like; and the temporary storage submodule acquires third control information, the temporary storage files are stored in sequence, the first storage address of the temporary storage file after the recording duration of the current temporary storage file is stored in the recording node information, and the recording node information is sequenced according to the time sequence if the third control information is received for multiple times during the same recording period of the temporary storage file.

Referring to fig. 1, the voiceprint recognition module is configured to recognize a voiceprint feature of an indoor sound, and allocate different voiceprint flag information according to the voiceprint feature;

specifically, the voiceprint recognition module can be a voiceprint recognition program stored in the processor, the voiceprint recognition module can acquire indoor sound acquired by the voice acquisition module according to different voiceprint characteristics, such as speaking frequency, accent, words used for inertial use and the like, before the system is used in the embodiment, a voiceprint library of people who possibly participate in a conference is established in advance, and a background model is established.

Specifically, the voice conversion module synchronously acquires the temporary storage file and corresponding voiceprint mark information in the temporary storage file, identifies the audio information in the temporary storage file, converts the corresponding audio information into text information, divides the text information into text information sections by taking different voiceprint mark information as separation standards, and forms a text information recording table by matching the voiceprint mark information with the text information sections according to a time sequence, so that personnel can check the text information recording table, and thus the conference content, the conference theme number, the conference participant number and other information are obtained; the voice conversion module acquires the recording node information, writes node labels into the character information recording table in a long term when the temporary storage file corresponding to the recording node information records, and sets the node labels in the same character information recording table in sequence by Arabic numbers according to the time sequence, so that conference topics are divided according to the node labels when the character information recording table is checked conveniently.

Furthermore, the device also comprises an audio playing module,

referring to fig. 3, the audio playing module is configured to receive the fifth control information, determine recording node information corresponding to the fifth control information, and start playing the recording file according to the recording node information.

Further, the audio playing module comprises a judgment submodule and a playing submodule;

and the judgment submodule is used for receiving the fifth control information, acquiring the recording node information and judging the recording duration corresponding to the recording node information carried by the fifth control information.

Specifically, the fifth control information is generated in the voice recognition and control module, the voice recognition and control module obtains a fifth voice signal, the fifth voice signal may be "play nth theme" of mandarin, n may be 1, 2, 3, 4, etc., the voice recognition program recognizes the information of "play nth theme", in this embodiment, taking the fifth voice signal as "play 1 st theme", the voice recognition program recognizes the information of "play 1 st theme", the processor generates the fifth control information ordered as 1, if the voice recognition program recognizes the information of "play 2 nd theme", the processor generates the fifth control information ordered as 2, and so on; the fifth control information with the rank of 1 corresponds to the node record information with the rank of 1 according to the time sequence, and corresponds to the node label with the rank of 1 according to the time sequence, in this embodiment, the fifth control information, the node record information, and the node label are arranged according to the time sequence and are in one-to-one correspondence; the judgment submodule receives the fifth control information, obtains a sequence corresponding to the fifth control information, searches recording node information corresponding to the sequence according to the sequence corresponding to the fifth control information, obtains recording duration corresponding to the recording node information, and the playing submodule obtains the recording duration and jumps to a position of the recording file or the temporary storage file corresponding to the recording duration to play the recording file or the temporary storage file.

The implementation principle of the intelligent voice recording and playback system of the embodiment is as follows: the voice acquisition module acquires indoor voice, the voice recognition module identifies voice information carried by the indoor voice, control information is generated according to the voice information, the storage module stores a recording file, the voiceprint recognition module identifies voiceprint features in the indoor voice and generates voiceprint mark information according to the voiceprint features, the voice conversion module is used for converting audio information of the recording file into character information, the recording file is divided into a plurality of sections by the voiceprint mark information, the voice conversion module translates the recording file into character information sections according to the division of the voiceprint mark information and forms a character information recording list with the voiceprint mark information carried by the character information sections, and recording content is convenient to arrange.

The embodiment of the present application further discloses an intelligent voice recording playback method, which is applicable to the above intelligent voice recording recovery system, and with reference to fig. 4, the method includes the following steps:

s1: and acquiring first sound information, controlling recording of indoor sound according to the first sound information, and generating a recording file.

Specifically, the sound acquisition module acquires indoor sound, the voice recognition module and the control module recognize whether the indoor sound contains first sound information, if so, first control information is generated according to the first sound information, and the sound acquisition module is controlled to record the indoor sound and store the indoor sound in the storage module.

S2: and identifying the voiceprint characteristics of indoor sound, generating voiceprint mark information according to the voiceprint characteristics, and separating the recording files by using different voiceprint mark information.

Specifically, the voiceprint recognition module performs voiceprint recognition on the indoor sound acquired by the sound acquisition module, generates different voiceprint mark information corresponding to different voiceprints, and separates the recording file according to different voiceprint mark information, namely, separates the recording file into different speaker voices and stores the voices in the storage module.

S3: and separating and identifying the sound recording files according to the voiceprint mark information to generate character information segments, and integrating the character information segments in the same sound recording file according to a time sequence to form a character information recording list.

The voice conversion module converts the well-separated recording file in the storage module into a text information segment of one segment, the text information segment corresponds to one-time speaking of one person, then the storage module carries corresponding voiceprint mark information to be stored in a text information recording table according to a time sequence, thereby forming the text information recording table which takes one-time speaking of one person as a segmentation according to the time sequence, and after the personnel of being convenient for acquire the text information recording table, the number of the participants can be visually seen by the text information recording table, and the speaking personnel who speak at each time.

S31: and acquiring third sound information, acquiring the recording duration of the current recording file according to the third sound information, and generating recording node information corresponding to the current recording duration.

Specifically, the recording duration when recording is started is 0 minutes and 0 seconds, the recording duration gradually increases with the increase of time, the initial position of a certain section of recording content in the recording file can be known by the recording duration, the third sound information is used for separating conference subjects, after the storage module acquires the third control information, the recording duration of the current recording file is immediately acquired, the first recording storage address which starts after the recording duration is marked, recording node information corresponding to the recording storage address is formed, when the recording node information is called, the corresponding recording storage address can be called, and the recording file is started to be played from the position.

S32: and writing a node label corresponding to the current recording node information into the character information recording table according to the third sound information, wherein the node labels are arranged in time sequence in the same recording file.

Specifically, the node label is written in the text information record table, so that the change of the conference theme can be known according to the node label when the text information record table is checked conveniently.

Further S4 also includes S41.

S41: and acquiring fourth sound information, and controlling and storing the recording file and the character information recording list according to the fourth sound information.

S5: and acquiring a fifth sound signal carrying the recording node information, and playing the recording file according to the recording node information carried by the fifth sound signal.

The above embodiments are preferred embodiments of the present application, and the protection scope of the present application is not limited by the above embodiments, so: all equivalent changes made according to the structure, shape and principle of the present application shall be covered by the protection scope of the present application.

Claims

1. An intelligent voice recording playback method is characterized by comprising the following steps,

2. The method of claim 1, wherein step S3 further comprises S31,

3. The method of claim 2, wherein step S3 further comprises step S32,

4. The method of claim 1, wherein step S4 further comprises S41,

5. The method for playing back a smart voice listing according to claim 1, further comprising step S5,

6. An intelligent voice listing playback system, comprising:

the sound acquisition module is used for acquiring indoor sound, acquiring first control information and recording the indoor sound according to the first control information; the voice recorder is used for acquiring second control information and ending recording the indoor voice according to the second control information;

7. The system according to claim 6, wherein the storage module comprises a temporary storage sub-module, a storage sub-module and a timing deletion sub-module;

8. The system of claim 7, further comprising an audio playing module,

and the audio playing module is used for receiving the fifth control information, judging the recording node information corresponding to the fifth control information, and starting to play the recording file or the temporary storage file according to the recording node information.

9. The system according to claim 8, wherein the audio playing module comprises a determining sub-module and a playing sub-module;