CN113676591A - Recording method and device - Google Patents

Recording method and device Download PDF

Info

Publication number
CN113676591A
CN113676591A CN202110919680.7A CN202110919680A CN113676591A CN 113676591 A CN113676591 A CN 113676591A CN 202110919680 A CN202110919680 A CN 202110919680A CN 113676591 A CN113676591 A CN 113676591A
Authority
CN
China
Prior art keywords
recording
voiceprint
audio
information
target
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110919680.7A
Other languages
Chinese (zh)
Inventor
边涛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Vivo Mobile Communication Hangzhou Co Ltd
Original Assignee
Vivo Mobile Communication Hangzhou Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Vivo Mobile Communication Hangzhou Co Ltd filed Critical Vivo Mobile Communication Hangzhou Co Ltd
Priority to CN202110919680.7A priority Critical patent/CN113676591A/en
Publication of CN113676591A publication Critical patent/CN113676591A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72403User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality
    • H04M1/7243User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages
    • H04M1/72433User interfaces specially adapted for cordless or mobile telephones with means for local support of applications that increase the functionality with interactive means for internal management of messages for voice messaging, e.g. dictaphones
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L17/00Speaker identification or verification techniques
    • G10L17/02Preprocessing operations, e.g. segment selection; Pattern representation or modelling, e.g. based on linear discriminant analysis [LDA] or principal components; Feature selection or extraction
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M1/00Substation equipment, e.g. for use by subscribers
    • H04M1/72Mobile telephones; Cordless telephones, i.e. devices for establishing wireless links to base stations without route selection
    • H04M1/724User interfaces specially adapted for cordless or mobile telephones
    • H04M1/72448User interfaces specially adapted for cordless or mobile telephones with means for adapting the functionality of the device according to specific conditions
    • H04M1/72454User interfaces specially adapted for cordless or mobile telephones with means for adapting the functionality of the device according to specific conditions according to context-related or environment-related conditions

Landscapes

  • Engineering & Computer Science (AREA)
  • Human Computer Interaction (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Business, Economics & Management (AREA)
  • Business, Economics & Management (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Environmental & Geological Engineering (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

The application discloses a recording method and a recording device, and belongs to the technical field of audio processing. The recording method comprises the steps of obtaining an audio signal, carrying out voiceprint characteristic detection on the audio signal to obtain voiceprint characteristic information, and starting recording on the audio signal under the condition that the voiceprint characteristic is matched with target voiceprint characteristic information, wherein the target voiceprint characteristic is obtained by carrying out voiceprint detection on pre-recorded audio.

Description

Recording method and device
Technical Field
The application belongs to the technical field of audio processing, and particularly relates to a recording method and device.
Background
Recording is a process of recording sound signals, people record sound by using media such as electronic equipment and the like, information is conveniently recorded, and recorded contents can be arranged in a playback mode in the future.
And when the recording device receives the recording instruction, starting recording until the recording stop instruction is received, and stopping recording. However, the inventor has studied the recording process of the prior art, and found that there is at least a problem in the prior art that a large blank content or a noise content often appears before and after a specific sound is recorded, and the content occupies the recording storage space, which increases the storage burden of the recording device during the recording process.
Disclosure of Invention
The embodiment of the application aims to provide a recording method and a recording device, which can solve the problem that the storage burden of a recording device is heavy in the recording process.
In a first aspect, an embodiment of the present application provides a recording method, where the method includes:
acquiring an audio signal;
carrying out voiceprint characteristic detection on the audio signal to obtain voiceprint characteristic information;
and under the condition that the voiceprint characteristic information is matched with target voiceprint characteristic information, starting to record the audio signal, wherein the target voiceprint characteristic information is obtained by carrying out voiceprint detection on pre-recorded audio.
In a second aspect, an embodiment of the present application provides a sound recording apparatus, including:
the acquisition module acquires an audio signal;
the detection module is used for carrying out voiceprint characteristic detection on the audio signal to obtain voiceprint characteristic information;
and the recording module starts to record the audio signal under the condition that the voiceprint characteristics are matched with the target voiceprint characteristic information.
Optionally, the recording module is specifically configured to:
and filtering a first audio frequency without the target voiceprint characteristic information in the audio signal, and recording a second audio frequency with the target voiceprint characteristic information in the audio signal.
Optionally, the recording apparatus provided in this embodiment of the present application further includes:
the display module displays user identity information corresponding to at least two candidate voiceprint feature information before starting to record the audio signal, wherein the candidate voiceprint feature information is obtained by carrying out voiceprint detection on a pre-recorded audio, and the user identity information is obtained by pre-marking the corresponding candidate voiceprint feature information;
the first receiving module receives a first input of a user;
and the determining module is used for responding to the first input, acquiring appointed user identity information and determining candidate voiceprint characteristic information corresponding to the appointed user identity information as the target voiceprint characteristic information.
Optionally, the candidate voiceprint feature information is voiceprint feature information matched with the at least two detected voiceprint feature information when the voiceprint feature detection is performed on the audio signal and the at least two pieces of voiceprint feature information are detected.
Optionally, the display module is specifically configured to:
before voiceprint feature detection is carried out on the audio signal, user identity information corresponding to at least two candidate voiceprint feature information respectively is displayed under the condition that second input of a user is received and responded.
Optionally, the recording module is specifically configured to:
under the condition that the voiceprint feature information is matched with target voiceprint feature information, starting to perform voice recognition on a second audio frequency with the target voiceprint feature information in the audio signal, and outputting text information;
and starting recording the audio signal under the condition that the voiceprint characteristic information is matched with the target voiceprint characteristic information and the text information contains the target keyword.
Optionally, the recording module is further specifically configured to:
and in the recording process, stopping recording under the condition that the voiceprint characteristic information matched with the target voiceprint characteristic information is not detected in a specified time period.
Optionally, the recording apparatus provided in this embodiment of the present application further includes:
the second receiving module is used for receiving a third input of the user in the recording process;
and the inserting module is used for responding to the third input and inserting the label into the sound recording file.
Optionally, the third input is in the form of at least one of:
the voice frequency input with the voice print characteristic information matched with the registered voice print characteristic information is carried out, wherein the registered voice print characteristic information is obtained by carrying out voice print detection on the voice frequency pre-recorded by a registered user of the recording equipment;
input of target type audio;
and inputting a motion signal matched with the target motion, wherein the target motion is preset.
In a third aspect, an embodiment of the present application provides an electronic device, which includes a processor, a memory, and a program or instructions stored on the memory and executable on the processor, and when executed by the processor, the program or instructions implement the steps of the method according to the first aspect.
In a fourth aspect, embodiments of the present application provide a readable storage medium, on which a program or instructions are stored, which when executed by a processor implement the steps of the method according to the first aspect.
In a fifth aspect, an embodiment of the present application provides a chip, where the chip includes a processor and a communication interface, where the communication interface is coupled to the processor, and the processor is configured to execute a program or instructions to implement the method according to the first aspect.
In a sixth aspect, the present application provides a computer program product comprising a computer program which, when executed by a processor, implements the steps of the method according to the first aspect.
In the embodiment of the application, by acquiring the audio signal, voiceprint feature detection is performed on the audio signal to obtain voiceprint feature information, and recording is started on the audio signal under the condition that the detected voiceprint feature information is matched with the target voiceprint feature information. Therefore, the invalid contents such as a large blank or noise and the like can be prevented from being recorded before the audio contents with the target voiceprint characteristic information, so that the internal storage space of the recording equipment is saved, and the recording efficiency is improved.
Drawings
FIG. 1 is a flowchart of a recording method according to an embodiment of the present disclosure;
FIG. 2 is a second flowchart of a recording method according to an embodiment of the present application;
FIG. 3 is a third flowchart of a recording method according to an embodiment of the present application;
FIG. 4 is a fourth flowchart of a recording method according to an embodiment of the present application;
FIG. 5 is a fifth flowchart of a recording method according to an embodiment of the present application;
FIG. 6 is a sixth flowchart of a recording method according to an embodiment of the present application;
FIG. 7 is a schematic diagram illustrating a recording dotting principle of a recording apparatus according to an embodiment of the present application;
fig. 8 is a second schematic view illustrating a recording dotting principle of a recording apparatus according to an embodiment of the present application;
fig. 9 is a third schematic view illustrating a recording dotting principle of a recording apparatus according to an embodiment of the present application;
FIG. 10 is a fourth schematic view illustrating a recording dotting principle of a recording apparatus according to an embodiment of the present application;
FIG. 11 is a schematic structural diagram of a recording apparatus according to an embodiment of the present disclosure;
fig. 12 is a second schematic structural diagram of a recording apparatus according to an embodiment of the present application;
fig. 13 is a third schematic structural diagram of a recording apparatus according to an embodiment of the present application;
fig. 14 is a schematic structural diagram of an electronic device provided in an embodiment of the present application;
fig. 15 is a schematic hardware configuration diagram of an electronic device implementing the embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described clearly below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some, but not all, embodiments of the present application. All other embodiments that can be derived by one of ordinary skill in the art from the embodiments given herein are intended to be within the scope of the present disclosure.
The terms first, second and the like in the description and in the claims of the present application are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It will be appreciated that the data so used may be interchanged under appropriate circumstances such that embodiments of the application may be practiced in sequences other than those illustrated or described herein, and that the terms "first," "second," and the like are generally used herein in a generic sense and do not limit the number of terms, e.g., the first term can be one or more than one. In addition, "and/or" in the specification and claims means at least one of connected objects, a character "/" generally means that a preceding and succeeding related objects are in an "or" relationship.
The recording method, the recording apparatus, the electronic device, and the storage medium provided in the embodiments of the present application are described in detail below with reference to the accompanying drawings through specific embodiments and application scenarios thereof.
Fig. 1 is a flowchart of a recording method according to an embodiment of the present application, where an execution main body of the method may be a recording apparatus, or a control module of the recording apparatus for executing the recording method. Specifically, the sound recording device may include, but is not limited to, a mobile phone with a sound recording function, or a communication device such as a tablet computer. The execution main body of the method may also be a server corresponding to the recording device, and is not specifically limited herein.
Referring to fig. 1, a recording method provided in an embodiment of the present application includes the following steps:
step 110: acquiring an audio signal;
step 120: carrying out voiceprint characteristic detection on the audio signal to obtain voiceprint characteristic information;
step 130: and under the condition that the voiceprint characteristic information is matched with target voiceprint characteristic information, starting to record the audio signal, wherein the target voiceprint characteristic information is obtained by carrying out voiceprint detection on pre-recorded audio.
The audio signal is a representation of sound, and the recording apparatus may be configured with a microphone to collect the audio signal, thereby implementing acquisition of the audio signal.
The voiceprint feature information is also called a voiceprint fingerprint (Acoustic fingerprint), and is specifically represented as a sound wave spectrum carrying speech information, so that the voiceprint feature detection can be performed on the audio signal, and the sound wave spectrum can be extracted from the audio signal.
Optionally, voiceprint feature information is extracted from the audio message through a specific algorithm to realize voiceprint feature detection.
In this case, in the case where step 120 is performed to extract voiceprint feature information from the audio signal, the extracted voiceprint feature information is matched with the pre-recorded candidate voiceprint feature information. And when the matching result shows that the target voiceprint characteristic information matched with the extracted voiceprint characteristic information is matched in the pre-recorded candidate voiceprint characteristic information, determining that the extracted voiceprint characteristic information is matched with the target voiceprint characteristic information based on the matching result, and starting to record the audio signal.
In addition, when the matching result shows that the target voiceprint feature information matched with the extracted voiceprint feature information is not matched, the recording may not be started until the target voiceprint feature information matched with the extracted voiceprint feature information is matched, and the recording is started.
In this embodiment, the detected voiceprint feature information is used for matching with the pre-recorded candidate voiceprint feature information, so that the candidate voiceprint feature information matched with the detected voiceprint feature information can be obtained, and the target voiceprint feature information can be quickly located.
Alternatively, the recording device may locally store pre-recorded candidate voiceprint feature information. In this case, the recording apparatus, upon acquiring the audio signal, performs voiceprint feature information matching of the voiceprint feature information detected from the audio signal with candidate voiceprint feature information stored locally by the recording apparatus to obtain a matching result, and starts recording in a case where the target voiceprint feature information is matched.
Alternatively, the pre-recorded candidate voiceprint feature information may be stored in the corresponding server. In this case, when the recording device acquires the audio signal, the recording device may send the audio signal or the voiceprint feature information detected from the audio signal to the server, and the server matches the detected voiceprint feature information with the pre-recorded candidate voiceprint feature information stored in the server, and returns the matching result to the recording device when the target voiceprint feature information is matched.
Optionally, starting to record the audio signal, specifically including:
and filtering a first audio frequency without the target voiceprint characteristic information in the audio signal, and recording a second audio frequency with the target voiceprint characteristic information in the audio signal.
This alternative embodiment may enable selective recording of different tones in the audio signal. When a plurality of sound audios are identified in the audio signal at the same time, the noise, noise and other interference sounds which do not belong to the target voiceprint feature information can be filtered from the audio signal based on the target voiceprint feature information, and the specific audio to which the target voiceprint feature information belongs is recorded, so that the recording quality is improved, and the recording efficiency is further improved.
For example, in practical applications, in a classroom environment, the voiceprint feature information of the teacher may be set as the target voiceprint feature information, and when the voiceprint feature information of the teacher is detected, the voice of the teacher is recorded, and other classmates and environmental noise are filtered out.
Optionally, when the audio signal is recorded, the first audio without the target voiceprint feature information in the audio signal may not be filtered, so that both the first audio without the target voiceprint feature information and the second audio with the target voiceprint feature information are recorded.
Fig. 2 is a flowchart of a recording method according to an embodiment of the present application, where before starting recording an audio signal, target voiceprint feature information is determined by the following steps:
step 210: displaying user identity information corresponding to at least two candidate voiceprint feature information respectively, wherein the candidate voiceprint feature information is obtained by carrying out voiceprint detection on a prerecorded audio frequency, and the user identity information is obtained by marking the corresponding candidate voiceprint feature information in advance;
step 220: receiving a first input of a user;
step 230: and responding to the first input, acquiring appointed user identity information, and determining candidate voiceprint characteristic information corresponding to the appointed user identity information as the target voiceprint characteristic information.
The embodiment shown in fig. 2 provides a user customization scheme for the target voiceprint feature information, and by displaying the user identity information, the user specifies the user voice to be recorded through the first input, so that the user requirements are met, and the recording device can quickly locate the voice audio to be recorded according to the specified target voiceprint feature information, thereby improving the recording efficiency.
Alternatively, the sound recording device may provide a touch screen to receive the first input, and the first input is a press, a slide, a single click, a double click, or the like on the touch screen, which is not limited herein.
Optionally, the sound recording apparatus may further provide a physical button to receive the first input, and the first input is a pressing operation of the physical button.
Optionally, the first input is a voice instruction.
In this embodiment, the user identity information may include, but is not limited to, name, nickname, occupation, and the like.
Alternatively, the candidate voiceprint feature information may be voiceprint feature information that matches the at least two detected voiceprint feature information in a case where the voiceprint feature detection is performed on the audio signal and the at least two voiceprint feature information are detected.
In this embodiment, when performing voiceprint feature detection on an original audio signal, multiple pieces of voiceprint feature information may be detected, and part or all of the detected voiceprint feature information is matched with the candidate voiceprint feature information, at this time, an audio of one or more pieces of target voiceprint feature information to be recorded may be selected by a user.
In another embodiment, when the original audio signal is subjected to voiceprint feature detection, it may also be detected that a single voiceprint feature information matches the target voiceprint feature information, in which case recording may be started.
Optionally, before the voiceprint feature detection is performed on the audio signal, in a case of receiving and responding to a second input of the user, user identity information corresponding to at least two candidate voiceprint feature information respectively is presented.
In this scenario, in the process from the start-up of the recording apparatus to the voiceprint feature detection of the audio signal, the designation of the target voiceprint feature information by the user through the second input may be received, so as to obtain the target voiceprint feature information in advance. Therefore, when the voiceprint characteristic information of the audio signal is detected, the voiceprint characteristic detection and matching can be carried out on the audio signal according to the target voiceprint characteristic information obtained in advance, the efficiency of detecting and matching the voiceprint characteristic can be improved, and the audio to be recorded can be quickly positioned.
In this embodiment, when the detected voiceprint feature information matches the target voiceprint feature information, the audio signal is recorded, so that a large blank or noise before the formal recording content can be avoided, and the recording efficiency is improved and the storage space of the recording device is saved.
Optionally, referring to an optional embodiment shown in fig. 3, the recording method provided in this embodiment specifically includes the following steps:
step 310: acquiring an audio signal, specifically referring to the content of step 110 above, which is not described herein again;
step 320: performing voiceprint feature detection on the audio signal to obtain voiceprint feature information, which specifically refers to the content of step 120 above and is not described herein again;
step 330: under the condition that the voiceprint feature information is matched with the target voiceprint feature information, starting to perform voice recognition on a second audio frequency with the target voiceprint feature information in the audio signal, and outputting text information;
step 340: and starting recording the audio signal under the condition that the voiceprint characteristic information is matched with the target voiceprint characteristic information and the text information contains the target keyword.
The embodiment provides the opportunity for starting recording, namely, the recording can be started under the condition that the target voiceprint feature information matching and the target keyword detection are simultaneously met. In practical application, when detecting the voiceprint feature information matched with the target voiceprint feature information, the recording can be started only when detecting the target keyword in the text information instead of starting the recording, so that the specific audio can be recorded, the recording amount is further reduced, and the storage space of the recording equipment is saved.
In a specific application scenario, the target keyword may be set in advance and used as a basis for identifying the text information. Specifically, the target keywords may be stored in a dictionary, so that the text information may be matched with the target keywords extracted in the dictionary to determine whether the text information includes the target keywords.
Specifically, the target keyword may be "please note", "focus below", "content below should be remembered as necessary", and the like, and is not limited herein.
Alternatively, the target keyword may be different corresponding to different stored candidate voiceprint feature information. In this case, the target keyword may be determined based on the voiceprint feature information with which the target voiceprint feature information is associated.
In a specific scene, if different speaking users have specific vocalization or expression modes, a target keyword can be extracted from the different speaking users to form a dictionary, and the target keyword and corresponding target voiceprint feature information are associated, so that the associated target keyword can be extracted from the dictionary in a targeted manner under the condition of determining the target voiceprint feature information.
Optionally, referring to fig. 4, the recording method provided in this embodiment includes the following steps:
step 410: acquiring an audio signal;
step 420: carrying out voiceprint characteristic detection on the audio signal to obtain voiceprint characteristic information;
step 430: starting to record the audio signal under the condition that the voiceprint characteristic information is matched with target voiceprint characteristic information, wherein the target voiceprint characteristic information is obtained by carrying out voiceprint detection on pre-recorded audio;
step 440: and in the recording process, stopping recording under the condition that the voiceprint characteristic information matched with the target voiceprint characteristic information is not detected in a specified time period.
Here, the steps 410, 420 and 430 may refer to the contents of the above steps 110, 120 and 130, respectively, and are not described herein again.
By using the embodiment, the recording of blank or invalid audio for a long time after the recorded specific audio containing the target voiceprint characteristic information can be prevented, the storage space of the recording equipment can be further saved, and the recording efficiency can be improved.
Fig. 4 is a flowchart of a recording method according to an alternative embodiment of the present application, where the method includes the following steps:
step 510: acquiring an audio signal;
step 520: carrying out voiceprint characteristic information detection on the audio signal;
step 530; starting to record the audio signal under the condition that the voiceprint characteristic information is matched with target voiceprint characteristic information, wherein the target voiceprint characteristic information is obtained by carrying out voiceprint detection on pre-recorded audio;
step 540: receiving a third input of the user in the recording process;
step 550: inserting a tag in the sound recording file in response to the third input.
The steps 510, 520, and 530 may refer to the contents of the above steps 110, 120, and 130, respectively, and are not described herein again.
In the embodiment, the inserted tag plays a role in positioning the position of the specific audio, so that the requirement of a user on the fixed-point marking of the recorded audio is met, and the recording function options of the recording equipment are expanded.
Optionally, the third input is in the form of at least one of:
the voice frequency input with the voice print characteristic information matched with the registered voice print characteristic information is carried out, wherein the registered voice print characteristic information is obtained by carrying out voice print detection on the voice frequency pre-recorded by a registered user of the recording equipment;
input of target type audio;
and inputting a motion signal matched with the target motion, wherein the target motion is preset.
Wherein, the registered voiceprint characteristic information and the above target voiceprint characteristic information can be different or the same. For example, in an application scenario, the registered voiceprint feature information may be a sound from a non-speaking user, such as a questioning and speaking sound of a user to which the recording apparatus belongs and other listening users, which is not limited herein.
The target type audio refers to a preset type audio, and a specific audio type is a tapping sound of a physical entity such as a desk, a recording device, and the like.
The action signal may be a gesture received by the recording device through the touch screen.
In this embodiment, the form of the label inserted in the audio file is not limited, and may be a numeral, a character, or the like.
Optionally, the tag is a third audio with the registered voiceprint feature information, wherein the third audio is derived from the audio signal.
In this case, the third audio is the tag itself. In a specific application scenario, the user may present his own point of view to the speech audio while listening to the speech audio, which may be input into the recording device in the form of the third audio.
Alternatively, the insertion of the tag in the sound recording file may also be triggered when voiceprint feature information matching the registered voiceprint feature information is detected and a specified keyword is identified in the audio having the registered voiceprint feature information. At this time, the specified keyword may be "please mark" or "please mark", and is not limited specifically herein.
Referring to fig. 6, a recording method according to an embodiment of the present application is described below with reference to a specific application scenario, where the method specifically includes the following steps.
Step 610: and (4) audio recording configuration.
And audio recording configurations including but not limited to audio filtering configurations, audio dotting configurations and recording option configurations.
In the audio filtering configuration, a voiceprint file uploaded by a user is received, identified and recorded as target voiceprint characteristic information, so that a specific audio with the target voiceprint characteristic information can be recorded later.
In an audio dotting configuration, it may be set to trigger dotting by clicking on an on-screen button, physical key, sensor, or specific audio. Automatic dotting can be set, the audio can be automatically identified in a mode of presetting keywords or keywords and voiceprints, and dotting records can be automatically generated when a user preset scheme is identified.
Wherein, in the recording option configuration, the recording device provides a plurality of recording options: recording an internal sound source; recording an external sound source; automatically skipping blank audio; and recording the voice. Wherein the internal sound source is a voice call sound inside the device, such as a mobile phone call or a social voice call.
Step 620: recording is automatically started using voiceprint recognition. Specifically, recording is automatically performed when a specific voiceprint is recognized, and recording can be automatically finished when the specific voiceprint is not recognized within a certain time.
Step 630: tags are inserted into the recorded audio by dotting.
Optionally, referring to the embodiment shown in fig. 7, a button 7A is suspended on the screen of the device for the user to trigger, and each time a user click is received, the device locally generates a file recording time point, which finally corresponds to the recorded audio file. When the user needs to mark the important content, more thinking and recording are often not performed, and the method and the device can provide a boundary mode to help the user mark the important content.
Optionally, sensor-based dotting may also be provided. Specifically, referring to fig. 8, the user shakes the recording apparatus 80, and the double-headed arrow shown in fig. 8 indicates the direction of shaking.
Alternatively, with reference to the embodiment shown in fig. 9, a physical key 9A may be provided to receive a user press, triggering a dotting to insert a tag in the sound recording file.
Alternatively, with reference to the embodiment shown in fig. 10, dotting may be performed by receiving a specific type of sound 10A produced by striking a table.
Optionally, an automatic dotting function is provided, and the automatic dotting function is realized by identifying keyword matching or voiceprint + keyword matching, for example, in a classroom scene, important contents of teaching can be more simply identified for dotting, so that the important of students can be put on learning without manually dotting and marking.
Step 640: and extracting audio clips and keywords according to the recording file and the dotting information.
Specifically, the recording device may locally preset functions: the marking points identify the audio time range; whether to ignore blank audio; whether to upload cloud analysis, and the like.
In this embodiment, the mark points set by the user may be subjected to text conversion, and keywords in the text may be extracted.
Step 650: and displaying the audio and the keywords in the range of the mark points and all audio related information.
After the audio processing is finished, a user can locally view audio analysis data through the display page, wherein the audio analysis data comprise integral audio information, mark point information of the user, audio-to-text information, keyword information and the like, and the audio can be quickly and integrally browsed. If the user uploads the file to the cloud, browsing and operation can be performed in the browser.
Step 660: in response to the input of the user, the recording file is cut, extracted, or converted into characters.
The user can browse the display page, and if the key audio range automatically identified in the display page is not satisfactory, the audio range can be adjusted by adopting input modes such as dragging. The setting can be manually performed when the recognized keywords and characters are not satisfactory. And the user input may also be operations such as cropping, synthesizing, extracting the tagged audio. The keywords not only can enable a user to have visual perception on the audio content, but also can be used when the subsequent audio files are too many to be screened.
Step 670: and uploading the recorded audio to the cloud.
The user can select whether to upload to the cloud end by himself after the processing is completed. Based on the current user environment, the user may use the audio data in other devices, the user can upload the audio data to the cloud, multi-end intercommunication is facilitated, meanwhile, the problem that the audio processing is too slow due to insufficient performance of the devices can be avoided, and the user experience can be greatly optimized through cloud processing.
Step 680: and responding to the input of the user, inserting the characters in the recorded audio into the target application program, and setting hyperlinks between the characters and the audio file.
In this way, for the requirement of recording by the user, the function of dragging the audio to the target application program is provided, and the audio file can be directly inserted into the note application program. If not, a text link can be generated in the note application program, and the corresponding audio can be played by punching a card in a webpage by clicking the link. Optionally, the audio may also be directly copied, the formatted key information words may be automatically generated, and the user may directly insert the key information words into the text editor.
It should be noted that, in the sound recording method provided in the embodiment of the present application, the execution main body may be a sound recording apparatus, or a control module for executing the sound recording method in the sound recording apparatus. In the embodiment of the present application, a method for executing recording by a recording apparatus is taken as an example, and the recording apparatus provided in the embodiment of the present application is described.
Referring to fig. 11, a recording apparatus provided in an embodiment of the present application may include:
an obtaining module 1110 for obtaining an audio signal;
the detection module 1120 is used for carrying out voiceprint characteristic detection on the audio signal to obtain voiceprint characteristic information;
the recording module 1130 starts recording the audio signal when the voiceprint feature information matches target voiceprint feature information, where the target voiceprint feature information is obtained by performing voiceprint detection on a pre-recorded audio.
Optionally, the recording module 1110 is specifically configured to:
and filtering a first audio frequency without the target voiceprint characteristic information in the audio signal, and recording a second audio frequency with the target voiceprint characteristic information in the audio signal.
Optionally, the recording module 1110 is specifically configured to:
under the condition that the voiceprint feature information is matched with target voiceprint feature information, starting to perform voice recognition on a second audio frequency with the target voiceprint feature information in the audio signal, and outputting text information;
and starting recording the audio signal under the condition that the voiceprint characteristic information is matched with the target voiceprint characteristic information and the text information contains the target keyword.
Optionally, the recording module 1110 is further specifically configured to:
and in the recording process, stopping recording under the condition that the voiceprint characteristic information matched with the target voiceprint characteristic information is not detected in a specified time period.
Alternatively, referring to fig. 12, the difference from fig. 11 is that the sound recording apparatus of the present embodiment further includes:
a display module 1210 that displays user identity information corresponding to at least two candidate voiceprint feature information before the recording module 1240 starts recording the audio signal, wherein the candidate voiceprint feature information is obtained by performing voiceprint detection on a pre-recorded audio, and the user identity information is obtained by pre-marking the corresponding candidate voiceprint feature information;
a first receiving module 1220 for receiving a first input from a user;
the determining module 1230, in response to the first input, acquires the specified user identity information, and determines the candidate voiceprint feature information corresponding to the specified user identity information as the target voiceprint feature information.
Optionally, the candidate voiceprint feature information is voiceprint feature information matched with the at least two detected voiceprint feature information when the voiceprint feature information detection is performed on the audio signal and the at least two voiceprint feature information are detected.
Optionally, before the voiceprint feature detection is performed on the audio signal, in a case of receiving and responding to a second input of the user, user identity information corresponding to at least two candidate voiceprint feature information respectively is presented.
Alternatively, referring to fig. 13, unlike fig. 11, the sound recording apparatus of the present embodiment further includes:
a second receiving module 1310 for receiving a third input from the user during the recording process;
an inserting module 1320, responsive to the third input, inserts a tag in the sound recording file.
Optionally, the third input is in the form of at least one of:
the voice frequency input with the voice print characteristic information matched with the registered voice print characteristic information is carried out, wherein the registered voice print characteristic information is obtained by carrying out voice print detection on the voice frequency pre-recorded by a registered user of the recording equipment;
input of target type audio;
and inputting a motion signal matched with the target motion, wherein the target motion is preset.
When the recording device provided by the embodiment of the application acquires the audio signal, the voice print feature detection is carried out on the audio signal to obtain the voice print feature information, and the audio signal is recorded under the condition that the detected voice print feature information is matched with the target voice print feature information. This can prevent recording of large-size blank or noise and other invalid content before the audio content having the target voiceprint feature information, thereby saving the internal storage space of the phonograph recording device and improving the recording efficiency.
The recording device in the embodiment of the present application may be a device, or may be a component, an integrated circuit, or a chip in a terminal. The device can be mobile electronic equipment or non-mobile electronic equipment. By way of example, the mobile electronic device may be a mobile phone, a tablet computer, a notebook computer, a palm top computer, a vehicle-mounted electronic device, a wearable device, an ultra-mobile personal computer (UMPC), a netbook or a Personal Digital Assistant (PDA), and the like, and the non-mobile electronic device may be a server, a Network Attached Storage (NAS), a Personal Computer (PC), a Television (TV), a teller machine or a self-service machine, and the like, and the embodiments of the present application are not particularly limited.
The recording device in the embodiment of the present application may be a device having an operating system. The operating system may be an Android (Android) operating system, an ios operating system, or other possible operating systems, and embodiments of the present application are not limited specifically.
The recording device provided in the embodiment of the present application can implement each process implemented by the method embodiments of fig. 1 to 6, and is not described here again to avoid repetition.
Optionally, as shown in fig. 14, an electronic device 1400 is further provided in the embodiment of the present application, and includes a processor 1401, a memory 1402, and a program or an instruction stored in the memory 1402 and executable on the processor 1401, where the program or the instruction is executed by the processor 1401 to implement each process of the foregoing recording method embodiment, and can achieve the same technical effect, and no further description is provided here to avoid repetition.
It should be noted that the electronic device in the embodiment of the present application includes the mobile electronic device and the non-mobile electronic device described above.
Fig. 15 is a schematic hardware configuration diagram of an electronic device implementing the embodiment of the present application.
The electronic device 1500 includes, but is not limited to: a radio frequency unit 1501, a network module 1502, an audio output unit 1503, an input unit 1504, a sensor 1505, a display unit 1506, a user input unit 1507, an interface unit 1508, a memory 1509, and a processor 1510.
Those skilled in the art will appreciate that the electronic device 1500 may also include a power supply (e.g., a battery) for powering the various components, which may be logically coupled to the processor 1510 via a power management system to perform functions such as managing charging, discharging, and power consumption via the power management system. The electronic device structure shown in fig. 15 does not constitute a limitation of the electronic device, and the electronic device may include more or less components than those shown, or combine some components, or arrange different components, and thus, the description is omitted here.
The input unit 1504 is used for acquiring an audio signal.
A processor 1510, configured to perform voiceprint feature detection on the audio signal to obtain voiceprint feature information;
and under the condition that the voiceprint characteristic information is matched with target voiceprint characteristic information, starting to record the audio signal, wherein the target voiceprint characteristic information is obtained by carrying out voiceprint detection on pre-recorded audio.
With the electronic device of this embodiment, when an audio signal is acquired, voiceprint feature detection is performed on the audio signal, and recording of the audio signal is started when the voiceprint feature information matches with target voiceprint feature information. This can prevent recording of large-size blank or noise and other invalid content before the audio content having the target voiceprint feature information, thereby saving the internal storage space of the phonograph recording device and improving the recording efficiency.
Optionally, the processor 1510 is further configured to filter a first audio in the audio signal without the target voiceprint feature information, and record a second audio in the audio signal with the target voiceprint feature information.
Optionally, the display unit 1506 is configured to, before starting to record the audio signal, display user identity information corresponding to at least two pieces of candidate voiceprint feature information, where the candidate voiceprint feature information is obtained by performing voiceprint detection on a pre-recorded audio, and the user identity information is obtained by pre-marking the corresponding candidate voiceprint feature information;
an input unit 1504, further configured to receive a first input of a user;
the processor 1510 is further configured to, in response to the first input, obtain specified user identity information, and determine candidate voiceprint feature information corresponding to the specified user identity information as the target voiceprint feature information.
Optionally, the processor 1510 is further configured to, before performing voiceprint feature detection on the audio signal, present user identity information corresponding to each of the at least two candidate voiceprint feature information in a case of receiving and responding to a second input of the user.
Optionally, the processor 1510 is further configured to, in a case that the detected voiceprint feature information matches target voiceprint feature information, start speech recognition on a second audio with the target voiceprint feature information in the audio signal, and output text information;
and starting recording the audio signal under the condition that the detected voiceprint characteristic information is matched with the target voiceprint characteristic information and the text information contains the target keyword.
Optionally, the processor 1510 is further configured to stop recording if the voiceprint feature information matching the target voiceprint feature information is not detected within a specified time period during recording.
Optionally, the input unit 1504 is further configured to receive a third input from the user during the recording;
a processor 1510 is further configured to insert a tag in the sound recording file in response to the third input.
It should be understood that, in the embodiment of the present application, the input Unit 1504 may include a Graphics Processing Unit (GPU) 1541 and a microphone 1542, and the Graphics Processing Unit 1541 processes image data of a still picture or a video obtained by an image capturing device (such as a camera) in a video capturing mode or an image capturing mode. The display unit 1506 may include a display panel 1561, and the display panel 1561 may be configured in the form of a liquid crystal display, an organic light emitting diode, or the like. The user input unit 1507 includes a touch panel 1571 and other input devices 1572. Touch panel 1571, also referred to as a touch screen. Touch panel 1571 may include two portions, a touch detection device and a touch controller. Other input devices 1572 may include, but are not limited to, a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, and a joystick, which are not described in detail herein. The memory 1509 may be used to store software programs as well as various data including, but not limited to, application programs and an operating system. The processor 1510 may integrate an application processor, which primarily handles operating systems, user interfaces, applications, etc., and a modem processor, which primarily handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 1510.
The embodiment of the present application further provides a readable storage medium, where a program or an instruction is stored on the readable storage medium, and when the program or the instruction is executed by a processor, the program or the instruction implements each process of the recording method embodiment, and can achieve the same technical effect, and in order to avoid repetition, details are not repeated here.
The processor is the processor in the electronic device described in the above embodiment. The readable storage medium includes a computer readable storage medium, such as a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and so on.
The embodiment of the present application further provides a chip, where the chip includes a processor and a communication interface, the communication interface is coupled to the processor, and the processor is configured to execute a program or an instruction to implement each process of the above recording method embodiment, and can achieve the same technical effect, and the details are not repeated here to avoid repetition.
It should be understood that the chips mentioned in the embodiments of the present application may also be referred to as system-on-chip, system-on-chip or system-on-chip, etc.
The embodiment of the present application further provides a computer program product, which includes a computer program, and when the computer program is executed by a processor, the processes of the recording method embodiment are implemented, and the same technical effects can be achieved, and in order to avoid repetition, details are not repeated here.
It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Further, it should be noted that the scope of the methods and apparatus of the embodiments of the present application is not limited to performing the functions in the order illustrated or discussed, but may include performing the functions in a substantially simultaneous manner or in a reverse order based on the functions involved, e.g., the methods described may be performed in an order different than that described, and various steps may be added, omitted, or combined. In addition, features described with reference to certain examples may be combined in other examples.
Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a computer software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal (such as a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present application.
While the present embodiments have been described with reference to the accompanying drawings, it is to be understood that the invention is not limited to the precise embodiments described above, which are meant to be illustrative and not restrictive, and that various changes may be made therein by those skilled in the art without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (10)

1. A method for processing audio recordings, comprising:
acquiring an audio signal;
carrying out voiceprint characteristic detection on the audio signal to obtain voiceprint characteristic information;
and under the condition that the voiceprint characteristic information is matched with target voiceprint characteristic information, starting to record the audio signal, wherein the target voiceprint characteristic information is obtained by carrying out voiceprint detection on pre-recorded audio.
2. The recording method of claim 1, wherein the starting of recording the audio signal comprises:
and filtering a first audio frequency without the target voiceprint characteristic information in the audio signal, and recording a second audio frequency with the target voiceprint characteristic information in the audio signal.
3. The recording method according to claim 2, wherein the target voiceprint characteristic information is determined by, before starting recording the audio signal:
displaying user identity information corresponding to at least two candidate voiceprint feature information respectively, wherein the candidate voiceprint feature information is obtained by carrying out voiceprint detection on a prerecorded audio frequency, and the user identity information is obtained by marking the corresponding candidate voiceprint feature information in advance;
receiving a first input of a user;
and responding to the first input, acquiring appointed user identity information, and determining candidate voiceprint characteristic information corresponding to the appointed user identity information as the target voiceprint characteristic information.
4. The audio recording method according to claim 3, wherein the candidate voiceprint feature information is voiceprint feature information that matches at least two detected voiceprint feature information in a case where voiceprint feature detection is performed on the audio signal and at least two voiceprint feature information are detected.
5. The audio recording method according to claim 3, wherein before the voiceprint feature detection is performed on the audio signal, in case of receiving and responding to a second input of the user, user identity information corresponding to at least two candidate voiceprint feature information respectively is presented.
6. The recording method according to claim 1, wherein the starting of recording the audio signal in the case that the voiceprint feature information matches the target voiceprint feature information comprises:
under the condition that the voiceprint feature information is matched with target voiceprint feature information, starting to perform voice recognition on a second audio frequency with the target voiceprint feature information in the audio signal, and outputting text information;
and starting recording the audio signal under the condition that the voiceprint characteristic information is matched with the target voiceprint characteristic information and the text information contains the target keyword.
7. The audio recording method according to claim 1, further comprising:
and in the recording process, stopping recording under the condition that the voiceprint characteristic information matched with the target voiceprint characteristic information is not detected in a specified time period.
8. The audio recording method according to claim 1, further comprising:
receiving a third input of the user in the recording process;
inserting a tag in the sound recording file in response to the third input.
9. The audio recording method of claim 8, wherein the third input is at least one of:
the voice frequency input with the voice print characteristic information matched with the registered voice print characteristic information is carried out, wherein the registered voice print characteristic information is obtained by carrying out voice print detection on the voice frequency pre-recorded by a registered user of the recording equipment;
input of target type audio;
and inputting a motion signal matched with the target motion, wherein the target motion is preset.
10. A sound recording apparatus, comprising:
the acquisition module acquires an audio signal;
the detection module is used for carrying out voiceprint characteristic detection on the audio signal to obtain voiceprint characteristic information;
and the recording module starts to record the audio signal under the condition that the voiceprint characteristics are matched with the target voiceprint characteristic information.
CN202110919680.7A 2021-08-11 2021-08-11 Recording method and device Pending CN113676591A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110919680.7A CN113676591A (en) 2021-08-11 2021-08-11 Recording method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110919680.7A CN113676591A (en) 2021-08-11 2021-08-11 Recording method and device

Publications (1)

Publication Number Publication Date
CN113676591A true CN113676591A (en) 2021-11-19

Family

ID=78542453

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110919680.7A Pending CN113676591A (en) 2021-08-11 2021-08-11 Recording method and device

Country Status (1)

Country Link
CN (1) CN113676591A (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107464557A (en) * 2017-09-11 2017-12-12 广东欧珀移动通信有限公司 Call recording method, device, mobile terminal and storage medium
CN112997144A (en) * 2018-12-12 2021-06-18 深圳市欢太科技有限公司 Recording method, recording device, electronic equipment and computer readable storage medium

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107464557A (en) * 2017-09-11 2017-12-12 广东欧珀移动通信有限公司 Call recording method, device, mobile terminal and storage medium
CN112997144A (en) * 2018-12-12 2021-06-18 深圳市欢太科技有限公司 Recording method, recording device, electronic equipment and computer readable storage medium

Similar Documents

Publication Publication Date Title
CN106024009B (en) Audio processing method and device
CN108847214B (en) Voice processing method, client, device, terminal, server and storage medium
CN105845124B (en) Audio processing method and device
CN107864410B (en) Multimedia data processing method and device, electronic equipment and storage medium
CN106570100A (en) Information search method and device
CN109614482A (en) Processing method, device, electronic equipment and the storage medium of label
CN107948729B (en) Rich media processing method and device, storage medium and electronic equipment
CN107731020B (en) Multimedia playing method, device, storage medium and electronic equipment
EP2682931B1 (en) Method and apparatus for recording and playing user voice in mobile terminal
WO2013189317A1 (en) Human face information-based multimedia interaction method, device and terminal
CN104035995A (en) Method and device for generating group tags
CN113411516B (en) Video processing method, device, electronic equipment and storage medium
CN108763475B (en) Recording method, recording device and terminal equipment
CN105447109A (en) Key word searching method and apparatus
CN104461348A (en) Method and device for selecting information
CN110989847A (en) Information recommendation method and device, terminal equipment and storage medium
CN107885482B (en) Audio playing method and device, storage medium and electronic equipment
TWI528186B (en) System and method for posting messages by audio signals
KR20100113334A (en) Method and apparatus for tagging of portable terminal
CN110309324A (en) A kind of searching method and relevant apparatus
CN106326260A (en) Webpage access method and device
CN113055529B (en) Recording control method and recording control device
CN113241097A (en) Recording method, recording device, electronic equipment and readable storage medium
CN110992958B (en) Content recording method, content recording apparatus, electronic device, and storage medium
CN103927334B (en) Webpage acquisition methods and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20211119

RJ01 Rejection of invention patent application after publication