CN114067852A

CN114067852A - Recording method, intelligent terminal and storage medium

Info

Publication number: CN114067852A
Application number: CN202111425887.5A
Authority: CN
Inventors: 冯宇傲; 王驭风
Original assignee: Shenzhen Transsion Holdings Co Ltd
Current assignee: Shenzhen Transsion Holdings Co Ltd
Priority date: 2021-11-26
Filing date: 2021-11-26
Publication date: 2022-02-18

Abstract

The application provides a recording method, an intelligent terminal and a storage medium, wherein the recording method comprises the following steps: when recognizing that the recorded voice contains a target keyword, acquiring a current time point; and determining or generating a target voice segment according to the current time point, and/or performing preset processing on the target voice segment. The method and the device can only process the voice data containing the target keywords in the recording process, solve the problem that more useless information exists in the recording, and improve the recording efficiency.

Description

Recording method, intelligent terminal and storage medium

Technical Field

The application relates to the technical field of multimedia processing, in particular to a recording method, an intelligent terminal and a storage medium.

Background

With the development of computer technology, a recording application installed on a mobile terminal can collect and record sound data of a user or an environment, and store the collected sound data in a file system.

In the course of conceiving and implementing the present application, the inventors found that at least the following problems existed: in some conference scenarios, the conference duration is long, and the collected voice data may have more useless information. Moreover, the collected sound data is also large, so that more storage space of the mobile terminal is occupied, and system consumption is influenced.

The foregoing description is provided for general background information and is not admitted to be prior art.

Disclosure of Invention

In view of the above technical problems, the present application provides a recording method, an intelligent terminal and a storage medium to solve the technical problem that there is much useless information in the recording.

In order to solve the technical problem, the application provides a recording method, which can be applied to an intelligent terminal and comprises the following steps:

s10, when recognizing that the recorded voice contains the target keyword, acquiring the current time point;

s20, determining or generating a target voice segment according to the current time point, and/or performing preset processing on the target voice segment.

Optionally, the step of determining or generating a target speech segment according to the current time point includes the following steps:

s21, determining or generating a backtracking time point and a delay time point according to the current time point and the target recording duration;

and S22, determining the voice between the backtracking time point and the delayed time point as the target voice segment.

Optionally, the step of S21 includes:

determining a backtracking time length and a delay time length according to the target recording time length;

and determining or generating the backtracking time point according to the current time point and the backtracking time length, and/or determining or generating the delay time point according to the current time point and the delay time length.

Optionally, the target keyword includes at least two.

Optionally, the step of S21 includes:

and determining or generating the backtracking time point based on the time point corresponding to the starting target keyword as the current time point, and/or determining or generating the delayed time point based on the time point corresponding to the terminating target keyword as the current time point.

Optionally, the step of S22 includes:

configuring the voice recorded from the backtracking time point to the current time point as a first voice segment, and/or configuring the voice recorded from the current time point to the delayed time point as a second voice segment;

determining or generating the target speech segment from the first speech segment and/or the second speech segment.

Optionally, the step S22 further includes:

after the current time point, acquiring a new current time point when the target keyword is identified again;

determining or generating a new delay time point according to the new current time point and the delay time length;

configuring the voice recorded from the new current time point to the new delayed time point into a third voice segment;

determining or generating the target speech segment from the first speech segment and/or the third speech segment.

Optionally, the method further comprises:

and recording the voice and storing the voice in the preset cache area.

Optionally, the preset processing includes at least one of:

storing the target voice segment into a preset storage area;

playing the target voice segment;

outputting the character information corresponding to the target speech segment;

optionally, before the step S10, the method further includes:

acquiring an input text signal;

and determining the target keyword according to the character signal.

Optionally, the text signal is input in a manner including at least one of:

sending the collected voice signal through a voice collecting device;

sending the collected picture signal through the camera;

and sending the received text signal through the text input terminal.

Optionally, the step of S10 includes the steps of:

s11, converting the voice into character information, and associating the time information corresponding to the voice with the character information;

and S12, identifying target text information matched with the target keyword in the text information of the voice, and configuring the time information associated with the target text information as the current time point.

Optionally, the step of S11 includes:

mapping the time information corresponding to the voice to the text information;

and generating an index table according to the text information and the time information, wherein the index table comprises a time index associated with the time information corresponding to the text information.

Optionally, the step of S12 includes:

when recognizing that the recorded voice contains a target keyword, determining target character information matched with the target keyword;

determining target time information associated with the target character information according to the time index;

and determining the target time information as the current time point.

Optionally, when it is recognized that the recorded voice includes a target keyword, the step of determining target text information matched with the target keyword includes:

acquiring the similarity between each phrase in the text information and the target keyword;

and when the similarity is greater than a similarity threshold value, determining the phrase as the target character information.

The application also provides an intelligent terminal, including: the device comprises a memory and a processor, wherein the memory is stored with a recording program, and the recording program realizes the steps of any one of the methods when being executed by the processor.

The present application also provides a storage medium having stored thereon a computer program which, when executed by a processor, carries out the steps of the method as set forth in any of the above.

As described above, the recording method of the present application can acquire the current time point when the target keyword is identified to be included in the recorded voice; and determining or generating a target voice segment according to the current time point, and/or performing preset processing on the target voice segment. Whether the recorded voice includes the target keyword or not is identified, and then the target voice segment containing the target keyword is subjected to preset processing, so that the fact that only the voice data containing the target keyword is processed in the recording process is achieved, the problem that more useless information exists in the recording is solved, and user experience is improved.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application. In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings needed to be used in the description of the embodiments will be briefly described below, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.

Fig. 1 is a schematic diagram of a hardware structure of an intelligent terminal implementing various embodiments of the present application;

fig. 2 is a communication network system architecture diagram according to an embodiment of the present application;

FIG. 3 is a flowchart illustrating a recording method according to a first embodiment;

FIG. 4 is a flowchart illustrating a recording method according to a first embodiment;

fig. 5 is a recording interface diagram showing a recording method according to the first embodiment;

fig. 6 is a setting interface diagram of the sound recording method according to the first embodiment;

FIG. 7 is a flowchart illustrating a recording method S20 according to a second embodiment;

FIG. 8 is a flowchart illustrating a recording method S21 according to a second embodiment;

FIG. 9 is a flowchart illustrating a recording method S20 according to a second embodiment;

FIG. 10 is a flowchart illustrating a recording method S22 according to a third embodiment;

fig. 11 is a recording interface diagram showing a recording method according to the third embodiment;

fig. 12 is a detailed flowchart of the sound recording method S22 according to the third embodiment.

The implementation, functional features and advantages of the objectives of the present application will be further explained with reference to the accompanying drawings. With the above figures, there are shown specific embodiments of the present application, which will be described in more detail below. These drawings and written description are not intended to limit the scope of the inventive concepts in any manner, but rather to illustrate the inventive concepts to those skilled in the art by reference to specific embodiments.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, the recitation of an element by the phrase "comprising an … …" does not exclude the presence of additional like elements in the process, method, article, or apparatus that comprises the element, and optionally, identically named components, features, and elements in different embodiments of the present application may have different meanings, as may be determined by their interpretation in the embodiment or by their further context within the embodiment.

It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope herein. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context. Also, as used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, unless the context indicates otherwise. It will be further understood that the terms "comprises," "comprising," "includes" and/or "including," when used in this specification, specify the presence of stated features, steps, operations, elements, components, items, species, and/or groups, but do not preclude the presence, or addition of one or more other features, steps, operations, elements, components, species, and/or groups thereof. The terms "or," "and/or," "including at least one of the following," and the like, as used herein, are to be construed as inclusive or mean any one or any combination. For example, "includes at least one of: A. b, C "means" any of the following: a; b; c; a and B; a and C; b and C; a and B and C ", again for example," A, B or C "or" A, B and/or C "means" any of the following: a; b; c; a and B; a and C; b and C; a and B and C'. An exception to this definition will occur only when a combination of elements, functions, steps or operations are inherently mutually exclusive in some way.

It should be understood that, although the steps in the flowcharts in the embodiments of the present application are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and may be performed in other orders unless explicitly stated herein. Moreover, at least some of the steps in the figures may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, in different orders, and may be performed alternately or at least partially with respect to other steps or sub-steps of other steps.

The words "if", as used herein, may be interpreted as "at … …" or "at … …" or "in response to a determination" or "in response to a detection", depending on the context. Similarly, the phrases "if determined" or "if detected (a stated condition or event)" may be interpreted as "when determined" or "in response to a determination" or "when detected (a stated condition or event)" or "in response to a detection (a stated condition or event)", depending on the context.

It should be noted that step numbers such as S10 and S20 are used herein for the purpose of more clearly and briefly describing the corresponding content, and do not constitute a substantial limitation on the sequence, and those skilled in the art may perform S20 first and then S10 in specific implementation, which should be within the scope of the present application.

It should be understood that the specific embodiments described herein are merely illustrative of the present application and are not intended to limit the present application.

In the following description, suffixes such as "module", "component", or "unit" used to denote elements are used only for the convenience of description of the present application, and have no specific meaning in themselves. Thus, "module", "component" or "unit" may be used mixedly.

The smart terminal may be implemented in various forms. For example, the smart terminal described in the present application may include smart terminals such as a mobile phone, a tablet computer, a notebook computer, a palmtop computer, a Personal Digital Assistant (PDA), a Portable Media Player (PMP), a navigation device, a wearable device, a smart band, a pedometer, and the like, and fixed terminals such as a Digital TV, a desktop computer, and the like.

The following description will be given taking a mobile terminal as an example, and it will be understood by those skilled in the art that the configuration according to the embodiment of the present application can be applied to a fixed type terminal in addition to elements particularly used for mobile purposes.

Referring to fig. 1, which is a schematic diagram of a hardware structure of a mobile terminal for implementing various embodiments of the present application, the mobile terminal 100 may include: RF (Radio Frequency) unit 101, WiFi module 102, audio output unit 103, a/V (audio/video) input unit 104, sensor 105, display unit 106, user input unit 107, interface unit 108, memory 109, processor 110, and power supply 111. Those skilled in the art will appreciate that the mobile terminal architecture shown in fig. 1 is not intended to be limiting of mobile terminals, which may include more or fewer components than those shown, or some components may be combined, or a different arrangement of components.

The following describes each component of the mobile terminal in detail with reference to fig. 1:

the radio frequency unit 101 may be configured to receive and transmit signals during information transmission and reception or during a call, and specifically, receive downlink information of a base station and then process the downlink information to the processor 110; in addition, the uplink data is transmitted to the base station. Typically, radio frequency unit 101 includes, but is not limited to, an antenna, at least one amplifier, a transceiver, a coupler, a low noise amplifier, a duplexer, and the like. In addition, the radio frequency unit 101 can also communicate with a network and other devices through wireless communication. The wireless communication may use any communication standard or protocol, including but not limited to GSM (Global System for Mobile communications), GPRS (General Packet Radio Service), CDMA2000(Code Division Multiple Access 2000), WCDMA (Wideband Code Division Multiple Access), TD-SCDMA (Time Division-Synchronous Code Division Multiple Access), FDD-LTE (Frequency Division duplex-Long Term Evolution), TDD-LTE (Time Division duplex-Long Term Evolution, Time Division Long Term Evolution), 5G, and so on.

WiFi belongs to short-distance wireless transmission technology, and the mobile terminal can help a user to receive and send e-mails, browse webpages, access streaming media and the like through the WiFi module 102, and provides wireless broadband internet access for the user. Although fig. 1 shows the WiFi module 102, it is understood that it does not belong to the essential constitution of the mobile terminal, and may be omitted entirely as needed within the scope not changing the essence of the invention.

The audio output unit 103 may convert audio data received by the radio frequency unit 101 or the WiFi module 102 or stored in the memory 109 into an audio signal and output as sound when the mobile terminal 100 is in a call signal reception mode, a call mode, a recording mode, a voice recognition mode, a broadcast reception mode, or the like. Also, the audio output unit 103 may also provide audio output related to a specific function performed by the mobile terminal 100 (e.g., a call signal reception sound, a message reception sound, etc.). The audio output unit 103 may include a speaker, a buzzer, and the like.

The a/V input unit 104 is used to receive audio or video signals. The a/V input Unit 104 may include a Graphics Processing Unit (GPU) 1041 and a microphone 1042, the Graphics processor 1041 Processing image data of still pictures or video obtained by an image capturing device (e.g., a camera) in a video capturing mode or an image capturing mode. The processed image frames may be displayed on the display unit 106. The image frames processed by the graphic processor 1041 may be stored in the memory 109 (or other storage medium) or transmitted via the radio frequency unit 101 or the WiFi module 102. The microphone 1042 may receive sounds (audio data) via the microphone 1042 in a phone call mode, a recording mode, a voice recognition mode, or the like, and may be capable of processing such sounds into audio data. The processed audio (voice) data may be converted into a format output transmittable to a mobile communication base station via the radio frequency unit 101 in case of a phone call mode. The microphone 1042 may implement various types of noise cancellation (or suppression) algorithms to cancel (or suppress) noise or interference generated in the course of receiving and transmitting audio signals.

The mobile terminal 100 also includes at least one sensor 105, such as a light sensor, a motion sensor, and other sensors. Optionally, the light sensor includes an ambient light sensor that may adjust the brightness of the display panel 1061 according to the brightness of ambient light, and a proximity sensor that may turn off the display panel 1061 and/or the backlight when the mobile terminal 100 is moved to the ear. As one of the motion sensors, the accelerometer sensor can detect the magnitude of acceleration in each direction (generally, three axes), can detect the magnitude and direction of gravity when stationary, and can be used for applications of recognizing the posture of a mobile phone (such as horizontal and vertical screen switching, related games, magnetometer posture calibration), vibration recognition related functions (such as pedometer and tapping), and the like; as for other sensors such as a fingerprint sensor, a pressure sensor, an iris sensor, a molecular sensor, a gyroscope, a barometer, a hygrometer, a thermometer, and an infrared sensor, which can be configured on the mobile phone, further description is omitted here.

The display unit 106 is used to display information input by a user or information provided to the user. The Display unit 106 may include a Display panel 1061, and the Display panel 1061 may be configured in the form of a Liquid Crystal Display (LCD), an Organic Light-Emitting Diode (OLED), or the like.

The user input unit 107 may be used to receive input numeric or character information and generate key signal inputs related to user settings and function control of the mobile terminal. Alternatively, the user input unit 107 may include a touch panel 1071 and other input devices 1072. The touch panel 1071, also referred to as a touch screen, may collect a touch operation performed by a user on or near the touch panel 1071 (e.g., an operation performed by the user on or near the touch panel 1071 using a finger, a stylus, or any other suitable object or accessory), and drive a corresponding connection device according to a predetermined program. The touch panel 1071 may include two parts of a touch detection device and a touch controller. Optionally, the touch detection device detects a touch orientation of a user, detects a signal caused by a touch operation, and transmits the signal to the touch controller; the touch controller receives touch information from the touch sensing device, converts the touch information into touch point coordinates, sends the touch point coordinates to the processor 110, and can receive and execute commands sent by the processor 110. In addition, the touch panel 1071 may be implemented in various types, such as a resistive type, a capacitive type, an infrared ray, and a surface acoustic wave. In addition to the touch panel 1071, the user input unit 107 may include other input devices 1072. Optionally, other input devices 1072 may include, but are not limited to, one or more of a physical keyboard, function keys (e.g., volume control keys, switch keys, etc.), a trackball, a mouse, a joystick, and the like, and are not limited thereto.

Alternatively, the touch panel 1071 may cover the display panel 1061, and when the touch panel 1071 detects a touch operation thereon or nearby, the touch panel 1071 transmits the touch operation to the processor 110 to determine the type of the touch event, and then the processor 110 provides a corresponding visual output on the display panel 1061 according to the type of the touch event. Although the touch panel 1071 and the display panel 1061 are shown in fig. 1 as two separate components to implement the input and output functions of the mobile terminal, in some embodiments, the touch panel 1071 and the display panel 1061 may be integrated to implement the input and output functions of the mobile terminal, and is not limited herein.

The interface unit 108 serves as an interface through which at least one external device is connected to the mobile terminal 100. For example, the external device may include a wired or wireless headset port, an external power supply (or battery charger) port, a wired or wireless data port, a memory card port, a port for connecting a device having an identification module, an audio input/output (I/O) port, a video I/O port, an earphone port, and the like. The interface unit 108 may be used to receive input (e.g., data information, power, etc.) from external devices and transmit the received input to one or more elements within the mobile terminal 100 or may be used to transmit data between the mobile terminal 100 and external devices.

The memory 109 may be used to store software programs as well as various data. The memory 109 may mainly include a program storage area and a data storage area, and optionally, the program storage area may store an operating system, an application program (such as a sound playing function, an image playing function, and the like) required by at least one function, and the like; the storage data area may store data (such as audio data, a phonebook, etc.) created according to the use of the cellular phone, and the like. Further, the memory 109 may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid state storage device.

The processor 110 is a control center of the mobile terminal, connects various parts of the entire mobile terminal using various interfaces and lines, and performs various functions of the mobile terminal and processes data by operating or executing software programs and/or modules stored in the memory 109 and calling data stored in the memory 109, thereby performing overall monitoring of the mobile terminal. Processor 110 may include one or more processing units; preferably, the processor 110 may integrate an application processor and a modem processor, optionally, the application processor mainly handles operating systems, user interfaces, application programs, etc., and the modem processor mainly handles wireless communications. It will be appreciated that the modem processor described above may not be integrated into the processor 110.

The mobile terminal 100 may further include a power supply 111 (e.g., a battery) for supplying power to various components, and preferably, the power supply 111 may be logically connected to the processor 110 via a power management system, so as to manage charging, discharging, and power consumption management functions via the power management system.

Although not shown in fig. 1, the mobile terminal 100 may further include a bluetooth module or the like, which is not described in detail herein.

In order to facilitate understanding of the embodiments of the present application, a communication network system on which the mobile terminal of the present application is based is described below.

Referring to fig. 2, fig. 2 is an architecture diagram of a communication Network system according to an embodiment of the present disclosure, where the communication Network system is an LTE system of a universal mobile telecommunications technology, and the LTE system includes a UE (User Equipment) 201, an E-UTRAN (Evolved UMTS Terrestrial Radio Access Network) 202, an EPC (Evolved Packet Core) 203, and an IP service 204 of an operator, which are in communication connection in sequence.

Optionally, the UE201 may be the terminal 100 described above, and is not described herein again.

The E-UTRAN202 includes eNodeB2021 and other eNodeBs 2022, among others. Alternatively, the eNodeB2021 may be connected with other enodebs 2022 through a backhaul (e.g., X2 interface), the eNodeB2021 is connected to the EPC203, and the eNodeB2021 may provide the UE201 access to the EPC 203.

The EPC203 may include an MME (Mobility Management Entity) 2031, an HSS (Home Subscriber Server) 2032, other MMEs 2033, an SGW (Serving gateway) 2034, a PGW (PDN gateway) 2035, and a PCRF (Policy and Charging Rules Function) 2036, and the like. Optionally, the MME2031 is a control node that handles signaling between the UE201 and the EPC203, providing bearer and connection management. HSS2032 is used to provide registers to manage functions such as home location register (not shown) and holds subscriber specific information about service characteristics, data rates, etc. All user data may be sent through SGW2034, PGW2035 may provide IP address assignment for UE201 and other functions, and PCRF2036 is a policy and charging control policy decision point for traffic data flow and IP bearer resources, which selects and provides available policy and charging control decisions for a policy and charging enforcement function (not shown).

The IP services 204 may include the internet, intranets, IMS (IP Multimedia Subsystem), or other IP services, among others.

Although the LTE system is described as an example, it should be understood by those skilled in the art that the present application is not limited to the LTE system, but may also be applied to other wireless communication systems, such as GSM, CDMA2000, WCDMA, TD-SCDMA, and future new network systems (e.g. 5G), and the like.

Based on the above mobile terminal hardware structure and communication network system, various embodiments of the present application are provided.

First embodiment

Referring to fig. 3, a first embodiment of the present application provides a recording method, including:

In the embodiment of the application, the terminal in the embodiment is an intelligent terminal with a recording function, and the intelligent terminal can be a mobile terminal disclosed in any one of the embodiments.

In daily life, a user is usually provided with a microphone through a mobile terminal such as a smart phone, a tablet computer, a wearable device, and the like, and the user can record sound through the microphone. In some embodiments, the intelligent terminal and the recording application installed thereon generally receive a recording trigger instruction sent by a large user, start to collect and record sound data, and then store the collected sound data in a file system. In some conference scenes, the conference duration is long, collected voice data may have more useless information, the collected voice data may also be large, more storage space of a mobile terminal is occupied, system consumption is affected, and the collected voice data may have more useless information, so that when a user needs to search for useful information, the user needs to repeatedly search for the voice data to obtain the useful information.

In the embodiment of the application, in order to solve the above problem, the intelligent terminal identifies whether the recorded voice contains the target keyword during the voice recording process or after the voice recording is completed. It can be understood that the voice containing the target keyword is useful information meeting the user requirements, and when the recorded voice is detected to contain the target keyword, a target voice segment containing the target keyword is obtained. It can be understood that the data volume of the target voice segment is smaller than that of the recorded voice, so that the target voice segment is subjected to preset processing without processing other voice segments except the target voice segment, and the data processing amount of the intelligent terminal is reduced. And only the target voice segment is stored, and other voice segments except the target voice segment are not stored, so that the storage space occupied by the recorded voice can be reduced, and the system consumption is reduced.

Optionally, before recording starts, useful information required by the user, namely, a target keyword needs to be acquired. Based on this, referring to fig. 4, before S10, the method further includes:

s30, acquiring the input character signal;

and S40, determining the target keyword according to the character signal.

Optionally, the manner of inputting the text signal includes at least one of:

sending the collected voice signal through a voice collecting device;

sending the collected picture signal through the camera;

and sending the received text signal through the text input terminal.

Optionally, there are multiple implementation manners for inputting the text signal to the intelligent terminal, and a voice signal may be collected by a voice collecting device associated with the intelligent terminal, where the voice collecting device may be a microphone; the method can also acquire picture signals through a camera associated with the intelligent terminal, and the picture can be any picture with characters; the received text signal can be forwarded to the intelligent terminal through a text input terminal associated with the intelligent terminal.

Optionally, voice signals are collected through a voice collecting device, the voice signals comprise target keywords, and the text input difficulty is reduced. After receiving the voice signal, the voice signal can be converted into a corresponding text signal according to a voice-to-text technology, and then the text signal is output so that a user can check the text signal. And confirming whether the text signal comprises a target keyword which is required to be input by the user, and if the user confirms that the text signal comprises the target keyword which is required to be input by the user, determining the text signal as the target keyword.

Optionally, when the voice signal is collected by the voice collecting device, the voice signal may be directly used as the target keyword.

Optionally, the camera associated with the intelligent terminal is integrated with the intelligent terminal or is connected with the intelligent terminal in a close manner, when the image signal is collected through the camera for character input, only the character signal in the preset area is identified by adopting an OCR (optical character recognition) technology, and when the target keyword is conveniently input, a user can conveniently collect the convenient image signal based on any place. Optionally, when the characters in the picture are identified, the picture is correspondingly rotated according to the inclination angle of the characters in the picture signal, so that the accuracy of reading the target keywords is ensured.

Optionally, the received text signal is forwarded to the television through a text input terminal associated with the intelligent terminal, and the user can place the terminal at any position to facilitate the user to input the text signal.

In the application embodiment, through the mode that sets up multiple different input text signals, convenience of customers adopts convenient and fast's text input form to carry out the input of text signal, reduces the degree of difficulty of text signal input, promotes user experience.

Optionally, the manner of obtaining the target keyword includes, but is not limited to, the above several manners, and the manner of obtaining the target keyword may also be that the intelligent terminal stores a history target keyword input by the user each time the recording operation is performed. When a recording request instruction input by a user is detected, calling the historical target keyword, and outputting the historical target keyword in a display interface of the intelligent terminal, so that the user can trigger selection operation on the historical target keyword according to self requirements after viewing the historical target keyword, and the intelligent terminal determines the target keyword according to the selection operation after receiving the selection operation.

Alternatively, the target keyword may be obtained by providing a keyword input interface for the user before starting the recording. The user inputs the corresponding target keywords through the keyword input interface in a voice input mode or a text input mode, and the intelligent terminal triggers the recording operation after receiving the target keywords input by the user.

Optionally, after the recording operation is triggered, the intelligent terminal continuously collects voice, stores the voice in the preset cache area, and then executes keyword recognition operation on the recorded voice. And when the recorded voice comprises the target keyword, acquiring the current time point of the target keyword, and determining or generating the target voice segment according to the current time point.

Optionally, the preset cache region may include an annular cache region, referring to fig. 5, where fig. 5 is a recording interface diagram of the recording method according to the embodiment of the present application. The recording interface comprises an annular cache area example graph and a pointer, the pointer rotates clockwise around the annular cache area, recorded voices are stored in the annular cache area in a circulating covering mode, and the circulating covering mode is that after the time length corresponding to the recorded voices exceeds the preset time length, the latest recorded voices cover the voices stored in the annular cache area at the earliest time.

Optionally, the annular cache region may store a voice with a preset duration, where the preset duration may be set by a user in a self-defined manner, or may be a fixed duration written in the system in advance. For example, if the preset duration is 3 minutes, when 3 minutes of voices are stored in the annular buffer area, the latest recorded voice is used to cover the voice stored in the annular buffer area at the earliest time, that is, the voice stored in the annular buffer area is the voice recorded within the latest 3 minutes.

Optionally, in the recording process, the continuously recorded voice is stored in the annular cache region, so that a corresponding target voice segment is subsequently extracted from the annular cache region, and then only the target voice segment is stored in a preset storage region without completely storing all recorded voices, thereby saving the storage space of the intelligent terminal.

Optionally, after the recorded voice is continuously stored in the annular cache area, in order to identify whether the recorded voice includes the target keyword, a keyword identification operation needs to be performed on the voice stored in the annular cache area.

Optionally, in an embodiment, when the target keyword is a voice signal, a specific manner of performing the keyword recognition operation is to obtain a target voice feature corresponding to the voice signal, match the target voice feature with the voice, and determine that the voice includes the target keyword when a target voice matching the target voice feature appears in the voice. It can be understood that, based on the problem that the recorded speech may have an incorrect accent, the manner of matching the target speech feature with the speech may be to determine that the speech is the target speech if the similarity between the speech in the speech and the target speech feature is greater than or equal to a preset threshold, and then determine that the speech includes the target keyword.

Optionally, in an embodiment, when the target keyword is a text signal, a specific manner of performing the keyword recognition operation may also be to convert a recorded voice into corresponding text information according to a voice-to-text technology, and further match the text information with the target keyword, and when the target keyword appears in the text information, it is determined that the voice includes the target keyword.

Optionally, when it is determined that the voice includes a target keyword, determining a current time point of the target keyword appearing in the recorded voice. Alternatively, the current time point may be a system time when the target keyword appears in the voice. Optionally, when the recorded voice is recognized to contain the target keyword at a certain moment, the system time of the intelligent terminal at the moment is obtained, and the system time is determined as the current time point. The current time point can also be in the process of recording the voice, a timer is started to record each time point in the recording process, each recorded time point is in one-to-one correspondence with the recorded voice to generate the corresponding relation between the time point and the voice, and then the corresponding time point can be obtained according to the corresponding relation at the current time point when the target keyword appears in the follow-up acquisition.

Optionally, the S10 includes:

Optionally, the step of associating the time information corresponding to the voice with the text information includes: mapping the time information corresponding to the voice to the text information; and generating an index table according to the text information and the time information, wherein the index table comprises a time index associated with the time information corresponding to the text information.

Optionally, the recorded voice is converted into corresponding text information according to a voice-to-text technology, the text information is mapped into the text information according to the time information corresponding to the voice, and then the text information and the time information corresponding to the text information are correspondingly generated into an index table, wherein the index table comprises time indexes associated with the text information and the time information corresponding to the text information. Optionally, the index table includes time indexes associated with the voice information, the text information, and time information corresponding to the text information.

Optionally, after the index table is obtained, based on that the index table includes a time index associated with time information corresponding to the text information, the current time point of the target keyword can be quickly determined according to the index table.

Optionally, identifying target text information matched with the target keyword in the text information of the voice, and configuring time information associated with the target text information as the current time point includes: when recognizing that the recorded voice contains a target keyword, determining target character information matched with the target keyword; determining target time information associated with the target character information according to the time index; and determining the target time information as the current time point.

Optionally, after the text information is acquired, matching the text information with the target keyword; and if the text information comprises target text information matched with the target keyword, inquiring based on the index table according to the target text information, inquiring target time information corresponding to the target text information based on the time index, and determining the target time information as the current time point.

It will be appreciated that the user may have an accent while speaking, which may result in the converted text information being different from the text that the user actually intended to express. Based on this, an embodiment of the present application further provides a method for improving accuracy of keyword recognition, where when a target keyword is recognized to be included in a recorded voice, the step of determining target text information matched with the target keyword includes: acquiring the similarity between each phrase in the text information and the target keyword; and when the word information comprises the word group with the similarity larger than the preset similarity threshold, determining the word group with the similarity larger than the preset similarity threshold as the target word information.

Optionally, in the process of converting the voice into text information, the environmental background sound of the voice may be filtered out. Based on this, the text information only includes the voice actually uttered by the speaker, and the voice may include a whole voice sentence, such as "i want to learn", and may also include a voice phrase, such as "learn", "love country", and the converted text information may include a whole text sentence, and may also include a text phrase.

Optionally, after the text information is obtained, each phrase in the text information may be obtained, and then similarity comparison is performed between each phrase and the target keyword to obtain similarity between each phrase and the target keyword. And when the similarity between any phrase and the target keyword is greater than a preset similarity threshold, determining the phrase as the target keyword, and further determining the phrase as the target character information. Optionally, the preset similarity threshold may be set by a user in a self-adaptive manner, or may be configured before the system leaves a factory, and the preset similarity threshold may be 85%.

Optionally, after the text information is obtained, the whole sentence of the text in the text information may be divided into a plurality of phrases, and then the phrases and the target keyword are compared with each other for similarity.

Optionally, based on consideration of the accent of the speaker, in the embodiment of the application, similarity between each phrase in the text information and the target keyword is obtained, and whether the text information includes the target text information matched with the target keyword is determined according to the similarity, so that accuracy of keyword identification is improved.

Optionally, the recording is continuously recorded until the user triggers the closing of the recording. Based on this, the present application further provides a recording method, where the recording method further includes: and continuously recording new voice, and updating the index table in real time according to the new voice.

Optionally, in the process of continuously recording the voice, the voice is converted into corresponding text information in real time according to the recorded voice, and the index table is updated in real time according to the converted text information until the user manually triggers to close the recording of the voice.

Optionally, after the current time point is obtained, a target speech segment is determined or generated according to the current time point, where the target speech segment is a speech segment including the target keyword, and when the recorded speech has the target keyword, the useful information that the user wants to record may be included in the speech recorded before or after the target keyword. Based on this, in order to ensure that a more comprehensive target speech segment is obtained, a target recording duration needs to be determined, and the target recording duration may include the current time point, a time point before the current time point, and a time point after the current time point.

Optionally, the target recording duration may be a user adaptive setting based on the setting page shown in fig. 6, or may be a system configuration, and the target recording duration may be 3min, or may be 1min, or the like.

Optionally, after the current time point and the target recording duration are obtained, a target voice segment corresponding to the target recording duration may be determined from the recorded voice, or the target voice segment may be generated.

Optionally, the mode of generating the target voice segment is to extract a voice segment corresponding to the target recording duration from the recorded voice, and use the voice segment corresponding to the target recording duration as the target voice segment.

Optionally, the recorded voice is stored in a preset buffer area, and the manner of extracting the voice segment corresponding to the target recording time length from the recorded voice is to determine a recording start time point and a recording end time point according to the target recording time length, intercept the voice segment stored between the recording start time point and the recording end time point from the preset buffer area according to the recording start time point and the recording end time point, and further use the voice segment as the target voice segment.

Optionally, after the target speech segment is obtained, the preset processing mode for the target speech segment includes at least one of the following:

storing the target voice segment into a preset storage area;

playing the target voice segment;

and outputting the character information corresponding to the target speech segment.

Optionally, after the target speech segment is determined or generated, the target speech segment is stored in a preset storage region, where the preset storage region may be a set region allocated by the intelligent terminal in advance for storing the target speech segment in the system memory, may also be a set region allocated by the intelligent terminal in advance for storing the target speech segment in the storage device, and may also be a set region allocated by the intelligent terminal in real time in the system memory and/or the storage device according to the size of the target speech segment.

Optionally, the target speech segments are stored in the preset storage area in a descending order according to time points of the target speech segments, and may also be classified according to target keywords, where the target keywords include a target keyword a and a target keyword B, the target speech segments including the target keyword a are stored in the preset storage area corresponding to the category to which the target keyword a belongs, and the target speech segments including the target keyword B are stored in the preset storage area corresponding to the category to which the target keyword B belongs. Optionally, in the embodiment of the present application, based on only storing the target speech segment, the target speech segment includes the target keyword, and the target speech segment is useful information required by the user, so that only the useful information is stored, and waste of system resources is saved.

Optionally, after the target speech segment is determined or generated, the target speech segment may be directly played through a microphone, so that a user can check in time whether the target speech segment has successfully entered useful information.

Optionally, after the target speech segment is determined or generated, the target speech segment may be converted into corresponding text information, and the text information is output, so that a user can view the recorded target speech segment through the text information.

Optionally, after the target voice segment is determined or generated, the target voice segment is converted into corresponding text information, a target keyword in the text information is marked to generate the marked text information, and the marked text information is output, so that a user can check the marked text information, not only can check the recorded target voice segment, but also can quickly determine the position of the target keyword.

Optionally, after the target speech segment is determined or generated, the preset processing includes, but is not limited to, the above several manners.

In the embodiment of the application, the target keyword input by a user is received before recording, and then in the recording process, all voices are stored in a preset cache region instead of being stored, and then whether the target keyword is included in the voices is identified according to the target keyword. The target keyword is included in the voice, the current time point of the target keyword in the voice is obtained, then a target voice section in a target recording time length is determined according to the current time point, the target recording time length comprises the current time point, and then the target voice section is extracted from the annular cache area according to the target voice section, and then the target voice section is stored. The embodiment of the application realizes intelligent acquisition of the target voice segment containing the target keyword, and avoids the problem that important recording contents are easily omitted when a user manually records voice. Meanwhile, the important recording content is stored based on only storing the target voice segment, the problem that more useless information exists in the recording is solved, and the waste of system resources is avoided.

Second embodiment

Referring to fig. 7, based on the first embodiment, the sound recording method of the present application proposes a second embodiment of the sound recording method, where the step of determining or generating the target speech segment according to the current time point includes:

In this embodiment of the application, the current time point is a time point of occurrence of the target keyword in the recorded voice, and the backtracking time point is a backtracking time point before the current time point. Optionally, the certain backtracking time point may be a time point after the current time point backtracks forward for a backtracking time duration, the delayed time point is a certain delayed time point before the current time point, and the certain delayed time point may be a time point after the current time point extends backward for a backtracking time duration.

Optionally, in an embodiment, after the recording is completed, that is, after the recording is stopped, and after the current time point and the target recording duration are obtained, referring to fig. 8, the step of determining a backtracking time point and a postponing time point according to the current time point and the target recording duration includes:

s211, determining a backtracking time length and a delay time length according to the target recording time length;

s212, determining or generating the backtracking time point according to the current time point and the backtracking time length, and/or determining or generating the delay time point according to the current time point and the delay time length.

Optionally, in another embodiment, after the target recording duration is obtained, the target recording duration may be used as the backtracking duration, and optionally, if the target recording duration is 30s, the backtracking duration is 30 s.

Optionally, in another embodiment, after the target recording duration is obtained, the target recording duration may be used as the delay duration, and optionally, if the target recording duration is 30s, the delay duration is 30 s.

Optionally, in another embodiment, after the target recording duration is obtained, a time period corresponding to the target recording duration may be averagely divided into two time periods, and then the time periods are respectively used as the backtracking duration and the delay duration, optionally, if the target recording duration is 30s, the backtracking duration is 15s, and the delay duration is 30 s.

Optionally, in another embodiment, after the target recording duration is obtained, it is seen that a time period corresponding to the target recording duration is divided into a first time period and a second time period in a preset separation manner, and then the first time period is used as the backtracking duration, and the second time period is used as the delay duration. Optionally, the durations corresponding to the first time period and the second time period are not equal. Optionally, the target recording time is 10min, the first time period is 3min, the second time period is 7min, the backtracking time is 3min, and the delay time is 7 min.

Optionally, during the recording process, it may happen that the current time point is a time point at which recording starts or a backtracking time point at which the current time point backtracks the backtracking duration is between the recording start time points. Optionally, the step of determining the backtracking duration and the delay duration according to the current time point and the target recording duration includes: after determining the backtracking time length and the delayed time length according to the target recording time length, acquiring the backtracking time length after the backtracking time length is backtracked from the current time point forward, judging whether the backtracking time length is between the recording start time points, if so, acquiring the time length between the current time point and the recording start time point, and determining the time length as the backtracking time length.

Optionally, after the backtracking duration and the delay duration are obtained, the backtracking duration is determined by backtracking the current time point forward, and optionally, the backtracking duration is 30s, the current time point is "11: 08:30: 100", and the backtracking time point is "11: 08:00: 100".

Optionally, the current time point is delayed by the delay time length to determine the delay time point, optionally, the delay time length is 30s, and the current time point is "11: 08:30: 100", and then the delay time point is "11: 09:00: 100".

Alternatively, in acquiring the recorded voice, it may occur that the target keyword occurs before the delayed time point after the current time point, that is, the target keyword includes at least two keywords, based on which, referring to fig. 9, the S21 includes:

s213, determining or generating the backtracking time point based on the time point corresponding to the starting target keyword as the current time point, and/or determining or generating the postponing time point based on the time point corresponding to the ending target keyword as the current time point.

Optionally, when the target keyword continuously appears in the recorded voice, that is, the time length between the time point corresponding to the next target keyword and the time point corresponding to the current target keyword is less than or equal to the delay time length, the backtracking time point is determined based on the time point corresponding to the starting target keyword, and/or the delay time point is determined based on the time point corresponding to the ending target keyword. Optionally, the starting target keyword is a first-appearing target keyword, and the terminating target keyword is a last-appearing target keyword.

Optionally, the backtracking time length is 30s, the delay time length is 30s, the target keyword is "learning", the time point at which the "learning" appears first is "11: 08:30: 100", that is, the time point corresponding to the initial target keyword is "11: 08:30: 100". The last time point of occurrence of "learning" is "11: 08:40: 500", that is, the time point corresponding to the termination target keyword is "11: 08:40: 500". The time period corresponding to the time point of the first occurrence of "learning" being "11: 08:30: 100" and the time point of the last occurrence of "learning" being "11: 08:40: 500" is 10s400ms, which is smaller than the delay time period. Then the time point '11: 08:30: 100' corresponding to the starting target keyword determines that the backtracking time point is '11: 08:00: 100', and the time point '11: 09:10: 500' corresponding to the terminating target keyword determines that the delayed time point is '11: 09:10: 500'.

Optionally, after the backtracking time point and the delay time point are obtained, the target speech segment is determined from the speech between the backtracking time point and the delay time point.

In the embodiment of the application, a backtracking time point and/or a delay time point are determined according to the forward backtracking and/or backward extension of the current time point, and then a target voice segment is determined according to the backtracking time point and/or the delay time point. Based on this, when the voice includes the target keyword, the voice between the current time point and the delayed time point can be acquired, and the voice before the current time point can be acquired, so that omission of important recording content is avoided, and a user can acquire the important recording content completely.

Third embodiment

Referring to fig. 10, the S22 includes, according to the second embodiment:

s221, configuring the voice recorded from the backtracking time point to the current time point as a first voice segment, and/or configuring the voice recorded from the current time point to the delayed time point as a second voice segment;

s222, determining or generating the target speech segment according to the first speech segment and/or the second speech segment.

In an embodiment of the present application, the first voice segment is a voice recorded between the backtracking time and the current time point, and the second voice segment is a voice recorded between the current time point and the delayed time point. Referring to fig. 11, fig. 11 is a recording interface display diagram. And the pointer in the recording interface is used for indicating the current time point, and the pointer in the recording interface simultaneously rotates clockwise and anticlockwise, wherein S1 generated by anticlockwise rotation is a first voice segment, and S2 generated clockwise is a second voice segment.

Optionally, after the first speech segment and the second speech segment are obtained, the determining or generating the target speech segment according to the first speech segment and/or the second speech segment may be to determine or generate the target speech segment after combining the first speech segment and the second speech segment, or may be to determine or generate the target speech segment according to the first speech segment, or may be to determine or generate the target speech segment according to the second speech segment.

Optionally, during the continuous recording of the voice, a target keyword may appear in a second voice segment recorded after the current time point. Based on this, referring to fig. 12, after S221, the method further includes:

s223, after the current time point, acquiring a new current time point when the target keyword is identified again;

s224, determining or generating a new delay time point according to the new current time point and the delay time length;

s225, configuring the recorded voice from the new current time point to the new delay time point into a third voice segment;

s226, determining or generating the target speech segment according to the first speech segment and/or the third speech segment.

Optionally, a target keyword may appear in a second speech segment recorded after the current time point, and when the target keyword still appears in the second speech segment, a time point corresponding to the target keyword is obtained in the second speech segment, and the time point is determined as the new current time point.

Optionally, after the new current time point is obtained, the delay duration is added to the second time point, so as to obtain a new delay time point corresponding to the new current time point. Optionally, the delay time length is 30s, the new current time point is "11: 08:30: 100", and the new delay time point is "11: 09:00: 100".

Optionally, continuing to record the voice to the new delayed time point based on the new current time point, and taking the voice recorded between the current time point and the new delayed time point as a third voice segment.

Optionally, after the third speech segment is obtained, the target speech segment may be determined or generated according to the first speech segment and/or the third speech segment by merging the first speech segment and the third speech segment to determine or generate the target speech segment, or may be determined or generated according to the first speech segment, or may be determined or generated according to the third speech segment.

It can be understood that when the target keyword continuously appears in the continuously recorded voice after the current time point, the voice is continuously recorded until the target keyword does not appear in the subsequently recorded voice, then the last time point of appearance of the target keyword is obtained from the subsequently recorded voice recorded after the current time point, then the last time point of appearance is extended backward for a preset time length to obtain a final delay time point, the voice is continuously recorded to the final delay time point, and the recorded voice and/or the first voice segment between the current time point and the final delay time point are determined or the target voice segment is generated.

In this embodiment of the present application, the first speech segment and/or the second speech segment are determined according to the backtracking time point, the current time point, and the delay time period, or the first speech segment and/or the third speech segment are determined according to the backtracking time point, the current time point, and the new delay time point. Optionally, the first speech segment is a speech segment recorded before the current time point, and the second speech segment and the third speech segment are speech segments recorded after the current time point. Based on this, when the speech includes the target keyword, not only the speech from the backtracking time point to the current time point but also the speech from the current time point to the delay time point may be acquired. Optionally, when the target keyword continuously appears in the continuous voice, the voice between the backtracking time point and the current time point can be obtained, and the voice between the current time point and the new delay time point can also be obtained, so that the important recording content is ensured not to be leaked, and the user can completely obtain the important recording content.

The embodiment of the application further provides an intelligent terminal, the intelligent terminal comprises a storage and a processor, the storage is stored with a recording program, and the recording program is executed by the processor to realize the steps of the recording method in any one of the embodiments.

An embodiment of the present application further provides a readable storage medium, where the storage medium stores a recording program, and the recording program, when executed by a processor, implements the steps of the recording method in any of the above embodiments.

In the embodiments of the intelligent terminal and the computer-readable storage medium provided in the present application, all technical features of any one of the foregoing recording method embodiments may be included, and the expanding and explaining contents of the specification are basically the same as those of the foregoing method embodiments, and are not described herein again.

Embodiments of the present application also provide a computer program product, which includes computer program code, when the computer program code runs on a computer, the computer is caused to execute the method in the above various possible embodiments.

Embodiments of the present application further provide a chip, which includes a memory and a processor, where the memory is used to store a computer program, and the processor is used to call and run the computer program from the memory, so that a device in which the chip is installed executes the method in the above various possible embodiments.

It is to be understood that the foregoing scenarios are only examples, and do not constitute a limitation on application scenarios of the technical solutions provided in the embodiments of the present application, and the technical solutions of the present application may also be applied to other scenarios. For example, as can be known by those skilled in the art, with the evolution of system architecture and the emergence of new service scenarios, the technical solution provided in the embodiments of the present application is also applicable to similar technical problems.

The above-mentioned serial numbers of the embodiments of the present application are merely for description and do not represent the merits of the embodiments.

The steps in the method of the embodiment of the application can be sequentially adjusted, combined and deleted according to actual needs.

The units in the device in the embodiment of the application can be merged, divided and deleted according to actual needs.

In the present application, the same or similar term concepts, technical solutions and/or application scenario descriptions will be generally described only in detail at the first occurrence, and when the description is repeated later, the detailed description will not be repeated in general for brevity, and when understanding the technical solutions and the like of the present application, reference may be made to the related detailed description before the description for the same or similar term concepts, technical solutions and/or application scenario descriptions and the like which are not described in detail later.

In the present application, each embodiment is described with emphasis, and reference may be made to the description of other embodiments for parts that are not described or illustrated in any embodiment.

The technical features of the technical solution of the present application may be arbitrarily combined, and for brevity of description, all possible combinations of the technical features in the embodiments are not described, however, as long as there is no contradiction between the combinations of the technical features, the scope of the present application should be considered as being described in the present application.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solutions of the present application may be embodied in the form of a software product, which is stored in a storage medium (e.g., ROM/RAM, magnetic disk, optical disk) and includes instructions for enabling a terminal device (e.g., a mobile phone, a computer, a server, a controlled terminal, or a network device) to execute the method of each embodiment of the present application.

In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. The procedures or functions according to the embodiments of the present application are all or partially generated when the computer program instructions are loaded and executed on a computer. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, the computer instructions may be transmitted from one website, computer, server, or data center to another website, computer, server, or data center by wire (e.g., coaxial cable, fiber optic, digital subscriber line) or wirelessly (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a magnetic medium (e.g., floppy Disk, memory Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., Solid State Disk (SSD)), among others.

The above description is only a preferred embodiment of the present application, and not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application, or which are directly or indirectly applied to other related technical fields, are included in the scope of the present application.

Claims

1. A sound recording method, comprising the steps of:

2. The method according to claim 1, wherein said determining or generating a target speech segment according to said current point in time comprises the steps of:

3. The method of claim 2, wherein the step of S21 includes at least one of:

determining a backtracking time length and a delayed time length according to the target recording time length, determining or generating the backtracking time point according to the current time point and the backtracking time length, and/or determining or generating the delayed time point according to the current time point and the delayed time length;

4. The method of claim 2, wherein the step of S22 includes:

5. The method of claim 4, wherein the step of S22 further comprises:

6. The method of any one of claims 1 to 5, wherein the pre-set processing comprises at least one of:

storing the target voice segment into a preset storage area;

playing the target voice segment;

7. The method according to any one of claims 1 to 5, wherein the step of S10 is preceded by the step of:

acquiring an input text signal;

and determining the target keyword according to the character signal.

8. The method according to any one of claims 1 to 5, wherein the step S10 includes:

converting the voice into text information, and associating time information corresponding to the voice with the text information;

and identifying target text information matched with the target keyword in the text information of the voice, and configuring time information associated with the target text information as the current time point.

9. An intelligent terminal, characterized in that, intelligent terminal includes: memory, processor, wherein the memory has stored thereon a recording program which when executed by the processor implements the steps of the recording method of any one of claims 1 to 8.

10. A storage medium, characterized in that the storage medium has stored thereon a computer program which, when being executed by a processor, carries out the steps of the sound recording method as claimed in any one of claims 1 to 8.