WO2023236794A1 - 一种音轨标记方法及电子设备 - Google Patents

一种音轨标记方法及电子设备 Download PDF

Info

Publication number
WO2023236794A1
WO2023236794A1 PCT/CN2023/096664 CN2023096664W WO2023236794A1 WO 2023236794 A1 WO2023236794 A1 WO 2023236794A1 CN 2023096664 W CN2023096664 W CN 2023096664W WO 2023236794 A1 WO2023236794 A1 WO 2023236794A1
Authority
WO
WIPO (PCT)
Prior art keywords
display
audio
audio track
content
electronic device
Prior art date
Application number
PCT/CN2023/096664
Other languages
English (en)
French (fr)
Inventor
肖冬
Original Assignee
华为技术有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 华为技术有限公司 filed Critical 华为技术有限公司
Publication of WO2023236794A1 publication Critical patent/WO2023236794A1/zh

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0484Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
    • G06F3/0485Scrolling or panning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/048Interaction techniques based on graphical user interfaces [GUI]
    • G06F3/0484Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
    • G06F3/04842Selection of displayed objects or displayed text elements
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Definitions

  • the present application relates to the field of terminal technology, and in particular to an audio track marking method and electronic equipment.
  • Speech transcription refers to a technology that converts speech content in audio into text. It is widely used in conference scenarios and online learning scenarios. For example, when voice transcription is used in a meeting scenario, the meeting recording can be transcribed and the meeting voice content can be transcribed into text, making it easier to record and view the meeting content.
  • the present application provides an audio track marking method and electronic device to provide a method for quickly marking key content on an audio track.
  • this application provides an audio track marking method.
  • the method may be performed by an electronic device, and the method includes: displaying a track of the audio in a display screen when recording audio or playing the audio; and, in response to a first operation triggered by a user, changing all the audio tracks on the audio track.
  • the first position corresponding to the first operation is displayed in a first display mode; wherein the first operation is used to indicate marking the first position of the audio track.
  • the electronic device when the electronic device records audio or plays audio, it displays the audio track of the audio on the display screen.
  • the user can trigger the first operation at any time, and the electronic device can display the first position corresponding to the first operation on the audio track as the first operation.
  • a display style can quickly mark the position of key audio in the audio track, making it convenient for users to find key audio at any time and improving user experience.
  • the method further includes: in response to the first operation triggered by the user, The second operation triggered by the user displays the first area between the first position on the audio track and the second position corresponding to the second operation as the first display style; wherein, the A second operation is used to indicate an end marker at the second position of the track, the second position being after the first position.
  • the electronic device can determine the first area in the audio track according to the first operation and the second operation triggered by the user, and display the first area as the first display style, thereby performing the operation on the area corresponding to the key audio in the audio track. Mark so that users can find it at any time.
  • the first display style includes at least one of the following: a background color different from the default display style of the audio track; a different audio track shape from the default display style of the audio track; The default display of the audio track Display track colors with different styles.
  • the method further includes: performing voice transcription on the audio, and displaying the voice transcription content corresponding to the audio; and translating the voice transcription corresponding to the first position in the voice transcription content.
  • the written content is displayed in the second display style.
  • the speech transcription content corresponding to the first position may be the speech transcription content of the audio corresponding to the audio track area of preset length including the first position, and the first position may be located at the starting position of the audio track area, End position or intermediate position.
  • the method further includes: performing speech transcription on the audio, and displaying the speech transcription content corresponding to the audio; and translating the speech transcription corresponding to the first area in the speech transcription content.
  • the written content is displayed in the second display style.
  • the speech transcription content corresponding to the first area includes the speech transcription content of the audio corresponding to the first area.
  • the electronic device can also mark the speech transcription content corresponding to the first position or the first area marked by the user, so that the key content can be highlighted for the user to view.
  • the second display style includes at least one of the following: a font that is different from the default display style of the voice-transcribed content; a text color that is different from the default display style of the voice-transcribed content; A background color that is different from the default display style of the speech-transcribed content.
  • the method further includes: in response to a third operation triggered by the user, adding an annotation area at a third position corresponding to the third operation; and displaying an annotation editing interface on the display screen. , receiving the annotation content input by the user on the annotation editing interface, and displaying the annotation content in the annotation area.
  • the electronic device can respond to the third operation triggered by the user and display the annotations input by the user at the third position, thereby marking key content and making it easier for the user to find audio content or voice transcription content.
  • the method further includes: displaying an area corresponding to the third position on the audio track as a third display style.
  • displaying the annotation editing interface on the display screen includes: displaying a voice transcription interface and the annotation editing interface in split screens on the display screen; or displaying on the display screen A floating window is displayed on the voice transcription interface, and the annotation editing interface is displayed in the floating mouth; wherein, the voice transcription interface includes the audio track of the audio and the voice transcription content corresponding to the audio.
  • the electronic device can display the annotation editing interface on a split screen or display the annotation editing interface in a floating window, and flexibly receive user input annotation content.
  • the method further includes: displaying a voice transcription interface and a minutes editing interface in split screens on the display screen; wherein the voice transcription interface includes the audio track of the audio and the audio Corresponding speech transcription content, the minutes editing interface includes the target speech transcription content and/or the playback control corresponding to the target speech transcription content; the target speech transcription content includes the audio track marked in the audio track
  • the speech transcription content corresponding to the area and/or the speech transcription content corresponding to the annotated audio track area, and the playback control corresponding to the target speech transcription content is used to play the audio clip corresponding to the target speech transcription content.
  • the electronic device can also display the voice transcription interface and the minutes editing interface in split screens.
  • the minutes editing interface can include the voice transcription content corresponding to the marked audio track area and/or the voice transcription corresponding to the annotated audio track area. Content, users can directly use these contents to edit minutes, which facilitates user operations and improves user experience.
  • the minutes editing interface can also include playback controls corresponding to the target speech transcribed content. Users can click the playback control to repeatedly listen to the audio content corresponding to the marked audio track area at any time, improving the user experience.
  • the audio track when the audio track includes multiple marked audio track areas, the multiple marked audio track areas correspond to at least one display style, and the at least one display style is user-defined. of.
  • the audio track areas triggered by different users have different display styles.
  • the method further includes: in response to a movement operation triggered by the user on a sliding control in the display interface, adjusting the voice transcription displayed in the display interface according to the distance corresponding to the movement operation. content.
  • the present application provides an electronic device, which includes a plurality of functional modules; the plurality of functional modules interact to implement the method shown in the first aspect and its respective embodiments.
  • the multiple functional modules can be implemented based on software, hardware, or a combination of software and hardware, and the multiple functional modules can be arbitrarily combined or divided based on specific implementation.
  • the present application provides an electronic device, including at least one processor and at least one memory.
  • Computer program instructions are stored in the at least one memory.
  • the at least one processor executes the above-mentioned first step. Aspects and methods shown in various embodiments thereof.
  • the present application also provides a computer program product containing instructions, which when the computer program product is run on a computer, causes the computer to execute the method shown in the above-mentioned first aspect and its respective implementation modes.
  • the present application also provides a computer-readable storage medium.
  • a computer program is stored in the computer-readable storage medium.
  • the computer program is executed by a computer, the computer is caused to execute the first aspect and the above. Methods shown in each embodiment.
  • the present application also provides a chip, which is used to read a computer program stored in a memory and execute the method shown in the above-mentioned first aspect and its respective implementation modes.
  • the present application also provides a chip system.
  • the chip system includes a processor and is used to support a computer device to implement the method shown in the above-mentioned first aspect and its respective implementation modes.
  • the chip system further includes a memory, and the memory is used to store necessary programs and data of the computer device.
  • the chip system can be composed of chips or include chips and other discrete devices.
  • Figure 1 is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • Figure 2 is a software structure block diagram of an electronic device provided by an embodiment of the present application.
  • Figure 3 is a schematic diagram of a display interface for displaying an audio track on an electronic device provided by an embodiment of the present application
  • Figure 4 is a schematic diagram of a user triggering a first operation provided by an embodiment of the present application
  • Figure 5 is a schematic diagram of displaying a first position mark provided by an embodiment of the present application.
  • Figure 6 is a schematic diagram of a display of an audio track provided by an embodiment of the present application.
  • Figure 7 is a schematic diagram of marking the speech transcription content corresponding to the first position provided by an embodiment of the present application.
  • Figure 8 is a schematic diagram of marking the speech transcription content corresponding to the first area provided by an embodiment of the present application.
  • Figure 9 is a schematic diagram of an interface for voice transcription of audio provided by an embodiment of the present application.
  • Figure 10 is a schematic diagram of a display of an audio track area containing multiple marks on an audio track provided by an embodiment of the present application
  • Figure 11 is a schematic diagram of the first annotation editing interface provided by the embodiment of the present application.
  • Figure 12 is a schematic diagram of another annotation editing interface provided by an embodiment of the present application.
  • Figure 13 is a schematic diagram of displaying annotations on an audio track according to an embodiment of the present application.
  • Figure 14 is a schematic diagram of a mark display style provided by an embodiment of the present application.
  • Figure 15 is a schematic diagram showing annotations on a sound track provided by an embodiment of the present application.
  • Figure 16 is a schematic diagram of an interface for editing speech transcription content provided by an embodiment of the present application.
  • Figure 17 is a schematic diagram showing a record editing interface provided by an embodiment of the present application.
  • Figure 18 is a schematic display diagram of yet another minutes editing interface provided by an embodiment of the present application.
  • Figure 19 is a schematic diagram showing a display of speech transcription content provided by an embodiment of the present application.
  • Figure 20 is a schematic diagram of the first audio track display method provided by the embodiment of the present application.
  • Figure 21 is a schematic diagram of yet another audio track display method provided by an embodiment of the present application.
  • Figure 22 is a flow chart of an audio track marking method provided by an embodiment of the present application.
  • At least one refers to one or more, and “multiple” refers to two or more.
  • “And/or” describes the relationship between associated objects, indicating that there can be three relationships. For example, A and/or B can mean: A exists alone, A and B exist simultaneously, and B exists alone, where A, B can be singular or plural. The character “/” generally indicates that the related objects are in an “or” relationship. "At least one (item) of the following” or similar expressions thereof refers to any combination of these items, including any combination of single item (items) or plural items (items).
  • At least one of a, b or c can mean: a, b, c, a and b, a and c, b and c, or a, b and c, where a, b, c Can be single or multiple.
  • Speech transcription refers to a technology that converts speech content in audio into text. It is widely used in conference scenarios and online learning scenarios. For example, when voice transcription is used in a meeting scenario, the meeting recording can be transcribed and the meeting voice content can be transcribed into text, making it easier to record and view the meeting content.
  • this application provides an audio track marking method to provide a method for quickly marking key content on the audio track.
  • the audio track marking method provided by the embodiment of the present application can be applied to electronic devices.
  • Electronic devices and embodiments for using such electronic devices are described below.
  • the electronic device in the embodiment of the present application may be, for example, a tablet computer, a mobile phone, a vehicle-mounted device, an augmented reality (AR)/virtual reality (VR) device, a notebook computer, or an ultra-mobile personal computer. , UMPC), netbook, personal digital assistant (personal digital assistant, PDA), wearable devices, Internet of things (Internet of things, IoT) devices, vehicles, etc.
  • the embodiments of this application do not place any restrictions on the specific types of electronic devices.
  • the electronic device may support a stylus.
  • FIG. 1 is a schematic structural diagram of an electronic device 100 provided by an embodiment of the present application.
  • the electronic device 100 may include a processor 110 , an external memory interface 120 , an internal memory 121 , a universal serial bus (USB) interface 130 , a charging management module 140 , a power management module 141 , and a battery 142 , Antenna 1, Antenna 2, mobile communication module 150, wireless communication module 160, audio module 170, speaker 170A, receiver 170B, microphone 170C, headphone interface 170D, sensor module 180, button 190, motor 191, indicator 192, camera 193 , display screen 194, and subscriber identification module (subscriber identification module, SIM) card interface 195 wait.
  • SIM subscriber identification module
  • the processor 110 may include one or more processing units.
  • the processor 110 may include an application processor (application processor, AP), a modem processor, a graphics processing unit (GPU), and an image signal processor. (image signal processor, ISP), controller, memory, video codec, digital signal processor (digital signal processor, DSP), baseband processor, and/or neural-network processing unit (NPU) wait.
  • image signal processor, ISP image signal processor
  • controller may be the nerve center and command center of the electronic device 100 .
  • the controller can generate operation control signals based on the instruction operation code and timing signals to complete the control of fetching and executing instructions.
  • the processor 110 may also be provided with a memory for storing instructions and data.
  • the memory in processor 110 is cache memory. This memory may hold instructions or data that have been recently used or recycled by processor 110 . If the processor 110 needs to use the instructions or data again, it can be called directly from the memory. Repeated access is avoided and the waiting time of the processor 110 is reduced, thus improving the efficiency of the system.
  • the USB interface 130 is an interface that complies with the USB standard specification, and may be a Mini USB interface, a Micro USB interface, a USB Type C interface, etc.
  • the USB interface 130 can be used to connect a charger to charge the electronic device 100, and can also be used to transmit data between the electronic device 100 and peripheral devices.
  • the charging management module 140 is used to receive charging input from the charger.
  • the power management module 141 is used to connect the battery 142, the charging management module 140 and the processor 110.
  • the power management module 141 receives input from the battery 142 and/or the charging management module 140, and supplies power to the processor 110, internal memory 121, external memory, display screen 194, camera 193, wireless communication module 160, etc.
  • the wireless communication function of the electronic device 100 can be implemented through the antenna 1, the antenna 2, the mobile communication module 150, the wireless communication module 160, the modem processor and the baseband processor.
  • Antenna 1 and Antenna 2 are used to transmit and receive electromagnetic wave signals.
  • Each antenna in electronic device 100 may be used to cover a single or multiple communication frequency bands. Different antennas can also be reused to improve antenna utilization. For example: Antenna 1 can be reused as a diversity antenna for a wireless LAN. In other embodiments, antennas may be used in conjunction with tuning switches.
  • the mobile communication module 150 can provide solutions for wireless communication including 2G/3G/4G/5G applied on the electronic device 100 .
  • the mobile communication module 150 may include at least one filter, switch, power amplifier, low noise amplifier (LNA), etc.
  • the mobile communication module 150 can receive electromagnetic waves through the antenna 1, perform filtering, amplification and other processing on the received electromagnetic waves, and transmit them to the modem processor for demodulation.
  • the mobile communication module 150 can also amplify the signal modulated by the modem processor and convert it into electromagnetic waves through the antenna 1 for radiation.
  • at least part of the functional modules of the mobile communication module 150 may be disposed in the processor 110 .
  • at least part of the functional modules of the mobile communication module 150 and at least part of the modules of the processor 110 may be provided in the same device.
  • the wireless communication module 160 can provide applications on the electronic device 100 including wireless local area networks (WLAN) (such as wireless fidelity (Wi-Fi) network), Bluetooth (bluetooth, BT), and global navigation satellites.
  • WLAN wireless local area networks
  • System global navigation satellite system, GNSS
  • frequency modulation frequency modulation, FM
  • near field communication technology near field communication, NFC
  • infrared technology infrared, IR
  • the wireless communication module 160 may be one or more devices integrating at least one communication processing module.
  • the wireless communication module 160 receives electromagnetic waves via the antenna 2 , frequency modulates and filters the electromagnetic wave signals, and sends the processed signals to the processor 110 .
  • the wireless communication module 160 can also receive the signal to be sent from the processor 110, frequency modulate it, amplify it, and convert it into electromagnetic waves through the antenna 2 for radiation.
  • the antenna 1 of the electronic device 100 is coupled to the mobile communication module 150, and the antenna 2 is coupled to the wireless communication module 150.
  • the communication module 160 is coupled so that the electronic device 100 can communicate with the network and other devices through wireless communication technology.
  • the wireless communication technology may include global system for mobile communications (GSM), general packet radio service (GPRS), code division multiple access (CDMA), broadband Code division multiple access (wideband code division multiple access, WCDMA), time-division code division multiple access (TD-SCDMA), long term evolution (long term evolution, LTE), BT, GNSS, WLAN, NFC , FM, and/or IR technology, etc.
  • the GNSS may include global positioning system (GPS), global navigation satellite system (GLONASS), Beidou navigation satellite system (BDS), quasi-zenith satellite system (quasi -zenith satellite system (QZSS) and/or satellite based augmentation systems (SBAS).
  • GPS global positioning system
  • GLONASS global navigation satellite system
  • BDS Beidou navigation satellite system
  • QZSS quasi-zenith satellite system
  • SBAS satellite based augmentation systems
  • the display screen 194 is used to display a display interface of an application, such as displaying a display page of an application installed on the electronic device 100 .
  • Display 194 includes a display panel.
  • the display panel can use a liquid crystal display (LCD), an organic light-emitting diode (OLED), an active matrix organic light emitting diode or an active matrix organic light emitting diode (active-matrix organic light emitting diode).
  • LCD liquid crystal display
  • OLED organic light-emitting diode
  • AMOLED organic light-emitting diode
  • FLED flexible light-emitting diode
  • Miniled MicroLed, Micro-oLed, quantum dot light emitting diode (QLED), etc.
  • the electronic device 100 may include 1 or N display screens 194, where N is a positive integer greater than 1.
  • Camera 193 is used to capture still images or video.
  • the object passes through the lens to produce an optical image that is projected onto the photosensitive element.
  • the photosensitive element can be a charge coupled device (CCD) or a complementary metal-oxide-semiconductor (CMOS) phototransistor.
  • CMOS complementary metal-oxide-semiconductor
  • the photosensitive element converts the optical signal into an electrical signal, and then passes the electrical signal to the ISP to convert it into a digital image signal.
  • ISP outputs digital image signals to DSP for processing.
  • DSP converts digital image signals into standard RGB, YUV and other format image signals.
  • the electronic device 100 may include 1 or N cameras 193, where N is a positive integer greater than 1.
  • Internal memory 121 may be used to store computer executable program code, which includes instructions.
  • the processor 110 executes instructions stored in the internal memory 121 to execute various functional applications and data processing of the electronic device 100 .
  • the internal memory 121 may include a program storage area and a data storage area.
  • the stored program area can store an operating system, software code of at least one application program, etc.
  • the storage data area may store data generated during use of the electronic device 100 (such as captured images, recorded videos, etc.).
  • the internal memory 121 may include high-speed random access memory, and may also include non-volatile memory, such as at least one disk storage device, flash memory device, universal flash storage (UFS), etc.
  • the external memory interface 120 can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the electronic device.
  • the external memory card communicates with the processor 110 through the external memory interface 120 to implement the data storage function. For example, save pictures, videos, etc. files on an external memory card.
  • the electronic device 100 can implement audio functions through the audio module 170, the speaker 170A, the receiver 170B, the microphone 170C, the headphone interface 170D, and the application processor. Such as music playback, recording, etc.
  • the sensor module 180 may include a pressure sensor 180A, an acceleration sensor 180B, a touch sensor 180C, etc.
  • the pressure sensor 180A is used to sense pressure signals and can convert the pressure signals into electrical signals.
  • pressure sensor 180A may be disposed on display screen 194 .
  • Touch sensor 180C also known as "touch panel”.
  • the touch sensor 180C can be provided on the display screen 194, and can be The touch sensor 180C and the display screen 194 form a touch screen, also called a "touch screen”.
  • the touch sensor 180C is used to detect a touch operation on or near the touch sensor 180C.
  • the touch sensor can pass the detected touch operation to the application processor to determine the touch event type.
  • Visual output related to the touch operation may be provided through display screen 194 .
  • the touch sensor 180C may also be disposed on the surface of the electronic device 100 at a location different from that of the display screen 194 .
  • the buttons 190 include a power button, a volume button, etc.
  • Key 190 may be a mechanical key. It can also be a touch button.
  • the electronic device 100 may receive key inputs and generate key signal inputs related to user settings and function control of the electronic device 100 .
  • the motor 191 can generate vibration prompts.
  • the motor 191 can be used for vibration prompts for incoming calls and can also be used for touch vibration feedback.
  • touch operations for different applications (such as taking pictures, audio playback, etc.) can correspond to different vibration feedback effects.
  • the touch vibration feedback effect can also be customized.
  • the indicator 192 may be an indicator light, which may be used to indicate charging status, power changes, or may be used to indicate messages, missed calls, notifications, etc.
  • the SIM card interface 195 is used to connect a SIM card. The SIM card can be connected to and separated from the electronic device 100 by inserting it into the SIM card interface 195 or pulling it out from the SIM card interface 195 .
  • FIG. 1 do not constitute a specific limitation on the electronic device 100.
  • the electronic device may also include more or less components than shown in the figure, or some components may be combined or separated. , or a different component arrangement.
  • the combination/connection relationship between the components in Figure 1 can also be adjusted and modified.
  • Figure 2 is a software structure block diagram of an electronic device provided by an embodiment of the present application.
  • the software structure of electronic equipment can be a layered architecture.
  • the software can be divided into several layers, and each layer has clear roles and division of labor.
  • the layers communicate through software interfaces.
  • the operating system is divided into four layers, from top to bottom: application layer, application framework layer (framework, FWK), runtime (runtime) and system library, and kernel layer.
  • the application layer can include a series of application packages. As shown in Figure 2, the application layer can include cameras, settings, skin modules, user interface (UI), third-party applications, etc. Among them, third-party applications can include gallery, calendar, calls, maps, navigation, WLAN, Bluetooth, music, video, short messages, etc.
  • the application layer may include a target installation package of a target application that the electronic device requests to download from the server, and the function files and layout files in the target installation package are adapted to the electronic device.
  • the application framework layer provides an application programming interface (API) and programming framework for applications in the application layer.
  • the application framework layer can include some predefined functions. As shown in Figure 2, the application framework layer can include window manager, content provider, view system, phone manager, resource manager, and notification manager.
  • a window manager is used to manage window programs.
  • the window manager can obtain the display size, determine whether there is a status bar, lock the screen, capture the screen, etc.
  • Content providers are used to store and retrieve data and make this data accessible to applications. Said data can include videos, images, audio, calls made and received, browsing history and bookmarks, phone books, etc.
  • the view system includes visual controls, such as controls that display text, controls that display pictures, etc.
  • a view system can be used to build applications.
  • the display interface can be composed of one or more views.
  • a display interface including a text message notification icon may include a view for displaying text and a view for displaying pictures.
  • Telephone managers are used to provide communication functions of electronic devices. For example, call status management (including connected, hung up, etc.).
  • the resource manager provides various resources to applications, such as localized strings, icons, pictures, layout files, video files etc.
  • the notification manager allows applications to display notification information in the status bar, which can be used to convey notification-type messages and can automatically disappear after a short stay without user interaction.
  • the notification manager is used to notify download completion, message reminders, etc.
  • the notification manager can also be notifications that appear in the status bar at the top of the system in the form of charts or scroll bar text, such as notifications for applications running in the background, or notifications that appear on the screen in the form of conversation windows. For example, text information is prompted in the status bar, a beep sounds, the electronic device vibrates, the indicator light flashes, etc.
  • the runtime includes core libraries and virtual machines.
  • the runtime is responsible for the scheduling and management of the operating system.
  • the core library contains two parts: one part is the functional functions that need to be called by the Java language, and the other part is the core library of the operating system.
  • the application layer and application framework layer run in virtual machines.
  • the virtual machine executes the java files of the application layer and application framework layer into binary files.
  • the virtual machine is used to perform object life cycle management, stack management, thread management, security and exception management, and garbage collection and other functions.
  • System libraries can include multiple functional modules. For example: surface manager (surface manager), media libraries (media libraries), three-dimensional graphics processing libraries (for example: OpenGL ES), two-dimensional graphics engines (for example: SGL), image processing libraries, etc.
  • surface manager surface manager
  • media libraries media libraries
  • three-dimensional graphics processing libraries for example: OpenGL ES
  • two-dimensional graphics engines for example: SGL
  • image processing libraries etc.
  • the surface manager is used to manage the display subsystem and provides the fusion of 2D and 3D layers for multiple applications.
  • the media library supports playback and recording of a variety of commonly used audio and video formats, as well as static image files, etc.
  • the media library can support a variety of audio and video encoding formats, such as: MPEG4, H.264, MP3, AAC, AMR, JPG, PNG, etc.
  • the 3D graphics processing library is used to implement 3D graphics drawing, image rendering, composition, and layer processing.
  • 2D Graphics Engine is a drawing engine for 2D drawing.
  • the kernel layer is the layer between hardware and software.
  • the kernel layer contains at least display driver, camera driver, audio driver, and sensor driver.
  • the hardware layer can include various types of sensors, such as acceleration sensors, gyroscope sensors, touch sensors, etc.
  • the structure shown in Figure 1 and Figure 2 is only an example of the electronic device provided by the embodiment of the present application, and cannot limit the electronic device provided by the embodiment of the present application.
  • the electronic device can There may be more or fewer devices or modules than in the structure shown in Figure 1 or Figure 2.
  • the audio track marking method provided by the embodiment of the present application is introduced below.
  • the electronic device can record audio.
  • the user can trigger the electronic device to record audio.
  • the audio recorded by the electronic device can be displayed on the display interface of the electronic device in the form of an audio track, and different positions on the audio track represent audio content corresponding to different times.
  • FIG. 3 is a schematic diagram of a display interface of an electronic device displaying an audio track according to an embodiment of the present application. Referring to Figure 3, the electronic device can display the audio track of the recorded audio. As the audio recording progresses, the duration of the audio track will also increase.
  • the electronic device can also play audio.
  • the audio played by the electronic device can also be displayed on the display interface of the electronic device in the form of an audio track.
  • the display interface shown in Figure 3 can be an audio track of audio played by the electronic device, and the user can click a position on the audio track to trigger the electronic device to play the content at the time corresponding to the position in the audio.
  • FIG. 4 is a schematic diagram of a user triggering a first operation according to an embodiment of the present application.
  • the first operation can be a setting gesture triggered by the user.
  • the first operation can be a double-click on the screen or a long press on the screen; or the first operation can be a user click on the display interface.
  • the controls in the keyboard interface For example, the user can click the "mark mode" control in the interface shown in Figure 4 to trigger the first operation; or the first operation can be for the user to press the buttons on the stylus pen body or perform settings on the stylus pen. Triggered after a certain gesture.
  • the electronic device may display the first position corresponding to the first operation on the audio track as the first display style in response to the first operation triggered by the user.
  • FIG. 5 is a schematic diagram of displaying a first position mark provided by an embodiment of the present application. Referring to Figure 5, the electronic device responds to the first operation triggered by the user, determines the first position corresponding to the first operation, and the electronic device displays the first position on the audio track as a first display style, as shown in the first display style in Figure 5 To display the arrow icon in the first position.
  • the electronic device after determining the first position corresponding to the first operation, can mark the audio track starting from the first position of the audio track, and display the audio track after the first position as First display style.
  • the first display style may be a display style that is different from the default display style of the audio track.
  • the first display style may be a background color different from the default display style, or the first display style may be a track shape or track color different from the default display style, etc.
  • FIG. 6 is a schematic diagram of a display of an audio track provided by an embodiment of the present application.
  • the electronic device displays the first area between the first position and the second position on the audio track as the first display style, and displays other areas as the default display style.
  • the first display style has a different background color than the default display style.
  • the electronic device can also determine an area on the audio track corresponding to an operation triggered by the user. If the user triggers a sliding operation on the audio track, the electronic device can determine the area corresponding to the sliding operation and mark the area. That is to say, the embodiment of the present application does not limit the operation mode of the user-triggered mark.
  • the electronic device may mark the first area on the audio track in response to the first operation and the second operation triggered by the user during the process of recording audio or playing audio.
  • the electronic device may, in response to the first operation triggered by the user, display the audio track area after the first position corresponding to the first operation as the first display style along with the audio recording or audio playback, and the electronic device may respond to the user's first operation. Triggered second action stop flag.
  • the electronic device can determine the first position corresponding to the first operation in response to the first operation triggered by the user, determine the second position corresponding to the second operation in response to the second operation triggered by the user, and then combine the first position and the third operation.
  • the first area between the two positions is displayed as the first display style. That is to say, the embodiments of the present application do not limit the manner in which the electronic device displays the first area as the first display style.
  • the electronic device can mark the first position of the first operation triggered by the user, and can also mark the third position determined by the first operation and the second operation triggered by the user. Mark an area.
  • the electronic device can also perform speech transcription on the audio, convert the audio content into text content, and display the text content in the display interface.
  • the electronic device can record and transcribe the voice simultaneously, or it can transcribe the audio after the electronic device completes the recording and obtains the audio.
  • the electronic device when it performs speech transcription on the audio, it can mark the speech transcription content corresponding to the first position on the audio track, and display the speech transcription content corresponding to the first position as the third position.
  • Two display styles wherein, the voice transcription content corresponding to the first position can be the voice transcription content of the audio corresponding to the audio track area of the preset length including the first position, and the first position can be located in the audio track area. Starting position, ending position or intermediate position.
  • the second display style is a display style that is different from the default display style of the speech transcribed content. For example, the second display style is a different font, text color, background color, etc. from the default display style of the speech transcribed content.
  • FIG. 7 is a schematic diagram of marking the speech transcription content corresponding to the first position provided by an embodiment of the present application.
  • the speech transcription content corresponding to the first position may be the speech transcription content corresponding to area A shown in FIG. 7 .
  • the first position is located in the middle of area A. It can be understood that the first position can also be located at the starting position or the end position of area A.
  • the preset length of area A can be an empirical value of a skilled person.
  • the voice transcription content corresponding to the first position is "January 1st”
  • the electronic device displays the voice transcription content corresponding to the first position as a second display style.
  • the second display style is the same as the voice transcription content. Default display style for written content with different background colors.
  • the electronic device can also mark the speech transcription content corresponding to the first area on the audio track, and display the speech transcription content corresponding to the first area as the second display style; wherein, the The speech transcription content corresponding to a region includes the speech transcription content of the audio corresponding to the first region.
  • the second display style is a display style that is different from the default display style of the speech transcribed content.
  • the second display style is a different font, text color, background color, etc. from the default display style of the speech transcribed content.
  • FIG. 8 is a schematic diagram of marking the speech transcription content corresponding to the first area provided by an embodiment of the present application.
  • the voice transcription content corresponding to the first area is "The meeting will start at 2 pm on January 1st".
  • the electronic device displays the voice transcription content corresponding to the first area as a second display style, such as the second
  • the display style is a different background color than the default display style for speech-transcribed content.
  • the first display style and the second display style may be the same to represent the corresponding relationship between the first area in the audio track and the speech transcription content.
  • the first display style and the second display style are the same.
  • the first display style and the second display style may also be different.
  • Figure 9 is a schematic diagram of an interface for voice transcription of audio provided by an embodiment of the present application.
  • the first display style may be the default display of the audio track.
  • the background color is different from the style
  • the second display style is a font that is different from the default display style of the speech transcription content.
  • the display styles of different audio track areas may be the same or different. That is to say, the multiple marked audio track areas correspond to at least one Display style, at least one display style can be customized by the user. For example, multiple marked audio track areas may all be displayed in the first display style, or different users may trigger marked audio track areas corresponding to different display styles.
  • FIG. 10 is a schematic diagram showing a track area containing multiple marks on a track provided by an embodiment of the present application.
  • area A on the audio track in Figure 10 is the audio track area marked by user 1
  • area B and area C are the audio track areas marked by user 2
  • area D is the audio track area marked by user 3.
  • Different users trigger Marked track areas are displayed differently.
  • the electronic device may respond to a third operation triggered by the user and add an annotation area at a third position corresponding to the third operation.
  • the third operation can be triggered by the user through setting gestures, controls in the display interface, or a stylus.
  • the electronic device can display the annotation editing interface in the display interface, receive the annotation content input by the user on the annotation editing interface, and display the annotation content in the annotation area.
  • the annotation content input by the user in the annotation editing interface includes but is not limited to text, pictures, audio, and video.
  • the electronic device can display the voice transcription interface and the annotation editing interface in split screens, or display a floating window on the voice transcription interface and display the annotation editing interface in the floating window.
  • FIG. 11 is a schematic diagram of the first annotation editing interface provided by an embodiment of the present application.
  • the left area of the display screen of the electronic device displays a speech transcription interface
  • the right area of the display screen of the electronic device displays an annotation editing interface.
  • the user can input annotation content on the annotation editing interface.
  • FIG. 12 is a schematic diagram of yet another annotation editing interface provided by an embodiment of the present application.
  • the electronic device displays the voice transcription interface
  • a floating window can be displayed on the voice transcription interface
  • the annotation editing interface can be displayed in the floating window. Users can enter comment content in the comment editing interface.
  • the electronic device can display the annotation at a third position corresponding to the third operation.
  • the electronic device can display the annotation after the user clicks on the third position, and hide the annotation after the user clicks on other positions in the display interface except the third position; or the electronic device can continuously display the annotation on the third position.
  • FIG. 13 is a schematic diagram of displaying annotations on an audio track according to an embodiment of the present application. Referring to Figure 13, after the user inputs the text "Meeting Time" in the annotation editing area shown in Figure 11 or Figure 12, the electronic device can display an annotation containing the text "Meeting Time” at the third position, thereby enhancing the key content.
  • the function of annotation is to facilitate users to find audio content or speech transcription content.
  • the electronic device may display the audio track at the third position in a third display style.
  • the third display style is different from the first display style to distinguish the audio track added by the user. Annotated areas of the track and areas of the track that have not been annotated by the user.
  • FIG. 14 is a schematic diagram of a mark display style provided by an embodiment of the present application.
  • the audio track area in the third display style indicates that the user has added annotations to this area
  • the audio track area in the first display style indicates that the user has marked this area but has not added annotations to this area.
  • the audio track area has the same display style as the voice transcription content corresponding to the audio track area, thereby indicating the corresponding relationship between the audio track area and the voice transcription content.
  • the annotation is hidden.
  • Figure 15 is a schematic diagram showing annotations on an audio track provided by an embodiment of the present application. Referring to Figure 15, after the user clicks on the audio track area where comments are added, the electronic device can display the comments added by the user above the audio track area.
  • FIG. 16 is a schematic diagram of an interface for editing voice transcription content provided by an embodiment of the present application.
  • the electronic device can display a keyboard interface in the display interface, and the user can edit the speech transcription content through the keyboard interface, such as deleting, modifying, or adding text.
  • the user can also edit the voice-transcribed content using a stylus.
  • the embodiment of the present application does not limit the method of editing the voice-transcribed content.
  • the electronic device when it displays the voice transcription interface, it can also display the minutes editing interface in split screen, and the user can edit the minutes in the minutes editing interface.
  • the minutes editing interface may include the target speech transcribed content and/or playback controls corresponding to the target speech transcribed content.
  • the target speech transcription content includes the speech transcription content corresponding to the marked audio track area and/or the speech transcription content corresponding to the annotated audio track area.
  • FIG. 17 is a schematic diagram showing a record editing interface provided by an embodiment of the present application.
  • the left area of the display screen of the electronic device displays the speech transcription interface
  • the right area of the display screen of the electronic device displays the minutes editing interface.
  • the electronic device can transcribe the voice transcription content corresponding to the marked audio track area in the audio track and the voice transcription corresponding to the annotated audio track area.
  • the content is displayed in the minutes editing interface.
  • the user triggers the electronic device to add comments in area A and area B, and marks area C.
  • the electronic device will translate the voice transcription content corresponding to area A as "next meeting” and the voice transcription content corresponding to area B. "The meeting will start at 2:30 pm on January 1st” and the voice transcription content corresponding to area C "The meeting location is 101" are displayed on the minutes editing interface. Users can edit minutes on the minutes editing interface. Due to electronic equipment The speech transcription content corresponding to the audio track area that has been annotated or marked by the user is displayed in the minutes editing interface. Users can directly use these contents to edit minutes, which facilitates user operations and improves user experience.
  • FIG. 18 is a schematic diagram of the display of yet another minutes editing interface provided by an embodiment of the present application.
  • the electronic device displays the voice transcription content corresponding to area A on the audio track as "next meeting” and the voice transcription content corresponding to area B as "the meeting will start at 2:30 pm on January 1" on the minutes editing interface.
  • the voice transcription content corresponding to area C is "The meeting location is 101”
  • the electronic device displays a playback control after each voice transcription content, and the user clicks "The meeting will start at 2:30 pm on January 1".
  • the electronic device plays the audio clip corresponding to area B on the audio track.
  • each display interface in the above embodiments are only examples and not limiting, and the display interface may also have other display forms.
  • the electronic device can display a sliding control, and the user can control the movement of the sliding control to view the voice transcription content that is not displayed in the current interface.
  • Figure 19 is a schematic diagram showing a display of speech transcribed content provided by an embodiment of the present application. Referring to Figure 19, when there is a lot of voice transcription content and the electronic device cannot fully display all the content in one interface, the user can trigger a mobile operation on the sliding control on the right side of the display interface. The electronic device will adjust the distance of the mobile operation according to the user's trigger. Adjust the speech transcription content displayed in the current display interface for user reference.
  • FIG 20 is a schematic diagram of the first audio track display method provided by the embodiment of the present application.
  • the electronic device can display movement controls in the audio track display area.
  • the movement controls on the left can control the audio track to move to the left, and the movement controls on the right can control the audio track to move to the right.
  • the electronic device can display different areas of the audio track in response to user-triggered actions on the mobile controls.
  • Figure 21 is a schematic diagram of yet another audio track display method provided by an embodiment of the present application.
  • the electronic device can compress and display the audio track of the audio with a duration of 20:30 in the current interface.
  • the user can click on the audio track to trigger the playback of the audio at the clicked position, or the user can drag the progress bar on the audio track to select The audio played.
  • this application also provides an audio track marking method, which can be executed by an electronic device, and the electronic device can have the structure shown in Figure 1 and/or Figure 2 .
  • Figure 22 is a flow chart of an audio track marking method provided by an embodiment of the present application. Referring to Figure 22, the method includes the following steps:
  • S2201 When recording audio or playing audio, display the audio track in the display.
  • S2202 In response to the first operation triggered by the user, display the first position corresponding to the first operation on the audio track as a first display style.
  • the first operation is used to indicate marking the first position of the audio track.
  • the present application also provides an electronic device.
  • the electronic device includes multiple functional modules; the multiple functional modules interact to realize the functions performed by the electronic device in the methods described in the embodiments of the present application.
  • the multiple functional modules can be implemented based on software, hardware, or a combination of software and hardware, and the multiple functional modules can be arbitrarily combined or divided based on specific implementation.
  • this application also provides an electronic device.
  • the electronic device includes at least one processor and at least one memory. Computer program instructions are stored in the at least one memory. When the electronic device runs, the At least one processor performs functions performed by the electronic device in each method described in the embodiments of this application.
  • this application also provides a computer program product containing instructions.
  • the computer program product When the computer program product is run on a computer, it causes the computer to execute the methods described in the embodiments of this application.
  • the present application also provides a computer-readable storage medium.
  • a computer program is stored in the computer-readable storage medium.
  • the computer program is executed by a computer, the computer is caused to execute the embodiments of the present application. Each method is described.
  • this application also provides a chip, which is used to read the computer program stored in the memory and implement the methods described in the embodiments of this application.
  • this application provides a chip system.
  • the chip system includes a processor and is used to support a computer device to implement the methods described in the embodiments of this application.
  • the chip system further includes a memory, and the memory is used to store necessary programs and data of the computer device.
  • the chip system may be composed of chips, or may include chips and other discrete devices.
  • embodiments of the present application may be provided as methods, systems, or computer program products. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment that combines software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having computer-usable program code embodied therein.
  • computer-usable storage media including, but not limited to, disk storage, CD-ROM, optical storage, etc.
  • These computer program instructions may also be stored in a computer-readable memory that causes a computer or other programmable data processing apparatus to operate in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including the instruction means, the instructions
  • the device implements the functions specified in a process or processes of the flowchart and/or a block or blocks of the block diagram.
  • These computer program instructions may also be loaded onto a computer or other programmable data processing device, causing a series of operating steps to be performed on the computer or other programmable device to produce computer-implemented processing, thereby executing on the computer or other programmable device.
  • Instructions provide steps for implementing the functions specified in a process or processes of a flowchart diagram and/or a block or blocks of a block diagram.

Abstract

本申请提供一种音轨标记方法及电子设备。在该方法中,电子设备在录制音频或播放音频时,在显示屏中显示音频的音轨。电子设备响应于用户触发的第一操作,将音轨上第一操作对应的第一位置显示为第一显示样式;其中,第一操作用于指示对音轨的第一位置进行标记。通过该方案,电子设备可以将用户触发标记的音轨上的第一位置显示为第一显示样式,从而快捷标记音轨中的重点音频位置,方便用户随时查找重点音频,提升用户体验。

Description

一种音轨标记方法及电子设备
相关申请的交叉引用
本申请要求在2022年06月06日提交中国专利局、申请号为202210633882.X、申请名称为“一种音轨标记方法及电子设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及终端技术领域,尤其涉及一种音轨标记方法及电子设备。
背景技术
语音转写是指将音频中的语音内容转写为文本的一种技术,在会议场景、在线学习场景中均得到广泛应用。如语音转写应用在会议场景时,可以对会议录音进行转写,将会议语音内容转写为文本,从而便于记录与查看会议内容。
但是,当用户需要对某段语音转写内容进行确认或者标记时,需要反复听录音寻找对应的语音位置,操作繁琐且效率较低。
发明内容
本申请提供一种音轨标记方法及电子设备,用以提供一种在音轨上快捷标记重点内容的方法。
第一方面,本申请提供一种音轨标记方法。该方法可以由电子设备执行,该方法包括:在录制音频或播放所述音频时,在显示屏中显示所述音频的音轨;响应于用户触发的第一操作,将所述音轨上所述第一操作对应的第一位置显示为第一显示样式;其中,所述第一操作用于指示对所述音轨的第一位置进行标记。
基于上述方法,电子设备在录制音频或播放音频时,在显示屏中显示音频的音轨,用户随时可以触发第一操作,电子设备可以将音轨上第一操作对应的第一位置显示为第一显示样式,从而快捷标记音轨中的重点音频位置,方便用户随时查找重点音频,提升用户体验。
在一个可能的设计中,在所述响应于用户触发的第一操作,将所述音轨上第一操作对应的第一位置显示为第一显示样式之后,所述方法还包括:响应于所述用户触发的第二操作,将所述音轨上的所述第一位置到所述第二操作对应的第二位置之间的第一区域显示为所述第一显示样式;其中,所述第二操作用于指示在所述音轨的所述第二位置处结束标记,所述第二位置在所述第一位置之后。
通过该设计,电子设备可以根据用户触发的第一操作和第二操作确定音轨中的第一区域,并将第一区域显示为第一显示样式,从而对音轨中重点音频对应的区域进行标记,便于用户随时查找。
在一个可能的设计中,所述第一显示样式包括以下至少一项:与所述音轨的默认显示样式不同的背景颜色;与所述音轨的默认显示样式不同的音轨形状;与所述音轨的默认显 示样式不同的音轨颜色。
在一个可能的设计中,所述方法还包括:对所述音频进行语音转写,显示所述音频对应的语音转写内容;将所述语音转写内容中所述第一位置对应的语音转写内容显示为第二显示样式。其中,第一位置对应的语音转写内容可以为包括第一位置在内的预设长度的音轨区域对应的音频的语音转写内容,第一位置可以位于该音轨区域的起始位置、终止位置或中间位置。
在一个可能的设计中,所述方法还包括:对所述音频进行语音转写,显示所述音频对应的语音转写内容;将所述语音转写内容中所述第一区域对应的语音转写内容显示为第二显示样式。其中,第一区域对应的语音转写内容包括第一区域对应的音频的语音转写内容。
通过以上设计,电子设备可以对用户标记的第一位置或第一区域对应的语音转写内容也进行标记,从而可以将重点内容突出显示,便于用户查看。
在一个可能的设计中,所述第二显示样式包括以下至少一项:与所述语音转写内容的默认显示样式不同的字体;与所述语音转写内容的默认显示样式不同的文字颜色;与所述语音转写内容的默认显示样式不同的背景颜色。
在一个可能的设计中,所述方法还包括:响应于所述用户触发的第三操作,在所述第三操作对应的第三位置处增加批注区域;在所述显示屏上显示批注编辑界面,接收所述用户在所述批注编辑界面输入的批注内容,并在所述批注区域显示所述批注内容。
通过该设计,电子设备可以响应用户触发的第三操作,在第三位置处显示用户输入的批注,从而起到对重点内容进行标注的作用,便于用户查找音频内容或语音转写内容。
在一个可能的设计中,所述方法还包括:将所述音轨上所述第三位置对应的区域显示为第三显示样式。
在一个可能的设计中,所述在所述显示屏上显示批注编辑界面,包括:在所述显示屏上分屏显示语音转写界面和所述批注编辑界面;或者在所述显示屏上显示的语音转写界面上显示悬浮窗口,在所述悬浮庄口中显示所述批注编辑界面;其中,所述语音转写界面包括所述音频的音轨和所述音频对应的语音转写内容。
通过该设计,电子设备可以分屏显示批注编辑界面或在悬浮窗口中显示批注编辑界面,灵活实现接收用户输入批注内容。
在一个可能的设计中,所述方法还包括:在所述显示屏上分屏显示语音转写界面和纪要编辑界面;其中,所述语音转写界面包括所述音频的音轨和所述音频对应的语音转写内容,所述纪要编辑界面包括目标语音转写内容和/或所述目标语音转写内容对应的播放控件;所述目标语音转写内容包括所述音轨中标记的音轨区域对应的语音转写内容和/或批注的音轨区域对应的语音转写内容,所述目标语音转写内容对应的播放控件用于播放所述目标语音转写内容对应的音频片段。
通过该设计,电子设备还可以分屏显示语音转写界面和纪要编辑界面,纪要编辑界面中可以包括标记的音轨区域对应的语音转写内容和/或批注的音轨区域对应的语音转写内容,用户可以直接利用这些内容进行纪要编辑,便于用户操作,提升用户体验。另外,纪要编辑界面还可以包括目标语音转写内容对应的播放控件,用户可以通过点击播放控件随时重复收听标记的音轨区域对应的音频内容,提升用户体验。
在一个可能的设计中,当所述音轨上包括多个标记的音轨区域时,所述多个标记的音轨区域对应至少一种显示样式,所述至少一种显示样式为用户自定义的。
例如,不同用户触发标记的音轨区域对应的显示样式不同。
在一个可能的设计中,所述方法还包括:响应于所述用户在显示界面中的滑动控件上触发的移动操作,根据所述移动操作对应的距离调整所述显示界面中显示的语音转写内容。
第二方面,本申请提供一种电子设备,所述电子设备包括多个功能模块;所述多个功能模块相互作用,实现上述第一方面及其各实施方式所示的方法。所述多个功能模块可以基于软件、硬件或软件和硬件的结合实现,且所述多个功能模块可以基于具体实现进行任意组合或分割。
第三方面,本申请提供一种电子设备,包括至少一个处理器和至少一个存储器,所述至少一个存储器中存储计算机程序指令,所述电子设备运行时,所述至少一个处理器执行上述第一方面及其各实施方式所示的方法。
第四方面,本申请还提供一种包含指令的计算机程序产品,当所述计算机程序产品在计算机上运行时,使得所述计算机执行上述第一方面及其各实施方式所示的方法。
第五方面,本申请还提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,当所述计算机程序被计算机执行时,使得所述计算机执行上述第一方面及其各实施方式所示的方法。
第六方面,本申请还提供一种芯片,所述芯片用于读取存储器中存储的计算机程序,执行上述第一方面及其各实施方式所示的方法。
第七方面,本申请还提供一种芯片系统,该芯片系统包括处理器,用于支持计算机装置实现上述第一方面及其各实施方式所示的方法。在一种可能的设计中,所述芯片系统还包括存储器,所述存储器用于保存该计算机装置必要的程序和数据。该芯片系统可以由芯片构成,也可以包含芯片和其他分立器件。
附图说明
图1为本申请实施例提供的一种电子设备的结构示意图;
图2为本申请实施例提供的一种电子设备的软件结构框图;
图3为本申请实施例提供的一种电子设备显示音频音轨的显示界面示意图;
图4为本申请实施例提供的一种用户触发第一操作的示意图;
图5为本申请实施例提供的一种对第一位置标记的显示示意图;
图6为本申请实施例提供的一种音轨的显示示意图;
图7为本申请实施例提供的一种对第一位置对应的语音转写内容进行标记的示意图;
图8为本申请实施例提供的一种对第一区域对应的语音转写内容进行标记的示意图;
图9为本申请实施例提供的一种对音频进行语音转写的界面示意图;
图10为本申请实施例提供的一种音轨上包含多个标记的音轨区域的显示示意图;
图11为本申请实施例提供的第一种批注编辑界面的示意图;
图12为本申请实施例提供的又一种批注编辑界面的示意图;
图13为本申请实施例提供的一种在音轨上显示批注的示意图;
图14为本申请实施例提供的一种标记显示样式的示意图;
图15为本申请实施例提供的一种音轨上批注的显示示意图;
图16为本申请实施例提供的一种编辑语音转写内容的界面示意图;
图17为本申请实施例提供的一种纪要编辑界面的显示示意图;
图18为本申请实施例提供的又一种纪要编辑界面的显示示意图;
图19为本申请实施例提供的一种语音转写内容的显示示意图;
图20为本申请实施例提供的第一种音轨显示方式的示意图;
图21为本申请实施例提供的又一种音轨显示方式的示意图;
图22为本申请实施例提供的一种音轨标记方法的流程图。
具体实施方式
为了使本申请实施例的目的、技术方案和优点更加清楚,下面将结合附图对本申请实施例作进一步地详细描述。其中,在本申请实施例的描述中,以下,术语“第一”、“第二”仅用于描述目的,而不能理解为指示或暗示相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括一个或者更多个该特征。
应理解,本申请实施例中“至少一个”是指一个或者多个,“多个”是指两个或两个以上。“和/或”,描述关联对象的关联关系,表示可以存在三种关系,例如,A和/或B,可以表示:单独存在A,同时存在A和B,单独存在B的情况,其中A、B可以是单数或者复数。字符“/”一般表示前后关联对象是一种“或”的关系。“以下至少一(项)个”或其类似表达,是指的这些项中的任意组合,包括单项(个)或复数项(个)的任意组合。例如,a、b或c中的至少一项(个),可以表示:a,b,c,a和b,a和c,b和c,或a、b和c,其中a、b、c可以是单个,也可以是多个。
语音转写是指将音频中的语音内容转写为文本的一种技术,在会议场景、在线学习场景中均得到广泛应用。如语音转写应用在会议场景时,可以对会议录音进行转写,将会议语音内容转写为文本,从而便于记录与查看会议内容。
但是,当用户需要对某段语音转写内容进行确认或者标记时,需要反复听录音寻找对应的语音位置,操作繁琐且效率较低。
基于以上问题,本申请提供一种音轨标记方法,用以提供一种在音轨上快捷标记重点内容的方法。
本申请实施例提供的音轨标记方法可以应用于电子设备。以下介绍电子设备、和用于使用这样的电子设备的实施例。本申请实施例的电子设备例如可以为平板电脑、手机、车载设备、增强现实(augmented reality,AR)/虚拟现实(virtual reality,VR)设备、笔记本电脑、超级移动个人计算机(ultra-mobile personal computer,UMPC)、上网本、个人数字助理(personal digital assistant,PDA)、可穿戴设备、物联网(internet of thing,IoT)设备、车机等,本申请实施例对电子设备的具体类型不作任何限制。在一些实施例中,电子设备可以支持手写笔。
图1为本申请实施例提供的一种电子设备100的结构示意图。如图1所示,电子设备100可以包括处理器110,外部存储器接口120,内部存储器121,通用串行总线(universal serial bus,USB)接口130,充电管理模块140,电源管理模块141,电池142,天线1,天线2,移动通信模块150,无线通信模块160,音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,传感器模块180,按键190,马达191,指示器192,摄像头193,显示屏194,以及用户标识模块(subscriber identification module,SIM)卡接口195 等。
处理器110可以包括一个或多个处理单元,例如:处理器110可以包括应用处理器(application processor,AP),调制解调处理器,图形处理器(graphics processing unit,GPU),图像信号处理器(image signal processor,ISP),控制器,存储器,视频编解码器,数字信号处理器(digital signal processor,DSP),基带处理器,和/或神经网络处理器(neural-network processing unit,NPU)等。其中,不同的处理单元可以是独立的器件,也可以集成在一个或多个处理器中。其中,控制器可以是电子设备100的神经中枢和指挥中心。控制器可以根据指令操作码和时序信号,产生操作控制信号,完成取指令和执行指令的控制。处理器110中还可以设置存储器,用于存储指令和数据。在一些实施例中,处理器110中的存储器为高速缓冲存储器。该存储器可以保存处理器110刚用过或循环使用的指令或数据。如果处理器110需要再次使用该指令或数据,可从所述存储器中直接调用。避免了重复存取,减少了处理器110的等待时间,因而提高了系统的效率。
USB接口130是符合USB标准规范的接口,具体可以是Mini USB接口,Micro USB接口,USB Type C接口等。USB接口130可以用于连接充电器为电子设备100充电,也可以用于电子设备100与外围设备之间传输数据。充电管理模块140用于从充电器接收充电输入。电源管理模块141用于连接电池142,充电管理模块140与处理器110。电源管理模块141接收电池142和/或充电管理模块140的输入,为处理器110,内部存储器121,外部存储器,显示屏194,摄像头193,和无线通信模块160等供电。
电子设备100的无线通信功能可以通过天线1,天线2,移动通信模块150,无线通信模块160,调制解调处理器以及基带处理器等实现。天线1和天线2用于发射和接收电磁波信号。电子设备100中的每个天线可用于覆盖单个或多个通信频带。不同的天线还可以复用,以提高天线的利用率。例如:可以将天线1复用为无线局域网的分集天线。在另外一些实施例中,天线可以和调谐开关结合使用。
移动通信模块150可以提供应用在电子设备100上的包括2G/3G/4G/5G等无线通信的解决方案。移动通信模块150可以包括至少一个滤波器,开关,功率放大器,低噪声放大器(low noise amplifier,LNA)等。移动通信模块150可以由天线1接收电磁波,并对接收的电磁波进行滤波,放大等处理,传送至调制解调处理器进行解调。移动通信模块150还可以对经调制解调处理器调制后的信号放大,经天线1转为电磁波辐射出去。在一些实施例中,移动通信模块150的至少部分功能模块可以被设置于处理器110中。在一些实施例中,移动通信模块150的至少部分功能模块可以与处理器110的至少部分模块被设置在同一个器件中。
无线通信模块160可以提供应用在电子设备100上的包括无线局域网(wireless local area networks,WLAN)(如无线保真(wireless fidelity,Wi-Fi)网络),蓝牙(bluetooth,BT),全球导航卫星系统(global navigation satellite system,GNSS),调频(frequency modulation,FM),近距离无线通信技术(near field communication,NFC),红外技术(infrared,IR)等无线通信的解决方案。无线通信模块160可以是集成至少一个通信处理模块的一个或多个器件。无线通信模块160经由天线2接收电磁波,将电磁波信号调频以及滤波处理,将处理后的信号发送到处理器110。无线通信模块160还可以从处理器110接收待发送的信号,对其进行调频,放大,经天线2转为电磁波辐射出去。
在一些实施例中,电子设备100的天线1和移动通信模块150耦合,天线2和无线通 信模块160耦合,使得电子设备100可以通过无线通信技术与网络以及其他设备通信。所述无线通信技术可以包括全球移动通讯系统(global system for mobile communications,GSM),通用分组无线服务(general packet radio service,GPRS),码分多址接入(code division multiple access,CDMA),宽带码分多址(wideband code division multiple access,WCDMA),时分码分多址(time-division code division multiple access,TD-SCDMA),长期演进(long term evolution,LTE),BT,GNSS,WLAN,NFC,FM,和/或IR技术等。所述GNSS可以包括全球卫星定位系统(global positioning system,GPS),全球导航卫星系统(global navigation satellite system,GLONASS),北斗卫星导航系统(beidou navigation satellite system,BDS),准天顶卫星系统(quasi-zenith satellite system,QZSS)和/或星基增强系统(satellite based augmentation systems,SBAS)。
显示屏194用于显示应用的显示界面,例如显示电子设备100上安装的应用的显示页面等。显示屏194包括显示面板。显示面板可以采用液晶显示屏(liquid crystal display,LCD),有机发光二极管(organic light-emitting diode,OLED),有源矩阵有机发光二极体或主动矩阵有机发光二极体(active-matrix organic light emitting diode的,AMOLED),柔性发光二极管(flex light-emitting diode,FLED),Miniled,MicroLed,Micro-oLed,量子点发光二极管(quantum dot light emitting diodes,QLED)等。在一些实施例中,电子设备100可以包括1个或N个显示屏194,N为大于1的正整数。
摄像头193用于捕获静态图像或视频。物体通过镜头生成光学图像投射到感光元件。感光元件可以是电荷耦合器件(charge coupled device,CCD)或互补金属氧化物半导体(complementary metal-oxide-semiconductor,CMOS)光电晶体管。感光元件把光信号转换成电信号,之后将电信号传递给ISP转换成数字图像信号。ISP将数字图像信号输出到DSP加工处理。DSP将数字图像信号转换成标准的RGB,YUV等格式的图像信号。在一些实施例中,电子设备100可以包括1个或N个摄像头193,N为大于1的正整数。
内部存储器121可以用于存储计算机可执行程序代码,所述可执行程序代码包括指令。处理器110通过运行存储在内部存储器121的指令,从而执行电子设备100的各种功能应用以及数据处理。内部存储器121可以包括存储程序区和存储数据区。其中,存储程序区可存储操作系统,以及至少一个应用程序的软件代码等。存储数据区可存储电子设备100使用过程中所产生的数据(例如拍摄的图像、录制的视频等)等。此外,内部存储器121可以包括高速随机存取存储器,还可以包括非易失性存储器,例如至少一个磁盘存储器件,闪存器件,通用闪存存储器(universal flash storage,UFS)等。
外部存储器接口120可以用于连接外部存储卡,例如Micro SD卡,实现扩展电子设备的存储能力。外部存储卡通过外部存储器接口120与处理器110通信,实现数据存储功能。例如将图片,视频等文件保存在外部存储卡中。
电子设备100可以通过音频模块170,扬声器170A,受话器170B,麦克风170C,耳机接口170D,以及应用处理器等实现音频功能。例如音乐播放,录音等。
其中,传感器模块180可以包括压力传感器180A,加速度传感器180B,触摸传感器180C等。
压力传感器180A用于感受压力信号,可以将压力信号转换成电信号。在一些实施例中,压力传感器180A可以设置于显示屏194。
触摸传感器180C,也称“触控面板”。触摸传感器180C可以设置于显示屏194,由触 摸传感器180C与显示屏194组成触摸屏,也称“触控屏”。触摸传感器180C用于检测作用于其上或附近的触摸操作。触摸传感器可以将检测到的触摸操作传递给应用处理器,以确定触摸事件类型。可以通过显示屏194提供与触摸操作相关的视觉输出。在另一些实施例中,触摸传感器180C也可以设置于电子设备100的表面,与显示屏194所处的位置不同。
按键190包括开机键,音量键等。按键190可以是机械按键。也可以是触摸式按键。电子设备100可以接收按键输入,产生与电子设备100的用户设置以及功能控制有关的键信号输入。马达191可以产生振动提示。马达191可以用于来电振动提示,也可以用于触摸振动反馈。例如,作用于不同应用(例如拍照,音频播放等)的触摸操作,可以对应不同的振动反馈效果。触摸振动反馈效果还可以支持自定义。指示器192可以是指示灯,可以用于指示充电状态,电量变化,也可以用于指示消息,未接来电,通知等。SIM卡接口195用于连接SIM卡。SIM卡可以通过插入SIM卡接口195,或从SIM卡接口195拔出,实现与电子设备100的接触和分离。
可以理解的是,图1所示的部件并不构成对电子设备100的具体限定,电子设备还可以包括比图示更多或更少的部件,或者组合某些部件,或者拆分某些部件,或者不同的部件布置。此外,图1中的部件之间的组合/连接关系也是可以调整修改的。
图2为本申请实施例提供的一种电子设备的软件结构框图。如图2所示,电子设备的软件结构可以是分层架构,例如可以将软件分成若干个层,每一层都有清晰的角色和分工。层与层之间通过软件接口通信。在一些实施例中,将操作系统分为四层,从上至下分别为应用程序层,应用程序框架层(framework,FWK),运行时(runtime)和系统库,以及内核层。
应用程序层可以包括一系列应用程序包(application package)。如图2所示,应用程序层可以包括相机、设置、皮肤模块、用户界面(user interface,UI)、三方应用程序等。其中,三方应用程序可以包括图库,日历,通话,地图,导航,WLAN,蓝牙,音乐,视频,短信息等。在本申请实施例中,应用程序层可以包括电子设备从服务器请求下载的目标应用的目标安装包,该目标安装包中的功能文件和布局文件适配于电子设备。
应用程序框架层为应用程序层的应用程序提供应用编程接口(application programming interface,API)和编程框架。应用程序框架层可以包括一些预先定义的函数。如图2所示,应用程序框架层可以包括窗口管理器,内容提供器,视图系统,电话管理器,资源管理器,通知管理器。
窗口管理器用于管理窗口程序。窗口管理器可以获取显示屏大小,判断是否有状态栏,锁定屏幕,截取屏幕等。内容提供器用来存放和获取数据,并使这些数据可以被应用程序访问。所述数据可以包括视频,图像,音频,拨打和接听的电话,浏览历史和书签,电话簿等。
视图系统包括可视控件,例如显示文字的控件,显示图片的控件等。视图系统可用于构建应用程序。显示界面可以由一个或多个视图组成的。例如,包括短信通知图标的显示界面,可以包括显示文字的视图以及显示图片的视图。
电话管理器用于提供电子设备的通信功能。例如通话状态的管理(包括接通,挂断等)。
资源管理器为应用程序提供各种资源,比如本地化字符串,图标,图片,布局文件, 视频文件等等。
通知管理器使应用程序可以在状态栏中显示通知信息,可以用于传达告知类型的消息,可以短暂停留后自动消失,无需用户交互。比如通知管理器被用于告知下载完成,消息提醒等。通知管理器还可以是以图表或者滚动条文本形式出现在系统顶部状态栏的通知,例如后台运行的应用程序的通知,还可以是以对话窗口形式出现在屏幕上的通知。例如在状态栏提示文本信息,发出提示音,电子设备振动,指示灯闪烁等。
运行时包括核心库和虚拟机。运行时负责操作系统的调度和管理。
核心库包含两部分:一部分是java语言需要调用的功能函数,另一部分是操作系统的核心库。应用程序层和应用程序框架层运行在虚拟机中。虚拟机将应用程序层和应用程序框架层的java文件执行为二进制文件。虚拟机用于执行对象生命周期的管理,堆栈管理,线程管理,安全和异常的管理,以及垃圾回收等功能。
系统库可以包括多个功能模块。例如:表面管理器(surface manager),媒体库(media libraries),三维图形处理库(例如:OpenGL ES),二维图形引擎(例如:SGL)、图像处理库等。
表面管理器用于对显示子系统进行管理,并且为多个应用程序提供了2D和3D图层的融合。
媒体库支持多种常用的音频,视频格式回放和录制,以及静态图像文件等。媒体库可以支持多种音视频编码格式,例如:MPEG4,H.264,MP3,AAC,AMR,JPG,PNG等。
三维图形处理库用于实现三维图形绘图,图像渲染,合成,和图层处理等。
2D图形引擎是2D绘图的绘图引擎。
内核层是硬件和软件之间的层。内核层至少包含显示驱动,摄像头驱动,音频驱动,传感器驱动。
硬件层可以包括各类传感器,例如加速度传感器、陀螺仪传感器、触摸传感器等。
需要说明的是,图1和图2所示的结构仅作为本申请实施例提供的电子设备的一种示例,并不能对本申请实施例提供的电子设备进行任何限定,具体实施中,电子设备可以具有比图1或图2所示的结构中更多或更少的器件或模块。
下面对本申请实施例提供的音轨标记方法进行介绍。
在本申请一些实施例中,电子设备可以录制音频,例如,用户可以触发电子设备录制音频。电子设备已录制的音频可以以音轨的形式的显示在电子设备的显示界面上,音轨上的不同位置表示不同时间对应的音频内容。图3为本申请实施例提供的一种电子设备显示音频音轨的显示界面示意图。参考图3,电子设备可以显示已录制的音频的音轨,随着录制音频的进行,音轨的时长也会增加。
在本申请的另一些实施例中,电子设备还可以播放音频。电子设备播放的音频也可以以音轨的形式显示在电子设备的显示界面上。如图3所示的显示界面中可以为电子设备播放的音频的音轨,则用户可以点击音轨上的位置以触发电子设备播放音频中该位置对应的时间处的内容。
电子设备在显示界面中显示音轨时,响应于用户触发的第一操作,确定第一操作对应的第一位置。第一操作可以用于指示在音轨的第一位置处对音轨进行标记。图4为本申请实施例提供的一种用户触发第一操作的示意图。参考图4,第一操作可以为用户触发的设定手势,如第一操作可以为双击屏幕或长按屏幕;或者第一操作可以为用户点击显示界面 或键盘界面中的控件,如用户可以点击图4所示界面中的“标记模式”控件,以触发第一操作;或者第一操作可以为用户通过手写笔笔身的按键或对手写笔执行设定手势后触发的。
在本申请一些实施方式中,电子设备可以响应于用户触发的第一操作,将音轨上第一操作对应的第一位置显示为第一显示样式。例如,图5为本申请实施例提供的一种对第一位置标记的显示示意图。参考图5,电子设备响应于用户触发的第一操作,确定第一操作对应的第一位置,电子设备将音轨上的第一位置显示为第一显示样式,如图5中第一显示样式为在第一位置上显示箭头图标。
在本申请另一些实施方式中,电子设备在确定出第一操作对应的第一位置后,可以从音轨的第一位置处开始对音轨进行标记,将第一位置之后的音轨显示为第一显示样式。第一显示样式可以为区别于音轨的默认显示样式的一种显示样式。如第一显示样式可以为与默认显示样式不同的背景颜色,或者第一显示样式可以为与默认显示样式不同的音轨形状或音轨颜色等。
电子设备响应于用户触发的第二操作,确定第二操作对应的第二位置。第二操作用于指示在音轨的第二位置处停止对音轨标记。例如,图6为本申请实施例提供的一种音轨的显示示意图。参考图6,电子设备将音轨上的第一位置到第二位置之间的第一区域显示为第一显示样式,其它区域显示为默认显示样式。第一显示样式与默认显示样式的背景颜色不同。其中,用户触发第二操作的方式可以参见用户触发第一操作的方式,重复之处不再赘述。
可以理解的是,在一些实施方式中,电子设备还可以根据用户触发的一个操作确定音轨上该操作对应的区域。如用户在音轨上触发滑动操作,电子设备可以确定滑动操作对应的区域,并对该区域进行标记。也就是说,本申请实施例对用户触发标记的操作方式并不进行限定。
需要说明的是,电子设备可以在录制音频或播放音频的过程中,响应于用户触发的第一操作和第二操作,对音轨上的第一区域进行标记。实施中,电子设备可以响应于用户触发的第一操作,随着音频录制或音频播放,将第一操作对应的第一位置之后的音轨区域显示为第一显示样式,并且电子设备响应于用户触发的第二操作停止标记。或者电子设备可以响应于用户触发的第一操作,确定第一操作对应的第一位置,并响应于用户触发的第二操作,确定第二操作对应的第二位置,再将第一位置和第二位置之间的第一区域显示为第一显示样式。也就是说,本申请实施例对电子设备将第一区域显示为第一显示样式的方式并不作限定。
通过上述介绍,本申请实施例提供的音轨标记方法中,电子设备可以对用户触发的第一操作的第一位置进行标记,还可以对用户触发的第一操作和第二操作确定出的第一区域进行标记。
本申请实施例中,电子设备还可以对音频进行语音转写,将音频内容转换为文本内容,并在显示界面中显示文本内容。电子设备可以同步进行录音和语音转写,也可以在电子设备录音结束得到音频后,再对音频进行语音转写。
一种可选的实施方式中,电子设备在对音频进行语音转写时,可以对音轨上第一位置对应的语音转写内容进行标记,将第一位置对应的语音转写内容显示为第二显示样式;其中,第一位置对应的语音转写内容可以为包括第一位置在内的预设长度的音轨区域对应的音频的语音转写内容,第一位置可以位于该音轨区域的起始位置、终止位置或中间位置。 第二显示样式为区别于语音转写内容的默认显示样式的显示样式,如第二显示样式为与语音转写内容的默认显示样式不同的字体、文字颜色、背景颜色等。
举例来说,图7为本申请实施例提供的一种对第一位置对应的语音转写内容进行标记的示意图。参考图7,第一位置对应的语音转写内容可以为图7所示的区域A对应的语音转写内容。在图7中,第一位置位于区域A的中间位置,可以理解的是,第一位置还可以位于区域A的起始位置或终止位置,区域A的预设长度可以为技术人员的经验数值。如图7中,第一位置对应的语音转写内容为“1月1日”,电子设备将第一位置对应的语音转写内容显示为第二显示样式,如第二显示样式为与语音转写内容的默认显示样式不同的背景颜色。
另一种可选的实施方式中,电子设备还可以对音轨上第一区域对应的语音转写内容进行标记,将第一区域对应的语音转写内容显示为第二显示样式;其中,第一区域对应的语音转写内容包括第一区域对应的音频的语音转写内容。第二显示样式为区别于语音转写内容的默认显示样式的显示样式,如第二显示样式为与语音转写内容的默认显示样式不同的字体、文字颜色、背景颜色等。
举例来说,图8为本申请实施例提供的一种对第一区域对应的语音转写内容进行标记的示意图。参考图8,第一区域对应的语音转写内容为“会议将于1月1日下午2点开始”,电子设备将第一区域对应的语音转写内容显示为第二显示样式,如第二显示样式为与语音转写内容的默认显示样式不同的背景颜色。可选的,第一显示样式与第二显示样式可以相同,以表示音轨中的第一区域与语音转写内容的对应的关系,如图8中第一显示样式与第二显示样式相同。第一显示样式与第二显示样式也可以不同,如图9为本申请实施例提供的一种对音频进行语音转写的界面示意图,参考图9第一显示样式可以为与音轨的默认显示样式不同的背景颜色,第二显示样式为与语音转写内容的默认显示样式不同的字体。
在本申请一些实施例中,当音轨上包括多个标记的音轨区域时,不同的音轨区域的显示样式可以相同或不同,也就是说,多个标记的音轨区域对应至少一种显示样式,至少一种显示样式可以为用户自定义的。例如,多个标记的音轨区域可以均显示为第一显示样式,或者不同用户触发标记的音轨区域可以对应不同的显示样式。
举例来说,图10为本申请实施例提供的一种音轨上包含多个标记的音轨区域的显示示意图。参考图10,图10中音轨上的区域A为用户1标记的音轨区域,区域B和区域C为用户2标记的音轨区域,区域D为用户3标记的音轨区域,不同用户触发标记的音轨区域的显示样式不同。
本申请一些实施方式中,电子设备可以响应于用户触发的第三操作,在第三操作对应的第三位置处增加批注区域。第三操作可以为用户通过设定手势、显示界面中的控件或者手写笔触发的。电子设备在检测到用户触发的第三操作后,可以在显示界面中显示批注编辑界面,并接收用户在批注编辑界面输入的批注内容,将批注内容显示在批注区域中。可选的,用户在批注编辑界面输入的批注内容包括但不限于文字、图片、音频、视频。
可选地,电子设备可以分屏显示语音转写界面与批注编辑界面,或者在语音转写界面上显示悬浮窗口,并在悬浮窗口中显示批注编辑界面。
例如,图11为本申请实施例提供的第一种批注编辑界面的示意图。参考图11,电子设备的显示屏的左边区域显示语音转写界面,电子设备的显示屏的右边区域显示批注编辑界面,用户可以在批注编辑界面输入批注内容。
又例如,图12为本申请实施例提供的又一种批注编辑界面的示意图。参考图12,电子设备在显示语音转写界面时,响应于用户触发的第三操作,可以在语音转写界面上显示一个悬浮窗口,并在该悬浮窗口中显示批注编辑界面。用户可以在批注编辑界面输入批注内容。
电子设备在接收用户输入的批注内容后,可以在第三操作对应的第三位置处显示批注。可选地,电子设备可以在用户点击第三位置后显示批注,并在用户点击显示界面中除第三位置之外的其它位置后,隐藏批注;或者电子设备可以在第三位置处持续显示批注。例如,图13为本申请实施例提供的一种在音轨上显示批注的示意图。参考图13,用户在图11或图12所示的批注编辑区域输入文本“会议时间”后,电子设备可以在第三位置处显示包含文本“会议时间”的批注,从而起到对重点内容进行标注的作用,便于用户查找音频内容或语音转写内容。
在一些实施例中,电子设备在检测到用户触发的第三操作后,可以以第三显示样式显示第三位置处的音轨,第三显示样式与第一显示样式不同,以区分用户添加了批注的音轨区域以及用户未添加批注的音轨区域。
举例来说,图14为本申请实施例提供的一种标记显示样式的示意图。参考图14,第三显示样式的音轨区域表示用户在该区域添加了批注,第一显示样式的音轨区域表示用户标记了该区域,但未在该区域添加批注。通过图14可以看出,音轨区域与音轨区域对应的语音转写内容的显示样式一致,从而可以表示音轨区域和语音转写内容的对应关系。在图14中,当用户未点击添加批注的音轨区域时,批注是隐藏的。图15为本申请实施例提供的一种音轨上批注的显示示意图。参考图15,用户点击添加批注的音轨区域后,电子设备可以在该音轨区域上方显示用户添加的批注。
另外,在本申请实施例提供的音轨标记方法中,用户可以随时对语音转写内容进行编辑。例如,图16为本申请实施例提供的一种编辑语音转写内容的界面示意图。参考图16,电子设备可以在显示界面中显示键盘界面,用户可以通过键盘界面对语音转写内容进行编辑,如对文字进行删除、修改或增加等操作。当然,用户还可以通过手写笔对语音转写内容进行编辑,本申请实施例对编辑语音转写内容的方式不做限定。
本申请一些实施例中,电子设备在显示语音转写界面时,还可以分屏显示纪要编辑界面,用户可以在纪要编辑界面编辑纪要。可选地,纪要编辑界面中可以包括目标语音转写内容和/或目标语音转写内容对应的播放控件。其中,目标语音转写内容包括标记的音轨区域对应的语音转写内容和/或批注的音轨区域对应的语音转写内容。当纪要编辑界面中显示语音转写内容对应的播放控件时,用户点击语音转写内容对应的播放控件后,电子设备可以播放语音转写内容对应的音频片段。
例如,图17为本申请实施例提供的一种纪要编辑界面的显示示意图。参考图17,电子设备的显示屏的左边区域显示语音转写界面,电子设备的显示屏的右边区域显示纪要编辑界面。用户在语音撰写界面触发结束标记的第二操作或者添加批注的第三操作后,电子设备可以将音轨中标记的音轨区域对应的语音转写内容和批注的音轨区域对应的语音转写内容显示在纪要编辑界面。如图17中,用户触发电子设备在区域A和区域B添加批注,并标记了区域C,则电子设备将区域A对应的语音转写内容“下一次会议”、区域B对应的语音转写内容“会议将于1月1日下午2点半开始”以及区域C对应的语音转写内容“会议地点为101”显示在纪要编辑界面,用户可以在纪要编辑界面编辑纪要,由于电子设备 已将用户添加批注或者标记的音轨区域对应的语音转写内容显示在纪要编辑界面,用户可以直接利用这些内容进行纪要编辑,便于用户操作,提升用户体验。
又例如,图18为本申请实施例提供的又一种纪要编辑界面的显示示意图。参考图18,电子设备在纪要编辑界面显示音轨上区域A对应的语音转写内容“下一次会议”、区域B对应的语音转写内容“会议将于1月1日下午2点半开始”以及区域C对应的语音转写内容“会议地点为101”,并且电子设备在每个语音转写内容后显示一个播放控件,用户点击“会议将于1月1日下午2点半开始”后的播放控件后,电子设备播放音轨上区域B对应的音频片段。通过该设计,用户可以随时重复收听标记的音轨区域对应的音频内容,提升用户体验。
需要说明的是,上述实施例中各个显示界面的附图仅作为示例而非限定,显示界面还可以具有其它显示形式。例如,当语音转写内容较多时,电子设备可以显示滑动控件,用户可以控制滑动控件移动,以查阅未显示在当前界面中的语音转写内容。如图19为本申请实施例提供的一种语音转写内容的显示示意图。参考图19,当语音转写内容较多,电子设备在一个界面中无法完整显示全部内容时,用户可以在显示界面右侧的滑动控件上触发移动操作,电子设备根据用户触发的移动操作的距离调整当前显示界面中显示的语音转写内容,以便用户查阅。
又例如,本申请实施例还提供两种音频的音轨显示方式。图20为本申请实施例提供的第一种音轨显示方式的示意图。参考图20,电子设备可以在音轨显示区域显示移动控件,左侧的移动控件可以控制音轨向左移动,右侧的移动控件可以控制音轨向右移动。电子设备可以响应于用户在移动控件上触发的操作,显示音轨的不同区域。
图21为本申请实施例提供的又一种音轨显示方式的示意图。参考图21,电子设备可以将时长为20:30的音频的音轨压缩显示在当前界面中,用户可以点击音轨以触发播放点击位置处的音频,或者用户可以拖动音轨上进度条选择播放的音频。
基于以上实施例,本申请还提供一种音轨标记方法,该方法可以由电子设备执行,该电子设备可以具有图1和/或图2所示的结构。图22为本申请实施例提供的一种音轨标记方法的流程图。参考图22,该方法包括以下步骤:
S2201:在录制音频或播放音频时,在显示屏中显示音频的音轨。
S2202:响应于用户触发的第一操作,将音轨上第一操作对应的第一位置显示为第一显示样式。
其中,第一操作用于指示对音轨的第一位置进行标记。
需要说明的是,本申请图22所示的音轨标记方法在具体实施时可以参见本申请上述各实施例,重复之处不再赘述。
基于以上实施例,本申请还提供一种电子设备,所述电子设备包括多个功能模块;所述多个功能模块相互作用,实现本申请实施例所描述的各方法中电子设备所执行的功能。所述多个功能模块可以基于软件、硬件或软件和硬件的结合实现,且所述多个功能模块可以基于具体实现进行任意组合或分割。
基于以上实施例,本申请还提供一种电子设备,该电子设备包括至少一个处理器和至少一个存储器,所述至少一个存储器中存储计算机程序指令,所述电子设备运行时,所述 至少一个处理器执行本申请实施例所描述的各方法中电子设备所执行的功能。
基于以上实施例,本申请还提供一种包含指令的计算机程序产品,当所述计算机程序产品在计算机上运行时,使得所述计算机执行本申请实施例所描述的各方法。
基于以上实施例,本申请还提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机程序,当所述计算机程序被计算机执行时,使得所述计算机执行本申请实施例所描述的各方法。
基于以上实施例,本申请还提供了一种芯片,所述芯片用于读取存储器中存储的计算机程序,实现本申请实施例所描述的各方法。
基于以上实施例,本申请提供了一种芯片系统,该芯片系统包括处理器,用于支持计算机装置实现本申请实施例所描述的各方法。在一种可能的设计中,所述芯片系统还包括存储器,所述存储器用于保存该计算机装置必要的程序和数据。该芯片系统,可以由芯片构成,也可以包含芯片和其他分立器件。
本领域内的技术人员应明白,本申请的实施例可提供为方法、系统、或计算机程序产品。因此,本申请可采用完全硬件实施例、完全软件实施例、或结合软件和硬件方面的实施例的形式。而且,本申请可采用在一个或多个其中包含有计算机可用程序代码的计算机可用存储介质(包括但不限于磁盘存储器、CD-ROM、光学存储器等)上实施的计算机程序产品的形式。
本申请是参照根据本申请的方法、设备(系统)、和计算机程序产品的流程图和/或方框图来描述的。应理解可由计算机程序指令实现流程图和/或方框图中的每一流程和/或方框、以及流程图和/或方框图中的流程和/或方框的结合。可提供这些计算机程序指令到通用计算机、专用计算机、嵌入式处理机或其他可编程数据处理设备的处理器以产生一个机器,使得通过计算机或其他可编程数据处理设备的处理器执行的指令产生用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的装置。
这些计算机程序指令也可存储在能引导计算机或其他可编程数据处理设备以特定方式工作的计算机可读存储器中,使得存储在该计算机可读存储器中的指令产生包括指令装置的制造品,该指令装置实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能。
这些计算机程序指令也可装载到计算机或其他可编程数据处理设备上,使得在计算机或其他可编程设备上执行一系列操作步骤以产生计算机实现的处理,从而在计算机或其他可编程设备上执行的指令提供用于实现在流程图一个流程或多个流程和/或方框图一个方框或多个方框中指定的功能的步骤。
显然,本领域的技术人员可以对本申请进行各种改动和变型而不脱离本申请的保护范围。这样,倘若本申请的这些修改和变型属于本申请权利要求及其等同技术的范围之内,则本申请也意图包含这些改动和变型在内。

Claims (16)

  1. 一种音轨标记方法,其特征在于,所述方法包括:
    在录制音频或播放音频时,在显示屏中显示音频的音轨;
    响应于用户触发的第一操作,将所述音轨上所述第一操作对应的第一位置显示为第一显示样式;
    其中,所述第一操作用于指示对所述音轨的第一位置进行标记。
  2. 如权利要求1所述的方法,其特征在于,在所述响应于用户触发的第一操作,将所述音轨上第一操作对应的第一位置显示为第一显示样式之后,所述方法还包括:
    响应于所述用户触发的第二操作,将所述音轨上的所述第一位置到所述第二操作对应的第二位置之间的第一区域显示为所述第一显示样式;
    其中,所述第二操作用于指示在所述音轨的所述第二位置处结束标记,所述第二位置在所述第一位置之后。
  3. 如权利要求1所述的方法,其特征在于,所述方法还包括:
    对所述音频进行语音转写,显示所述音频对应的语音转写内容;
    将所述语音转写内容中所述第一位置对应的语音转写内容显示为第二显示样式。
  4. 如权利要求2所述的方法,其特征在于,所述方法还包括:
    对所述音频进行语音转写,显示所述音频对应的语音转写内容;
    将所述语音转写内容中所述第一区域对应的语音转写内容显示为第二显示样式。
  5. 如权利要求1-4任一项所述的方法,其特征在于,所述第一显示样式包括以下至少一项:
    与所述音轨的默认显示样式不同的背景颜色;
    与所述音轨的默认显示样式不同的音轨形状;
    与所述音轨的默认显示样式不同的音轨颜色。
  6. 如权利要求3或4所述的方法,其特征在于,所述第二显示样式包括以下至少一项:
    与所述语音转写内容的默认显示样式不同的字体;
    与所述语音转写内容的默认显示样式不同的文字颜色;
    与所述语音转写内容的默认显示样式不同的背景颜色。
  7. 如权利要求1-6任一项所述的方法,其特征在于,所述方法还包括:
    响应于所述用户触发的第三操作,在所述第三操作对应的第三位置处增加批注区域;
    在所述显示屏上显示批注编辑界面,接收所述用户在所述批注编辑界面输入的批注内容,并在所述批注区域显示所述批注内容。
  8. 如权利要求7所述的方法,其特征在于,所述方法还包括:
    将所述音轨上所述第三位置显示为第三显示样式。
  9. 如权利要求7或8所述的方法,其特征在于,所述在所述显示屏上显示批注编辑界面,包括:
    在所述显示屏上分屏显示语音转写界面和所述批注编辑界面;或者
    在所述显示屏上显示的语音转写界面上显示悬浮窗口,在所述悬浮庄口中显示所述批注编辑界面;
    其中,所述语音转写界面包括所述音频的音轨和所述音频对应的语音转写内容。
  10. 如权利要求1-6任一项所述的方法,其特征在于,所述方法还包括:
    在所述显示屏上分屏显示语音转写界面和纪要编辑界面;
    其中,所述语音转写界面包括所述音频的音轨和所述音频对应的语音转写内容,所述纪要编辑界面包括目标语音转写内容和/或所述目标语音转写内容对应的播放控件;所述目标语音转写内容包括所述音轨中标记的音轨区域对应的语音转写内容和/或批注的音轨区域对应的语音转写内容,所述目标语音转写内容对应的播放控件用于播放所述目标语音转写内容对应的音频片段。
  11. 如权利要求1-10任一项所述的方法,其特征在于,
    当所述音轨上包括多个标记的音轨区域时,所述多个标记的音轨区域对应至少一种显示样式,所述至少一种显示样式为用户自定义的。
  12. 如权利要求3-11任一项所述的方法,其特征在于,所述方法还包括:
    响应于所述用户在显示界面中的滑动控件上触发的移动操作,根据所述移动操作对应的距离调整所述显示界面中显示的语音转写内容。
  13. 一种电子设备,其特征在于,包括至少一个处理器,所述至少一个处理器与至少一个存储器耦合,所述至少一个处理器用于读取所述至少一个存储器所存储的计算机程序,以执行如权利要求1-12中任一所述的方法。
  14. 一种电子设备,其特征在于,包括多个功能模块;所述多个功能模块相互作用,实现如权利要求1-12中任一所述的方法。
  15. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质中存储有指令,当其在计算机上运行时,使得计算机执行如权利要求1-12中任一所述的方法。
  16. 一种包含指令的计算机程序产品,其特征在于,当所述计算机程序产品在计算机上运行时,使得计算机执行如权利要求1-12中任一所述的方法。
PCT/CN2023/096664 2022-06-06 2023-05-26 一种音轨标记方法及电子设备 WO2023236794A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210633882.X 2022-06-06
CN202210633882.XA CN115237316A (zh) 2022-06-06 2022-06-06 一种音轨标记方法及电子设备

Publications (1)

Publication Number Publication Date
WO2023236794A1 true WO2023236794A1 (zh) 2023-12-14

Family

ID=83670428

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2023/096664 WO2023236794A1 (zh) 2022-06-06 2023-05-26 一种音轨标记方法及电子设备

Country Status (2)

Country Link
CN (1) CN115237316A (zh)
WO (1) WO2023236794A1 (zh)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115237316A (zh) * 2022-06-06 2022-10-25 华为技术有限公司 一种音轨标记方法及电子设备
CN116737049B (zh) * 2022-11-22 2024-04-19 荣耀终端有限公司 音频播放方法及终端设备

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160300557A1 (en) * 2015-01-12 2016-10-13 Tencent Technology (Shenzhen) Company Limited Method, client and computer storage medium for processing information
CN111415651A (zh) * 2020-02-15 2020-07-14 深圳传音控股股份有限公司 一种音频信息提取方法、终端及计算机可读存储介质
CN112887480A (zh) * 2021-01-22 2021-06-01 维沃移动通信有限公司 音频信号处理方法、装置、电子设备和可读存储介质
CN114023301A (zh) * 2021-11-26 2022-02-08 掌阅科技股份有限公司 音频编辑方法、电子设备及存储介质
CN115237316A (zh) * 2022-06-06 2022-10-25 华为技术有限公司 一种音轨标记方法及电子设备

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104581351A (zh) * 2015-01-28 2015-04-29 上海与德通讯技术有限公司 音频或视频的录制方法及其播放方法、电子装置
CN104751846B (zh) * 2015-03-20 2019-03-01 努比亚技术有限公司 语音到文本转换的方法及装置
CN104751870B (zh) * 2015-03-24 2018-07-06 联想(北京)有限公司 一种信息处理方法及电子设备
CN106294634A (zh) * 2016-08-02 2017-01-04 乐视控股(北京)有限公司 基于用户界面的信息处理方法和信息处理装置
CN108416565A (zh) * 2018-01-25 2018-08-17 北京云知声信息技术有限公司 会议记录方法
CN208521600U (zh) * 2018-05-11 2019-02-19 科大讯飞股份有限公司 一种电子设备
CN109634700A (zh) * 2018-11-26 2019-04-16 维沃移动通信有限公司 一种音频的文本内容显示方法及终端设备
CN113936697B (zh) * 2020-07-10 2023-04-18 北京搜狗智能科技有限公司 语音处理方法、装置以及用于语音处理的装置
CN112311658A (zh) * 2020-10-29 2021-02-02 维沃移动通信有限公司 语音信息处理方法、装置及电子设备
CN113539313A (zh) * 2021-07-22 2021-10-22 统信软件技术有限公司 一种音频标记方法、音频数据播放方法及计算设备
CN114115674B (zh) * 2022-01-26 2022-07-22 荣耀终端有限公司 录音和文档内容的定位方法、电子设备及存储介质

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160300557A1 (en) * 2015-01-12 2016-10-13 Tencent Technology (Shenzhen) Company Limited Method, client and computer storage medium for processing information
CN111415651A (zh) * 2020-02-15 2020-07-14 深圳传音控股股份有限公司 一种音频信息提取方法、终端及计算机可读存储介质
CN112887480A (zh) * 2021-01-22 2021-06-01 维沃移动通信有限公司 音频信号处理方法、装置、电子设备和可读存储介质
CN114023301A (zh) * 2021-11-26 2022-02-08 掌阅科技股份有限公司 音频编辑方法、电子设备及存储介质
CN115237316A (zh) * 2022-06-06 2022-10-25 华为技术有限公司 一种音轨标记方法及电子设备

Also Published As

Publication number Publication date
CN115237316A (zh) 2022-10-25

Similar Documents

Publication Publication Date Title
JP7414842B2 (ja) コメント追加方法及び電子デバイス
US11922005B2 (en) Screen capture method and related device
JP7142783B2 (ja) 音声制御方法及び電子装置
WO2021018067A1 (zh) 一种悬浮窗口的管理方法及相关装置
US20220224665A1 (en) Notification Message Preview Method and Electronic Device
WO2021036571A1 (zh) 一种桌面的编辑方法及电子设备
CN112394811B (zh) 一种隔空手势的交互方法及电子设备
WO2023236794A1 (zh) 一种音轨标记方法及电子设备
WO2021115194A1 (zh) 一种应用图标的显示方法及电子设备
CN111147660B (zh) 一种控件的操作方法及电子设备
US20220374118A1 (en) Display Method and Electronic Device
WO2022057852A1 (zh) 一种多应用程序之间的交互方法
CN112068907A (zh) 一种界面显示方法和电子设备
WO2023184825A1 (zh) 电子设备的录像控制方法、电子设备及可读介质
CN112835501A (zh) 一种显示方法及电子设备
CN115016697A (zh) 投屏方法、计算机设备、可读存储介质和程序产品
WO2021042881A1 (zh) 消息通知方法及电子设备
CN116095413B (zh) 视频处理方法及电子设备
CN113448658A (zh) 截屏处理的方法、图形用户接口及终端
CN116204254A (zh) 一种批注页面生成方法、电子设备及存储介质
CN116095412B (zh) 视频处理方法及电子设备
WO2023030198A1 (zh) 批注方法和电子设备
WO2023226975A1 (zh) 一种显示方法与电子设备
WO2024037421A1 (zh) 一种操作方法、电子设备及介质
WO2023061298A1 (zh) 一种图片备份系统、方法与设备

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 23818959

Country of ref document: EP

Kind code of ref document: A1