WO2021093333A1 - 音频播放方法、电子设备及存储介质 - Google Patents

音频播放方法、电子设备及存储介质 Download PDF

Info

Publication number
WO2021093333A1
WO2021093333A1 PCT/CN2020/097534 CN2020097534W WO2021093333A1 WO 2021093333 A1 WO2021093333 A1 WO 2021093333A1 CN 2020097534 W CN2020097534 W CN 2020097534W WO 2021093333 A1 WO2021093333 A1 WO 2021093333A1
Authority
WO
WIPO (PCT)
Prior art keywords
segmentation
sentence
audio file
playback
audio
Prior art date
Application number
PCT/CN2020/097534
Other languages
English (en)
French (fr)
Inventor
高翔
孙静
Original Assignee
网易(杭州)网络有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 网易(杭州)网络有限公司 filed Critical 网易(杭州)网络有限公司
Publication of WO2021093333A1 publication Critical patent/WO2021093333A1/zh
Priority to US17/663,225 priority Critical patent/US20220269724A1/en

Links

Images

Classifications

    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/10Indexing; Addressing; Timing or synchronising; Measuring tape travel
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/63Querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/683Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/685Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using automatically derived transcript of audio data, e.g. lyrics
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/60Information retrieval; Database structures therefor; File system structures therefor of audio data
    • G06F16/68Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/686Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using information manually generated, e.g. tags, keywords, comments, title or artist information, time, location or usage information, user ratings
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L13/00Speech synthesis; Text to speech systems
    • G10L13/08Text analysis or generation of parameters for speech synthesis out of text, e.g. grapheme to phoneme translation, prosody generation or stress or intonation determination
    • G10L2013/083Special characters, e.g. punctuation marks

Definitions

  • the present disclosure relates to the field of audio technology, and in particular, to an audio playback method, electronic equipment, and storage medium.
  • a specific audio segment needs to be played repeatedly, for example, a certain segment of the audio needs to be re-listened for reasons of learning or interest, or the playback content needs to be returned to the place for re-listening because the content of the playback is not heard clearly.
  • the prior art used to implement this function has problems such as inaccurate positioning, high operating cost, low efficiency, fixed return time not as long as the user needs, and low flexibility and accuracy.
  • the purpose of the present disclosure is to provide an audio playback method, electronic device, and storage medium, so as to overcome at least to a certain extent one or more problems caused by limitations and defects of related technologies.
  • an audio playback method including: recognizing an audio file to be played as a text file containing segmentation symbols;
  • the user when the user wants to listen to a certain piece of audio content, he only needs to operate on the terminal to accurately locate the appropriate target playback point. There is no need to repeatedly slide the playback progress bar for positioning.
  • the operation is simple and convenient .
  • the sentence is not limited by the audio playback speed during the segmentation process. Even if the playback speed is slow, accurate sentence segmentation can be achieved, and then positioned Appropriate playback position for re-listening.
  • determining a target playback point according to the current playback position of the audio file and the position of the segmentation mark includes:
  • search for the previous segmentation mark adjacent to the current playback position of the audio file and determine the position of the previous segmentation mark in the audio file as the target playback point.
  • each time the operation of returning to the previous sentence is triggered the beginning of the previous sentence at the current playback position can be accurately located, and the user can find the target playback point for playback with a simple operation, and the operation is convenient.
  • the segmentation symbol includes a first segmentation symbol and a second segmentation symbol
  • determining a target playback point according to the current playback position of the audio file and the position of the sentence segmentation mark includes:
  • the first segmentation symbol is the preceding segmentation symbol adjacent to the text character in the text file
  • the second segmentation symbol is the preceding segmentation symbol adjacent to the first segmentation symbol in the text file.
  • the user’s replay intention can be judged by comparing the character spacing between the text character corresponding to the real-time playback position of the audio file and the previous sentence symbol, and the most accurate target playback point can be found intelligently. There is no need to repeatedly trigger the return operation to accurately locate the audio position that you want to hear hard for playback, which further improves the convenience of the operation, thereby enhancing the user experience.
  • the sentence segmentation mark includes a first sentence segmentation mark and a second sentence segmentation mark
  • determining a target playback point according to the current playback position of the audio file and the position of the sentence segmentation mark includes:
  • the first sentence segmentation mark is the previous sentence segmentation mark adjacent to the current playback position in the audio file
  • the second sentence segmentation mark is the previous sentence segmentation mark adjacent to the first sentence segmentation mark in the audio file
  • the user's replay intention can be judged, and the most accurate target playback point can be found intelligently. There is no need to repeatedly trigger the return operation to accurately locate the audio position that you want to hear hard for playback, which further improves the convenience of the operation, thereby enhancing the user experience.
  • an audio playback method including: in response to a second trigger operation, detecting whether the speech rate of an audio file to be played is less than a preset speech rate;
  • the generating a corresponding segmentation mark in the audio file according to the pause duration of the audio includes:
  • an electronic device including:
  • a memory configured to execute and store executable instructions of the processor
  • the processor is configured to execute the above audio playback method by executing the executable instruction.
  • a computer-readable storage medium having a computer program stored thereon, and when the computer program is executed by a processor, the above-mentioned audio playback method is implemented.
  • Fig. 1 is a schematic diagram of a system architecture of an audio playing method according to an exemplary embodiment of the present disclosure
  • Fig. 2 is a flowchart of an audio playing method according to an exemplary embodiment of the present disclosure
  • FIG. 3 is a schematic diagram of an interface of audio application software in an audio playback application scenario according to an exemplary embodiment of the present disclosure
  • FIG. 4 is a flowchart of an audio playback method according to another exemplary embodiment of the present disclosure.
  • FIG. 5 is a schematic diagram of determining a target playback point in an audio playback process of an exemplary embodiment of the present disclosure
  • FIG. 6 is a schematic diagram of determining a target playback point in an audio playback process of another exemplary embodiment of the present disclosure
  • Fig. 7 is a schematic structural diagram of an electronic device according to an exemplary embodiment of the present disclosure.
  • the user's operating burden can be reduced to a certain extent, but this method is often easy to misidentify, especially when the audio playback speed is slow. If the audio pause cannot be accurately identified, it will not be able to accurately locate the beginning of the sentence that the user wants to hear for playback. Moreover, the recognition method of audio pauses cannot be adjusted in a targeted manner according to the speech rate environment of the playback, and intelligent sentence segmentation cannot be realized.
  • this exemplary embodiment provides a new technical solution.
  • the technical solutions of the embodiments of the present disclosure are described in detail below:
  • Fig. 1 shows a schematic diagram of a system architecture of an exemplary application environment in which an audio playback method according to an embodiment of the present disclosure can be applied.
  • the system architecture 100 may include one or more of terminal devices 101, 102, and 103, a network 104 and a server 105.
  • the network 104 is used to provide a medium for communication links between the terminal devices 101, 102, 103 and the server 105.
  • the network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, and so on.
  • the terminal devices 101, 102, 103 may be various electronic devices with display screens, including but not limited to desktop computers, portable computers, smart phones, tablet computers, and so on. It should be understood that the numbers of terminal devices, networks, and servers in FIG. 1 are merely illustrative. According to implementation needs, there can be any number of terminal devices, networks, and servers.
  • the server 105 may be a server cluster composed of multiple servers.
  • the audio playback method provided in the embodiments of the present disclosure may be executed by the terminal devices 101, 102, 103, and correspondingly, the audio playback device may also be provided in the terminal devices 101, 102, 103.
  • the audio playback method provided by the embodiments of the present disclosure can also be executed by the terminal devices 101, 102, 103 and the server 105 together. Accordingly, the audio playback device can be set in the terminal devices 101, 102, 103 and the server 105.
  • the audio playing method provided by the embodiment of the present disclosure may also be executed by the server 105, and accordingly, the audio playing device may be set in the server 105, which is not particularly limited in this exemplary embodiment.
  • One aspect of the embodiments of the present disclosure provides an audio playback method, which can be applied to one or more of the aforementioned terminal devices 101, 102, 103, can also be applied to the aforementioned server 105, and can also be applied to The terminal devices 101, 102, 103 and the server 105.
  • the audio playback method includes:
  • Step S210 Recognizing the audio file to be played as a text file containing segmentation symbols
  • Step S220 according to the correspondence between the audio file and the text file, generate a sentence segmentation mark at a position corresponding to the sentence segmentation symbol in the audio file;
  • Step S230 In response to a trigger operation, determine a target playback point according to the current playback position of the audio file and the position of the sentence segmentation mark;
  • Step S240 Play the audio file from the target playback point.
  • the audio can be accurately positioned to the position where the user wants to hear hard without increasing the complexity of the user's operation, and a more accurate loop playback is realized.
  • step S210 the audio file to be played is recognized as a text file containing segmentation symbols.
  • the audio file to be played is a file storing audio data.
  • the audio file may be music, teaching voice, or recording (such as a voice message sent by a user in an instant messaging tool), which is not particularly limited in this example embodiment.
  • the text file is a text file obtained by performing voice recognition on the above audio file, and includes text characters corresponding to the audio content.
  • Speech recognition can be realized by using a recognition algorithm known in the prior art.
  • the speech recognition process can be implemented as follows: firstly, pre-emphasis, windowing and framing are performed on the audio data in the aforementioned audio file. And end-point detection and other preprocessing operations, and then analyze the audio data that has undergone preprocessing operations, and extract the required features, and finally use the discrete hidden Markov model trained on the sample to make the voice signal after the feature extraction Recognize, get the text file corresponding to the audio file.
  • the audio application interface includes a text display area 307, which can be used to display the content of the recognized text file.
  • the segmentation symbol is used to segment the above text file.
  • the segmentation symbol can be a symbol such as a comma, a semicolon or a period in the text file, or other symbols that can play a role in segmentation. This example is implemented The method does not make any special restrictions on this.
  • step S220 according to the correspondence between the audio file and the text file, a segmentation mark is generated at a position corresponding to the segmentation symbol in the audio file.
  • the correspondence between the audio file and the text file may be a one-to-one correspondence between the audio content and the characters of the recognized text during the speech recognition process.
  • the characters can be text characters or numeric characters.
  • the segmentation mark may be a special mark used to identify the above-mentioned segmentation mark.
  • the segmentation mark can be a specific special character inserted at the segmentation symbol, it can also be a dot mark at the position where the segmentation symbol is located in the sound track of the above audio file, or it can be other types that can identify the segmentation.
  • the special mark used by the symbol is not particularly limited in the embodiment of this example.
  • step S230 in response to a trigger operation, a target playback point is determined according to the current playback position of the audio file and the position of the sentence segmentation mark.
  • the trigger operation can be the user's touch operation on the terminal device (for example: clicking on a control on the touch screen, sliding in the display area, etc.), or non-touch operation (for example: clicking on a control with a mouse, pressing a mechanical button, etc.) ), it can also be a trigger operation (for example, shaking, voice input, etc.) based on preset interaction conditions.
  • the audio application interface includes a return key 301, a playback pause key 303, and a playback progress bar 305.
  • the trigger operation is the user's pressing of the return key 301 on the audio application playback interface. Trigger an operation to issue a request to return to the previous sentence.
  • the current playback position of the audio file may be the position where a certain frame of the audio file is currently played, specifically, it may be the real-time playback position corresponding to the audio frame on the playback progress bar of the audio player.
  • an audio file contains four sentences A, B, C, and D. At this moment, the audio file is playing to the beginning of sentence B, and the beginning of sentence B is the above-mentioned current position.
  • the target playback point may be the starting point in the audio file of the part to be played repeatedly in the audio file.
  • an audio file contains four sentences A, B, C, and D.
  • the start point of B is the target playback point.
  • the next sentence break mark before the current playback position is used as the target playback point.
  • the audio content to be played is "A sentence, B sentence. C sentence, D sentence.”
  • the current playback position is the beginning of C sentence, then The segmentation symbol ".” before the current playback position and the end of the B sentence, or the segmentation symbol "," at the end of the A sentence is determined as the target playback point.
  • This example implementation does not make a special limitation on this.
  • Step S240 Play the audio file from the target playback point.
  • the audio file is played from the target playback point, and the audio file may be returned to the target playback point at the position corresponding to the playback progress bar for playback. For example, if the audio frame of the target playback point corresponds to the position of 1 minute and 30 seconds on the playback progress bar, the playback starts from 1 minute and 30 seconds.
  • steps S210 and S220 in this exemplary embodiment can be executed before the audio is played (for example: first recognize the audio file to be played as a text file in the server, generate a segmentation mark in the audio file, and then During the audio playback process, when the terminal device detects a user's trigger operation, a target playback point is determined according to the current playback position of the audio file and the position of the segmentation mark, and the audio file is played at the target playback point), or They are executed when the audio is played (for example, when the user triggers the control to play the audio, the voice recognition is started, the audio file to be played is recognized as a text file, and the sentence mark is generated in the audio file, and then the terminal device detects the user's When the operation is triggered, a target playback point is determined according to the current playback position of the audio file and the position of the segmentation mark, and the audio file is played at the target playback point, which is not particularly limited in this exemplary embodiment.
  • the audio file to be played is recognized as a text file containing segmentation symbols; according to the corresponding relationship between the audio file and the text file, the position corresponding to the segmentation symbol in the audio file A segmentation mark is generated at the location; in response to a trigger operation, a target playback point is determined according to the current playback position of the audio file and the location of the segmentation mark; and the audio file is played from the target playback point.
  • the sentence is not limited by the audio playback speed during the segmentation process. Even if the playback speed is slow, accurate sentence segmentation can be achieved, and then positioned Appropriate playback position for re-listening.
  • the recognizing the audio file to be played as a text file containing segmentation symbols includes:
  • the audio file to be played is recognized as a text file, the text file is divided into a plurality of sub-text files in units of sentences through a preset sentence model, and sentence-breaking symbols are marked at the end of the sub-text file to generate Contains text files with hyphenation symbols.
  • the sentence segmentation model is used to divide the file into multiple sub-text files.
  • Each sub-text file can be regarded as a sentence.
  • a sentence symbol is added to the end of each self-text file to form a text file containing sentence symbols.
  • the sentence model uses the sentence model to construct training samples based on the characteristic attributes of the vocabulary, and is obtained through CRF algorithm training. Build training samples and get them through CRF algorithm training.
  • the sentence model can be trained in advance according to the characteristics of different fields, such as the financial field, the communication field, the electric power field, and the daily life field.
  • Vocabulary feature attributes can include the inherent attributes of the vocabulary (such as verbs, nouns, adjectives, adverbs, prepositions, modal particles, etc.), the sentence attributes of the vocabulary (such as subject, predicate, object, attributive, adverbial, etc.), and the vocabulary in different fields Semantic attributes.
  • the CRF (Conditional Random Field) algorithm is an algorithm based on probability judgments. Training samples are constructed according to the characteristic attributes of the vocabulary. The sentence model corresponding to a specific field obtained by training through the CRF algorithm can be based on different fields from the text content. The pause law of the word with pause information calculates the probability of forming the sentence position, and then the sentence is segmented.
  • the target sentence segmentation position of the text file is determined according to the sentence segmentation model, and when the confidence level of the sentence segmentation position of the text file is greater than a preset reliability, the sentence segmentation position is determined as the target sentence segmentation position, according to the target The sentence segmentation position divides the text file into sub-text files in sentence units.
  • the sentence segmentation model divides the text file into various characters and phrases, and reads each character and phrase from the text file in sequence. For example, the recognized content is “I will go home after work.” , Then read “I”, “off work”, “after”, “go home” in turn.
  • the sentence segmentation model analyzes the sentence segmentation position of "off work” at the end of the text with a confidence of 0.2 , And the preset reliability is 0.8, continue to read the next character or phrase "after”, and so on, when the text "I go home after work” is read, the sentence segmentation model analyzes the sentence segmentation position at the end of the text The confidence level of is 0.9. If it exceeds the preset confidence level of 0.8, the end of "Go Home" can be determined as the target sentence segmentation position.
  • the correspondence between the audio file and the text file includes:
  • the audio file In the process of recognizing the audio file to be played as a text file, the audio file establishes a corresponding relationship with the recognized characters of the text file on the time axis.
  • the audio file and the text file are analyzed, and the corresponding relationship between the audio file and the characters of the text file on the time axis in the speech recognition process is obtained. For example, a certain character in a text file corresponds to a certain second of audio content on the playback progress bar.
  • determining a target playback point according to the current playback position of the audio file and the position of the segmentation mark includes:
  • search for the previous segmentation mark adjacent to the current playback position of the audio file and determine the position of the previous segmentation mark in the audio file as the target playback point.
  • the segmentation mark of sentence A is found, and the position of the sentence segmentation mark of the found sentence A is used as the target playback point. In this way, each time the operation of returning to the previous sentence is triggered, the beginning of the previous sentence at the current playback position can be accurately located, and the user can find the target playback point for playback with a simple operation, and the operation is convenient. It should be noted that the above scenario is only an exemplary description, and does not limit the protection scope of the exemplary implementation manner in any way.
  • the segmentation symbol includes a first segmentation symbol and a second segmentation symbol
  • determining a target playback point according to the current playback position of the audio file and the position of the sentence segmentation mark includes:
  • the first segmentation symbol is the preceding segmentation symbol adjacent to the text character in the text file
  • the second segmentation symbol is the preceding segmentation symbol adjacent to the first segmentation symbol in the text file.
  • the audio playback progress bar is currently playing to the beginning of sentence C.
  • a trigger operation it will be based on the corresponding relationship between the audio file and the text file.
  • the segmentation mark of sentence A uses the position of the segmentation mark of sentence A (that is, the beginning of sentence B) as the target playback point.
  • the current playback reaches the position 501 at the beginning of sentence C, it corresponds to the second character at the beginning of text C (shown by the dotted line of text content "not required” in Figure 5), and the preset character spacing is 3 characters.
  • the segmentation symbol of text A is searched forward.
  • the segmentation symbol of text A corresponds to the segmentation mark of sentence A. Therefore, the position 505 where the segmentation mark of sentence A is located is determined as the target playback point.
  • the segmentation symbol of text B is found in the text file, and then the segmentation mark of sentence B in the audio file is found correspondingly, and the segmentation mark of sentence B is located
  • the position (that is, the beginning of sentence C) is used as the target playback point.
  • the current playback reaches the position 502 of sentence C, it corresponds to the 9th character at the beginning of text C (shown by the dotted line of the text content "computer" in Figure 5), and the preset character spacing is 3
  • the segmentation symbol of text B is searched forward.
  • the segmentation symbol of text B corresponds to the segmentation mark of sentence B. Therefore, the position 503 of the segmentation mark of sentence B is determined as the target playback point.
  • the user’s replay intention can be judged by comparing the character spacing between the text character corresponding to the real-time playback position of the audio file and the previous sentence symbol, and the most accurate target playback point can be found intelligently, and the user does not need to Repeatedly triggering the return operation can accurately locate the audio position that you want to hear hard for playback, which further improves the convenience of the operation, thereby enhancing the user experience.
  • the above scenario is only an exemplary description, and does not limit the protection scope of the exemplary implementation manner in any way.
  • the sentence segmentation mark includes a first sentence segmentation mark and a second sentence segmentation mark
  • determining a target playback point according to the current playback position of the audio file and the position of the sentence segmentation mark includes:
  • the first sentence segmentation mark is the previous sentence segmentation mark adjacent to the current playback position in the audio file
  • the second sentence segmentation mark is the previous sentence segmentation mark adjacent to the first sentence segmentation mark in the audio file
  • the audio playback progress bar is currently playing to the beginning of sentence C.
  • the current playback on the audio playback progress bar is judged Whether the playback time corresponding to the position and the segmentation mark of sentence B are longer than the preset time interval. If it is within the preset time interval, the segmentation marks of sentence B and sentence A will be found in turn, and the sentence of sentence A
  • the position of the segmentation mark (that is, the beginning of sentence B) is used as the target playback point. As shown in Fig. 6, if the current playback reaches the position 601 of the second second of sentence C (the position of the playback progress bar 0:19 in Fig.
  • the sentence is found forward at this time
  • the sentence segmentation mark of A, and the position 605 of the sentence segmentation mark of sentence A (corresponding to the position of the playback progress bar 0:06 in FIG. 6) is determined as the target playback point.
  • the segmentation mark of sentence B is found, and the position of the segmentation mark of sentence B (that is, the beginning of sentence C) is used as the target playback point.
  • the user's replay intention can be judged, and the most accurate target playback point can be found intelligently. There is no need to repeatedly trigger the return operation to accurately locate the audio position that you want to hear hard for playback, which further improves the convenience of the operation, thereby enhancing the user experience. It should be noted that the above scenario is only an exemplary description, and does not limit the protection scope of the exemplary implementation manner in any way.
  • the method further includes:
  • the audio file to be played is played from the beginning.
  • sentence A Take an audio file containing four sentences A, B, C, and D as an example.
  • the first sentence of the audio file (that is, sentence A) is not marked with a sentence break mark.
  • sentence A When a trigger operation is detected, if the current playback Is sentence A, then sentence A will be played from the beginning.
  • the sentence break symbol is a comma, a period, or a semicolon.
  • the sentence segmentation symbol may be a symbol such as a comma, a period, or a semicolon, or other symbols that can function as a sentence segmentation, and this example embodiment does not make a special limitation.
  • an audio playback method is provided.
  • the audio playback method can be applied to one or more of the aforementioned terminal devices 101, 102, 103, or can be applied to the aforementioned server 105, and can also be applied In the terminal devices 101, 102, 103 and the server 105.
  • the audio playback method includes:
  • Step S410 In response to the second trigger operation, detect whether the speech rate of the audio file to be played is lower than the preset speech rate
  • Step S420 If yes, identify the audio file to be played as a text file containing segmentation symbols
  • Step S430 According to the correspondence between the audio file and the text file, generate a sentence segmentation mark at a position corresponding to the sentence segmentation symbol in the audio file;
  • Step S440 if not, generate a corresponding segmentation mark in the audio file according to the pause duration of the audio;
  • Step S450 In response to the first trigger operation, determine a target playback point according to the current playback position of the audio file and the position of the sentence segmentation mark;
  • Step S460 Play the audio file from the target playback point.
  • step S410 in response to the second triggering operation, it is detected whether the speech rate of the audio file to be played is lower than the preset speech rate.
  • the second trigger operation can be a touch operation of the user on the terminal device (for example: clicking a control on the touch screen, sliding in the display area, etc.), or a non-touch operation (for example: clicking a control with a mouse, pressing a mechanical Button, etc.), it can also be a trigger operation (for example: shaking, voice input, etc.) based on preset interaction conditions.
  • the second trigger operation is a trigger operation of the play pause key 303 by the user on the play interface of the audio application to play the audio file.
  • steps S420-S430 if yes, the audio file to be played is recognized as a text file containing segmentation symbols; according to the corresponding relationship between the audio file and the text file, the audio file is compared with the text file in the audio file. A segmentation mark is generated at the position corresponding to the segmentation symbol.
  • the segmentation mark of the audio file is determined according to voice recognition, and the position corresponding to the segmentation symbol in the audio file is based on the correspondence between the audio file and the text file Segmentation marks are generated at the place.
  • the correspondence between the audio file and the text file may be a one-to-one correspondence between the audio content and the characters of the recognized text during the speech recognition process.
  • the characters can be text characters or numeric characters.
  • the segmentation mark may be a special mark used to identify the above-mentioned segmentation mark.
  • the segmentation mark can be a specific special character inserted at the segmentation symbol, it can also be a dot mark at the position where the segmentation symbol is located in the sound track of the above audio file, or it can be other types that can identify the segmentation.
  • the special mark used by the symbol is not particularly limited in the embodiment of this example.
  • step S440 if not, generate a corresponding segmentation mark in the audio file according to the pause duration of the audio;
  • the segmentation mark of the audio file is determined according to the pause duration of the audio. For example, when it is detected that the silent duration of the audio is greater than a preset threshold, a specific special character is inserted into the silent frequency band to form a sentence segmentation mark.
  • step S450 in response to the first triggering operation, a target playback point is determined according to the current playback position of the audio file and the position of the sentence segmentation mark.
  • the first trigger operation can be a touch operation of the user on the terminal device (for example: clicking a control on the touch screen, sliding in the display area, etc.), or a non-touch operation (for example: clicking a control with a mouse, pressing a mechanical Button, etc.), it can also be a trigger operation (for example: shaking, voice input, etc.) based on preset interaction conditions.
  • the trigger operation is a trigger operation of the return key 301 by the user on the playback interface of the audio application to issue a request to return to the previous sentence.
  • the current playback position of the audio file may be the position where a certain frame of the audio file is currently played, specifically, it may be the real-time playback position corresponding to the audio frame on the playback progress bar of the audio player.
  • an audio file contains four sentences A, B, C, and D. At this moment, the audio file is playing to the beginning of sentence B, and the beginning of sentence B is the above-mentioned current position.
  • the target playback point may be the starting point in the audio file of the part to be played repeatedly in the audio file.
  • an audio file contains four sentences A, B, C, and D.
  • the start point of B is the target playback point.
  • the next next sentence before the current playback position is used as the target playback point.
  • the audio content to be played is "A sentence, B sentence. C sentence, D sentence.”
  • the current playback position is the beginning of C sentence.
  • the segmentation symbol ".” before the current playback position and the end of the B sentence, or the segmentation symbol "," at the end of the A sentence is determined as the target playback point.
  • This example implementation does not make a special limitation on this.
  • Step S460 Play the audio file from the target playback point.
  • the audio file is played from the target playback point, and the audio file may be returned to the target playback point at the position corresponding to the playback progress bar for playback. For example, if the audio frame of the target playback point corresponds to the position of 1 minute and 30 seconds on the playback progress bar, the playback starts from 1 minute and 30 seconds.
  • the voice playback method of this exemplary embodiment can intelligently adapt to various complicated playback speed environments, while taking into account the accuracy and efficiency of sentence segmentation recognition, and improving user experience.
  • the recognizing the audio file to be played as a text file containing segmentation symbols includes:
  • the audio file to be played is recognized as a text file, the text file is divided into a plurality of sub-text files in units of sentences through a preset sentence model, and sentence-breaking symbols are marked at the end of the sub-text file to generate Contains text files with hyphenation symbols.
  • the sentence segmentation model is used to divide the file into multiple sub-text files.
  • Each sub-text file can be regarded as a sentence.
  • a sentence symbol is added to the end of each self-text file to form a text file containing sentence symbols.
  • the sentence model uses the sentence model to construct training samples based on the characteristic attributes of the vocabulary, and is obtained through CRF algorithm training. Build training samples and get them through CRF algorithm training.
  • the sentence model can be trained in advance according to the characteristics of different fields, such as the financial field, the communication field, the electric power field, and the daily life field.
  • Vocabulary feature attributes can include the inherent attributes of the vocabulary (such as verbs, nouns, adjectives, adverbs, prepositions, modal particles, etc.), the sentence attributes of the vocabulary (such as subject, predicate, object, attributive, adverbial, etc.), and the vocabulary in different fields Semantic attributes.
  • the CRF (Conditional Random Field) algorithm is an algorithm based on probability judgments. Training samples are constructed according to the characteristic attributes of the vocabulary. The sentence model corresponding to a specific field obtained by training through the CRF algorithm can be based on different fields from the text content. The pause law of the word with pause information calculates the probability of forming the sentence position, and then the sentence is segmented.
  • the target sentence segmentation position of the text file is determined according to the sentence segmentation model, and when the confidence level of the sentence segmentation position of the text file is greater than a preset reliability, the sentence segmentation position is determined as the target sentence segmentation position, according to the target The sentence segmentation position divides the text file into sub-text files in sentence units.
  • the segmentation model divides the text file into various characters and phrases, and reads each character and phrase from the text file in sequence. For example, the recognized content is “I will go home after work.” , Then read “I”, “off work”, “after”, “go home” in turn.
  • the sentence segmentation model analyzes the sentence segmentation position of "off work” at the end of the text with a confidence of 0.2 , And the preset reliability is 0.8, continue to read the next character or phrase "after”, and so on, when the text "I go home after work” is read, the sentence segmentation model analyzes the sentence segmentation position at the end of the text The confidence level of is 0.9. If it exceeds the preset confidence level of 0.8, the end of "Go Home" can be determined as the target sentence segmentation position.
  • the correspondence between the audio file and the text file includes:
  • the audio file In the process of recognizing the audio file to be played as a text file, the audio file establishes a corresponding relationship with the recognized characters of the text file on the time axis.
  • the audio file and the text file are analyzed, and the corresponding relationship between the audio file and the characters of the text file on the time axis in the speech recognition process is obtained. For example, a certain character in a text file corresponds to a certain second of audio content on the playback progress bar.
  • determining a target playback point according to the current playback position of the audio file and the position of the segmentation mark includes:
  • search for the previous segmentation mark adjacent to the current playback position of the audio file and determine the position of the previous segmentation mark in the audio file as the target playback point.
  • the segmentation mark of sentence A is found, and the position of the sentence segmentation mark of the found sentence A is used as the target playback point. In this way, each time the operation of returning to the previous sentence is triggered, the beginning of the previous sentence at the current playback position can be accurately located, and the user can find the target playback point for playback with a simple operation, and the operation is convenient. It should be noted that the above scenario is only an exemplary description, and does not limit the protection scope of the exemplary implementation manner in any way.
  • the segmentation symbol includes a first segmentation symbol and a second segmentation symbol
  • determining a target playback point according to the current playback position of the audio file and the position of the sentence segmentation mark includes:
  • the first segmentation symbol is the preceding segmentation symbol adjacent to the text character in the text file
  • the second segmentation symbol is the preceding segmentation symbol adjacent to the first segmentation symbol in the text file.
  • the audio playback progress bar is currently playing to the beginning of sentence C.
  • a trigger operation it will be based on the corresponding relationship between the audio file and the text file.
  • the segmentation mark of sentence A uses the position of the segmentation mark of sentence A (that is, the beginning of sentence B) as the target playback point.
  • the current playback reaches the position 501 at the beginning of sentence C, it corresponds to the second character at the beginning of text C (shown by the dotted line of text content "not required” in Figure 5), and the preset character spacing is 3 characters.
  • the segmentation symbol of text A is searched forward.
  • the segmentation symbol of text A corresponds to the segmentation mark of sentence A. Therefore, the position 505 where the segmentation mark of sentence A is located is determined as the target playback point.
  • the segmentation symbol of text B is found in the text file, and then the segmentation mark of sentence B in the audio file is found correspondingly, and the segmentation mark of sentence B is located
  • the position (that is, the beginning of sentence C) is used as the target playback point.
  • the current playback reaches the position 502 of sentence C, it corresponds to the 9th character at the beginning of text C (shown by the dotted line of the text content "computer" in Figure 5), and the preset character spacing is 3
  • the segmentation symbol of text B is searched forward.
  • the segmentation symbol of text B corresponds to the segmentation mark of sentence B. Therefore, the position 503 of the segmentation mark of sentence B is determined as the target playback point.
  • the user’s replay intention can be judged by comparing the character spacing between the text character corresponding to the real-time playback position of the audio file and the previous sentence symbol, and the most accurate target playback point can be found intelligently, and the user does not need to Repeatedly triggering the return operation can accurately locate the audio position that you want to hear hard for playback, which further improves the convenience of the operation, thereby enhancing the user experience.
  • the above scenario is only an exemplary description, and does not limit the protection scope of the exemplary implementation manner in any way.
  • the sentence segmentation mark includes a first sentence segmentation mark and a second sentence segmentation mark
  • determining a target playback point according to the current playback position of the audio file and the position of the sentence segmentation mark includes:
  • the first sentence segmentation mark is the previous sentence segmentation mark adjacent to the current playback position in the audio file
  • the second sentence segmentation mark is the previous sentence segmentation mark adjacent to the first sentence segmentation mark in the audio file
  • the audio playback progress bar is currently playing to the beginning of sentence C.
  • the current playback on the audio playback progress bar is judged Whether the playback time corresponding to the position and the segmentation mark of sentence B are longer than the preset time interval. If it is within the preset time interval, the segmentation marks of sentence B and sentence A will be found in turn, and the sentence of sentence A
  • the position of the segmentation mark (that is, the beginning of sentence B) is used as the target playback point. As shown in Fig. 6, if the current playback reaches the position 601 of the second second of sentence C (the position of the playback progress bar 0:19 in Fig.
  • the sentence is found forward at this time
  • the sentence segmentation mark of A, and the position 605 of the sentence segmentation mark of sentence A (the position of the playback progress bar 0:06 in FIG. 6) is determined as the target playback point.
  • the segmentation mark of sentence B is found, and the position of the segmentation mark of sentence B (that is, the beginning of sentence C) is used as the target playback point.
  • the user's replay intention can be judged, and the most accurate target playback point can be found intelligently. There is no need to repeatedly trigger the return operation to accurately locate the audio position that you want to hear hard for playback, which further improves the convenience of the operation, thereby enhancing the user experience. It should be noted that the above scenario is only an exemplary description, and does not limit the protection scope of the exemplary implementation manner in any way.
  • the method further includes:
  • the audio file to be played is played from the beginning.
  • sentence A Take an audio file containing four sentences A, B, C, and D as an example.
  • the first sentence of the audio file (that is, sentence A) is not marked with a sentence break mark.
  • sentence A When a trigger operation is detected, if the current playback Is sentence A, then sentence A will be played from the beginning.
  • the sentence break symbol is a comma, a period, or a semicolon.
  • the sentence segmentation symbol may be a symbol such as a comma, a period, or a semicolon, or other symbols that can function as a sentence segmentation, and this example embodiment does not make a special limitation.
  • the generating a corresponding segmentation mark in the audio file according to the pause duration of the audio includes:
  • the pause duration threshold can be set in advance. During the playback of the audio file, when it is detected that the duration of the audio silent segment is greater than the preset pause duration threshold, a specific special character is inserted into the silent frequency band to form a segmentation mark.
  • FIG. 7 is a schematic structural diagram of an electronic device provided by an embodiment of the disclosure.
  • the electronic device 700 of this embodiment includes: a processor 701 and a memory 702; wherein the memory 702 is configured to store computer-executable instructions; the processor 701 is configured to execute computer-executable instructions stored in the memory to The steps performed in the above-mentioned embodiments are implemented. Specifically, the relevant description in the above method embodiment can be used.
  • the embodiments of the present disclosure also provide a computer-readable storage medium in which computer-executable instructions are stored, and when the processor executes the computer-executable instructions, the above-mentioned data processing method is implemented.
  • the disclosed device and method may be implemented in other ways.
  • the device embodiments described above are merely illustrative.
  • the division of the modules is only a logical function division, and there may be other divisions in actual implementation, for example, multiple modules can be combined or integrated. To another system, or some features can be ignored, or not implemented.
  • the displayed or discussed mutual coupling or direct coupling or communication connection may be indirect coupling or communication connection through some interfaces, devices or modules, and may be in electrical, mechanical or other forms.
  • the functional modules in the various embodiments of the present disclosure may be integrated into one processing unit, or each module may exist alone physically, or two or more modules may be integrated into one unit.
  • the units formed by the above modules can be implemented in the form of hardware, or in the form of hardware plus software functional units.
  • the above-mentioned integrated modules implemented in the form of software functional modules may be stored in a computer readable storage medium.
  • the above-mentioned software function module is stored in a storage medium and includes several instructions to make a computer device (which may be a personal computer, a server, or a network device, etc.) or a processor (English: processor) execute the various embodiments of the present disclosure Part of the method.
  • processor may be a central processing unit (English: Central Processing Unit, abbreviated as: CPU), or other general-purpose processors, digital signal processors (English: Digital Signal Processor, abbreviated as: DSP), and application-specific integrated circuits. (English: Application Specific Integrated Circuit, referred to as ASIC) etc.
  • the general-purpose processor may be a microprocessor or the processor may also be any conventional processor or the like.
  • the steps of the method disclosed in combination with the invention can be directly embodied as executed and completed by a hardware processor, or executed and completed by a combination of hardware and software modules in the processor.
  • the memory may include a high-speed RAM memory, or may also include a non-volatile storage NVM, such as at least one disk storage, and may also be a U disk, a mobile hard disk, a read-only memory, a magnetic disk, or an optical disk.
  • the bus can be an Industry Standard Architecture (ISA) bus, Peripheral Component (PCI) bus, or Extended Industry Standard Architecture (EISA) bus, etc.
  • ISA Industry Standard Architecture
  • PCI Peripheral Component
  • EISA Extended Industry Standard Architecture
  • the bus can be divided into address bus, data bus, control bus and so on.
  • the buses in the drawings of the present disclosure are not limited to only one bus or one type of bus.
  • the above-mentioned storage medium can be realized by any type of volatile or non-volatile storage device or their combination, such as static random access memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable Except programmable read only memory (EPROM), programmable read only memory (PROM), read only memory (ROM), magnetic memory, flash memory, magnetic disk or optical disk.
  • SRAM static random access memory
  • EEPROM electrically erasable programmable read-only memory
  • EPROM erasable except programmable read only memory
  • PROM programmable read only memory
  • ROM read only memory
  • magnetic memory flash memory
  • flash memory magnetic disk or optical disk.
  • optical disk any available medium that can be accessed by a general-purpose or special-purpose computer.
  • An exemplary storage medium is coupled to the processor, so that the processor can read information from the storage medium and write information to the storage medium.
  • the storage medium may also be an integral part of the processor.
  • the processor and the storage medium may be located in Application Specific Integrated Circuits (ASIC for short).
  • ASIC Application Specific Integrated Circuits
  • the processor and the storage medium may also exist as discrete components in the electronic device or the main control device.
  • a person of ordinary skill in the art can understand that all or part of the steps in the foregoing method embodiments can be implemented by a program instructing relevant hardware.
  • the aforementioned program can be stored in a computer readable storage medium. When the program is executed, it executes the steps including the foregoing method embodiments; and the foregoing storage medium includes: ROM, RAM, magnetic disk, or optical disk and other media that can store program codes.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Multimedia (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Library & Information Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Acoustics & Sound (AREA)
  • Human Computer Interaction (AREA)
  • User Interface Of Digital Computer (AREA)
  • Machine Translation (AREA)
  • Document Processing Apparatus (AREA)

Abstract

本公开提供了一种音频播放方法、电子设备以及计算机可读存储介质,该方法包括:将待播放的音频文件识别为包含有断句符号的文本文件;根据所述音频文件与所述文本文件的对应关系,在所述音频文件中与所述断句符号对应的位置处生成断句标记;响应一触发操作,根据所述音频文件当前播放位置以及所述断句标记的位置确定一目标播放点;从所述目标播放点处播放所述音频文件。本公开的一种音频播放方法、电子设备以及计算机可读存储介质,可以在不增加用户操作复杂性的情况下,将音频准确定位至用户想要重听的位置处进行播放,实现了较为精准的循环播放。

Description

音频播放方法、电子设备及存储介质
相关申请的交叉引用
本公开要求于2019年11月14日提交的申请号为201911112611.4、名称为“音频播放方法及装置、存储介质及电子设备”的中国专利申请,以及于2020年1月15日提交的申请号为202010042918.8、名称为“音频播放方法、电子设备及存储介质”的中国专利申请的优先权,上述中国专利申请的全部内容通过引用全部并入本文。
技术领域
本公开涉及音频技术领域,尤其涉及一种音频播放方法、电子设备及存储介质。
背景技术
随着通信技术的发展,目前大多数终端都已支持音频播放,以满足用户学习、工作及娱乐需求。
在一些情况下,需要对特定的音频段进行重复播放,例如,出于学习或兴趣的原因需要重听音频中的某段,或者由于没有听清播放的内容需要返回至该处进行重听。而用来实现这一功能的现有技术存在定位不精准,操作成本高,效率低、固定返回时长不是用户需要的时长,灵活度和准确度低等问题。
需要说明的是,在上述背景技术部分公开的信息仅用于加强对本公开的背景的理解,因此可以包括不构成对本领域普通技术人员已知的现有技术的信息。
发明内容
本公开的目的在于提供一种音频播放方法、电子设备以及存储介质,进而至少在一定程度上克服由于相关技术的限制和缺陷而导致的一个或者多个问题。
根据本公开的一个方面,提供一种音频播放方法,包括:将待播放的音频文件识别为包含有断句符号的文本文件;
根据所述音频文件与所述文本文件的对应关系,在所述音频文件中与所述断句符号对应的位置处生成断句标记;
响应一触发操作,根据所述音频文件当前播放位置以及所述断句标记的位置确定一目标播放点;
从所述目标播放点处播放所述音频文件。
本示例实施例中,当用户想重听某段音频内容时,只需在终端上进行操作,即可准确定位至合适的目标播放点,无需通过重复滑动播放进度条来进行定位,操作简单便捷。此外,通过将待播放的音频文件识别为含有断句符号的文本文件进行断句,使得在断句过程中不受音频播放语速的限制,即使播放的语速较慢也能实现准确断句,进而定位至合适的播放位置进行重听。
在本公开的一种示例性实施例中,所述响应一触发操作,根据所述音频文件当前播放位置以及所述断句标记的位置确定一目标播放点,包括:
响应一触发操作,查找所述音频文件当前播放位置相邻的前一个断句标记,将所述前一个断句标记在所述音频文件中的位置确定为目标播放点。
本示例实施例中,每触发一次返回上一句的操作,即可准确定位至当前播放位置的上一句话的开头,用户只需简单操作即可查找到目标播放点进行播放,操作便捷。
在本公开的一种示例性实施例中,所述断句符号包括第一断句符号和第二断句符号;
所述响应一触发操作,根据所述音频文件的当前播放位置以及所述断句标记的位置确定一目标播放点,包括:
响应一触发操作,根据所述音频文件与所述文本文件的对应关系,在所述文本文件中确定与所述当前播放位置对应的文本字符;
判断所述文本字符与所述断句符号之间的字符间隔是否大于预设字符间隔;
若是,则将所述第一断句符号对应的断句标记在时间轴上的播放位置确定为目标播放点;
若否,则将所述第二断句符号对应的断句标记在时间轴上的播放位置确定为目标播放点;
其中,所述第一断句符号为在所述文本文件中所述文本字符相邻的前一个断句符号,所述第二断句符号为在所述文本文件中所述第一断句符号相邻的前一个断句符号。
本示例实施例中,通过比较音频文件的实时播放位置对应的文本字符与前一断句符号的之间的字符间距来判断用户的重播意图,可以智能化地查找到最准确的目标播放点,用户不必重复触发返回操作即可准确定位至想要重听的音频位置处进行播放,进一步提高了操作的便捷性,从而增强了用户体验度。
在本公开的一种示例性实施例中,所述断句标记包括第一断句标记和第二断句标记;
所述响应一触发操作,根据所述音频文件的当前播放位置以及所述断句标记的位置确定一目标播放点,包括:
响应一触发操作,判断所述当前播放位置对应的播放时间与所述第一断句标记对应的播放时间之间的时间间隔是否大于预设时间间隔;
若是,则将第一断句标记在时间轴上的播放位置确定为目标播放点;
若否,则将第二断句标记在时间轴上的播放位置确定为目标播放点;
其中,第一断句标记为在所述音频文件中所述当前播放位置相邻的前一个断句标记,第二断句标记为在所述音频文件中所述第一断句标记相邻的前一个断句标记。
本示例实施例,通过比较音频文件的实时播放位置的播放时间与前一断句标记的播放时间之间的时间间隔来判断用户的重播意图,可以智能化地查找到最准确的目标播放点,用户不必重复触发返回操作即可准确定位至想要重听的音频位置处进行播放,进一步提高了操作的便捷性,从而增强了用户体验度。
根据本公开的另一个方面,提供一种音频播放方法,包括:响应第二触发操作,检测待播放的音频文件的语速是否小于预设语速;
若是,将所述待播放的音频文件识别为包含有断句符号的文本文件;
根据所述音频文件与所述文本文件的对应关系,在所述音频文件中与所述断句符号对应的位置处生成断句标记;
若否,根据音频的停顿时长在所述音频文件中生成对应的断句标记;
响应第一触发操作,根据所述音频文件当前播放位置以及所述断句标记的位置确定一目标播放点;
从所述目标播放点处播放所述音频文件。
在本公开的一种示例性实施例中,所述在所述音频文件中根据音频的停顿时长生成对应的断句标记,包括:
当检测到音频的停顿时长大于预设时长时,在该音频文件中生成对应的断句标记。
根据本公开的另一个方面,提供一种电子设备,包括:
处理器、显示装置;以及
存储器,被配置为执行存储所述处理器的可执行指令;
其中,所述处理器配置为经由执行所述可执行指令来执行上述音频播放方法。
根据本公开的另一个方面,提供一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现以上述音频播放方法。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,并不能限制本公开。
附图说明
通过参照附图来详细描述其示例性实施例,本公开的上述和其它特征及优点将变得更加明显。显而易见地,下面描述中的附图仅仅是本公开的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。在附图中:
图1为本公开一示例性实施例的音频播放方法的系统架构的示意图;
图2为本公开一示例性实施例的音频播放方法的流程图;
图3为本公开一示例性实施例的音频播放应用场景中音频应用软件的界面示意图;
图4为本公开另一示例性实施例的音频播放方法的流程图;
图5为本公开一示例性实施例的音频播放流程中确定目标播放点的示意图;
图6为本公开另一示例性实施例的音频播放流程中确定目标播放点的示意图;
图7为本公开一示例性实施例的电子设备的结构示意图。
具体实施方式
需要说明的是,在不冲突的情况下,本公开中的实施例及实施例中的特征可以相互组合。下面将参考附图并结合实施例来详细说明本公开。
为了使本技术领域的人员更好地理解本公开方案,下面将结合本公开实施例中的附 图,对本公开实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本公开一部分的实施例,而不是全部的实施例。基于本公开中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都应当属于本公开保护的范围。
需要说明的是,本公开的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别类似的对象,而不必用于描述特定的顺序或先后次序。应该理解这样使用的数据在适当情况下可以互换,以便这里描述的本公开的实施例。此外,术语“包括”和“具有”以及他们的任何变形,意图在于覆盖不排他的包含,例如,包含了一系列步骤或单元的过程、方法、系统、产品或设备不必限于清楚地列出的那些步骤或单元,而是可包括没有清楚地列出的或对于这些过程、方法、产品或设备固有的其它步骤或单元。
需要说明的是,附图仅为本公开的示意性图解,并非一定是按比例绘制。图中相同的附图标记表示相同或类似的部分,因而将省略对它们的重复描述。附图中所示的一些方框图是功能实体,不一定必须与物理或逻辑上独立的实体相对应。可以采用软件形式来实现这些功能实体,或在一个或多个硬件模块或集成电路中实现这些功能实体,或在不同网络和/或处理器装置和/或微控制器装置中实现这些功能实体。
本案申请人在长期的研发过程中,发现现有技术中音频播放方法存在以下不足:
1、通过手势滑动音频播放进度条进行返回时,只能靠用户感觉进行滑动,往往需要用户多次滑动才能定位到合适的播放位置,效率低下,用户体验度较低;
2、通过识别音频停顿时长,返回至上一音频停顿处进行播放的方式,可以在一定程度上降低用户操作负担,但此方式往往容易误识别,尤其是在音频播放语速较慢的情况下并不能准确地识别到音频停顿处,也就无法准确定位至用户想要重听的句子开头处进行播放。并且,不能根据播放的语速环境有针对性地调整音频停顿的识别方式,无法实现智能断句。
3、音频播放过程中每播放完一句话,用户想重听上一句话的内容,播放进度条往往已经播放至下一句话的开头,此时如果定位至音频的上一停顿处,并不是用户想要重听的内容,还需要重复进行返回操作,较为繁琐。
为了解决上述问题,本示例实施方式提供了一种新的技术方案,以下对本公开实施例的技术方案进行详细阐述:
图1示出了可以应用本公开实施例的一种音频播放方法的示例性应用环境的系统架构的示意图。
如图1所示,系统架构100可以包括终端设备101、102、103中的一个或多个,网络104和服务器105。网络104用以在终端设备101、102、103和服务器105之间提供通信链路的介质。网络104可以包括各种连接类型,例如有线、无线通信链路或者光纤电缆等等。终端设备101、102、103可以是具有显示屏的各种电子设备,包括但不限于台式计算机、便携式计算机、智能手机和平板电脑等等。应该理解,图1中的终端设备、 网络和服务器的数目仅仅是示意性的。根据实现需要,可以具有任意数目的终端设备、网络和服务器。比如服务器105可以是多个服务器组成的服务器集群等。
本公开实施例所提供的音频播放方法可以由终端设备101、102、103执行,相应的,音频播放装置也可以设置于终端设备101、102、103中。本公开实施例所提供的音频播放方法也可以由终端设备101、102、103与服务器105共同执行,相应地,音频播放装置可以设置于终端设备101、102、103与服务器105中。此外,本公开实施例所提供的音频播放方法还可以由服务器105执行,相应的,音频播放装置可以设置于服务器105中,本示例性实施例中对此不做特殊限定。
本公开实施例的一方面,提供了一种音频播放方法,该音频播放方法可以应用于上述终端设备101、102、103中的一个或多个,也可以应用于上述服务器105,还可以应用于终端设备101、102、103与服务器105中。如图2所示,该音频播放方法包括:
步骤S210,将待播放的音频文件识别为包含有断句符号的文本文件;
步骤S220,根据所述音频文件与所述文本文件的对应关系,在所述音频文件中与所述断句符号对应的位置处生成断句标记;
步骤S230,响应一触发操作,根据所述音频文件当前播放位置以及所述断句标记的位置确定一目标播放点;
步骤S240,从所述目标播放点处播放所述音频文件。
通过本示例性实施例中的音频播放方法,可以在不增加用户操作复杂性的情况下,将音频准确定位至用户想要重听的位置处进行播放,实现了较为精准的循环播放。
下面,将对本示例性实施例中音频播放方法的各步骤作进一步地说明。
在步骤S210中,将待播放的音频文件识别为包含有断句符号的文本文件。
待播放的音频文件为存放音频数据的文件。举例而言,该音频文件可以为音乐,也可以为教学语音,还可以为录音(比如即时通讯工具中用户发的语音信息),本示例实施方式对此不做特殊限定。
文本文件为通过对上述音频文件进行语音识别后获得的文本文件,包括与音频内容相对应的文本字符。语音识别可以采用现有技术中公知的识别算法来实现,在一种可选的实施例中,语音识别过程具体实现可以如下:首先对上述音频文件中的音频数据进行预加重、加窗分帧及端点检测等预处理操作,然后对经过预处理操作的音频数据进行分析,并提取出所需的特征,最后采用经样本训练后的离散隐马尔可夫模型对特征提取后的语音信号做语音识别,得到该音频文件所对应的文本文件。在具体的应用场景中,如图3所示,音频应用界面上包括文本显示区域307,可用于显示识别后的文本文件内容。
断句符号是用来对上述文本文件进行断句的,举例而言,该断句符号可以是文本文件中的逗号,分号或句号等符号,也可以是其他能起到断句作用的符号,本示例实施方式对此不做特殊限定。
在步骤S220中,根据所述音频文件与所述文本文件的对应关系,在所述音频文件 中与所述断句符号对应的位置处生成断句标记。
音频文件与文本文件的对应关系,可以是语音识别过程中,音频内容与识别后的文本各字符之间的一一对应关系。字符可以是文字字符,也可以是数字字符。
断句标记可以是用来标识上述断句符号的特殊标记。举例而言,该断句标记可以是在断句符号处插入特定的特殊字符,也可以是在断句符号在上述音频文件的声音轨道中所在的位置处进行的打点标记,还可以是其他可以实现标识断句符号作用的特殊标记,本示例实施方式对此不做特殊限定。
在步骤S230中,响应一触发操作,根据所述音频文件当前播放位置以及所述断句标记的位置确定一目标播放点。
触发操作可以是用户在终端设备上的触控操作(例如:在触控屏上点击控件、在显示区域内进行滑动等),也可是非触控操作(例如:鼠标点击控件、按压机械按钮等),还可以是根据预先设置的交互条件进行的触发操作(例如:摇晃、声音输入等)。在具体的应用场景中,如图3所示,音频应用界面上包括返回键301、播放暂停键303、播放进度条305,所述触发操作为用户在音频应用的播放界面上对返回键301的触发操作,以发出返回上一句的请求。
音频文件的当前播放位置可以是当前播放至音频文件的某一帧所处的位置,具体地,可以是在音频播放器的播放进度条上对应于该音频帧的实时播放位置。举例而言,一段音频文件包含A、B、C及D共四句话,此刻音频文件正播放到语句B的开头处,则语句B的开头处即为上述当前位置。
目标播放点可以是上述音频文件中要重复播放的部分在该音频文件中的起点。举例而言,一段音频文件包含A、B、C及D共四句话,要从B开始重播该音频文件,则B的开始点即为目标播放点。
根据所述音频文件当前播放位置以及所述断句标记的位置确定一目标播放点,可以是将与音频文件当前播放位置之前相邻最近的一个断句标记作为目标播放点,也可以是将与音频文件当前播放位置之前相邻第二近的一个断句标记作为目标播放点,例如:播放的音频内容依次为“A句,B句。C句,D句。”,当前播放位置为C句开头,则把当前播放位置之前与B句末尾断句符号“。”,或A句末尾断句符号“,”确定为目标播放点。本示例实施方式对此不做特殊限定。
步骤S240,从所述目标播放点处播放所述音频文件。
当定位至目标播放点后,从目标播放点处开始播放音频文件,可以是将音频文件返回至该目标播放点对应于播放进度条的位置处进行播放。举例而言,目标播放点的音频帧对应于播放进度条上1分30秒的位置,则从1分30秒处开始播放。
需要说明的是,本示例实施例中的步骤S210和S220分别可以在播放音频播放前执行(例如:首先在服务器中将待播放的音频文件识别为文本文件,在音频文件中生成断句标记,然后在音频播放过程中,终端设备检测到用户的触发操作时根据音频文件当 前播放位置以及所述断句标记的位置来确定一目标播放点,并在目标播放点处播放所述音频文件),也可以分别在播放音频时执行(例如:检测到用户触发播放音频的控件时,启动语音识别,将待播放的音频文件识别为文本文件,在音频文件中生成断句标记,随后终端设备在检测到用户的触发操作时根据音频文件当前播放位置以及所述断句标记的位置来确定一目标播放点,并在目标播放点处播放所述音频文件),本示例实施方式对此不做特殊限定。
本示例实施例中,将待播放的音频文件识别为包含有断句符号的文本文件;根据所述音频文件与所述文本文件的对应关系,在所述音频文件中与所述断句符号对应的位置处生成断句标记;响应一触发操作,根据所述音频文件当前播放位置以及所述断句标记的位置确定一目标播放点;从所述目标播放点处播放所述音频文件。这样,当用户想重听某段音频内容时,只需在终端上进行操作,即可准确定位至合适的目标播放点,无需通过重复滑动播放进度条来进行定位,操作简单便捷。此外,通过将待播放的音频文件识别为含有断句符号的文本文件进行断句,使得在断句过程中不受音频播放语速的限制,即使播放的语速较慢也能实现准确断句,进而定位至合适的播放位置进行重听。
在本公开的一种示例性实施例中,所述将待播放的音频文件识别为包含有断句符号的文本文件,包括:
将待播放的音频文件识别为文本文件,通过预设的句子模型将所述文本文件分割为多个以句为单位的子文本文件,并在所述子文本文件的末端标记断句符号,以生成包含有断句符号的文本文件。
使用断句模型将文件文件划分为多个子文本文件,每个子文本文件可视为一句话,在每个自文本文件的末端添加断句符号,以形成包含断句符号的文本文件。
在本公开的一种示例性实施例中,所述句子模型以所述句子模型以词汇的特征属性构建训练样本,通过CRF算法训练得到。建训练样本,通过CRF算法训练得到。
句子模型可以预先根据不同的领域的特点分别训练得到,如金融领域,通信领域,电力领域,日常生活领域等。词汇特征属性可以包括词汇的固有属性(如动词、名词、形容词、副词、介词、语气词等)、词汇的语句属性(如主语、谓语、宾语、定语、状语等)、以及词汇在不同领域中的语义属性。
CRF(Conditional random field,条件随机场)算法是基于概率判断的算法,根据词汇的特征属性构建训练样本,通过CRF算法进行训练得到的对应于特定领域的句子模型,可以从文本内容中根据不同领域具有停顿信息的词的停顿规律计算出形成断句位置的概率,并以此进行断句。
可选的,根据所述断句模型确定文本文件的目标断句位置,在所述文本文件的断句位置的置信度大于预设置信度时,将所述断句位置确定为目标断句位置,根据所述目标断句位置将文本文件分割为以句为单位的子文本文件。
在将音频文件识别为文本文件后,断句模型将文本文件划分为各类字符以及词组, 顺次从文本文件中读取各个字符、词组,例如:识别的内容为“我下班后去回家”,则依次读取“我”、“下班”、“后”、“回家”,当读取内容“我下班”,断句模型分析该文本的结尾处“下班”的断句位置的置信度为0.2,而预设置信度为0.8,则继续读取下一个字符或词组“后”,以此类推,当读取至文本“我下班后回家”,断句模型分析该文本的结尾处的断句位置的置信度为0.9,超过预设置信度0.8,则可确定“回家”末尾处为目标断句位置。
在本公开的一种示例性实施例中,所述音频文件与所述文本文件的对应关系,包括:
在将待播放的音频文件识别为文本文件的过程中,所述音频文件在时间轴上与识别的所述文本文件的字符建立的对应关系。
通过语音识别技术得到音频文件对应的文本文件后,对音频文件和文本文件进行分析,得出语音识别过程中音频文件在时间轴上与文本文件的各个字符建立的对应关系。比如,文本文件中的某个字符对应于播放进度条上某一秒的音频内容。
在本公开的一种示例性实施例中,所述响应一触发操作,根据所述音频文件当前播放位置以及所述断句标记的位置确定一目标播放点,包括:
响应一触发操作,查找所述音频文件当前播放位置相邻的前一个断句标记,将所述前一个断句标记在所述音频文件中的位置确定为目标播放点。
以包含A、B、C及D共四句话的一段音频文件为例,若当前音频文件播放至语句C,则当检测到一次触发操作时,在该音频文件中查找语句C之前的一个断句标记,也即语句B结尾处的断句标记所在的位置作为目标播放点。可以通过重复上述查找前一个断句标记的操作来定位上述目标播放点。假设音频文件当前所播放的为语句D,目标播放点为语句B,即要将音频文件从语句B的开头处开始播放,则需要进行三次触发操作,依次查找到语句C、语句B、语句A的断句标记,将查找到的语句A的断句标记所在的位置作为目标播放点。这样,每触发一次返回上一句的操作,即可准确定位至当前播放位置的上一句话的开头,用户只需简单操作即可查找到目标播放点进行播放,操作便捷。需要说明的是,上述场景只是一种示例性说明,并不对本示例实施方式的保护范畴起任何限定作用。
在本公开的一种示例性实施例中,所述断句符号包括第一断句符号和第二断句符号;
所述响应一触发操作,根据所述音频文件的当前播放位置以及所述断句标记的位置确定一目标播放点,包括:
响应一触发操作,根据所述音频文件与所述文本文件的对应关系,在所述文本文件中确定与所述当前播放位置对应的文本字符;
判断所述文本字符与所述断句符号之间的字符间隔是否大于预设字符间隔;
若是,则将所述第一断句符号对应的断句标记在时间轴上的播放位置确定为目标播放点;
若否,则将所述第二断句符号对应的断句标记在时间轴上的播放位置确定为目标播放点;
其中,所述第一断句符号为在所述文本文件中所述文本字符相邻的前一个断句符号,所述第二断句符号为在所述文本文件中所述第一断句符号相邻的前一个断句符号。
以包含A、B、C及D共四句话的一段音频文件为例,音频播放进度条当前播放至语句C的开头处,在检测到一次触发操作时,根据音频文件与文本文件的对应关系,确定当前播放直至语句C的哪一个文本字符,如果是处于预设的字符间距内,则在文本文件中依次查找语句B、语句A的断句符号,进而相对应地找到音频文件中语句B、语句A的断句标记,将该语句A的断句标记所在的位置(即语句B的开头处)作为目标播放点。如图5所示,如果当前播放至语句C开头的位置501处,对应于文本C开头第2个字符(附图5中文本内容“不要”的虚线所示处),而预设字符间距为3个字符,此时向前查找到文本A的断句符号,该文本A的断句符号对应于语句A的断句标记,因此将该语句A的断句标记所在的位置505确定为目标播放点。同理,如果不处于预设的字符间距内,则在文本文件中查找到文本B的断句符号,进而相对应地找到音频文件中语句B的断句标记,并将该语句B的断句标记所在的位置(即语句C的开头处)作为目标播放点。如图5所示,如果当前播放至语句C的位置502处,对应于文本C开头第个9字符(附图5中文本内容“电脑”的虚线所示处),而预设字符间距为3个字符,此时向前查找到文本B的断句符号,该文本B的断句符号对应于语句B的断句标记,因此将该语句B的断句标记所在的位置503确定为目标播放点。
本示例实施例,通过比较音频文件的实时播放位置对应的文本字符与前一断句符号的之间的字符间距来判断用户的重播意图,可以智能化地查找到最准确的目标播放点,用户不必重复触发返回操作即可准确定位至想要重听的音频位置处进行播放,进一步提高了操作的便捷性,从而增强了用户体验度。需要说明的是,上述场景只是一种示例性说明,并不对本示例实施方式的保护范畴起任何限定作用。
在本公开的一种示例性实施例中,所述断句标记包括第一断句标记和第二断句标记;
所述响应一触发操作,根据所述音频文件的当前播放位置以及所述断句标记的位置确定一目标播放点,包括:
响应一触发操作,判断所述当前播放位置对应的播放时间与所述第一断句标记对应的播放时间之间的时间间隔是否大于预设时间间隔;
若是,则将第一断句标记在时间轴上的播放位置确定为目标播放点;
若否,则将第二断句标记在时间轴上的播放位置确定为目标播放点;
其中,第一断句标记为在所述音频文件中所述当前播放位置相邻的前一个断句标记,第二断句标记为在所述音频文件中所述第一断句标记相邻的前一个断句标记。
以包含A、B、C及D共四句话的一段音频文件为例,音频播放进度条当前播放至语句C的开头处,在检测到一次触发操作时,判断音频播放进度条上当前的播放位置对应的播放时间与语句B的断句标记对应播放之间是否大于预设时间间隔,如果处于预设的时间间隔内,则依次查找到语句B、语句A的断句标记,并将该语句A的断句 标记所在的位置(即语句B的开头处)作为目标播放点。如图6所示,若当前播放至语句C开头第2秒的位置601处(图6中播放进度条0:19的位置),而预设字符间距为3秒,此时向前查找到语句A的断句标记,并将该语句A的断句标记所在的位置605(图6中对应于播放进度条0:06的位置)确定为目标播放点。同理,如果不处于预设的时间间隔内,则查找到语句B的断句标记,并将该语句B的断句标记所在的位置(即语句C的开头处)作为目标播放点。如图6所示,若当前播放至语句C开头第10秒的位置602处(图6中播放进度条0:27的位置),而预设字符间距为3秒,此时向前查找到语句B的断句标记,并将该语句B的断句标记所在的位置603(图6中播放进度条0:17的位置)确定为目标播放点。
本示例实施例,通过比较音频文件的实时播放位置的播放时间与前一断句标记的播放时间之间的时间间隔来判断用户的重播意图,可以智能化地查找到最准确的目标播放点,用户不必重复触发返回操作即可准确定位至想要重听的音频位置处进行播放,进一步提高了操作的便捷性,从而增强了用户体验度。需要说明的是,上述场景只是一种示例性说明,并不对本示例实施方式的保护范畴起任何限定作用。
在本公开的一种示例性实施例中,所述方法还包括:
若在所述音频文件的所述当前播放位置处查找不到所述前一个断句标记,则从头播放所述待音频文件。
以包含A、B、C及D共四句话的一段音频文件为例,音频文件的首句话(即语句A)之前并未标记有断句标记,在检测到一次触发操作时,如果当前播放的是语句A,则从头开始播放语句A。
在本公开的一种示例性实施例中,所述断句符号为逗号、句号或分号。
断句符号可以是逗号、句号或分号等符号,也可以是其他能起到断句作用的符号,本示例实施例不作特殊限定。
本公开实施例的另一方面,提供了一种音频播放方法,该音频播放方法可以应用于上述终端设备101、102、103中的一个或多个,也可以应用于上述服务器105,还可以应用于终端设备101、102、103与服务器105中。如图4所示,该音频播放方法包括:
步骤S410:响应第二触发操作,检测待播放的音频文件的语速是否小于预设语速;
步骤S420:若是,将所述待播放的音频文件识别为包含有断句符号的文本文件;
步骤S430:根据所述音频文件与所述文本文件的对应关系,在所述音频文件中与所述断句符号对应的位置处生成断句标记;
步骤S440:若否,根据音频的停顿时长在所述音频文件中生成对应的断句标记;
步骤S450:响应第一触发操作,根据所述音频文件当前播放位置以及所述断句标记的位置确定一目标播放点;
步骤S460:从所述目标播放点处播放所述音频文件。
下面,将对本示例性实施例中音频播放方法的各步骤作进一步地说明。
在步骤S410中,响应第二触发操作,检测待播放的音频文件的语速是否小于预设语速。
第二触发操作可以是用户在终端设备上的触控操作(例如:在触控屏上点击控件、在显示区域内进行滑动等),也可是非触控操作(例如:鼠标点击控件、按压机械按钮等),还可以是根据预先设置的交互条件进行的触发操作(例如:摇晃、声音输入等)。在具体的应用场景中,如图3所示,第二触发操作为用户在音频应用的播放界面上对播放暂停键303的触发操作,以播放音频文件。
在检测到第二触发操作时,判断音频文件的播放语速是否小于预设语速。
在步骤S420-S430中,若是,将所述待播放的音频文件识别为包含有断句符号的文本文件;根据所述音频文件与所述文本文件的对应关系,在所述音频文件中与所述断句符号对应的位置处生成断句标记。
如果播放语速小于预设语速,则根据语音识别来确定音频文件的断句标记,根据所述音频文件与所述文本文件的对应关系,在所述音频文件中与所述断句符号对应的位置处生成断句标记。
音频文件与文本文件的对应关系,可以是语音识别过程中,音频内容与识别后的文本各字符之间的一一对应关系。字符可以是文字字符,也可以是数字字符。
断句标记可以是用来标识上述断句符号的特殊标记。举例而言,该断句标记可以是在断句符号处插入特定的特殊字符,也可以是在断句符号在上述音频文件的声音轨道中所在的位置处进行的打点标记,还可以是其他可以实现标识断句符号作用的特殊标记,本示例实施方式对此不做特殊限定。
在步骤S440中,若否,根据音频的停顿时长在所述音频文件中生成对应的断句标记;
如果播放语速大于预设语速,则根据音频的停顿时长来确定音频文件的断句标记。举例而言,在检测到音频的无声时长大于预设阈值时,在该无声音频段中插入特定的特殊字符,形成断句标记。
在步骤S450中,响应第一触发操作,根据所述音频文件当前播放位置以及所述断句标记的位置确定一目标播放点。
第一触发操作可以是用户在终端设备上的触控操作(例如:在触控屏上点击控件、在显示区域内进行滑动等),也可是非触控操作(例如:鼠标点击控件、按压机械按钮等),还可以是根据预先设置的交互条件进行的触发操作(例如:摇晃、声音输入等)。在具体的应用场景中,如图3所示,所述触发操作为用户在音频应用的播放界面上对返回键301的触发操作,以发出返回上一句的请求。
音频文件的当前播放位置可以是当前播放至音频文件的某一帧所处的位置,具体地,可以是在音频播放器的播放进度条上对应于该音频帧的实时播放位置。举例而言,一段音频文件包含A、B、C及D共四句话,此刻音频文件正播放到语句B的开头处,则语 句B的开头处即为上述当前位置。
目标播放点可以是上述音频文件中要重复播放的部分在该音频文件中的起点。举例而言,一段音频文件包含A、B、C及D共四句话,要从B开始重播该音频文件,则B的开始点即为目标播放点。
根据所述音频文件当前播放位置以及所述断句标记的位置确定一目标播放点,可以是将与音频文件当前播放位置之前相邻最近的一个断句标记作为目标播放点,也可以是将与音频文件当前播放位置之前相邻第二近的一个断句标记作为目标播放点,例如:播放的音频内容为“A句,B句。C句,D句。”,当前播放位置为C句开头,则把当前播放位置之前与B句末尾断句符号“。”,或A句末尾断句符号“,”确定为目标播放点。本示例实施方式对此不做特殊限定。
步骤S460,从所述目标播放点处播放所述音频文件。
当定位至目标播放点后,从目标播放点处开始播放音频文件,可以是将音频文件返回至该目标播放点对应于播放进度条的位置处进行播放。举例而言,目标播放点的音频帧对应于播放进度条上1分30秒的位置,则从1分30秒处开始播放。
本示例实施例中,根据不同的音频播放语速智能选择断句标记的生成方式,在播放语速较慢的情况下使用语音识别方式,根据音频文件与文本文件的对应关系,在音频文件中与断句符号对应的位置处生成断句标记,可以准确查找到音频文件中的断句位置。在播放语速较快的情况下使用识别音频停顿的方式生成断句标记,效率较快。本示例实施例的语音播放方法可智能适应各种复杂的播放语速环境,同时兼顾断句识别的准确性与效率性,提高了用户体验度。
在本公开的一种示例性实施例中,所述将待播放的音频文件识别为包含有断句符号的文本文件,包括:
将待播放的音频文件识别为文本文件,通过预设的句子模型将所述文本文件分割为多个以句为单位的子文本文件,并在所述子文本文件的末端标记断句符号,以生成包含有断句符号的文本文件。
使用断句模型将文件文件划分为多个子文本文件,每个子文本文件可视为一句话,在每个自文本文件的末端添加断句符号,以形成包含断句符号的文本文件。
在本公开的一种示例性实施例中,所述句子模型以所述句子模型以词汇的特征属性构建训练样本,通过CRF算法训练得到。建训练样本,通过CRF算法训练得到。
句子模型可以预先根据不同的领域的特点分别训练得到,如金融领域,通信领域,电力领域,日常生活领域等。词汇特征属性可以包括词汇的固有属性(如动词、名词、形容词、副词、介词、语气词等)、词汇的语句属性(如主语、谓语、宾语、定语、状语等)、以及词汇在不同领域中的语义属性。
CRF(Conditional random field,条件随机场)算法是基于概率判断的算法,根据词汇的特征属性构建训练样本,通过CRF算法进行训练得到的对应于特定领域的句子模 型,可以从文本内容中根据不同领域具有停顿信息的词的停顿规律计算出形成断句位置的概率,并以此进行断句。
可选的,根据所述断句模型确定文本文件的目标断句位置,在所述文本文件的断句位置的置信度大于预设置信度时,将所述断句位置确定为目标断句位置,根据所述目标断句位置将文本文件分割为以句为单位的子文本文件。
在将音频文件识别为文本文件后,断句模型将文本文件划分为各类字符以及词组,顺次从文本文件中读取各个字符、词组,例如:识别的内容为“我下班后去回家”,则依次读取“我”、“下班”、“后”、“回家”,当读取内容“我下班”,断句模型分析该文本的结尾处“下班”的断句位置的置信度为0.2,而预设置信度为0.8,则继续读取下一个字符或词组“后”,以此类推,当读取至文本“我下班后回家”,断句模型分析该文本的结尾处的断句位置的置信度为0.9,超过预设置信度0.8,则可确定“回家”末尾处为目标断句位置。
在本公开的一种示例性实施例中,所述所述音频文件与所述文本文件的对应关系,包括:
在将待播放的音频文件识别为文本文件的过程中,所述音频文件在时间轴上与识别的所述文本文件的字符建立的对应关系。
通过语音识别技术得到音频文件对应的文本文件后,对音频文件和文本文件进行分析,得出语音识别过程中音频文件在时间轴上与文本文件的各个字符建立的对应关系。比如,文本文件中的某个字符对应于播放进度条上某一秒的音频内容。
在本公开的一种示例性实施例中,所述响应一触发操作,根据所述音频文件当前播放位置以及所述断句标记的位置确定一目标播放点,包括:
响应一触发操作,查找所述音频文件当前播放位置相邻的前一个断句标记,将所述前一个断句标记在所述音频文件中的位置确定为目标播放点。
以包含A、B、C及D共四句话的一段音频文件为例,若当前音频文件播放至语句C,则当检测到一次触发操作时,在该音频文件中查找语句C之前的一个断句标记,也即语句B结尾处的断句标记所在的位置作为目标播放点。可以通过重复上述查找前一个断句标记的操作来定位上述目标播放点。假设音频文件当前所播放的为语句D,目标播放点为语句B,即要将音频文件从语句B的开头处开始播放,则需要进行三次触发操作,依次查找到语句C、语句B、语句A的断句标记,将查找到的语句A的断句标记所在的位置作为目标播放点。这样,每触发一次返回上一句的操作,即可准确定位至当前播放位置的上一句话的开头,用户只需简单操作即可查找到目标播放点进行播放,操作便捷。需要说明的是,上述场景只是一种示例性说明,并不对本示例实施方式的保护范畴起任何限定作用。
在本公开的一种示例性实施例中,所述断句符号包括第一断句符号和第二断句符号;
所述响应一触发操作,根据所述音频文件的当前播放位置以及所述断句标记的位置确定一目标播放点,包括:
响应一触发操作,根据所述音频文件与所述文本文件的对应关系,在所述文本文件中确定与所述当前播放位置对应的文本字符;
判断所述文本字符与所述断句符号之间的字符间隔是否大于预设字符间隔;
若是,则将所述第一断句符号对应的断句标记在时间轴上的播放位置确定为目标播放点;
若否,则将所述第二断句符号对应的断句标记在时间轴上的播放位置确定为目标播放点;
其中,所述第一断句符号为在所述文本文件中所述文本字符相邻的前一个断句符号,所述第二断句符号为在所述文本文件中所述第一断句符号相邻的前一个断句符号。
以包含A、B、C及D共四句话的一段音频文件为例,音频播放进度条当前播放至语句C的开头处,在检测到一次触发操作时,根据音频文件与文本文件的对应关系,确定当前播放直至语句C的哪一个文本字符,如果是处于预设的字符间距内,则在文本文件中依次查找语句B、语句A的断句符号,进而相对应地找到音频文件中语句B、语句A的断句标记,将该语句A的断句标记所在的位置(即语句B的开头处)作为目标播放点。如图5所示,如果当前播放至语句C开头的位置501处,对应于文本C开头第2个字符(附图5中文本内容“不要”的虚线所示处),而预设字符间距为3个字符,此时向前查找到文本A的断句符号,该文本A的断句符号对应于语句A的断句标记,因此将该语句A的断句标记所在的位置505确定为目标播放点。同理,如果不处于预设的字符间距内,则在文本文件中查找到文本B的断句符号,进而相对应地找到音频文件中语句B的断句标记,并将该语句B的断句标记所在的位置(即语句C的开头处)作为目标播放点。如图5所示,如果当前播放至语句C的位置502处,对应于文本C开头第个9字符(附图5中文本内容“电脑”的虚线所示处),而预设字符间距为3个字符,此时向前查找到文本B的断句符号,该文本B的断句符号对应于语句B的断句标记,因此将该语句B的断句标记所在的位置503确定为目标播放点。
本示例实施例,通过比较音频文件的实时播放位置对应的文本字符与前一断句符号的之间的字符间距来判断用户的重播意图,可以智能化地查找到最准确的目标播放点,用户不必重复触发返回操作即可准确定位至想要重听的音频位置处进行播放,进一步提高了操作的便捷性,从而增强了用户体验度。需要说明的是,上述场景只是一种示例性说明,并不对本示例实施方式的保护范畴起任何限定作用。
在本公开的一种示例性实施例中,所述断句标记包括第一断句标记和第二断句标记;
所述响应一触发操作,根据所述音频文件的当前播放位置以及所述断句标记的位置确定一目标播放点,包括:
响应一触发操作,判断所述当前播放位置对应的播放时间与所述第一断句标记对应的播放时间之间的时间间隔是否大于预设时间间隔;
若是,则将第一断句标记在时间轴上的播放位置确定为目标播放点;
若否,则将第二断句标记在时间轴上的播放位置确定为目标播放点;
其中,第一断句标记为在所述音频文件中所述当前播放位置相邻的前一个断句标记,第二断句标记为在所述音频文件中所述第一断句标记相邻的前一个断句标记。
以包含A、B、C及D共四句话的一段音频文件为例,音频播放进度条当前播放至语句C的开头处,在检测到一次触发操作时,判断音频播放进度条上当前的播放位置对应的播放时间与语句B的断句标记对应播放之间是否大于预设时间间隔,如果处于预设的时间间隔内,则依次查找到语句B、语句A的断句标记,并将该语句A的断句标记所在的位置(即语句B的开头处)作为目标播放点。如图6所示,若当前播放至语句C开头第2秒的位置601处(图6中播放进度条0:19的位置),而预设字符间距为3秒,此时向前查找到语句A的断句标记,并将该语句A的断句标记所在的位置605(图6中播放进度条0:06的位置)确定为目标播放点。同理,如果不处于预设的时间间隔内,则查找到语句B的断句标记,并将该语句B的断句标记所在的位置(即语句C的开头处)作为目标播放点。如图6所示,若当前播放至语句C开头第10秒的位置602处(图6中对应于播放进度条0:27的位置),而预设字符间距为3秒,此时向前查找到语句B的断句标记,并将该语句B的断句标记所在的位置603(图6中播放进度条0:17的位置)确定为目标播放点。
本示例实施例,通过比较音频文件的实时播放位置的播放时间与前一断句标记的播放时间之间的时间间隔来判断用户的重播意图,可以智能化地查找到最准确的目标播放点,用户不必重复触发返回操作即可准确定位至想要重听的音频位置处进行播放,进一步提高了操作的便捷性,从而增强了用户体验度。需要说明的是,上述场景只是一种示例性说明,并不对本示例实施方式的保护范畴起任何限定作用。
在本公开的一种示例性实施例中,所述方法还包括:
若在所述音频文件的所述当前播放位置处查找不到所述前一个断句标记,则从头播放所述待音频文件。
以包含A、B、C及D共四句话的一段音频文件为例,音频文件的首句话(即语句A)之前并未标记有断句标记,在检测到一次触发操作时,如果当前播放的是语句A,则从头开始播放语句A。
在本公开的一种示例性实施例中,所述断句符号为逗号、句号或分号。
断句符号可以是逗号、句号或分号等符号,也可以是其他能起到断句作用的符号,本示例实施例不作特殊限定。
在本公开的一种示例性实施例中,所述在所述音频文件中根据音频的停顿时长生成对应的断句标记,包括:
当检测到音频的停顿时长大于预设时长时,在该音频文件中生成对应的断句标记。
可以预先设置停顿时长阈值,在音频文件播放过程中,当检测到音频无声段的时长大于预设的停顿时长阈值时,在该无声音频段中插入特定的特殊字符,形成断句标记。
本公开实施例还提供了一种电子设备设备。图7为本公开实施例提供的电子设备的结构示意图。如图7所示,本实施例的电子设备700包括:处理器701以及存储器702;其中,存储器702,用于存储计算机执行指令;处理器701,被配置为执行存储器存储的计算机执行指令,以实现上述实施例中所执行的各个步骤。具体可以上述方法实施例中的相关描述。
本公开实施例还提供一种计算机可读存储介质,所述计算机可读存储介质中存储有计算机执行指令,当处理器执行所述计算机执行指令时,实现如上述的数据处理方法。
在本公开所提供的几个实施例中,应该理解到,所揭露的设备和方法,可以通过其它的方式实现。例如,以上所描述的设备实施例仅仅是示意性的,例如,所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式,例如多个模块可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或模块的间接耦合或通信连接,可以是电性,机械或其它的形式。另外,在本公开各个实施例中的各功能模块可以集成在一个处理单元中,也可以是各个模块单独物理存在,也可以两个或两个以上模块集成在一个单元中。上述模块成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。
上述以软件功能模块的形式实现的集成的模块,可以存储在一个计算机可读取存储介质中。上述软件功能模块存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或处理器(英文:processor)执行本公开各个实施例所述方法的部分步骤。应理解,上述处理器可以是中央处理单元(英文:Central Processing Unit,简称:CPU),还可以是其他通用处理器、数字信号处理器(英文:Digital Signal Processor,简称:DSP)、专用集成电路(英文:Application Specific Integrated Circuit,简称:ASIC)等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。结合发明所公开的方法的步骤可以直接体现为硬件处理器执行完成,或者用处理器中的硬件及软件模块组合执行完成。
存储器可能包含高速RAM存储器,也可能还包括非易失性存储NVM,例如至少一个磁盘存储器,还可以为U盘、移动硬盘、只读存储器、磁盘或光盘等。总线可以是工业标准体系结构(Industry Standard Architecture,ISA)总线、外部设备互连(Peripheral Component,PCI)总线或扩展工业标准体系结构(Extended Industry Standard Architecture,EISA)总线等。总线可以分为地址总线、数据总线、控制总线等。为便于表示,本公开附图中的总线并不限定仅有一根总线或一种类型的总线。上述存储介质可以是由任何类型的易失性或非易失性存储设备或者它们的组合实现,如静态随机存取存储器(SRAM),电可擦除可编程只读存储器(EEPROM),可擦除可编程只读存储器(EPROM),可编程只读存储器(PROM),只读存储器(ROM),磁存储器,快闪存储器,磁盘或光盘。存储介质可以是通用或专用计算机能够存取的任何可用介质。
一种示例性的存储介质耦合至处理器,从而使处理器能够从该存储介质读取信息,且可向该存储介质写入信息。当然,存储介质也可以是处理器的组成部分。处理器和存储介质可以位于专用集成电路(Application Specific Integrated Circuits,简称:ASIC)中。当然,处理器和存储介质也可以作为分立组件存在于电子设备或主控设备中。
本领域普通技术人员可以理解:实现上述各方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成。前述的程序可以存储于一计算机可读取存储介质中。该程序在执行时,执行包括上述各方法实施例的步骤;而前述的存储介质包括:ROM、RAM、磁碟或者光盘等各种可以存储程序代码的介质。
最后应说明的是:以上各实施例仅用以说明本公开的技术方案,而非对其限制;尽管参照前述各实施例对本公开进行了详细的说明,本领域的普通技术人员应当理解:其依然可以对前述各实施例所记载的技术方案进行修改,或者对其中部分或者全部技术特征进行等同替换;而这些修改或者替换,并不使相应技术方案的本质脱离本公开各实施例技术方案的范围。应当理解的是,本公开并不局限于上面已经描述并在附图中示出的精确结构,并且可以在不脱离其范围进行各种修改和改变。本公开的范围仅由所附的权利要求来限制。

Claims (21)

  1. 一种音频播放方法,将待播放的音频文件识别为包含有断句符号的文本文件;
    根据所述音频文件与所述文本文件的对应关系,在所述音频文件中与所述断句符号对应的位置处生成断句标记;
    响应一触发操作,根据所述音频文件当前播放位置以及所述断句标记的位置确定一目标播放点;
    从所述目标播放点处播放所述音频文件。
  2. 根据权利要求1所述的方法,其中,所述将待播放的音频文件识别为包含有断句符号的文本文件,包括:
    将待播放的音频文件识别为文本文件,通过预设的句子模型将所述文本文件分割为多个以句为单位的子文本文件,并在所述子文本文件的末端标记断句符号,以生成包含有断句符号的文本文件。
  3. 根据权利要求2所述的方法,其中,所述句子模型以词汇的特征属性构建训练样本,通过CRF算法训练得到。
  4. 根据权利要求1所述的方法,其中,所述所述音频文件与所述文本文件的对应关系,包括:
    在将待播放的音频文件识别为文本文件的过程中,所述音频文件在时间轴上与识别的所述文本文件的字符建立的对应关系。
  5. 根据权利要求1或4所述的方法,其中,所述响应一触发操作,根据所述音频文件当前播放位置以及所述断句标记的位置确定一目标播放点,包括:
    响应于一触发操作,查找所述音频文件当前播放位置相邻的前一个断句标记,将所述前一个断句标记在所述音频文件中的位置确定为目标播放点。
  6. 根据权利要求1或4所述的方法,其中,所述断句符号包括第一断句符号和第二断句符号;
    所述响应一触发操作,根据所述音频文件的当前播放位置以及所述断句标记的位置确定一目标播放点,包括:
    响应一触发操作,根据所述音频文件与所述文本文件的对应关系,在所述文本文件中确定与所述当前播放位置对应的文本字符;
    判断所述文本字符与所述断句符号之间的字符间隔是否大于预设字符间隔;
    若是,则将所述第一断句符号对应的断句标记在时间轴上的播放位置确定为目标播放点;
    若否,则将所述第二断句符号对应的断句标记在时间轴上的播放位置确定为目标播放点;
    其中,所述第一断句符号为在所述文本文件中所述文本字符相邻的前一个断句符号,所述第二断句符号为在所述文本文件中所述第一断句符号相邻的前一个断句符号。
  7. 根据权利要求1或4所述的方法,其中,所述断句标记包括第一断句标记和第二断句标记;
    所述响应一触发操作,根据所述音频文件的当前播放位置以及所述断句标记的位置确定一目标播放点,包括:
    响应一触发操作,判断所述当前播放位置对应的播放时间与所述第一断句标记对应的播放时间之间的时间间隔是否大于预设时间间隔;
    若是,则将第一断句标记在时间轴上的播放位置确定为目标播放点;
    若否,则将第二断句标记在时间轴上的播放位置确定为目标播放点;
    其中,第一断句标记为在所述音频文件中所述当前播放位置相邻的前一个断句标记,第二断句标记为在所述音频文件中所述第一断句标记相邻的前一个断句标记。
  8. 根据权利要求5-7任一项所述的方法,其中,所述方法还包括:
    若在所述音频文件的所述当前播放位置处查找不到所述前一个断句标记,则从头播放所述待音频文件。
  9. 根据权利要求1-8任一项所述的方法,其中,所述断句符号为逗号、句号或分号。
  10. 一种音频播放方法,其中,响应第二触发操作,检测待播放的音频文件的语速是否小于预设语速;
    若是,将所述待播放的音频文件识别为包含有断句符号的文本文件;
    根据所述音频文件与所述文本文件的对应关系,在所述音频文件中与所述断句符号对应的位置处生成断句标记;
    若否,根据音频的停顿时长在所述音频文件中生成对应的断句标记;
    响应第一触发操作,根据所述音频文件当前播放位置以及所述断句标记的位置确定一目标播放点;
    从所述目标播放点处播放所述音频文件。
  11. 根据权利要求10所述的方法,其中,所述将待播放的音频文件识别为包含有断句符号的文本文件,包括:
    将待播放的音频文件识别为文本文件,通过预设的句子模型将所述文本文件分割为多个以句为单位的子文本文件,并在所述子文本文件的末端标记断句符号,以生成包含有断句符号的文本文件。
  12. 根据权利要求11所述的方法,其中,所述句子模型以词汇的特征属性构建训练样本,通过CRF算法训练得到。
  13. 根据权利要求10所述的方法,其中,所述所述音频文件与所述文本文件的对应关系,包括:
    在将待播放的音频文件识别为文本文件的过程中,所述音频文件在时间轴上与识别的所述文本文件的字符建立的对应关系。
  14. 根据权利要求10或13所述的方法,其中,所述响应一触发操作,根据所述音频 文件当前播放位置以及所述断句标记的位置确定一目标播放点,包括:
    响应于一触发操作,查找所述音频文件当前播放位置相邻的前一个断句标记,将所述前一个断句标记在所述音频文件中的位置确定为目标播放点。
  15. 根据权利要求13所述的方法,其中,所述断句符号包括第一断句符号和第二断句符号;
    所述响应一触发操作,根据所述音频文件的当前播放位置以及所述断句标记的位置确定一目标播放点,包括:
    响应一触发操作,根据所述音频文件与所述文本文件的对应关系,在所述文本文件中确定与所述当前播放位置对应的文本字符;
    判断所述文本字符与所述断句符号之间的字符间隔是否大于预设字符间隔;
    若是,则将所述第一断句符号对应的断句标记在时间轴上的播放位置确定为目标播放点;
    若否,则将所述第二断句符号对应的断句标记在时间轴上的播放位置确定为目标播放点;
    其中,所述第一断句符号为在所述文本文件中所述文本字符相邻的前一个断句符号,所述第二断句符号为在所述文本文件中所述第一断句符号相邻的前一个断句符号。
  16. 根据权利要求10所述的方法,其中,所述断句标记包括第一断句标记和第二断句标记;
    所述响应一触发操作,根据所述音频文件的当前播放位置以及所述断句标记的位置确定一目标播放点,包括:
    响应一触发操作,判断所述当前播放位置对应的播放时间与所述第一断句标记对应的播放时间之间的时间间隔是否大于预设时间间隔;
    若是,则将第一断句标记在时间轴上的播放位置确定为目标播放点;
    若否,则将第二断句标记在时间轴上的播放位置确定为目标播放点;
    其中,第一断句标记为在所述音频文件中所述当前播放位置相邻的前一个断句标记,第二断句标记为在所述音频文件中所述第一断句标记相邻的前一个断句标记。
  17. 根据权利要求14-16任一项所述的方法,其中,所述方法还包括:
    若在所述音频文件的所述当前播放位置处查找不到所述前一个断句标记,则从头播放所述待音频文件。
  18. 根据权利要求10-17任一项所述的方法,其中,所述断句符号为逗号、句号或分号。
  19. 根据权利要求10所述的音频播放方法,其中,所述在所述音频文件中根据音频的停顿时长生成对应的断句标记,包括:
    当检测到音频的停顿时长大于预设时长时,在该音频文件中生成对应的断句标记。
  20. 一种电子设备,包括:
    处理器;以及
    存储器,被配置为执行存储所述处理器的可执行指令;
    其中,所述处理器配置为经由执行所述可执行指令来执行权利要求1-19任一项所述的音频播放方法。
  21. 一种计算机可读存储介质,其上存储有计算机程序,所述计算机程序被处理器执行时实现权利要求1-19任一项所述的音频播放方法。
PCT/CN2020/097534 2019-11-14 2020-06-22 音频播放方法、电子设备及存储介质 WO2021093333A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
US17/663,225 US20220269724A1 (en) 2019-11-14 2022-05-13 Audio playing method, electronic device, and storage medium

Applications Claiming Priority (4)

Application Number Priority Date Filing Date Title
CN201911112611 2019-11-14
CN201911112611.4 2019-11-14
CN202010042918.8 2020-01-15
CN202010042918.8A CN111128254B (zh) 2019-11-14 2020-01-15 音频播放方法、电子设备及存储介质

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US17/663,225 Continuation US20220269724A1 (en) 2019-11-14 2022-05-13 Audio playing method, electronic device, and storage medium

Publications (1)

Publication Number Publication Date
WO2021093333A1 true WO2021093333A1 (zh) 2021-05-20

Family

ID=70490711

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/097534 WO2021093333A1 (zh) 2019-11-14 2020-06-22 音频播放方法、电子设备及存储介质

Country Status (3)

Country Link
US (1) US20220269724A1 (zh)
CN (1) CN111128254B (zh)
WO (1) WO2021093333A1 (zh)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112686018A (zh) * 2020-12-23 2021-04-20 科大讯飞股份有限公司 一种文本分割方法、装置、设备及存储介质
CN112712825B (zh) * 2020-12-30 2022-09-23 维沃移动通信有限公司 音频处理方法、装置及电子设备
CN114267358B (zh) * 2021-12-17 2023-12-12 北京百度网讯科技有限公司 音频处理方法、装置、设备及存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102497391A (zh) * 2011-11-21 2012-06-13 宇龙计算机通信科技(深圳)有限公司 服务器、移动终端和提示方法
CN104965872A (zh) * 2015-06-11 2015-10-07 联想(北京)有限公司 一种信息处理方法和电子设备
CN108268452A (zh) * 2018-01-15 2018-07-10 东北大学 一种基于深度学习的专业领域机器同步翻译装置及方法
CN109246472A (zh) * 2018-08-01 2019-01-18 平安科技(深圳)有限公司 视频播放方法、装置、终端设备及存储介质
WO2019144926A1 (zh) * 2018-01-26 2019-08-01 上海智臻智能网络科技股份有限公司 智能交互方法、装置、计算机设备和计算机可读存储介质
WO2019153685A1 (zh) * 2018-02-07 2019-08-15 深圳壹账通智能科技有限公司 文本处理方法、装置、计算机设备和存储介质

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104038827B (zh) * 2014-06-06 2018-02-02 小米科技有限责任公司 多媒体播放方法及装置
CN106373598B (zh) * 2016-08-23 2018-11-13 珠海市魅族科技有限公司 音频重播的控制方法和装置
CN107679033B (zh) * 2017-09-11 2021-12-14 百度在线网络技术(北京)有限公司 文本断句位置识别方法和装置
CN108989897A (zh) * 2018-08-13 2018-12-11 封雷迅 一种按字幕逐句复读的视频播放方法、存储设备及终端

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102497391A (zh) * 2011-11-21 2012-06-13 宇龙计算机通信科技(深圳)有限公司 服务器、移动终端和提示方法
CN104965872A (zh) * 2015-06-11 2015-10-07 联想(北京)有限公司 一种信息处理方法和电子设备
CN108268452A (zh) * 2018-01-15 2018-07-10 东北大学 一种基于深度学习的专业领域机器同步翻译装置及方法
WO2019144926A1 (zh) * 2018-01-26 2019-08-01 上海智臻智能网络科技股份有限公司 智能交互方法、装置、计算机设备和计算机可读存储介质
WO2019153685A1 (zh) * 2018-02-07 2019-08-15 深圳壹账通智能科技有限公司 文本处理方法、装置、计算机设备和存储介质
CN109246472A (zh) * 2018-08-01 2019-01-18 平安科技(深圳)有限公司 视频播放方法、装置、终端设备及存储介质

Also Published As

Publication number Publication date
CN111128254A (zh) 2020-05-08
CN111128254B (zh) 2021-09-03
US20220269724A1 (en) 2022-08-25

Similar Documents

Publication Publication Date Title
EP3648099B1 (en) Voice recognition method, device, apparatus, and storage medium
WO2021093333A1 (zh) 音频播放方法、电子设备及存储介质
US10114809B2 (en) Method and apparatus for phonetically annotating text
JP6667504B2 (ja) オーファン発話検出システム及び方法
US11238854B2 (en) Facilitating creation and playback of user-recorded audio
CN108920649B (zh) 一种信息推荐方法、装置、设备和介质
CN110188356B (zh) 信息处理方法及装置
WO2018195783A1 (en) Input method editor
CN110287364B (zh) 语音搜索方法、系统、设备及计算机可读存储介质
CN112562684B (zh) 一种语音识别方法、装置和电子设备
WO2019179014A1 (zh) 语音消息搜索显示方法、装置、计算机设备及存储介质
US20220301547A1 (en) Method for processing audio signal, method for training model, device and medium
CN112669842A (zh) 人机对话控制方法、装置、计算机设备及存储介质
CN112262382A (zh) 上下文深层书签的注释和检索
CN107424612B (zh) 处理方法、装置和机器可读介质
WO2022228377A1 (zh) 录音方法、装置、电子设备和可读存储介质
WO2022206198A1 (zh) 一种音频和文本的同步方法、装置、设备以及介质
CN117253478A (zh) 一种语音交互方法和相关装置
WO2022166962A1 (zh) 纪要处理方法、装置、设备和存储介质
CN107797676B (zh) 一种单字输入方法及装置
CN110020429B (zh) 语义识别方法及设备
CN112712825B (zh) 音频处理方法、装置及电子设备
CN113901186A (zh) 电话录音标注方法、装置、设备及存储介质
JP4802689B2 (ja) 情報認識装置及び情報認識プログラム
US11322142B2 (en) Acoustic sensing-based text input method

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20887311

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20887311

Country of ref document: EP

Kind code of ref document: A1