WO2019223134A1 - 语音消息搜索方法、装置、计算机设备及存储介质 - Google Patents

语音消息搜索方法、装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2019223134A1
WO2019223134A1 PCT/CN2018/101062 CN2018101062W WO2019223134A1 WO 2019223134 A1 WO2019223134 A1 WO 2019223134A1 CN 2018101062 W CN2018101062 W CN 2018101062W WO 2019223134 A1 WO2019223134 A1 WO 2019223134A1
Authority
WO
WIPO (PCT)
Prior art keywords
voice message
message
text
segment
voice
Prior art date
Application number
PCT/CN2018/101062
Other languages
English (en)
French (fr)
Inventor
张雨嘉
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2019223134A1 publication Critical patent/WO2019223134A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/07User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail characterised by the inclusion of specific contents
    • H04L51/18Commands or executable codes
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/48Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use
    • G10L25/51Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination
    • G10L25/54Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00 specially adapted for particular use for comparison or discrimination for retrieval
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/07User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail characterised by the inclusion of specific contents
    • H04L51/10Multimedia information
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L51/00User-to-user messaging in packet-switching networks, transmitted according to store-and-forward or real-time protocols, e.g. e-mail
    • H04L51/21Monitoring or handling of messages
    • H04L51/216Handling conversation history, e.g. grouping of messages in sessions or threads

Definitions

  • the present application relates to the field of data processing technologies, and in particular, to a voice message search method, device, computer device, and storage medium.
  • the chat tool when used by the elderly, or children, etc., because they do not know how to use the chat tool, there is a long sent voice message. , Such as more than 1 minute, etc .; or when the user does not want to make multiple voices when talking about one thing, but wants to make clear the issues related to the thing through one voice, then the voice message may be very long. 60s and so on. In the prior art, if a voice message reaches 60s, it will be sent automatically, and no further recording can be performed, which will cause some users who want to record longer voice messages (more than 60s) to have a poor experience. In addition, when the receiving end receives a long voice message, it does not want to read such a long voice message, which affects the user experience.
  • the embodiments of the present application provide a voice message search method, device, computer equipment, and storage medium, which can search for a voice message and display the voice message search result.
  • an embodiment of the present application provides a voice message search method, which is applied to a terminal.
  • the method includes: segmenting the obtained complete voice message into a multi-segment fragmented voice message, and combining the multi-segment fragmented voice message and The text message corresponding to the complete voice message is sent to the target terminal; the complete voice message and the text message corresponding to the complete voice message are stored; if a first message search instruction is received, a search is performed from the saved text message and A text message matched by the first message search instruction is used as a first text message; and a voice message search result corresponding to the first text message is displayed as a first search result, wherein the first search result includes the first A complete voice message corresponding to a text message.
  • an embodiment of the present application provides a voice message search apparatus, and the apparatus includes a unit for executing the voice message search method described in the first aspect.
  • an embodiment of the present application provides a computer device, where the computer device includes a memory and a processor connected to the memory;
  • the memory is configured to store a computer program
  • the processor is configured to run the computer program stored in the memory to perform the voice message search method according to the first aspect.
  • an embodiment of the present application provides a computer-readable storage medium.
  • the computer-readable storage medium stores a computer program, where the computer program includes program instructions, and the program instructions are implemented by a processor to implement the foregoing.
  • the voice message search method according to the first aspect.
  • a voice message matching the message search instruction is obtained and a voice message search result is displayed, which facilitates the user to view the voice message matching the message search instruction, improves the efficiency of querying the voice message, and User experience; by segmenting the voice message when sending, and sending the segmented fragmented voice message to the target terminal, to avoid that the user corresponding to the target terminal does not want to read too long voice messages, further improving the user experience .
  • FIG. 1 is a schematic flowchart of a voice message search method according to an embodiment of the present application
  • FIG. 2 is a schematic diagram of a sub-flow of a voice message search method according to an embodiment of the present application
  • FIG. 3 is a schematic diagram of a sub-process of a voice message search method according to another embodiment of the present application.
  • FIG. 4 is another schematic flowchart of a voice message search method according to an embodiment of the present application.
  • FIG. 5 is a diagram illustrating an example of displaying a second voice message search result according to an embodiment of the present application.
  • FIG. 6 is a schematic block diagram of a voice message search apparatus according to an embodiment of the present application.
  • FIG. 7 is a schematic block diagram of a segment sending unit according to an embodiment of the present application.
  • FIG. 8 is a schematic block diagram of a segment sending unit according to another embodiment of the present application.
  • FIG. 9 is a schematic block diagram of a voice message search apparatus according to another embodiment of the present application.
  • FIG. 10 is a schematic block diagram of a computer device according to an embodiment of the present application.
  • first, second, etc. may be used herein to describe various elements, these elements should not be limited by these terms. These terms are only used to distinguish these elements from each other.
  • first acquisition unit may be referred to as a second acquisition unit, and similarly, the second acquisition unit may be referred to as a first acquisition unit.
  • the first acquisition unit and the second acquisition unit are both acquisition units, but they are not the same acquisition unit.
  • the terminals described below include mobile phones, laptop computers, tablet computers, desktop computers, and other devices. It should be noted that instant messaging tools such as WeChat and QQ can be installed in the terminal.
  • the terminal can send and receive voice messages.
  • FIG. 1 is a schematic flowchart of a voice message search method according to an embodiment of the present application.
  • the method is applied to a terminal.
  • the terminal sends a voice message to a target terminal.
  • the method includes the following steps S101-S104.
  • S101 Segment the obtained complete voice message into a multi-segment fragmented voice message, and send the multi-segment fragmented voice message and a text message corresponding to the complete voice message to a target terminal.
  • Segmenting the obtained complete voice message can be understood as segmenting the obtained voice message while recording, and when the recording is complete, the obtained voice message is regarded as a complete voice message; also It can be understood that after the recording is completed, the obtained complete voice message is segmented.
  • the voice message formed after segmentation is called a fragment voice message.
  • a complete voice message includes multiple segments of a voice message, that is, all segments of a voice message.
  • Sending all the fragmented voice messages to the target terminal can be understood as sending all the fragmented voice messages to the target terminal together, or it can be understood that after forming a fragmented voice message, the fragmented voice message is sent to the target terminal until All fragmented voice messages are sent to the target terminal.
  • the information that needs to be sent to the target terminal also includes a text message corresponding to the complete voice message.
  • the target terminal may be understood as an end that receives all segment voice messages.
  • the text message corresponding to the complete voice message may be obtained by converting the obtained voice message into text while recording, and after the recording is completed and the conversion is completed, all the obtained text is corresponding to the complete voice message. Text message; or after the recording is completed, the recorded complete voice message is converted into a corresponding text message. Among them, a complete voice message is converted into a corresponding text message by a voice recognition algorithm.
  • the first message search instruction includes a first keyword. Searching for a text message that matches the first message search instruction from the saved text messages according to the first message search instruction includes: searching for the first message from the saved text message according to the first keyword in the first message search instruction. Keyword matching text message. Use the searched text message as the first text message.
  • a first keyword may be input in a search query item on a homepage of an instant communication tool such as WeChat, and a search button is clicked or the input is detected to generate a first message search instruction; a specific communication object such as a chat object may also be opened, In the corresponding interface of the specific communication object, find relevant buttons such as "Find chat history", click this button, enter the first keyword, click the search button, or detect that the input is complete, then generate a first message search instruction, where the communication
  • the object can be a single contact or a group.
  • the input method of the first keyword includes a text form and a voice form. The first keyword input in the voice form needs to be converted into the first keyword in the corresponding text form according to the voice recognition algorithm.
  • Search for a text message that matches the first keyword according to the first keyword in the first message search instruction for example, if the keyword is "zoo", search in the saved text message, and if the search includes information related to "zoo"
  • the text message is considered to be a text message that matches the first keyword, and the text message is taken as the first text message.
  • the search includes various ways of searching, such as fuzzy search, precise search, and the like.
  • the first message search instruction may further include target time period information selected in the two time periods, that is, the first message search instruction may further include time information; in some embodiments, the first message The search instruction may further include target contact information selected in an interface regarding at least two contacts, that is, the first message search instruction may further include target contact information.
  • the display is performed according to the first preset format.
  • the first preset format includes: a complete voice message, and a text content corresponding to a preset number of words before and after a first keyword in the complete voice message.
  • the first preset format may further include: the sender information corresponding to the complete voice message, and the completion time of sending the complete voice message.
  • the first keyword can be highlighted, such as distinguishing colors or bolding
  • the sender information includes the sender's nickname and / or sender's avatar, etc.
  • the complete voice information includes the complete voice and / or the duration of the complete voice message;
  • the number of words includes the number of words of the keyword, and the preset number of times may be set to a specific number of words, such as 16 words, or may be set to another number of words according to other rules. If the total number of words of the text message corresponding to the voice message exceeds the preset number of words, other texts than the preset number of words may be replaced by ellipsis. If the key word is: eat, the preset number of words is 16, then the text message can be displayed as: ... where do you eat, send a positioning to ...
  • a voice message matching the first message search instruction is obtained and a voice message search result is displayed, which facilitates the user to view the voice message matching the first message search instruction, and improves the query of the voice message.
  • the efficiency improves the user experience; by segmenting the voice message at the sender and sending the segmented multi-segment voice message to the target terminal, the user of the target terminal is prevented from wanting to read too long a voice message, which further improves User experience.
  • step S101 includes steps S201-S203.
  • a segmentation point of the acquired voice message is located according to a preset segmentation condition.
  • Locating the segmentation point of the acquired voice message according to a preset segmentation condition includes: locating the segmentation point of the acquired voice message according to the time of the voice message, or locating the segmentation point based on the time of the voice message and the position of the speaking pause. The segmentation point of the obtained voice message.
  • the segmentation point of the acquired voice message according to the time of the voice message includes: locating the segmentation point of the acquired voice message according to a first preset time. For example, when the first preset time is 60s, when the time when the voice message is detected reaches the first preset time 60s, 60s is used as a segmentation point; when the time when the voice message is detected reaches 120s, 120s is used as a segmentation point. Understandably, the voice message is segmented at intervals of the first preset time, such as the voice message is segmented at intervals of 60s. This method of locating segmentation points is simple and can improve the efficiency of segmentation.
  • Locating the segmentation point of the acquired voice message according to the time of the voice message and the pause position of the voice message including: judging whether the time of the voice message reaches the preset minimum segment time; if the preset minimum segment time is reached and the pre- Set the maximum segmentation time to detect the speaking pause position in the voice message; if the speaking pause position is detected, locate the segmentation point of the acquired voice message according to the speaking pause position; if the speaking pause position is not detected and the time of the voice message The preset maximum segmentation time is reached, and the segmentation point of the acquired voice message is located according to the maximum segmentation time.
  • the detected pause position is used as the segmentation point. If the pause position of the speech message is not detected and the time of the voice message The preset maximum segmentation time is reached, and the maximum segmentation time is used as the segmentation point.
  • the preset minimum segmentation time may be 30s, etc., and the preset maximum segmentation time may be 60s, etc. You can detect the speaking pause position according to the sound wave change corresponding to the voice message.
  • Locating the segmentation point can be understood as finding and saving the location of the segmentation point, such as the time of finding and saving the voice message corresponding to the segmentation point. This method of locating segmented points takes into account the pause time of speech and the time of voice messages, and uses the user experience as a starting point to improve the user experience.
  • the first preset time, the preset minimum segmentation time, and the preset maximum segmentation time may be preset by the system; or may be set according to the user's habits, that is, the settings of the user are received.
  • the appropriate corresponding duration is used as the new corresponding duration.
  • the obtained voice message is taken as a complete voice message, and the complete voice message is converted into a text message by a voice recognition algorithm.
  • the end of the recording is detected; if the click or release of the button related to the "long voice function" is detected, the end of the recording is detected. If the end of recording is detected, the recorded complete voice message is converted into a text message by a voice recognition algorithm.
  • the fragmented voice message is marked with a serial number according to the order of sending. If divided into 3 segments, the first segmented voice message is marked as 01, the second segmented voice message is marked as 02, and the third segmented voice message is marked as 03. Other marks can also be used for identification.
  • This embodiment locates the segmentation point during the recording process. After the recording is finished, the complete voice message is converted into a corresponding text message, and then the complete voice message is divided into multiple segments according to the segmentation point, and the segmented voice formed after segmentation is Messages and text messages are sent to the target terminal. This embodiment is used to segment the long voice message and send it to the target terminal.
  • step S101 includes steps S301-S306.
  • the start of recording is detected; or in the instant communication tool, a button related to the "long voice function" can be added. Voice function "button, the start of recording is detected.
  • Detecting whether the currently generated voice message satisfies a preset segmentation condition includes: detecting the time of the currently generated voice message, and determining whether the voice message meets the preset segmentation condition according to the time of the voice message; or detecting The time of the voice message and the position of the pause in the voice message are determined according to the time of the voice message and the position of the pause in the voice message to determine whether the voice message satisfies a preset segmentation condition.
  • detecting the time of the currently generated voice message and determining whether the voice message meets a preset segmentation condition according to the time of the voice message includes detecting whether the time of the currently generated voice message reaches a second The preset time, if the second preset time is reached, it is determined that the voice message satisfies a preset segmentation condition. If the second preset time is 60s, it is determined that the voice message satisfies a preset segmentation condition from the start of recording until the voice message reaches 60s. Send the voice message as a fragment voice message to the target terminal. Understandably, the unsent voice message reaching the second preset time is sent as a fragment voice message.
  • the corresponding voice message of the first 60s is not considered. It will start from 61s, and if it reaches the second preset time, it will be 61s Voice messages up to 120s are treated as one fragment voice message. Understandably, the recorded voice message is segmented every second preset time interval, such as segmenting the voice message every 60s.
  • the method for determining that a voice message meets a preset segmentation condition is simple and can improve the efficiency of voice message segmentation.
  • detecting the time of the currently generated voice message and the speaking pause position in the voice message, and determining whether the voice message satisfies a preset segmentation condition according to the time of the voice message and the speaking pause position includes: Determine whether the time of the currently generated voice message reaches the preset minimum segment time; if the time of the voice message reaches the preset minimum segment time and does not reach the preset maximum segment time, detect the pause position in the voice message ; If a pause position is detected, it is determined that the voice message meets the preset segmentation condition; If a pause position is not detected and the time of the voice message reaches the preset maximum segmentation time, it is determined that the voice message meets the preset segmentation condition .
  • the voice message is segmented according to the detected speech pause position. If the speech pause position is not detected and The time of the voice message reaches a preset maximum segmentation time, and the voice message is segmented according to the preset maximum segmentation time. The segmented voice message is sent as a segmented voice message, and the segmented voice message that has been sent is not considered when determining whether the preset segmentation conditions are met.
  • the pause position of speech can be detected according to a sound wave change corresponding to the voice message.
  • the second preset time, the preset minimum segmentation time, and the preset maximum segmentation time can be modified. For the manner of modification, please refer to the description of the corresponding section above.
  • step S302 If the currently generated voice message meets a preset segmentation condition, send the currently generated voice message as a fragment voice message to the target terminal. If the currently generated voice message does not satisfy the preset segmentation condition, step S303 is performed.
  • the fragment voice message when it is sent, it is marked for marking. Understandably, after a long speech segment, multiple fragmented voice messages may be formed. For the convenience of receiving, when a voice message is sent in segments, multiple fragmented voice messages that are sent are marked with a serial number according to the order in which they are sent. If divided into 3 segments, the first segmented voice message is marked as 01, the second segmented voice message is marked as 02, and the third segmented voice message is marked as 03. Other marks can also be used.
  • step S304 if the recording is not over, the generated voice message in the next paragraph is used as the currently generated voice message, and then step S301 is triggered to be executed.
  • the voice message is segmented, if the recording is not over, detecting whether the voice message meets a preset segmentation condition is directed to the voice message after the previous segmentation point. Understandably, the currently generated voice message after the last segmentation point is taken as the object to be segmented, that is, the next segmented voice message is taken as the currently generated voice message to detect whether the preset segmentation conditions are met. .
  • the end of the recording is detected; if the click or release of the button related to the "long voice function" is detected, the end of the recording is detected. If the end of recording is detected, the currently generated and unsent voice message is used as the fragment voice message, and the fragment voice message and the text message corresponding to the complete voice message are sent to the target terminal. Understandably, if the end of the recording is detected, the voice message currently generated and not yet sent in this recording is sent as a fragment voice message to the target terminal, and the text message corresponding to the complete voice message corresponding to this recording is sent. Go to the target terminal.
  • the text message corresponding to the complete voice message is obtained by real-time conversion of the voice message obtained after the recording starts. Understandably, if the recording start is detected, the obtained voice message is converted into a text message. Specifically, if a recording start is detected, a voice-to-text interface is started, and the interface is used to call a voice recognition algorithm to convert the recorded voice message into text while recording. Understandably, during the recording process, the text was turned while recording. Correspondingly, the acquired voice message is a voice message formed while recording.
  • the text is rotated and segmented simultaneously, and the segmented multi-segment voice message is sent to the target terminal.
  • the text message corresponding to the voice message is sent to the target. terminal.
  • the text is turned while being segmented and sent at the same time, which can improve the efficiency of sending voice messages.
  • the corresponding method before the fragmented voice message is sent to the target terminal, the corresponding method further includes: compressing the fragmented voice message; and sending the fragmented voice message to the target terminal includes: sending the compressed fragmented voice message Go to the target terminal.
  • a compression tool may be used for compression, such as an audio compression tool speex, and a specific compression ratio may be set to 1:15. The compression ratio of 1:15 is selected because in this ratio, the decompressed fragmented voice message does not affect the user experience, and does not affect the effect of converting the decompressed voice message into text.
  • the terminal compresses the fragmented voice message before sending it, improving the transmission rate and saving network bandwidth.
  • the method before displaying the voice message search result corresponding to the first text message as the first search result, the method further includes: detecting whether there are multiple first text messages; if There are multiple first text messages, and the voice message search results corresponding to the multiple text messages are sorted according to a preset rule. Displaying the search result of the voice message corresponding to the first text message as the first search result includes displaying the search result of the voice message corresponding to the sorted first text message as the first search result.
  • the preset rules include the order of the time before and after the voice message is sent, and / or the order of matching between the text message corresponding to the voice message and the keyword, or the corresponding time of the voice message according to the forgetting curve of the person. Sorting of the possibility of forgetting.
  • displaying the voice message search result corresponding to the first text message as the first search result includes: The search result of the voice message corresponding to the text message corresponding to the voice message is displayed according to the first preset format, and the corresponding plain text message is displayed according to another preset format.
  • another preset format includes: sender information corresponding to plain text information, plain text information, and time when the plain text message is sent.
  • FIG. 4 is a schematic flowchart of a voice message search method according to an embodiment of the present application.
  • the method is applied to a terminal.
  • the terminal receives a voice message sent by a target terminal.
  • the target terminal in this embodiment may be the same target terminal as the target terminal shown in the embodiments of FIG. 1 to FIG. 3, or may be different target terminals.
  • This method includes the following steps S401-S404 in addition to the method described in the embodiment of FIGS. 1-3.
  • S401 Receive a multi-segment fragmented voice message and a text message corresponding to a complete voice message sent by a target terminal.
  • the terminal receives the segmented multi-segment fragmented voice message and the text message corresponding to the complete voice message. Because there are multiple fragmented voice messages for a complete voice message segmentation, when the network is unstable, the segmented fragmented voice messages may not arrive in order. Among them, whether the received fragmented voice message arrives in sequence can be determined by the identification of the fragmented voice message, such as a serial number identifier.
  • the fragmented voice message After receiving the fragmented voice message, determine whether the fragmented voice message arrives in order; if it does not arrive in sequence, put the received fragmented voice message in the cache; if it arrives in order, the fragmented voice that arrives in order Messages are displayed in the terminal for users to browse and read; if the fragmented voice messages that have not arrived in sequence arrive, the fragmented voice messages that have not arrived in sequence and the fragmented voice messages in the cache are displayed in the terminal for the user according to the serial number identifier Browse and read. Understandably, it can be received at will when receiving, but when the terminal is displaying, the received fragmented voice messages are displayed in the order corresponding to the serial number identifier.
  • the subsequent fragmented voice messages are displayed first, and the previous fragmented voice messages are displayed later. If the user reads the next part of the speech first, they will feel that it is not.
  • the terminal receives multi-segment voice messages, which can solve the problem that users are not willing to read long voices. For example, after a user reads a voice message, he or she is not very clear about one of the voice messages. The user only wants to listen to the corresponding voice segment again, and does not want to start from scratch every time he listens to the voice. In this case, if you start from scratch every time, it will affect the user experience. Obviously, receiving multiple segments of voice messages on the terminal can improve the user experience.
  • the text message corresponding to the complete voice message will correspond to multiple segment voice messages.
  • the second message search instruction includes a second keyword. Searching for a text message that matches the second message search instruction from the saved text messages according to the second message search instruction includes: searching for the second message from the saved text message according to the second keyword in the second message search instruction. Keyword matching text message. Use the searched text message as the second text message.
  • S404 Display the search result of the voice message corresponding to the second text message as the second search result, where the second search result includes all the segment voice messages that have a corresponding relationship with the second text message.
  • the display is performed according to the second preset format.
  • the second preset format includes all segment voice messages and text messages corresponding to complete voice messages that have a corresponding relationship with the second text message.
  • the text message corresponding to the complete voice message is displayed, so as to conveniently locate the fragment voice message where the second keyword is located.
  • the second preset format may further include: the sender information corresponding to the fragmented voice message, and the reception and display time of the fragmented voice message.
  • the text message corresponding to the complete voice message has a second keyword, and the second keyword can be highlighted, such as distinguishing colors or bolding.
  • the sender information includes the sender's nickname and / or the sender's avatar.
  • the information includes the corresponding fragmented voice and / or the duration of the fragmented voice message.
  • FIG. 5 is a diagram showing an example of displaying a second search result.
  • a voice message search result matching the second keyword is displayed on the screen 11 of the terminal 10.
  • the second keyword 110 is "zoo"
  • the sender information includes the sender image 120 and the sender nickname 130.
  • the sender whose nickname is "xyzxyz”, sent two segments of the voice message, and the complete voice message includes the two segments of the voice message.
  • the two fragmented voice messages include a fragmented voice message 160 and a fragmented voice message duration 150.
  • the text content 140 corresponding to the complete voice message, where it can be seen that the keyword "zoo" is displayed in bold.
  • a text message matching the second keyword may be displayed after all the segment voice messages.
  • the time 170 when the voice message is received and displayed is displayed as: 2018-01-01. In other embodiments, the time at which the voice message is sent can also be specific to seconds.
  • the corresponding method after receiving the fragmented voice message, the corresponding method further includes: detecting whether the received fragmented voice message is a compressed fragmented voice message; if it is a compressed fragmented voice message, the compressed fragmented message is The voice message is decompressed so that the terminal can read a better-quality fragmented voice message to improve the user experience.
  • the method before displaying the voice message search result corresponding to the second text message as the second search result, the method further includes: detecting whether there are multiple second text messages; if there are multiple, Sort the voice message search results corresponding to multiple text messages according to a preset rule.
  • the displaying the voice message search result corresponding to the second text message as the second search result includes displaying the voice message search result corresponding to the sorted second text message as the second search result.
  • the received multi-segment voice messages are sorted as one voice message, and the time when the first segment voice message is received is used as the time corresponding to the multi-segment voice messages.
  • the preset rules include sorting in accordance with the time sequence of the time when the voice message is received, and / or sorting according to the matching degree between the text message corresponding to the voice message and the keyword, or according to the forgetting curve of the person corresponding to the time when the voice message is sent. Sorting of the possibility of forgetting.
  • displaying the voice message search result corresponding to the second text message as the second search result includes: displaying the complete voice message
  • the search result of the voice message corresponding to the corresponding text message is displayed according to the second preset format
  • the corresponding plain text message is displayed according to another preset format.
  • another preset format includes: sender information corresponding to plain text information, plain text information, and time when the plain text message is sent.
  • FIG. 6 is a schematic block diagram of a voice message search apparatus according to an embodiment of the present application.
  • the device is configured in a terminal.
  • the device 60 includes a segment sending unit 601, a first saving unit 602, a first search unit 603, and a first display unit 604.
  • the segment sending unit 601 is configured to segment the obtained complete voice message into a multi-segment fragment voice message, and send the multi-segment fragment voice message and a text message corresponding to the complete voice message to a target terminal.
  • the first saving unit 602 is configured to save the complete voice message and a text message corresponding to the complete voice message. In the terminal, the complete voice message recorded this time and the text message corresponding to the complete voice message are still stored.
  • the first searching unit 603 is configured to, if a first message search instruction is received, search for a text message matching the first message search instruction from the saved text messages as the first text message.
  • a first display unit 604 configured to display a voice message search result corresponding to the first text message as a first search result, where the first search result includes a complete voice message corresponding to the first text message .
  • the segment sending unit 601 includes a positioning unit 701, a first conversion unit 702, and a message segment sending unit 703.
  • the positioning unit 701 is configured to locate a segmentation point of the acquired voice message according to a preset segmentation condition if a recording start is detected.
  • the acquired voice message is a voice message formed while recording.
  • the positioning unit is configured to locate the segmentation point of the acquired voice message according to the time of the voice message, or to locate the segmentation point of the acquired voice message according to the time of the voice message and the speaking pause position.
  • the positioning unit includes a time judgment unit, a pause detection unit, and a location determination unit.
  • the time judging unit is configured to judge whether the time of the voice message reaches a preset minimum segment time.
  • the pause detection unit is configured to detect a pause position in a voice message if a preset minimum segment time is reached and a preset maximum segment time is not reached.
  • a positioning determination unit is configured to locate a segmentation point of the acquired voice message according to the speech pause position if a speech pause position is detected.
  • the positioning determining unit is further configured to locate the segmentation point of the acquired voice message according to the maximum segmentation time if the speech pause position is not detected and the time of the voice message reaches a preset maximum segmentation time.
  • the first converting unit 702 is configured to: if the end of recording is detected, use the acquired voice message as a complete voice message, and convert the complete voice message into a text message through a voice recognition algorithm.
  • the message segment sending unit 703 is configured to divide the complete voice message into multiple segments to form a multi-segment segment voice message according to the segmentation point, and send the multi-segment segment voice message and the text message to a target terminal.
  • the segment sending unit 601 includes a segment detecting unit 801, a message sending unit 802, an end detection unit 803, and a current voice determination unit 804.
  • the segment detection unit 801 is configured to detect whether the currently generated voice message satisfies a preset segment condition if a recording start is detected.
  • the segment detection unit 801 is configured to detect a time of a currently generated voice message, and determine whether the voice message meets a preset segmentation condition according to the time of the voice message; or to detect the time of the voice message and the The speaking pause position in the voice message is determined according to the time and the speaking pause position of the voice message to determine whether the voice message meets a preset segmentation condition.
  • the segment detection unit 801 if the segment detection unit 801 is configured to detect a time of a currently generated voice message, it is determined whether the voice message meets a preset segmentation condition according to the time of the voice message.
  • the segment detection unit includes Time detection unit and condition determination unit.
  • the time detecting unit is configured to detect whether the time of the currently generated voice message reaches a second preset time.
  • the condition determining unit is configured to determine, if the time of the currently generated voice message reaches a second preset time, that the voice message meets a preset segmentation condition.
  • the segment detection unit 801 is configured to detect the time of the currently generated voice message and the position of the pause in the voice message, determine whether the voice message satisfies the time and the position of the pause in the voice message.
  • the segmentation condition is preset.
  • the segmentation detection unit includes a time determination unit, a pause detection unit, and a condition determination unit.
  • the time judging unit is configured to judge whether the time of the currently generated voice message reaches a preset minimum segment time.
  • a pause detection unit is configured to detect a pause position in the voice message if the time of the voice message reaches a preset minimum segment time and does not reach a preset maximum segment time.
  • the condition determining unit is configured to determine that the voice message satisfies a preset segmentation condition if a speech pause position is detected; and is further configured to determine the speech message position that meets a preset maximum segmentation time if no speech pause position is detected The voice message meets the preset segmentation conditions.
  • the message sending unit 802 is configured to send the currently generated voice message as a fragment voice message to the target terminal if the currently generated voice message meets a preset segmentation condition. If the currently generated voice message does not satisfy the preset segmentation condition, the end detection unit 803 is triggered. Among them, when the fragment voice message is sent, it is marked for marking. Understandably, after a long speech segment, multiple fragmented voice messages may be formed. For the convenience of receiving, when a voice message is sent in segments, multiple fragmented voice messages that are sent are marked with a serial number according to the order in which they are sent.
  • the end detection unit 803 is configured to detect whether the recording ends. Understandably, after the voice message is segmented, if the recording is not finished, detecting whether the voice message meets the preset segmentation conditions is for the voice message after the previous segmentation point, that is, the current message after the previous segmentation point.
  • the generated and unsent voice message is used as an object to be segmented to detect whether a preset segmentation condition is satisfied.
  • the current voice determining unit 804 is configured to use the voice message generated in the next segment as the currently generated voice message if the recording is not finished, and then trigger the segment detection unit.
  • the message sending unit 802 is further configured to: if the end of recording is detected, use the currently generated and unsent voice message as the fragment voice message, and send the text message corresponding to the complete voice message and the fragment voice message of the last paragraph to the target terminal, where Wherein, the text message corresponding to the complete voice message is obtained by real-time conversion of the voice message obtained after the recording starts.
  • the corresponding segmented sending unit before the fragmented voice message is sent to the target terminal, further includes a compression unit.
  • the compression unit is used to compress the fragmented voice message; the message sending unit is used to send the compressed fragmented voice message to the target terminal.
  • the terminal compresses the fragmented voice message before sending it, improving the transmission rate and saving network bandwidth.
  • the corresponding device further includes: a first message detection unit and a first sorting unit.
  • the first message detection unit is configured to detect whether there are multiple first text messages.
  • the first sorting unit is configured to sort the voice message search results corresponding to the multiple text messages according to a preset rule if there are multiple first text messages.
  • the first display unit is further configured to display the sorted voice message search result corresponding to the first text message as the first search result.
  • the first display unit is further configured to use the search result of the voice message corresponding to the text message corresponding to the complete voice message according to the first A preset format is used for displaying, and the corresponding plain text message is displayed according to another preset format.
  • FIG. 9 is a schematic block diagram of a voice message search apparatus according to an embodiment of the present application.
  • the device is configured in a terminal.
  • the device 90 includes a receiving unit 901, a second saving unit 902, a second search unit 903, and a second display unit 904 in addition to the units included in the embodiments of FIGS. 6 to 8.
  • the receiving unit 901 is configured to receive a multi-segment fragmented voice message and a text message corresponding to a complete voice message sent by a target terminal.
  • the second saving unit 902 is configured to save a correspondence between a plurality of pieces of fragmented voice messages and a text message corresponding to the complete voice message.
  • the second search unit 903 is configured to search a text message that matches the second message search instruction from the saved text messages as the second text message if the second message search instruction is received.
  • the second display unit 904 is configured to display a voice message search result corresponding to the second text message as the second search result, where the second search result includes all the segment voice messages that have a corresponding relationship with the second text message.
  • the corresponding device further includes a compression detection unit and a decompression unit.
  • the compression detection unit is configured to detect whether the received fragmented voice message is a compressed fragmented voice message.
  • the decompression unit is used for decompressing the compressed fragmented voice message if it is a compressed fragmented voice message, so that the terminal can read the better-quality fragmented voice message and improve the user experience.
  • the corresponding device further includes a second message detection unit and a second sorting unit.
  • the second message detection unit is configured to detect whether there are multiple second text messages.
  • the second sorting unit is configured to sort the voice message search results corresponding to the multiple text messages according to a preset rule if there are multiple second text messages.
  • the second display unit is further configured to display the voice message search result corresponding to the sorted second text message as the first search result.
  • the second display unit is further configured to use the search result of the voice message corresponding to the text message corresponding to the complete voice message according to the first Two preset formats are displayed, and the corresponding plain text message is displayed according to another preset format.
  • the above apparatus may be implemented in the form of a computer program, and the computer program may be run on a computer device as shown in FIG. 10.
  • FIG. 10 is a schematic block diagram of a computer device according to an embodiment of the present application.
  • the device 100 includes a processor 102, a memory, and a network interface 103 connected through a system bus 101.
  • the memory may include a non-volatile storage medium 104 and an internal memory 105.
  • the non-volatile storage medium 104 can store an operating system 1041 and a computer program 1042. When the computer program 1042 is executed, it can cause the processor 102 to execute a voice message search method.
  • the processor 102 is used to provide computing and control capabilities to support the operation of the entire device 100.
  • the internal memory 105 provides an environment for running a computer program in a non-volatile storage medium. When the computer program is executed by the processor 102, the processor 102 can cause the processor 102 to execute a voice message search method.
  • the network interface 103 is used for network communication, such as receiving a message search instruction.
  • FIG. 10 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation on the device 100 to which the solution of the present application is applied.
  • the specific device 100 may Include more or fewer parts than shown in the figure, or combine certain parts, or have a different arrangement of parts.
  • the processor 102 is configured to run a computer program stored in a memory to implement any embodiment of the foregoing method for searching for a voice message.
  • the processor 102 may be a central processing unit (CPU), and the processor may also be another general-purpose processor or a digital signal processor (DSP). , Application Specific Integrated Circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components, etc.
  • a general-purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
  • a computer-readable storage medium stores a computer program, where the computer program includes program instructions, and the program instructions are executed by a processor to Implement any embodiment of the foregoing voice message search method.
  • the computer-readable storage medium may be an internal storage unit of the terminal according to any of the foregoing embodiments, such as a hard disk or a memory of the terminal.
  • the computer-readable storage medium may also be an external storage device of the terminal, such as a plug-in hard disk, a Smart Media Card (SMC), and a Secure Digital (SD) card provided on the terminal. Wait.
  • the computer-readable storage medium may further include both an internal storage unit of the terminal and an external storage device.

Landscapes

  • Engineering & Computer Science (AREA)
  • Signal Processing (AREA)
  • Computer Networks & Wireless Communication (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Information Transfer Between Computers (AREA)
  • Telephonic Communication Services (AREA)

Abstract

一种语音消息搜索方法、装置、计算机设备及可读存储介质。所述方法包括:将获取的完整语音消息进行分段形成多段片段语音消息,并将所述多段片段语音消息和所述完整语音消息对应的文本消息发送到目标终端(S101);保存所述完整语音消息和所述完整语音消息对应的文本消息(S102);若接收到第一消息搜索指令,从保存的所述文本消息中搜索与所述第一消息搜索指令匹配的文本消息作为第一文本消息(S103);将所述第一文本消息对应的语音消息搜索结果作为第一搜索结果进行显示,所述第一搜索结果包括所述第一文本消息所对应的完整语音消息(S104)。

Description

语音消息搜索方法、装置、计算机设备及存储介质
本申请要求于2018年5月24日提交中国专利局、申请号为201810508827.1、发明名称为“语音消息搜索方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及数据处理技术领域,尤其涉及一种语音消息搜索方法、装置、计算机设备及存储介质。
背景技术
即时通信工具如微信、QQ等,已经成为人们工作生活中必不可少的交流工具。我们在使用此类工具时,通过视觉和听觉感知的聊天内容在脑海里会留下印象,然而随着时间的流逝,内容不是很清晰。为了了解之前的聊天内容,我们经常会用搜索功能,从而定位到当时的聊天记录。为了方便用户查找、定位历史消息记录,大多数现有的通信工具都为用户提供了历史消息记录的查询功能,然而拥有这一功能的通讯工具均只能查询、定位到用户的文字消息记录,忽略了用户对查询、定位语音消息记录的需求,导致用户查找语音消息的过程极其繁琐,严重影响用户体验。另一方面,用户在使用此类通信工具进行语音聊天时会遇到以下情况:若使用聊天工具的是老人、或者小孩等,由于他们不太会使用聊天工具,因此存在发送的语音消息很长,如超过1分钟等;或者用户在说一件事情时,不想发多条语音,而希望通过一条语音就把该事情相关的问题说清楚,那么可能会导致该条语音消息很长,如超过60s等。现有技术中,若语音消息达到60s就会自动发送,不能再接着进行录制了,这会导致一些想录制更长语音消息(超过60s)的用户体验不高。另外,当接收端接收到长语音消息后,却又不想读这么长的语音消息,从而影响用户体验。
发明内容
本申请实施例提供一种语音消息搜索方法、装置、计算机设备及存储介质,可对语音消息进行搜索并将语音消息搜索结果进行显示。
第一方面,本申请实施例提供了一种语音消息搜索方法,应用于一终端,该方法包括:将获取的完整语音消息进行分段形成多段片段语音消息,并将所述多段片段语音消息和所述完整语音消息对应的文本消息发送到目标终端;保存所述完整语音消息和所述完整语音消息对应的文本消息;若接收到第一消息搜索指令,从保存的所述文本消息中搜索与所述第一消息搜索指令匹配的文本消息作为第一文本消息;将所述第一文本消息对应的语音消息搜索结果作为第一搜索结果进行显示,其中,所述第一搜索结果包括所述第一文本消息所对应的完整语音消息。
第二方面,本申请实施例提供了一种语音消息搜索装置,该装置包括用于执行上述第一方面所述的语音消息搜索方法的单元。
第三方面,本申请实施例提供了一种计算机设备,所述计算机设备包括存储器,以及与所述存储器相连的处理器;
所述存储器用于存储计算机程序,所述处理器用于运行所述存储器中存储的计算机程序,以执行上述第一方面所述的语音消息搜索方法。
第四方面,本申请实施例提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令被处理器执行时,实现上述第一方面所述的语音消息搜索方法。
本申请实施例通过对语音消息进行搜索,得到与消息搜索指令匹配的语音消息并显示语音消息搜索结果,可方便用户查看与消息搜索指令匹配的语音消息,提高了查询语音消息的效率,提升了用户的体验;通过在发送时将语音消息进行分段,并将分段后的片段语音消息发送到目标终端,以避免目标终端对应的用户不想阅读太长的语音消息,进一步提升了用户的体验。
附图说明
图1是本申请实施例提供的一种语音消息搜索方法的流程示意图;
图2是本申请实施例提供的一种语音消息搜索方法的子流程示意图;
图3是本申请另一实施例提供的一种语音消息搜索方法的子流程示意图;
图4是本申请实施例提供的一种语音消息搜索方法的另一子流程示意图;
图5是本申请实施例提供的第二语音消息搜索结果显示的示例图;
图6是本申请施例提供的一种语音消息搜索装置的示意性框图;
图7是本申请实施例提供的分段发送单元的示意性框图;
图8是本申请另一实施例提供的分段发送单元的示意性框图;
图9是本申请另一实施例提供的语音消息搜索装置的示意性框图;
图10是本申请实施例提供的一种计算机设备的示意性框图。
具体实施方式
在本申请中,应当理解,尽管术语第一、第二等可以在此用来描述各种元素,但这些元素不应该受限于这些术语。这些术语仅用来将这些元素彼此区分开。例如,在不脱离本申请范围的前提下,第一获取单元可以被称为第二获取单元,并且类似地,第二获取单元可以被称为第一获取单元。第一获取单元和第二获取单元均为获取单元,但它们并非同一获取单元。
以下描述的终端包括移动电话、膝上型计算机、平板计算机、台式计算机等设备。需要注意的是,终端中安装有可以发送语音消息的即时通信工具如微信、QQ等。终端可以发送语音消息,也可以接收语音消息。
图1为本申请实施例提供的一种语音消息搜索方法的流程示意图。该方法应用于一终端中,在该实施例中,该终端发送语音消息到目标终端。该方法包括以下步骤S101-S104。
S101,将获取的完整语音消息进行分段形成多段片段语音消息,并将所述多段片段语音消息和所述完整语音消息对应的文本消息发送到目标终端。
其中,将获取的完整语音消息进行分段,可以理解为在录音的过程中,边录音边将所获取的语音消息进行分段,在录音完成时,将获取的语音消息作为完整语音消息;也可以理解为在完成录音后,将所获取的完整语音消息进行分段。把分段后形成的语音消息称为片段语音消息。完整语音消息包括多段片段语音消息,即所有片段语音消息。将所有片段语音消息发送到目标终端,可以理解为将分段后所有片段语音消息一起发送到目标终端,也可以理解为形成一个片段语音消息后,就将该片段语音消息发送到目标终端,直至所有片段语音消息都发送至目标终端。还需要发送到目标终端的信息包括完整语音消息对应的文本消息。可以理解地,在该实施例中,目标终端可以理解为接收所有片段语音消息的一端。其中,完整语音消息对应的文本消息可以是在录音的过程中,边录音边将所获取的语音消息转换为文字,在录音完成且转化完成后,将所得到的所有文字作为完整语音消息对应的文本消息;也可以是在完成录音后,将所录制的完整语音消息转换为对应的文本消息。其中,通过语音识别算法将完整语音消息转换为对应的文本消息。
S102,保存所述完整语音消息和所述完整语音消息对应的文本消息。在终 端中,仍保存本次录音的完整语音消息和该完整语音消息对应的文本消息。
S103,若接收到第一消息搜索指令,从保存的文本消息中搜索与第一消息搜索指令匹配的文本消息作为第一文本消息。
其中,第一消息搜索指令中包括第一关键词。根据第一消息搜索指令,从保存的文本消息中搜索与第一消息搜索指令匹配的文本消息,包括:根据第一消息搜索指令中的第一关键词,从保存的文本消息中搜索与第一关键词匹配的文本消息。将搜索出的文本消息作为第一文本消息。具体地,可在即时通信工具如微信主页上的搜索查询项中输入第一关键词,点击搜索按钮或者检测到输入完成,生成第一消息搜索指令;也可打开具体的通信对象如聊天对象,在具体的通信对象相应的界面中找到“查找聊天记录”等相关按钮,点击该按钮后,输入第一关键词,点击搜索按钮或者检测到输入完成,即生成第一消息搜索指令,其中,通信对象可以是单个的联系人,也可以是群组。其中,第一关键词输入的方式,包括文本形式和语音形式。语音形式输入的第一关键词,需要根据语音识别算法将语音转换为对应的文本形式的第一关键词。根据第一消息搜索指令中的第一关键词,搜索与第一关键词匹配的文本消息,如关键词为“动物园”,在保存的文本消息中搜索,若搜索到包括与“动物园”相关的文本消息,那么就认为该文本消息是与第一关键词匹配的文本消息,将该文本消息作为第一文本消息。其中,搜索包括各种方式的搜索,如模糊搜索、精确搜索等。
在一些实施例中,第一消息搜索指令中还可以包括在两个时间段中选择的目标时间段信息,即第一消息搜索指令中还可以包括时间信息;在一些实施例中,第一消息搜索指令中还可以包括在有关至少两个联系人的界面中选择的目标联系人信息,即第一消息搜索指令中还可以包括目标联系人信息。
S104,将所述第一文本消息对应的语音消息搜索结果作为第一搜索结果进行显示,其中,所述第一搜索结果包括所述第一文本消息所对应的完整语音消息。
具体地,按照第一预设格式进行显示。其中,第一预设格式包括:完整语音消息、完整语音消息中第一关键词前后对应的预设字数的文本内容。第一预设格式还可以包括:完整语音消息对应的发送人信息、完整语音消息发送完成的时间。其中,第一关键词可以高亮显示,如区分颜色或者加粗等,发送人信息包括发送人昵称和/或发送人头像等,完整语音信息包括完整语音和/或完整语音消息时长等;预设字数包括关键词的字数,预设次数可以设置为具体字数,如16个字等,也可以根据其他的规则设置为其他的字数。若语音消息对应的文 本消息的总字数超过预设字数,预设字数以外的其他文本可以用省略号代替。如关键词为:吃饭,预设字数为16,那么文本消息可以显示为:...你在哪个地方吃饭,发个定位给...。
本申请实施例通过对语音消息进行搜索,得到与第一消息搜索指令匹配的语音消息并显示语音消息搜索结果,可方便用户查看与第一消息搜索指令匹配的语音消息,提高了查询语音消息的效率,提升了用户的体验;通过在发送方将语音消息进行分段,并将分段后的多段片段语音消息发送到目标终端,以避免目标终端的用户不想阅读太长的语音消息,进一步提升了用户的体验。
在一实施例中,如图2所示,步骤S101包括步骤S201-S203。
S201,若检测到开始录音,根据预设分段条件定位所获取到的语音消息的分段点。
在即时通信工具中,若检测到点击或者按住录音按钮,即检测到开始录音;也可在即时通信工具中,添加“长语音功能”相关的按钮,若检测到点击或者按住该“长语音功能”相关的按钮,即检测到开始录音。边录音边定位分段点,对应地,所获取到的语音消息是边录音边形成的语音消息。根据预设分段条件定位所获取到的语音消息的分段点,包括:根据语音消息的时间定位所获取到的语音消息的分段点,或者根据语音消息的时间和说话停顿位置来定位所获取到的语音消息的分段点。
根据语音消息的时间定位所获取到的语音消息的分段点,包括:根据第一预设时间定位所获取到的语音消息的分段点。如第一预设时间为60s,检测到语音消息的时间到达第一预设时间60s时,将60s作为一个分段点;检测到语音消息的时间到达120s时,将120s作为一个分段点。可以理解地,每隔第一预设时间的间隔将语音消息进行分段,如每隔60s的间隔将语音消息进行分段。该种定位分段点的方式简单,可以提高分段的效率。
根据语音消息的时间和说话停顿位置来定位所获取到的语音消息的分段点,包括:判断语音消息的时间是否达到预设最小分段时间;若达到预设最小分段时间且未达到预设最大分段时间,检测语音消息中的说话停顿位置;若检测到说话停顿位置,根据说话停顿位置定位所获取到的语音消息的分段点;若未检测到说话停顿位置且语音消息的时间达到预设最大分段时间,根据该最大分段时间定位所获取到的语音消息的分段点。可以理解地,若语音消息的时间达到预设最小分段时间且未达到预设最大分段时间,将检测到的说话停顿位置作为分段点,若未检测到说话停顿位置且语音消息的时间达到预设最大分段时 间,将该最大分段时间作为分段点。其中,预设最小分段时间可以为30s等,预设最大分段时间可以为60s等。可以根据语音消息对应的声波变化来检测说话停顿位置,如若检测到语音消息中的一段声波平均振幅比较高,而接下来检测到声波平均振幅比较低,若声波平均振幅比较低对应的时长达到预设时长,将该达到预设时长的点对应的语音消息的时间作为说话停顿位置。定位分段点,可以理解为找到并保存分段点所在的位置,如找到并保存分段点对应的语音消息的时间。该种定位分段点的方式考虑到说话停顿点和语音消息的时间,以用户的体验为出发点,提升用户的体验。
该实施例中,第一预设时间、预设最小分段时间、预设最大分段时间可以是系统预先设置的;也可以根据用户的习惯进行设置,即接收用户的设置。第一预设时间、预设最小分段时间、预设最大分段时间等设置好后,可以进行修改,如可以接收用户修改的对应时长,也可以根据用户的反馈,接收服务器设置的另一些合适的对应时长作为新的对应时长。
S202,若检测到录音结束,将所获取到的语音消息作为完整语音消息,将所述完整语音消息通过语音识别算法转换为文本消息。
在即时通信工具中,若检测到点击或者松开录音按钮,即检测到录音结束;若检测到点击或者松开“长语音功能”相关的按钮,即检测到录音结束。若检测到录音结束,将录制的完整语音消息通过语音识别算法转换为文本消息。
S203,根据所述分段点将所述完整语音消息分成多段以形成多段片段语音消息,将所述多段片段语音消息和所述文本消息发送到目标终端。
为了接收的方便,在片段语音消息发送时,将片段语音消息按照发送的顺序标记序号标识。如若分成3段,那么第一个发送的片段语音消息标记为01,第二个发送的片段语音消息标记为02,第三个发送的片段语音消息标记为03。也可以用其他标记来标识。
该实施例在录音的过程中定位分段点,在录音结束后,将完整语音消息转换为对应的文本消息,再根据分段点将完整语音消息分成多段,并将分段后形成的片段语音消息和文本消息发送到目标终端。通过该实施例以将长语音消息进行分段,并发送到目标终端。
在一实施例中,如图3所示,步骤S101包括步骤S301-S306。
S301,若检测到开始录音,检测当前已生成的语音消息是否满足预设分段条件。
在即时通信工具中,若检测到点击或者按住录音按钮,即检测到开始录音; 也可在即时通信工具中,添加“长语音功能”相关的按钮,若检测到点击或者按住该“长语音功能”相关的按钮,即检测到开始录音。
其中,检测当前已生成的语音消息是否满足预设分段条件,包括:检测当前已生成的语音消息的时间,根据该语音消息的时间来确定该语音消息是否满足预设分段条件;或者检测该语音消息的时间和该语音消息中的说话停顿位置,根据该语音消息的时间和说话停顿位置来确定该语音消息是否满足预设分段条件。
在一实施例中,检测当前已生成的语音消息的时间,根据该语音消息的时间来确定该语音消息是否满足预设分段条件,包括:检测当前已生成的语音消息的时间是否达到第二预设时间,若达到第二预设时间,确定该语音消息满足预设分段条件。如第二预设时间为60s,从录音开始到语音消息达到60s,确定该语音消息满足预设分段条件。将该语音消息作为片段语音消息发送到目标终端。可以理解地,将达到第二预设时间且未发送的语音消息作为一个片段语音消息发送。如将前60s的语音消息作为一个片段语音消息发送,再判断是否满足预设分段条件时不考虑前60s对应的语音消息了,将从61s开始,若再达到第二预设时间,即将61s到120s的语音消息作为一个片段语音消息。可以理解地,每隔第二预设时间的间隔将所录制的语音消息进行分段,如每隔60s的间隔将语音消息进行分段。该种确定语音消息满足预设分段条件的方法简单,可以提高语音消息分段的效率。
在一实施例中,检测当前已生成的语音消息的时间和该语音消息中的说话停顿位置,根据该语音消息的时间和说话停顿位置来确定该语音消息是否满足预设分段条件,包括:判断当前已生成的语音消息的时间是否达到预设最小分段时间;若该语音消息的时间达到预设最小分段时间且未达到预设最大分段时间,检测该语音消息中的说话停顿位置;若检测到说话停顿位置,确定该语音消息满足预设分段条件;若未检测到说话停顿位置且该语音消息的时间达到预设最大分段时间,确定该语音消息满足预设分段条件。可以理解地,若该语音消息的时间达到预设最小分段时间且未达到预设最大分段时间,根据检测到的说话停顿位置将该语音消息进行分段,若未检测到说话停顿位置且该语音消息的时间达到预设最大分段时间,根据预设最大分段时间将该语音消息进行分段。将分段后形成的语音消息作为片段语音消息发送出去,再判断是否满足预设分段条件时不考虑已经发送出去的片段语音消息了。其中,可以根据该语音消息对应的声波变化来检测说话停顿位置。该实施例中,第二预设时间、预设最小 分段时间、预设最大分段时间可以进行修改,修改的方式请参看上述对应部分的描述。
S302,若当前已生成的语音消息满足预设分段条件,将当前已生成的语音消息作为片段语音消息发送到目标终端。若当前已生成的语音消息不满足预设分段条件,执行步骤S303。
其中,片段语音消息在发送时会加上标识以进行标记。可以理解地,一段长语音分段后,可能会形成多个片段语音消息。为了接收的方便,将语音消息进行分段发送时,将发送的多个片段语音消息按照发送的顺序标记序号标识。如若分成3段,那么第一个发送的片段语音消息标记为01,第二个发送的片段语音消息标记为02,第三个发送的片段语音消息标记为03。也可以用其他标识来标记。
S303,检测录音是否结束。
S304,若录音未结束,将下一段已生成的语音消息作为当前已生成的语音消息,接着触发执行步骤S301。
可以理解地,将语音消息进行分段后,若录音未结束,检测语音消息是否满足预设分段条件,是针对上一个分段点之后的语音消息的。可以理解地,将上一个分段点之后的当前已生成的语音消息作为将要分段的对象,即下一段已生成的语音消息作为当前已生成的语音消息,来检测是否满足预设分段条件。
S305,若检测到录音结束,将当前已生成且未发送的语音消息作为片段语音消息,将完整语音消息对应的文本消息和最后一段的片段语音消息发送到目标终端,其中,所述完整语音消息对应的文本消息是通过对录音开始后所获取到的语音消息进行实时转换所得到的。
在即时通信工具中,若检测到点击或者松开录音按钮,即检测到录音结束;若检测到点击或者松开“长语音功能”相关的按钮,即检测到录音结束。若检测到录音结束,将当前已生成且未发送的语音消息作为片段语音消息,将该片段语音消息和完整语音消息对应的文本消息发送到目标终端。可以理解地,若检测到录音结束,将本次录音中当前已生成且还未发送的语音消息作为一个片段语音消息发送到目标终端,同时将本次录音对应的完整语音消息对应的文本消息发送到目标终端。
该完整语音消息对应的文本消息是通过对录音开始后所获取到的语音消息进行实时转换所得到的。可以理解地,若检测到开始录音,将所获取到的语音消息转换为文本消息。具体地,若检测到开始录音,启动语音转文字的接口, 该接口用于调用语音识别算法,以边录音边将录制的语音消息转换为文字。可以理解地,在录音的过程中,边录音边转文字。对应地,所获取到的语音消息是边录音边形成的语音消息。
该实施例在边录音的过程中,边转文字,并同时进行分段,将分段后的多段片段语音消息发送到目标终端,同时在录音结束后,将语音消息对应的文本消息发送到目标终端。该实施例在边录音的过程中,边转文字,并同时进行分段并发送,可以提高发送语音消息的效率。
在一些实施例中,将片段语音消息发送到目标终端之前,对应的方法还包括:将片段语音消息进行压缩;所述将片段语音消息发送到目标终端,包括:将压缩后的片段语音消息发送到目标终端。具体地,可以使用压缩工具进行压缩,如音频压缩工具speex,具体压缩比例可以设置为1∶15。选择压缩比例1∶15是因为在该比例下,解压后的片段语音消息不影响用户的体验,同时不影响将解压后的语音消息转换为文字的效果。终端将片段语音消息进行压缩后再发送,提高传输速率,节省网络带宽。
在一些实施例中,在所述将所述第一文本消息对应的语音消息搜索结果作为第一搜索结果进行显示之前,所述方法还包括:检测所述第一文本消息是否有多条;若第一文本消息有多条,将多条文本消息对应的语音消息搜索结果按照预设规则排序。所述将所述第一文本消息对应的语音消息搜索结果作为第一搜索结果进行显示,包括:将排序后的所述第一文本消息对应的语音消息搜索结果作为第一搜索结果进行显示。其中,预设规则包括按照语音消息发送的时间前后顺序,和/或按照语音消息对应的文本消息与关键词的匹配度进行排序,或者根据人的遗忘曲线来根据不同语音消息发送时间所对应的遗忘可能性的高低进行排序等。
在一些实施例中,若第一文本消息包括完整语音消息对应的文本消息和纯文本消息,那么将所述第一文本消息对应的语音消息搜索结果作为第一搜索结果进行显示,包括:将完整语音消息对应的文本消息所对应的语音消息搜索结果按照第一预设格式进行显示,将对应的纯文本消息按照另一预设格式进行显示。其中,另一预设格式包括:纯文本信息对应的发送人信息、纯文本信息、纯文本消息发送的时间等。
图4是本申请实施例提供的一种语音消息搜索方法的流程示意图。该方法应用于终端中,在该实施例中,该终端接收目标终端发送的语音消息。该实施例中的目标终端与图1~图3实施例所示的目标终端可以是同一个目标终端,也 可以是不同的目标终端。该方法除了包括图1~图3实施例所述的方法之外,还包括以下步骤S401-S404。
S401,接收目标终端发送的多段片段语音消息和完整语音消息对应的文本消息。
可以理解地,终端接收的是分段后的多段片段语音消息以及完整语音消息对应的文本消息。由于一个完整语音消息分段后的片段语音消息有多个,在网络不稳定的情况下,有可能分段后的片段语音消息并不是按序到达。其中,可通过片段语音消息的标识如序号标识来判断接收到的片段语音消息是否是按序到达。当接收到片段语音消息后判断所述片段语音消息是否是按序到达;若不是按序到达,将接收到的片段语音消息放在缓存中;若按序到达,则将按序到达的片段语音消息显示在终端中以供用户浏览和阅读;若未按序到达的片段语音消息到达后,将未按序到达的片段语音消息和缓存中的片段语音消息按照序号标识显示在终端中以供用户浏览和阅读。可以理解地,接收时可以随意接收,但是在终端显示时,是按照序号标识对应的顺序来显示接收到的片段语音消息。以避免后面的片段语音消息先收到就显示,而前面的片段语音消息后显示,给用户带来的不便。如若用户先阅读后面的语音部分,会觉得不知所以然。另外,终端接收的是多段片段语音消息,可以解决用户不太愿意阅读长语音的问题。如用户阅读了一段语音消息后,对语音消息中某一段不是很清楚,用户只想再重复听该段对应的语音片段,而不希望每次听语音时,都从头开始。在该种情况下,若每次都从头开始,会影响用户的体验。显然,终端接收多段片段语音消息,可以提升用户的体验。
S402,保存多段片段语音消息与该完整语音消息对应的文本消息之间的对应关系。
可以理解地,完整语音消息对应的文本消息会对应多段片段语音消息。
S403,若接收到第二消息搜索指令,从保存的文本消息中搜索与第二消息搜索指令匹配的文本消息作为第二文本消息。
其中,第二消息搜索指令中包括第二关键词。根据第二消息搜索指令,从保存的文本消息中搜索与第二消息搜索指令匹配的文本消息,包括:根据第二消息搜索指令中的第二关键词,从保存的文本消息中搜索与第二关键词匹配的文本消息。将搜索出的文本消息作为第二文本消息。
S404,将第二文本消息对应的语音消息搜索结果作为第二搜索结果进行显示,其中,第二搜索结果包括与第二文本消息有对应关系的所有片段语音消息。
具体地,按照第二预设格式进行显示。其中,第二预设格式包括:与第二文本消息有对应关系的所有片段语音消息、完整语音消息对应的文本消息。其中,显示完整语音消息对应的文本消息,以方便定位第二关键词所在的片段语音消息。第二预设格式还可以包括:片段语音消息对应的发送人信息、片段语音消息接收显示的时间。其中,完整语音消息对应的文本消息中有第二关键词,第二关键词可以高亮显示,如区分颜色或者加粗等,发送人信息包括发送人昵称和/或发送人头像等,片段语音信息包括对应的片段语音和/或片段语音消息时长等。
图5为第二搜索结果显示的示例图。如图5所示,在终端10的屏幕11上显示有与第二关键词匹配的语音消息搜索结果。其中,第二关键词110为“动物园”,发送人信息包括发送人图像120和发送人昵称130。其中,发送人昵称为“xyzxyz”的发送人发送了两段片段语音消息,完整语音消息包括该两段片段语音消息。两段片段语音消息包括片段语音160和片段语音消息时长150。完整语音消息对应的文本内容140,其中,可以看出关键词“动物园”为加粗显示。可将与第二关键词匹配的文本消息显示在所有片段语音消息之后。语音消息接收显示的时间170显示为:2018-01-01,在其他实施例中,语音消息发送的时间还可以具体到秒等。
在该方法实施例中,在接收到片段语音消息后,对应的方法还包括:检测接收到的片段语音消息是否为压缩后的片段语音消息;若是压缩后的片段语音消息,将压缩后的片段语音消息进行解压,以使终端可以阅读到质量较好的片段语音消息,提升用户体验。
在一些实施例中,在所述将第二文本消息对应的语音消息搜索结果作为第二搜索结果进行显示之前,所述方法还包括:检测第二文本消息是否有多条;若有多条,将多条文本消息对应的语音消息搜索结果按照预设规则排序。所述将第二文本消息对应的语音消息搜索结果作为第二搜索结果进行显示,包括:将排序后的第二文本消息对应的语音消息搜索结果作为第二搜索结果进行显示。其中,将接收到的多段片段语音消息看做一个语音消息来进行排序,将第一个片段语音消息接收的时间作为该多段片段语音消息对应的时间。其中,预设规则包括按照语音消息接收的时间前后顺序,和/或按照语音消息对应的文本消息与关键词的匹配度进行排序,或者根据人的遗忘曲线来根据不同语音消息发送时间所对应的遗忘可能性的高低进行排序等。
在一些实施例中,若第二文本消息包括完整语音消息对应的文本消息和纯 文本消息,那么将第二文本消息对应的语音消息搜索结果作为第二搜索结果进行显示,包括:将完整语音消息对应的文本消息所对应的语音消息搜索结果按照第二预设格式进行显示,将对应的纯文本消息按照另一预设格式进行显示。其中,另一预设格式包括:纯文本信息对应的发送人信息、纯文本信息、纯文本消息发送的时间等。
图6是本申请实施例提供的一种语音消息搜索装置的示意性框图。该装置被配置于一终端中。如图6所示,该装置60包括分段发送单元601、第一保存单元602、第一搜索单元603、第一显示单元604。
分段发送单元601,用于将获取的完整语音消息进行分段形成多段片段语音消息,并将所述多段片段语音消息和所述完整语音消息对应的文本消息发送到目标终端。
第一保存单元602,用于保存所述完整语音消息和所述完整语音消息对应的文本消息。在终端中,仍保存本次录音的完整语音消息和该完整语音消息对应的文本消息。
第一搜索单元603,用于若接收到第一消息搜索指令,从保存的文本消息中搜索与第一消息搜索指令匹配的文本消息作为第一文本消息。
第一显示单元604,用于将所述第一文本消息对应的语音消息搜索结果作为第一搜索结果进行显示,其中,所述第一搜索结果包括所述第一文本消息所对应的完整语音消息。
在一实施例中,如图7所示,分段发送单元601包括定位单元701、第一转换单元702、消息分段发送单元703。
定位单元701,用于若检测到开始录音,根据预设分段条件定位所获取到的语音消息的分段点。
可以理解地,边录音边定位分段点,对应地,所获取到的语音消息是边录音边形成的语音消息。定位单元,用于根据语音消息的时间定位所获取到的语音消息的分段点,或者用于根据语音消息的时间和说话停顿位置来定位所获取到的语音消息的分段点。
其中,若定位单元用于根据语音消息的时间和说话停顿位置来定位所获取到的语音消息的分段点,对应地,定位单元包括时间判断单元、停顿检测单元、定位确定单元。其中,时间判断单元,用于判断语音消息的时间是否达到预设最小分段时间。停顿检测单元,用于若达到预设最小分段时间且未达到预设最大分段时间,检测语音消息中的说话停顿位置。定位确定单元,用于若检测到 说话停顿位置,根据说话停顿位置定位所获取到的语音消息的分段点。定位确定单元,还用于若未检测到说话停顿位置且语音消息的时间达到预设最大分段时间,根据该最大分段时间定位所获取到的语音消息的分段点。
第一转换单元702,用于若检测到录音结束,将所获取到的语音消息作为完整语音消息,将完整语音消息通过语音识别算法转换为文本消息。
消息分段发送单元703,用于根据所述分段点将所述完整语音消息分成多段以形成多段片段语音消息,将所述多段片段语音消息和所述文本消息发送到目标终端。
在一实施例中,如图8所示,分段发送单元601包括分段检测单元801、消息发送单元802、结束检测单元803、当前语音确定单元804。
分段检测单元801,用于若检测到开始录音,检测当前已生成的语音消息是否满足预设分段条件。
其中,分段检测单元801,用于检测当前已生成的语音消息的时间,根据该语音消息的时间来确定该语音消息是否满足预设分段条件;或者用于检测该语音消息的时间和该语音消息中的说话停顿位置,根据该语音消息的时间和说话停顿位置来确定该语音消息是否满足预设分段条件。
在一实施例中,若分段检测单元801用于检测当前已生成的语音消息的时间,根据该语音消息的时间来确定语音消息是否满足预设分段条件,对应地,分段检测单元包括时间检测单元、条件确定单元。其中,时间检测单元,用于检测当前已生成的语音消息的时间是否达到第二预设时间。条件确定单元,用于若当前已生成的语音消息的时间达到第二预设时间,确定该语音消息满足预设分段条件。
在一实施例中,若分段检测单元801用于检测当前已生成的语音消息的时间和该语音消息中的说话停顿位置,根据该语音消息的时间和说话停顿位置来确定该语音消息是否满足预设分段条件,对应地,分段检测单元包括时间判断单元、停顿检测单元、条件确定单元。其中,时间判断单元,用于判断当前已生成的语音消息的时间是否达到预设最小分段时间。停顿检测单元,用于若该语音消息的时间达到预设最小分段时间且未达到预设最大分段时间,检测该语音消息中的说话停顿位置。条件确定单元,用于若检测到说话停顿位置,确定该语音消息满足预设分段条件;还用于若未检测到说话停顿位置且该语音消息的时间达到预设最大分段时间,确定该语音消息满足预设分段条件。
消息发送单元802,用于若当前已生成的语音消息满足预设分段条件,将 当前已生成的语音消息作为片段语音消息发送到目标终端。若当前已生成的语音消息不满足预设分段条件,触发结束检测单元803。其中,片段语音消息在发送时会加上标识以进行标记。可以理解地,一段长语音分段后,可能会形成多个片段语音消息。为了接收的方便,将语音消息进行分段发送时,将发送的多个片段语音消息按照发送的顺序标记序号标识。
结束检测单元803,用于检测录音是否结束。可以理解地,将语音消息进行分段后,若录音未结束,检测语音消息是否满足预设分段条件,是针对上一个分段点之后的语音消息的,即将上一个分段点之后的当前已生成且未发送的语音消息作为将要分段的对象,来检测是否满足预设分段条件。
当前语音确定单元804,用于若录音未结束,将下一段已生成的语音消息作为当前已生成的语音消息,接着触发分段检测单元。
消息发送单元802,还用于若检测到录音结束,将当前已生成且未发送的语音消息作为片段语音消息,将完整语音消息对应的文本消息和最后一段的片段语音消息发送到目标终端,其中,其中,所述完整语音消息对应的文本消息是通过对录音开始后所获取到的语音消息进行实时转换所得到的。
在一些实施例中,将片段语音消息发送到目标终端之前,对应的分段发送单元还包括压缩单元。其中,压缩单元,用于将片段语音消息进行压缩;消息发送单元,用于将压缩后的片段语音消息发送到目标终端。终端将片段语音消息进行压缩后再发送,提高传输速率,节省网络带宽。
在一些实施例中,对应的装置还包括:第一消息检测单元、第一排序单元。第一消息检测单元,用于检测第一文本消息是否有多条。第一排序单元,用于若第一文本消息有多条,将多条文本消息对应的语音消息搜索结果按照预设规则排序。第一显示单元,还用于将排序后的所述第一文本消息对应的语音消息搜索结果作为第一搜索结果进行显示。
在一些实施例中,若第一文本消息包括完整语音消息对应的文本消息和纯文本消息,那么第一显示单元,还用于将完整语音消息对应的文本消息所对应的语音消息搜索结果按照第一预设格式进行显示,将对应的纯文本消息按照另一预设格式进行显示。
图9是本申请实施例提供的一种语音消息搜索装置的示意性框图。该装置被配置于终端中。如图9所示,该装置90除了包括图6-图8实施例所包括的单元外,还包括接收单元901、第二保存单元902、第二搜索单元903、第二显示单元904。
接收单元901,用于接收目标终端发送的多段片段语音消息和完整语音消息对应的文本消息。
第二保存单元902,用于保存多段片段语音消息与该完整语音消息对应的文本消息之间的对应关系。
第二搜索单元903,用于若接收到第二消息搜索指令,从保存的文本消息中搜索与第二消息搜索指令匹配的文本消息作为第二文本消息。
第二显示单元904,用于将第二文本消息对应的语音消息搜索结果作为第二搜索结果进行显示,其中,第二搜索结果包括与第二文本消息有对应关系的所有片段语音消息。
在一些实施例中,对应的装置还包括压缩检测单元、解压单元。其中,压缩检测单元,用于检测接收到的片段语音消息是否为压缩后的片段语音消息。解压单元,用于若是压缩后的片段语音消息,将压缩后的片段语音消息进行解压,以使终端可以阅读到质量较好的片段语音消息,提升用户体验。
在一些实施例中,对应的装置还包括:第二消息检测单元、第二排序单元。其中,第二消息检测单元,用于检测第二文本消息是否有多条。第二排序单元,用于若第二文本消息有多条,将多条文本消息对应的语音消息搜索结果按照预设规则排序。第二显示单元,还用于将排序后的第二文本消息对应的语音消息搜索结果作为第一搜索结果进行显示。
在一些实施例中,若第二文本消息包括完整语音消息对应的文本消息和纯文本消息,那么第二显示单元,还用于将完整语音消息对应的文本消息所对应的语音消息搜索结果按照第二预设格式进行显示,将对应的纯文本消息按照另一预设格式进行显示。
上述装置实施例的实现过程和达到的有益效果可参看对应的方法实施例的描述,在此不再赘述。
上述装置可以实现为一种计算机程序的形式,计算机程序可以在如图10所示的计算机设备上运行。
图10为本申请实施例提供的一种计算机设备的示意性框图。该设备100包括通过系统总线101连接的处理器102、存储器和网络接口103,其中,存储器可以包括非易失性存储介质104和内存储器105。
该非易失性存储介质104可存储操作系统1041和计算机程序1042。该计算机程序1042被执行时,可使得处理器102执行语音消息搜索方法。该处理器102用于提供计算和控制能力,支撑整个设备100的运行。该内存储器105为 非易失性存储介质中的计算机程序的运行提供环境,该计算机程序被处理器102执行时,可使得处理器102执行语音消息搜索方法。该网络接口103用于进行网络通信,如接收消息搜索指令等。本领域技术人员可以理解,图10中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的设备100的限定,具体的设备100可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
其中,所述处理器102用于运行存储在存储器中的计算机程序,以实现前述语音消息搜索方法的任一实施例。
应当理解,在本申请实施例中,所称处理器102可以是中央处理单元(Central Processing Unit,CPU),该处理器还可以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器也可以是任何常规的处理器等。
在本申请的另一实施例中提供了一种计算机可读存储介质,所述计算机可读存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令当被处理器执行,以实现前述语音消息搜索方法的任一实施例。
所述计算机可读存储介质可以是前述任一实施例所述的终端的内部存储单元,例如终端的硬盘或内存。所述计算机可读存储介质也可以是所述终端的外部存储设备,例如所述终端上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡等。进一步地,所述计算机可读存储介质还可以既包括所述终端的内部存储单元也包括外部存储设备。
在本申请所提供的几个实施例中,应该理解到,所揭露的装置、设备和方法,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。所属领域的技术人员可以清楚地了解到,为了描述的方便和简洁,上述描述的装置、设备和单元的具体工作过程,可以参考前述方法实施例中的对应过程,在此不再赘述。以上所述,仅为本申请的具体实施方式,但本申请的保护范围并不局限于此,任何熟悉本技术领域的技术人员在本申请揭露的技术范围内,可轻易想到各种等效的修改或替换,这些修改或替换都应涵盖在本申请的保护范围之内。因此,本申请的保护范围应以权利要求的保护范围为准。

Claims (20)

  1. 一种语音消息搜索方法,其特征在于,所述方法包括:
    将获取的完整语音消息进行分段形成多段片段语音消息,并将所述多段片段语音消息和所述完整语音消息对应的文本消息发送到目标终端;
    保存所述完整语音消息和所述完整语音消息对应的文本消息;
    若接收到第一消息搜索指令,从保存的所述文本消息中搜索与所述第一消息搜索指令匹配的文本消息作为第一文本消息;
    将所述第一文本消息对应的语音消息搜索结果作为第一搜索结果进行显示,其中,所述第一搜索结果包括所述第一文本消息所对应的完整语音消息。
  2. 根据权利要求1所述的方法,其特征在于,所述方法还包括:
    接收目标终端发送的多段片段语音消息和完整语音消息对应的文本消息;
    保存所述多段片段语音消息与所述完整语音消息对应的文本消息之间的对应关系;
    若接收到第二消息搜索指令,从保存的所述文本消息中搜索与所述第二消息搜索指令匹配的文本消息作为第二文本消息;
    将所述第二文本消息对应的语音消息搜索结果作为第二搜索结果进行显示,其中,所述第二搜索结果包括与所述第二文本消息有对应关系的多段片段语音消息。
  3. 根据权利要求1所述的方法,其特征在于,所述将获取的完整语音消息进行分段形成多段片段语音消息,并将所述多段片段语音消息和所述完整语音消息对应的文本消息发送到目标终端,包括:
    若检测到开始录音,根据预设分段条件定位所获取到的语音消息的分段点;
    若检测到录音结束,将所获取到的语音消息作为完整语音消息,将所述完整语音消息通过语音识别算法转换为文本消息;
    根据所述分段点将所述完整语音消息分成多段以形成多段片段语音消息,将所述多段片段语音消息和所述文本消息发送到目标终端。
  4. 根据权利要求1所述的方法,其特征在于,所述将获取的完整语音消息进行分段形成多段片段语音消息,并将所述多段片段语音消息和所述完整语音 消息对应的文本消息发送到目标终端,包括:
    若检测到开始录音,检测当前已生成的语音消息是否满足预设分段条件;
    若满足预设分段条件,将当前已生成的语音消息作为片段语音消息发送到目标终端;
    检测所述录音是否结束;
    若检测到所述录音未结束,将下一段已生成的语音消息作为当前已生成的语音消息,触发所述检测当前已生成的语音消息是否满足预设分段条件的步骤;
    若检测到所述录音结束,将当前已生成且未发送的语音消息作为片段语音消息,将所述完整语音消息对应的文本消息和最后一段片段语音消息发送到目标终端,其中,所述完整语音消息对应的文本消息是通过对录音开始后所获取到的语音消息进行实时转换所得到的。
  5. 根据权利要求4所述的方法,其特征在于,所述检测当前已生成的语音消息是否满足预设分段条件,包括:
    判断当前已生成的语音消息的时间是否达到预设最小分段时间;
    若达到预设最小分段时间且未达到预设最大分段时间,检测所述语音消息中的说话停顿位置;
    若检测到说话停顿位置,确定所述语音消息满足预设分段条件;
    若未检测到说话停顿位置且所述语音消息的时间达到预设最大分段时间,确定所述语音消息满足预设分段条件。
  6. 根据权利要求1所述的方法,其特征在于,将片段语音消息发送到目标终端之前,所述方法还包括:
    将片段语音消息进行压缩;
    所述将片段语音消息发送到目标终端,包括:将压缩后的片段语音消息发送到目标终端。
  7. 根据权利要求2所述的方法,其特征在于,所述接收目标终端发送的片段语音消息之后,所述方法还包括:
    检测接收到的片段语音消息是否为压缩后的片段语音消息;
    若是压缩后的片段语音消息,将压缩后的片段语音消息进行解压。
  8. 一种语音消息搜索装置,其特征在于,所述装置包括:
    分段发送单元,用于将获取的完整语音消息进行分段形成多段片段语音消息,并将所述多段片段语音消息和所述完整语音消息对应的文本消息发送到目标终端;
    第一保存单元,用于保存所述完整语音消息和所述完整语音消息对应的文本消息;
    第一搜索单元,用于若接收到第一消息搜索指令,从保存的所述文本消息中搜索与所述第一消息搜索指令匹配的文本消息作为第一文本消息;
    第一显示单元,用于将所述第一文本消息对应的语音消息搜索结果作为第一搜索结果进行显示,其中,所述第一搜索结果包括所述第一文本消息所对应的完整语音消息。
  9. 根据权利要求8所述的装置,其特征在于,所述装置还包括:
    接收单元,用于接收目标终端发送的多段片段语音消息和完整语音消息对应的文本消息;
    第二保存单元,用于保存所述多段片段语音消息与所述完整语音消息对应的文本消息之间的对应关系;
    第二搜索单元,用于若接收到第二消息搜索指令,从保存的所述文本消息中搜索与所述第二消息搜索指令匹配的文本消息作为第二文本消息;
    第二显示单元,用于将所述第二文本消息对应的语音消息搜索结果作为第二搜索结果进行显示,其中,所述第二搜索结果包括与所述第二文本消息有对应关系的多段片段语音消息。
  10. 根据权利要求8所述的装置,其特征在于,所述分段发送单元包括:
    分段检测单元,用于若检测到开始录音,检测当前已生成的语音消息是否满足预设分段条件;
    消息发送单元,用于若满足预设分段条件,将当前已生成的语音消息作为片段语音消息发送到目标终端;
    结束检测单元,用于检测所述录音是否结束;
    当前语音确定单元,用于若检测到所述录音未结束,将下一段已生成的语音消息作为当前已生成的语音消息,触发所述分段检测单元;
    所述消息发送单元,还用于若检测到所述录音结束,将当前已生成且未发 送的语音消息作为片段语音消息,将所述完整语音消息对应的文本消息和最后一段片段语音消息发送到目标终端,其中,所述完整语音消息对应的文本消息是通过对录音开始后所获取到的语音消息进行实时转换所得到的。
  11. 一种计算机设备,其特征在于,所述计算机设备包括存储器,以及与所述存储器相连的处理器;
    所述存储器用于存储计算机程序;所述处理器用于运行所述存储器中存储的计算机程序,以执行如下步骤:
    将获取的完整语音消息进行分段形成多段片段语音消息,并将所述多段片段语音消息和所述完整语音消息对应的文本消息发送到目标终端;
    保存所述完整语音消息和所述完整语音消息对应的文本消息;
    若接收到第一消息搜索指令,从保存的所述文本消息中搜索与所述第一消息搜索指令匹配的文本消息作为第一文本消息;
    将所述第一文本消息对应的语音消息搜索结果作为第一搜索结果进行显示,其中,所述第一搜索结果包括所述第一文本消息所对应的完整语音消息。
  12. 根据权利要求11所述的计算机设备,其特征在于,所述处理器还执行如下步骤:
    接收目标终端发送的多段片段语音消息和完整语音消息对应的文本消息;
    保存所述多段片段语音消息与所述完整语音消息对应的文本消息之间的对应关系;
    若接收到第二消息搜索指令,从保存的所述文本消息中搜索与所述第二消息搜索指令匹配的文本消息作为第二文本消息;
    将所述第二文本消息对应的语音消息搜索结果作为第二搜索结果进行显示,其中,所述第二搜索结果包括与所述第二文本消息有对应关系的多段片段语音消息。
  13. 根据权利要求11所述的计算机设备,其特征在于,所述处理器在执行所述将获取的完整语音消息进行分段形成多段片段语音消息,并将所述多段片段语音消息和所述完整语音消息对应的文本消息发送到目标终端时,具体执行如下步骤:
    若检测到开始录音,根据预设分段条件定位所获取到的语音消息的分段点;
    若检测到录音结束,将所获取到的语音消息作为完整语音消息,将所述完整语音消息通过语音识别算法转换为文本消息;
    根据所述分段点将所述完整语音消息分成多段以形成多段片段语音消息,将所述多段片段语音消息和所述文本消息发送到目标终端。
  14. 根据权利要求11所述的计算机设备,其特征在于,所述处理器在执行所述将获取的完整语音消息进行分段形成多段片段语音消息,并将所述多段片段语音消息和所述完整语音消息对应的文本消息发送到目标终端时,具体执行如下步骤:
    若检测到开始录音,检测当前已生成的语音消息是否满足预设分段条件;
    若满足预设分段条件,将当前已生成的语音消息作为片段语音消息发送到目标终端;
    检测所述录音是否结束;
    若检测到所述录音未结束,将下一段已生成的语音消息作为当前已生成的语音消息,触发所述检测当前已生成的语音消息是否满足预设分段条件的步骤;
    若检测到所述录音结束,将当前已生成且未发送的语音消息作为片段语音消息,将所述完整语音消息对应的文本消息和最后一段片段语音消息发送到目标终端,其中,所述完整语音消息对应的文本消息是通过对录音开始后所获取到的语音消息进行实时转换所得到的。
  15. 根据权利要求14所述的计算机设备,其特征在于,所述处理器在执行所述检测当前已生成的语音消息是否满足预设分段条件时,具体执行如下步骤:
    判断当前已生成的语音消息的时间是否达到预设最小分段时间;
    若达到预设最小分段时间且未达到预设最大分段时间,检测所述语音消息中的说话停顿位置;
    若检测到说话停顿位置,确定所述语音消息满足预设分段条件;
    若未检测到说话停顿位置且所述语音消息的时间达到预设最大分段时间,确定所述语音消息满足预设分段条件。
  16. 根据权利要求11所述的计算机设备,其特征在于,将片段语音消息发送到目标终端之前,所述处理器还执行如下步骤:
    将片段语音消息进行压缩;
    所述将片段语音消息发送到目标终端,包括:将压缩后的片段语音消息发送到目标终端。
  17. 根据权利要求12所述的计算机设备,其特征在于,在接收目标终端发送的片段语音消息之后,所述处理器还执行如下步骤:
    检测接收到的片段语音消息是否为压缩后的片段语音消息;
    若是压缩后的片段语音消息,将压缩后的片段语音消息进行解压。
  18. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质存储有计算机程序,所述计算机程序包括程序指令,所述程序指令被处理器执行时,实现如下步骤:
    将获取的完整语音消息进行分段形成多段片段语音消息,并将所述多段片段语音消息和所述完整语音消息对应的文本消息发送到目标终端;
    保存所述完整语音消息和所述完整语音消息对应的文本消息;
    若接收到第一消息搜索指令,从保存的所述文本消息中搜索与所述第一消息搜索指令匹配的文本消息作为第一文本消息;
    将所述第一文本消息对应的语音消息搜索结果作为第一搜索结果进行显示,其中,所述第一搜索结果包括所述第一文本消息所对应的完整语音消息。
  19. 根据权利要求18所述的计算机可读存储介质,其特征在于,所述处理器还实现如下步骤:
    接收目标终端发送的多段片段语音消息和完整语音消息对应的文本消息;
    保存所述多段片段语音消息与所述完整语音消息对应的文本消息之间的对应关系;
    若接收到第二消息搜索指令,从保存的所述文本消息中搜索与所述第二消息搜索指令匹配的文本消息作为第二文本消息;
    将所述第二文本消息对应的语音消息搜索结果作为第二搜索结果进行显示,其中,所述第二搜索结果包括与所述第二文本消息有对应关系的多段片段语音消息。
  20. 根据权利要求18所述的计算机可读存储介质,其特征在于,所述处理器在执行所述将获取的完整语音消息进行分段形成多段片段语音消息,并将所述多段片段语音消息和所述完整语音消息对应的文本消息发送到目标终端时, 具体实现如下步骤:
    若检测到开始录音,检测当前已生成的语音消息是否满足预设分段条件;
    若满足预设分段条件,将当前已生成的语音消息作为片段语音消息发送到目标终端;
    检测所述录音是否结束;
    若检测到所述录音未结束,将下一段已生成的语音消息作为当前已生成的语音消息,触发所述检测当前已生成的语音消息是否满足预设分段条件的步骤;
    若检测到所述录音结束,将当前已生成且未发送的语音消息作为片段语音消息,将所述完整语音消息对应的文本消息和最后一段片段语音消息发送到目标终端,其中,所述完整语音消息对应的文本消息是通过对录音开始后所获取到的语音消息进行实时转换所得到的。
PCT/CN2018/101062 2018-05-24 2018-08-17 语音消息搜索方法、装置、计算机设备及存储介质 WO2019223134A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201810508827.1 2018-05-24
CN201810508827.1A CN108874904B (zh) 2018-05-24 2018-05-24 语音消息搜索方法、装置、计算机设备及存储介质

Publications (1)

Publication Number Publication Date
WO2019223134A1 true WO2019223134A1 (zh) 2019-11-28

Family

ID=64333808

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/101062 WO2019223134A1 (zh) 2018-05-24 2018-08-17 语音消息搜索方法、装置、计算机设备及存储介质

Country Status (2)

Country Link
CN (1) CN108874904B (zh)
WO (1) WO2019223134A1 (zh)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113299279A (zh) * 2021-05-18 2021-08-24 上海明略人工智能(集团)有限公司 用于关联语音数据和检索语音数据的方法、装置、电子设备和可读存储介质
CN114124875A (zh) * 2021-11-04 2022-03-01 维沃移动通信有限公司 语音消息处理方法、装置、电子设备及介质

Families Citing this family (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109819116A (zh) * 2019-03-20 2019-05-28 初心娃科技有限公司 社交聊天的方法及装置
CN110287364B (zh) * 2019-06-28 2021-10-08 合肥讯飞读写科技有限公司 语音搜索方法、系统、设备及计算机可读存储介质
CN110379413B (zh) * 2019-06-28 2022-04-19 联想(北京)有限公司 一种语音处理方法、装置、设备及存储介质
CN112397102B (zh) * 2019-08-14 2022-07-08 腾讯科技(深圳)有限公司 音频处理方法、装置及终端
CN112069796B (zh) * 2020-09-03 2023-08-04 阳光保险集团股份有限公司 一种语音质检方法、装置,电子设备及存储介质
CN112287162A (zh) * 2020-10-27 2021-01-29 维沃移动通信有限公司 消息搜索方法、装置和电子设备
CN112769678A (zh) * 2021-01-07 2021-05-07 维沃移动通信有限公司 语音消息处理方法、装置和电子设备
CN117253485B (zh) * 2023-11-20 2024-03-08 翌东寰球(深圳)数字科技有限公司 一种数据处理方法、装置、设备及存储介质

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103379460A (zh) * 2012-04-20 2013-10-30 华为终端有限公司 一种语音消息处理方法及终端
CN103581395A (zh) * 2012-08-01 2014-02-12 联想(北京)有限公司 一种显示方法及电子设备
CN104714981A (zh) * 2013-12-17 2015-06-17 腾讯科技(深圳)有限公司 语音消息搜索方法、装置及系统
CN106559540A (zh) * 2015-09-30 2017-04-05 北京奇虎科技有限公司 语音数据处理方法及装置
CN107346318A (zh) * 2016-05-06 2017-11-14 腾讯科技(深圳)有限公司 提取语音内容的方法及装置

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7912699B1 (en) * 2004-08-23 2011-03-22 At&T Intellectual Property Ii, L.P. System and method of lattice-based search for spoken utterance retrieval
CN101382937B (zh) * 2008-07-01 2011-03-30 深圳先进技术研究院 基于语音识别的多媒体资源处理方法及其在线教学系统
CN104078044B (zh) * 2014-07-02 2016-03-30 努比亚技术有限公司 移动终端及其录音搜索的方法和装置
CN105302925A (zh) * 2015-12-10 2016-02-03 百度在线网络技术(北京)有限公司 推送语音搜索数据的方法和装置
CN107391741A (zh) * 2017-08-09 2017-11-24 广东小天才科技有限公司 语音片段的搜索方法、搜索装置及终端设备

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103379460A (zh) * 2012-04-20 2013-10-30 华为终端有限公司 一种语音消息处理方法及终端
CN103581395A (zh) * 2012-08-01 2014-02-12 联想(北京)有限公司 一种显示方法及电子设备
CN104714981A (zh) * 2013-12-17 2015-06-17 腾讯科技(深圳)有限公司 语音消息搜索方法、装置及系统
CN106559540A (zh) * 2015-09-30 2017-04-05 北京奇虎科技有限公司 语音数据处理方法及装置
CN107346318A (zh) * 2016-05-06 2017-11-14 腾讯科技(深圳)有限公司 提取语音内容的方法及装置

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113299279A (zh) * 2021-05-18 2021-08-24 上海明略人工智能(集团)有限公司 用于关联语音数据和检索语音数据的方法、装置、电子设备和可读存储介质
CN114124875A (zh) * 2021-11-04 2022-03-01 维沃移动通信有限公司 语音消息处理方法、装置、电子设备及介质
CN114124875B (zh) * 2021-11-04 2023-12-19 维沃移动通信有限公司 语音消息处理方法、装置、电子设备及介质

Also Published As

Publication number Publication date
CN108874904A (zh) 2018-11-23
CN108874904B (zh) 2022-04-29

Similar Documents

Publication Publication Date Title
WO2019223134A1 (zh) 语音消息搜索方法、装置、计算机设备及存储介质
CN110164437B (zh) 一种即时通信的语音识别方法和终端
EP2901661B1 (en) Terminal and method for transmitting and receiving data
WO2019154153A1 (zh) 消息处理方法、未读消息的显示方法、计算机终端
US20150039319A1 (en) Command Handling Method, Apparatus, and System
US20080123823A1 (en) Method and system for detecting voice mail spam
US10091643B2 (en) Method and apparatus for displaying associated information in electronic device
US20140095673A1 (en) Systems and methods for transmitting and receiving data
WO2019228369A1 (zh) 消息处理方法及相关产品
WO2019179014A1 (zh) 语音消息搜索显示方法、装置、计算机设备及存储介质
CN102696249B (zh) 一种对消息中的数据进行处理的方法及移动终端
WO2011153863A1 (zh) 一种即时通信系统中的文字显示的方法、终端及系统
US8868419B2 (en) Generalizing text content summary from speech content
CN109151148B (zh) 通话内容的记录方法、装置、终端及计算机可读存储介质
CN113094143A (zh) 跨应用消息发送方法、装置和电子设备、可读存储介质
CN110120909B (zh) 消息的传输方法和装置、存储介质、电子装置
CN108270925B (zh) 语音信息的处理方法、装置、终端和计算机可读存储介质
KR101643808B1 (ko) 어플리케이션과 서버 간의 연동을 이용한 음성 서비스 제공 방법 및 그 시스템
CN106791226B (zh) 通话故障检测方法及系统
WO2018120882A1 (zh) 移动终端上获取事件信息的方法和移动终端
EP3059731A1 (en) Method and apparatus for automatically sending multimedia file, mobile terminal, and storage medium
CN113595884B (zh) 一种消息提醒方法及应用端
CN110602325B (zh) 一种终端的语音推荐方法和装置
US20080162489A1 (en) Apparatus and method for exchanging information between devices
CN113852835A (zh) 直播音频处理方法、装置、电子设备以及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18920153

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205N DATED 02.02.2021)

122 Ep: pct application non-entry in european phase

Ref document number: 18920153

Country of ref document: EP

Kind code of ref document: A1