TWI724507B

TWI724507B - Voice processing method and device

Info

Publication number: TWI724507B
Application number: TW108130240A
Authority: TW
Inventors: 柳林東
Original assignee: 開曼群島商創新先進技術有限公司
Priority date: 2018-11-22
Filing date: 2019-08-23
Publication date: 2021-04-11
Also published as: WO2020103562A1; TW202020652A; CN110018806A

Abstract

本發明提供一種語音處理方法和裝置。基於用戶語音播放次數判斷確定這段語音的資訊獲取難度，並主動提供不同的語音播放策略，提升用戶在語音交流場景下的使用體驗。The invention provides a voice processing method and device. Based on the number of times the user's voice is played, it determines the difficulty of obtaining information for this voice, and actively provides different voice playback strategies to improve the user's experience in voice communication scenarios.

Description

Voice processing method and device

本說明書涉及網際網路技術領域，尤其涉及一種語音處理方法和裝置。This specification relates to the field of Internet technology, and in particular to a voice processing method and device.

隨著網際網路技術的發展，傳統的聊天工具開始具備語音交流的功能，用戶除了打字發送文本資訊外，還可以選擇輸入並發送一段語音與他人進行聊天交流。在現有技術的語音聊天功能中，用戶在接收到一段語音後，可能因為環境嘈雜或者對方語速太快等因素，需要反覆傾聽某段語音來獲取其中包含的資訊量，用戶體驗較差，目前尚沒有對這種場景進行最佳化和處理的方案。With the development of Internet technology, traditional chat tools have begun to have the function of voice communication. In addition to typing and sending text messages, users can also choose to input and send a voice to chat with others. In the voice chat function of the prior art, after a user receives a voice, it may be due to factors such as a noisy environment or the other party’s speech speed. There is no plan for optimizing and processing this kind of scene.

針對上述技術問題，本發明提供一種語音處理方法和裝置，技術方案如下：根據本發明的第一態樣，提供一種語音處理方法，該方法包括：監測到單條語音播放後，確定所述語音資訊在預定時間內的已播放次數，判斷所述已播放次數是否處於預定的播放次數區間內；若所述已播放次數處於預定的播放次數區間內，則按照預定義的語音處理策略處理所述語音資訊。根據本發明的第二態樣，提供一種語音處理裝置，該裝置包括：播放次數監測模組：用於監測到單條語音播放後，確定所述語音資訊在預定時間內的已播放次數，判斷所述已播放次數是否處於預定的播放次數區間內；語音資訊處理模組：用於當所述已播放次數處於預定的播放次數區間內時，按照預定義的語音處理策略處理所述語音資訊。根據本發明的第三態樣，提供一種電腦設備，包括記憶體、處理器及儲存在記憶體上並可在處理器上運行的電腦程式，其中，所述處理器執行所述程式時實現一種語音播放方法，該方法包括：監測到單條語音播放後，確定所述語音資訊在預定時間內的已播放次數，判斷所述已播放次數是否處於預定的播放次數區間內；若所述已播放次數處於預定的播放次數區間內，則按照預定義的語音處理策略處理所述語音資訊。本發明所提供的技術方案，提供了一種語音處理方法，基於用戶語音播放次數判斷確定這段語音的資訊獲取難度，並主動提供不同的語音播放策略，提升用戶在語音交流場景下的使用體驗。應當理解的是，以上的一般描述和後文的細節描述僅是示例性和解釋性的，並不能限制本發明。此外，本發明中的任一實施例並不需要達到上述的全部效果。In view of the above technical problems, the present invention provides a voice processing method and device. The technical solutions are as follows: According to a first aspect of the present invention, there is provided a voice processing method, the method including: After monitoring the playback of a single voice, determine the number of times that the voice information has been played within a predetermined time, and determine whether the number of times that has been played is within a predetermined interval of times of playback; If the number of times played is within a predetermined interval of times of playing times, the voice information is processed according to a predefined voice processing strategy. According to a second aspect of the present invention, there is provided a voice processing device, which includes: Play count monitoring module: used to determine the number of times the voice information has been played within a predetermined time after monitoring a single piece of voice to be played, and to determine whether the number of times played is within a predetermined interval of play times; Voice information processing module: used to process the voice information according to a predefined voice processing strategy when the played times are within a predetermined play times interval. According to a third aspect of the present invention, there is provided a computer device including a memory, a processor, and a computer program stored on the memory and running on the processor, wherein the processor executes the program when the program is executed. Voice playback method, the method includes: After monitoring the playback of a single voice, determine the number of times that the voice information has been played within a predetermined time, and determine whether the number of times that has been played is within a predetermined interval of times of playback; If the number of times played is within a predetermined interval of times of playing times, the voice information is processed according to a predefined voice processing strategy. The technical solution provided by the present invention provides a voice processing method, which determines the difficulty of obtaining information of this segment of voice based on the number of times the user's voice is played, and actively provides different voice playback strategies to improve the user's experience in voice communication scenarios. It should be understood that the above general description and the following detailed description are only exemplary and explanatory, and cannot limit the present invention. In addition, any embodiment of the present invention does not need to achieve all the above-mentioned effects.

這裡將詳細地對示例性實施例進行說明，其示例表示在圖式中。下面的描述涉及圖式時，除非另有表示，不同圖式中的相同數字表示相同或相似的要素。以下示例性實施例中所描述的實施方式並不代表與本說明書相一致的所有實施方式。相反，它們僅是與如所附申請專利範圍中所詳述的、本說明書的一些態樣相一致的裝置和方法的例子。在本說明書使用的術語是僅僅出於描述特定實施例的目的，而非旨在限制本說明書。在本說明書和所附申請專利範圍中所使用的單數形式的“一種”、“所述”和“該”也旨在包括多數形式，除非上下文清楚地表示其他含義。還應當理解，本文中使用的術語“和/或”是指並包含一個或多個相關聯的列出專案的任何或所有可能組合。應當理解，儘管在本說明書可能採用術語第一、第二、第三等來描述各種資訊，但這些資訊不應限於這些術語。這些術語僅用來將同一類型的資訊彼此區分開。例如，在不脫離本說明書範圍的情況下，第一資訊也可以被稱為第二資訊，類似地，第二資訊也可以被稱為第一資訊。取決於語境，如在此所使用的詞語“如果”可以被解釋成為“在……時”或“當……時”或“回應於確定”。隨著網際網路技術的發展，傳統的聊天工具開始具備語音交流的功能，用戶除了打字發送文本資訊外，還可以選擇輸入並發送一段語音與他人進行聊天交流。在現有技術的語音聊天功能中，用戶在接收到一段語音後，可能因為環境嘈雜或者對方語速太快等因素，需要反覆傾聽某段語音來獲取其中包含的資訊量，用戶體驗較差，目前尚沒有對這種場景進行最佳化和處理的方案。針對以上問題，本發明提供一種語音處理方法，以及一種用於執行該方法的語音處理裝置，下面對本實施例涉及的語音處理方法進行詳細說明，參見圖1所示，該方法可以包括以下步驟： S101，監測到單條語音播放後，確定所述語音資訊在預定時間內的已播放次數； S102，判斷所述已播放次數是否處於預定的播放次數區間內；若所述已播放次數處於預定的播放次數區間內，執行步驟S103，若所述已播放次數未處於預定的播放次數區間內，則不操作。本實施例提供的方法應用於通過語音資訊進行交流的場景。具體地，該語音資訊指的是並不是打電話一類的語音通話，而是錄製好的一段音訊。舉例說明，在用戶使用微信進行交流的過程中，用戶可輸入一段語音資訊發送給自己指定的連絡人，也可收到並播放連絡人錄製好的一段語音資訊。在一些情況下，用戶可能無法順利獲取每一條語音資訊包含的資訊量，例如：發送語音資訊的連絡人語速過快，音量過低，發送環境較嘈雜。或用戶自身所處的環境較嘈雜等等。為了聽清對方的語音資訊，用戶通常會進行多次播放。在本實施例中，當用戶播放語音資訊後，確定該語音資訊在預定時間內的已播放次數是否處於預定的播放次數區間內。其中，該播放次數區間為預先劃分的，可由用戶自訂的次數區間。舉例說明，可將1-2次播放設定為第一次數區間，3-5次播放設定為第二次數區間，高於6次播放設定為第三次數區間。進一步地，可根據每條語音資訊的被播放次數所落在的次數區間範圍為該語音資訊選取不同的處理策略。需要注意的是，本實施例監測的是單條語音資訊在預定時間內的已播放次數，如，單條語音資訊在2分鐘內的已播放次數、如果語音資訊在超過預定時間，如幾天內被間隔性的反覆播放，則大概率並不是用戶聽不清楚語音資訊，不需要採取語音處理策略進行處理。 S103，按照預定義的語音處理策略處理所述語音資訊。具體地，語音處理策略可以包括降低所述語音資訊的播放速度、提高所述語音資訊的播放音量或將所述語音資訊轉換成文本顯示等等。每種語音處理策略可以單獨使用，在某些情況下，也可對同一條語音資訊使用一種以上的語音處理策略。其中，按照預定義的語音處理策略處理語音資訊可包含多種處理方式。下面舉出幾種較常用的處理方式，以下舉例並不用以限制本說明書，用戶可根據不同應用場景設定更多不同的處理方式。 a)若所述播放次數處於預定的播放次數區間內，則按照所述播放次數區間對應的語音處理策略處理所述語音資訊，其中，不同的播放次數區間被設定了對應的語音處理策略。舉例說明：如上文所述，可將1-2次播放設定為第一次數區間，3-5次播放設定為第二次數區間，高於6次播放設定為第三次數區間。當單條語音的播放次數落在第一次數區間時，不使用語音處理策略對該條語音進行處理；當單條語音的播放次數落在第二次數區間時，使用語音處理策略-按比例提高音量對該條語音進行處理；當單條語音的播放次數落在第三次數區間時，使用語音處理策略-按比例提高音量和語音策略按比例降低播放速度共同對該條語音進行處理。其中，每種次數區間被設定的語音處理策略可以不同，也可以相同。不同次數區間對應的語音處理策略可由用戶自行設定。 b)若所述播放次數處於預定的播放次數區間內，則檢測所述語音資訊的語音品質問題，根據檢測結果選擇對應的語音處理策略處理所述語音資訊。舉例說明，可將高於3次播放設定為第一次數區間，當單條語音的播放次數落在第一次數區間時，檢測該條語音資訊的語音品質問題。語音品質問題可能包括：音量太小，語速過快，背景音過於嘈雜等。則可依據檢測出的不同語音品質問題採取相對應的語音處理策略，如，提高音量，放慢播放速度，進行降噪處理等。在一些較為簡單和普遍的應用場景中，可只設定一個次數區間，並對應設定該次數區間的處理策略。參見圖2所示，為本實施例提供的一種語音播放方法，該方法可以包括以下步驟： S201，監測到單條語音播放後，確定所述語音資訊在預定時間內的已播放次數； S202，判斷所述已播放次數是否高於預設閾值；若所述已播放次數高於預設閾值，執行步驟S203，若所述已播放次數未高於預設閾值，則不操作； S203，按照預定義的語音處理策略處理所述語音資訊。具體可為降低所述語音資訊的播放速度、提高所述語音資訊的播放音量或將所述語音資訊轉換成文本顯示等。該預定義的語音處理策略為用戶預先設定的，在語音資訊播放次數高於預設閾值後採取的語音處理策略。舉例說明，只要單條語音資訊在2分鐘內的播放次數高於3次，則提高該條語音資訊的播放音量。也可在首次監測到有語音資訊的播放次數高於預設閾值後，將不同的語音處理策略作為可選項展示給用戶，在用戶選擇後，後續自動使用該語音處理策略處理播放次數高於預設閾值的語音資訊。具體地，在用戶如何預先設定不同語音處理策略的問題上，有多種可行方式，下面舉出幾種較常用的處理方式，以下舉例並不用以限制本說明書，用戶可根據不同應用場景設定更多不同的處理方式。 a)針對連絡人情況進行設定，用戶可為不同的連絡人設定一種或多種常用語音處理策略。舉例說明，若連絡人A語速較快，則為該連絡人設定語音處理策略-放慢播放速度，當用戶播放連絡人A的語音資訊到達預設次數區間後，自動放慢來自連絡人A的語音資訊的播放速度；若連絡人D的方言較重，則為該連絡人設定語音處理策略-轉為文本，當用戶播放連絡人D的語音資訊到達預設次數區間後，自動將來自連絡人D的語音資訊轉為文本展示。 b)針對自身情況進行設定，例如自身所處環境嘈雜，則將語音處理策略設定為-提高音量；或自身處於不方便播放語音資訊的環境中，則可將語音處理策略設定為-轉為文本展示。進一步地，該播放次數區間可被設定為0，如上文，在自身所處環境不方便播放語音資訊時，不需要檢測已被播放次數，直接將收到的語音資訊自動轉為文本展示。進一步地，當檢測到用戶在一段時間內對多條語音資訊進行反覆播放後，可進一步提供一種更為智慧地處理方式，參見圖3所示，為本說明書提供的語音播放方法，該方法可以包括以下步驟： S301，檢測在預定時間內被語音處理策略處理過的語音資訊的條數； S302，判斷所述被語音處理策略處理過的語音資訊的條數是否高於預定閾值，若所述條數高於預定閾值，執行步驟S303，若所述條數不高於預定閾值，則不操作； S303，自動使用預定義的語音處理策略處理後續收到的語音資訊；具體地，若預定時間內被語音處理策略處理過的語音資訊的條數是否高於預定閾值，則說明用戶在一段時間內對多條語音資訊進行反覆播放。則可進一步去掉“反覆播放”這一判定步驟，將後續收到的語音資訊都使用語音處理策略進行處理。進一步地，可確定預定時間內使用次數最多的語音處理策略，自動使用所述使用次數最多的語音處理策略處理後續收到的語音資訊。進一步地，當檢測到用戶在一段時間內對多條語音資訊進行反覆播放後，可進一步判斷造成反覆播放的是否為單個連絡人，參見圖4所示，為本說明書提供的一種語音播放方法，該方法可以包括以下步驟： S401，確定預定時間內，被語音處理策略處理過的語音資訊的條數高於預設閾值的連絡人； S402，使用預定義的語音處理策略處理來自所述連絡人的後續語音資訊。具體地，若預定時間內被語音處理策略處理過的語音資訊的條數是否高於預定閾值，則說明用戶在一段時間內對多條語音資訊進行反覆播放。若該多條語音資訊來自相同連絡人，而其他連絡人的語音資訊並未被多次處理，則可判定這段時間內，來自該連絡人的語音資訊需要進行智慧的後續處理。進一步地，可確定預定時間內對該連絡人的語音資訊使用過次數最多的語音處理策略，自動使用所述使用次數最多的語音處理策略處理後續收到的，來自該連絡人的語音資訊。或，具體檢測該連絡人的語音資訊的語音品質問題，根據語音品質檢測結果選取有針對性的語音處理策略處理後續收到的，來自該連絡人的語音資訊。或，為用戶展示可選用的，針對該連絡人的語音改善選項，並使用被選擇的語音處理策略處理後續收到的，來自該連絡人的語音資訊。相應於上述方法實施例，本發明還提供一種語音處理方法裝置，應用於用戶端，參見圖5所示，所述裝置可以包括：播放次數監測模組510和語音資訊處理模組520。播放次數監測模組510：用於監測到單條語音播放後，確定所述語音資訊在預定時間內的已播放次數，判斷所述已播放次數是否處於預定的播放次數區間內；語音資訊處理模組520：用於當所述已播放次數處於預定的播放次數區間內時，按照預定義的語音處理策略處理所述語音資訊。本發明還提供一種電腦設備，其至少包括記憶體、處理器及儲存在記憶體上並可在處理器上運行的電腦程式，其中，處理器執行所述程式時實現前述語音處理方法，所述方法包括：監測到單條語音播放後，確定所述語音資訊在預定時間內的已播放次數，判斷所述已播放次數是否處於預定的播放次數區間內；若所述已播放次數處於預定的播放次數區間內，則按照預定義的語音處理策略處理所述語音資訊。圖6示出了本發明所提供的一種更為具體的計算設備硬體結構示意圖，該設備可以包括：處理器1110、記憶體1120、輸入/輸出介面1130、通信介面1140和匯流排1150。其中處理器1110、記憶體1120、輸入/輸出介面1130和通信介面1140通過匯流排1150實現彼此之間在設備內部的通信連接。處理器1110可以採用通用的CPU(Central Processing Unit，中央處理器)、微處理器、應用專用積體電路(Application Specific Integrated Circuit，ASIC)、或者一個或多個積體電路等方式實現，用於執行相關程式，以實現本發明所提供的技術方案。記憶體1120可以採用ROM(Read Only Memory，唯讀記憶體)、RAM(Random Access Memory，隨機存取記憶體)、靜態存放裝置，動態儲存裝置設備等形式實現。記憶體1120可以儲存作業系統和其他應用程式，在通過軟體或者韌體來實現本發明所提供的技術方案時，相關的程式碼保存在記憶體1120中，並由處理器1110來調用執行。輸入/輸出介面1130用於連接輸入/輸出模組，以實現資訊輸入及輸出。輸入輸出/模組可以作為元件配置在設備中(圖中未示出)，也可以外接於設備以提供相應功能。其中輸入裝置可以包括鍵盤、滑鼠、觸控式螢幕、麥克風、各類感測器等，輸出設備可以包括顯示器、揚聲器、振動器、指示燈等。通信介面1140用於連接通信模組(圖中未示出)，以實現本設備與其他設備的通信交互。其中通信模組可以通過有線方式(例如USB、網路線等)實現通信，也可以通過無線方式(例如行動網路、WIFI、藍牙等)實現通信。匯流排1150包括一通路，在設備的各個元件(例如處理器1110、記憶體1120、輸入/輸出介面1130和通信介面1140)之間傳輸資訊。需要說明的是，儘管上述設備僅示出了處理器1110、記憶體1120、輸入/輸出介面1130、通信介面1140以及匯流排1150，但是在具體實施過程中，該設備還可以包括實現正常運行所必需的其他元件。此外，本領域的技術人員可以理解的是，上述設備中也可以僅包含實現本發明方案所必需的組件，而不必包含圖中所示的全部元件。本發明還提供一種電腦可讀儲存介質，其上儲存有電腦程式，該程式被處理器執行時實現前述的語音處理方法，所述方法包括：監測到單條語音播放後，確定所述語音資訊在預定時間內的已播放次數，判斷所述已播放次數是否處於預定的播放次數區間內；若所述已播放次數處於預定的播放次數區間內，則按照預定義的語音處理策略處理所述語音資訊。電腦可讀介質包括永久性和非永久性、可行動和非可行動媒體可以由任何方法或技術來實現資訊儲存。資訊可以是電腦可讀指令、資料結構、程式的模組或其他資料。電腦的儲存介質的例子包括，但不限於相變記憶體(PRAM)、靜態隨機存取記憶體(SRAM)、動態隨機存取記憶體(DRAM)、其他類型的隨機存取記憶體(RAM)、唯讀記憶體(ROM)、電可擦除可程式設計唯讀記憶體(EEPROM)、快閃記憶體或其他記憶體技術、唯讀光碟唯讀記憶體(CD-ROM)、數位多功能光碟(DVD)或其他光學儲存、磁盒式磁帶，磁帶磁磁片儲存或其他磁性存放裝置或任何其他非傳輸介質，可用於儲存可以被計算設備存取的資訊。按照本文中的界定，電腦可讀介質不包括暫存電腦可讀媒體(transitory media)，如調變的資料信號和載波。對於裝置實施例而言，由於其基本對應於方法實施例，所以相關之處參見方法實施例的部分說明即可。以上所描述的裝置實施例僅僅是示意性的，其中所述作為分離元件說明的單元可以是或者也可以不是實體上分開的，作為單元顯示的元件可以是或者也可以不是實體單元，即可以位於一個地方，或者也可以分佈到多個網路單元上。可以根據實際的需要選擇其中的部分或者全部模組來實現本說明書方案的目的。本領域普通技術人員在不付出創造性勞動的情況下，即可以理解並實施。通過以上的實施方式的描述可知，本領域的技術人員可以清楚地瞭解到本發明可借助軟體加必需的通用硬體平臺的方式來實現。基於這樣的理解，本發明的技術方案本質上或者說對現有技術做出貢獻的部分可以以軟體產品的形式體現出來，該電腦軟體產品可以儲存在儲存介質中，如ROM/RAM、磁碟、光碟等，包括若干指令用以使得一台電腦設備(可以是個人電腦，伺服器，或者網路設備等)執行本發明各個實施例或者實施例的某些部分所述的方法。上述實施例闡明的系統、裝置、模組或單元，具體可以由電腦晶片或實體實現，或者由具有某種功能的產品來實現。一種典型的實現設備為電腦，電腦的具體形式可以是個人電腦、筆記型電腦、蜂巢式電話、相機電話、智慧型電話、個人數位助理、媒體播放機、導航設備、電子郵件收發設備、遊戲控制台、平板電腦、可穿戴設備或者這些設備中的任意幾種設備的組合。本說明書中的各個實施例均採用遞進的方式描述，各個實施例之間相同相似的部分互相參見即可，每個實施例重點說明的都是與其他實施例的不同之處。尤其，對於裝置實施例而言，由於其基本相似於方法實施例，所以描述得比較簡單，相關之處參見方法實施例的部分說明即可。以上所描述的裝置實施例僅僅是示意性的，其中所述作為分離元件說明的模組可以是或者也可以不是實體上分開的，在實施本發明方案時可以把各模組的功能在同一個或多個軟體和/或硬體中實現。也可以根據實際的需要選擇其中的部分或者全部模組來實現本實施例方案的目的。本領域普通技術人員在不付出創造性勞動的情況下，即可以理解並實施。以上所述僅是本發明的具體實施方式，應當指出，對於本技術領域的普通技術人員來說，在不脫離本發明原理的前提下，還可以做出若干改進和潤飾，這些改進和潤飾也應視為本發明的保護範圍。The exemplary embodiments will be described in detail here, and examples thereof are shown in the drawings. When the following description refers to the drawings, unless otherwise indicated, the same numbers in different drawings indicate the same or similar elements. The implementation manners described in the following exemplary embodiments do not represent all implementation manners consistent with this specification. On the contrary, they are only examples of devices and methods consistent with some aspects of this specification as detailed in the scope of the appended application. The terms used in this specification are only for the purpose of describing specific embodiments, and are not intended to limit the specification. The singular forms of "a", "the" and "the" used in this specification and the scope of the appended applications are also intended to include plural forms, unless the context clearly indicates other meanings. It should also be understood that the term "and/or" as used herein refers to and includes any or all possible combinations of one or more associated listed items. It should be understood that although the terms first, second, third, etc. may be used in this specification to describe various information, the information should not be limited to these terms. These terms are only used to distinguish the same type of information from each other. For example, without departing from the scope of this specification, the first information can also be referred to as second information, and similarly, the second information can also be referred to as first information. Depending on the context, the word "if" as used herein can be interpreted as "when" or "when" or "in response to certainty". With the development of Internet technology, traditional chat tools have begun to have the function of voice communication. In addition to typing and sending text messages, users can also choose to input and send a voice to chat with others. In the voice chat function of the prior art, after the user receives a piece of voice, it may be due to factors such as noisy environment or the other party’s speaking speed that they need to listen to a certain voice repeatedly to obtain the amount of information contained therein. The user experience is poor. There is no plan for optimizing and processing this kind of scene. In view of the above problems, the present invention provides a voice processing method and a voice processing device for executing the method. The voice processing method involved in this embodiment will be described in detail below. As shown in FIG. 1, the method may include the following steps: S101: After monitoring the playback of a single voice, determine the number of times the voice information has been played within a predetermined time; S102. Determine whether the number of times played is within a predetermined number of times of play interval; if the number of times played is within a predetermined number of times of play interval, perform step S103, if the number of times played is not within the predetermined number of times of play interval, No operation. The method provided in this embodiment is applied to a scenario where voice information is used for communication. Specifically, the voice information refers to not a voice call such as a phone call, but a recorded piece of audio. For example, when a user uses WeChat to communicate, the user can input a voice message and send it to a contact person designated by him, or he can receive and play a voice message recorded by the contact person. In some cases, the user may not be able to successfully obtain the amount of information contained in each voice message. For example, the contact person who sends the voice message speaks too fast, the volume is too low, and the sending environment is noisy. Or the user's own environment is noisy and so on. In order to hear the voice information of the other party clearly, the user usually plays it multiple times. In this embodiment, after the user plays the voice information, it is determined whether the number of times the voice information has been played within a predetermined time is within a predetermined play times interval. Wherein, the play times interval is a pre-divided time interval that can be customized by the user. For example, 1-2 times of play can be set as the first time interval, 3-5 times of play can be set as the second time interval, and higher than 6 times can be set as the third time interval. Further, different processing strategies can be selected for each piece of voice information according to the range of the number of times that each piece of voice information has been played. It should be noted that this embodiment monitors the number of times a single voice message has been played within a predetermined time, for example, the number of times a single voice message has been played within 2 minutes, if the voice message exceeds the predetermined time, such as being played within a few days. Interval repetitive playback, the high probability is not that the user cannot hear the voice information, and does not need to adopt a voice processing strategy for processing. S103: Process the voice information according to a predefined voice processing strategy. Specifically, the voice processing strategy may include reducing the playback speed of the voice information, increasing the playback volume of the voice information, or converting the voice information into text display, and so on. Each voice processing strategy can be used separately, and in some cases, more than one voice processing strategy can be used for the same voice information. Among them, processing voice information according to a predefined voice processing strategy may include multiple processing methods. Here are a few more commonly used processing methods. The following examples are not intended to limit this manual. Users can set more different processing methods according to different application scenarios. a) If the play times are within a predetermined play times interval, the voice information is processed according to the voice processing strategy corresponding to the play times interval, wherein the corresponding voice processing strategies are set for different play times intervals. For example: As mentioned above, 1-2 times of play can be set as the first time interval, 3-5 times of play can be set as the second time interval, and higher than 6 times can be set as the third time interval. When the number of times a single voice is played falls within the first time interval, the voice processing strategy is not used to process the voice; when the number of times a single voice plays falls within the second time interval, the voice processing strategy is used-increase the volume proportionally Process the voice; when the number of times a single voice is played falls within the third time interval, use the voice processing strategy-increasing the volume proportionally and the voice strategy proportionally reducing the playback speed to process the voice together. Among them, the voice processing strategy set for each frequency interval can be different or the same. The voice processing strategy corresponding to different frequency intervals can be set by the user. b) If the number of times of play is within a predetermined number of times of play interval, detect the voice quality problem of the voice information, and select a corresponding voice processing strategy to process the voice information according to the detection result. For example, it is possible to set playback times higher than 3 times as the first times interval, and when the playing times of a single voice falls within the first times interval, the voice quality problem of the piece of voice information is detected. Voice quality problems may include: the volume is too low, the speaking rate is too fast, the background sound is too noisy, etc. Corresponding voice processing strategies can be adopted according to the detected different voice quality problems, such as increasing the volume, slowing down the playback speed, and performing noise reduction processing. In some relatively simple and common application scenarios, only one frequency interval can be set, and the processing strategy for the frequency interval can be set correspondingly. As shown in FIG. 2, a voice playback method provided in this embodiment may include the following steps: S201: After monitoring the playback of a single voice, determine the number of times the voice information has been played within a predetermined time; S202: Determine whether the number of times played is higher than a preset threshold; if the number of times played is higher than the preset threshold, step S203 is executed, and if the number of times played is not higher than the preset threshold, no operation is performed; S203: Process the voice information according to a predefined voice processing strategy. Specifically, it may be reducing the playback speed of the voice information, increasing the playback volume of the voice information, or converting the voice information into text display. The predefined voice processing strategy is a voice processing strategy that is preset by the user and is adopted after the number of times the voice information is played is higher than a preset threshold. For example, as long as a single piece of voice information is played more than 3 times in 2 minutes, the playback volume of that piece of voice information is increased. It is also possible to display different voice processing strategies as optional options to the user after the first time it is detected that the number of playbacks with voice information is higher than the preset threshold. After the user selects, the subsequent voice processing strategy is automatically used to process the playback times higher than the preset threshold. Set the threshold of voice information. Specifically, there are many possible ways for users to pre-set different voice processing strategies. The following are some of the more commonly used processing methods. The following examples are not intended to limit this manual. Users can set more according to different application scenarios. Different processing methods. a) Set according to the contact situation, the user can set one or more common voice processing strategies for different contacts. For example, if contact A speaks faster, set a voice processing strategy for the contact-slow down the playback speed. When the user plays the voice information of contact A to the preset number of times, it will automatically slow down. If the dialect of contact person D is heavier, set the voice processing strategy for the contact-convert to text. When the user plays the voice information of contact person D to the preset number of times, it will automatically come from the contact Person D’s voice information is converted to text display. b) Set according to your own situation. For example, if you are in a noisy environment, set the voice processing strategy to-increase the volume; or if you are in an environment where it is not convenient to play voice information, you can set the voice processing strategy to-convert to text Show. Further, the playing times interval can be set to 0. As mentioned above, when the voice information is inconvenient to play in the environment in which one is located, there is no need to detect the number of times that it has been played, and the received voice information is automatically converted into text display. Further, when it is detected that the user has repeatedly played multiple pieces of voice information within a period of time, a more intelligent processing method can be provided. Refer to Figure 3 for the voice playback method provided in this manual. This method can It includes the following steps: S301: Detect the number of pieces of voice information processed by the voice processing strategy within a predetermined time; S302: Determine whether the number of pieces of voice information processed by the voice processing strategy is higher than a predetermined threshold; if the number of pieces is higher than the predetermined threshold, perform step S303; if the number of pieces is not higher than the predetermined threshold, no operating; S303: Automatically use a predefined voice processing strategy to process the subsequently received voice information; Specifically, if the number of pieces of voice information processed by the voice processing strategy within a predetermined period of time is higher than a predetermined threshold, it means that the user repeatedly plays multiple pieces of voice information within a period of time. Then the judgment step of "play repeatedly" can be further removed, and the subsequent received voice information will be processed using the voice processing strategy. Further, the voice processing strategy that is used the most frequently within a predetermined time can be determined, and the voice processing strategy that is used the most frequently is automatically used to process the subsequently received voice information. Further, when it is detected that the user has repeatedly played multiple pieces of voice information within a period of time, it can be further determined whether the repeated playback is a single contact. See Figure 4, which is a voice playback method provided in this manual. The method can include the following steps: S401: Determine contacts whose number of voice information processed by the voice processing strategy is higher than a preset threshold within a predetermined period of time; S402: Use a predefined voice processing strategy to process subsequent voice information from the contact. Specifically, if the number of pieces of voice information processed by the voice processing strategy within a predetermined period of time is higher than a predetermined threshold, it means that the user repeatedly plays multiple pieces of voice information within a period of time. If the multiple pieces of voice information are from the same contact, and the voice information of other contacts has not been processed multiple times, it can be determined that the voice information from the contact needs to be processed intelligently during this period of time. Further, the voice processing strategy that has been used the most frequently for the voice information of the contact within a predetermined time can be determined, and the voice processing strategy that has been used the most frequently is automatically used to process the subsequent received voice information from the contact. Or, specifically detecting the voice quality problem of the voice information of the contact, and selecting a targeted voice processing strategy according to the voice quality detection result to process the subsequently received voice information from the contact. Or, show the user the selectable voice improvement options for the contact, and use the selected voice processing strategy to process the subsequent received voice information from the contact. Corresponding to the above method embodiments, the present invention also provides a voice processing method and device, which is applied to the user terminal. As shown in FIG. 5, the device may include: a playback frequency monitoring module 510 and a voice information processing module 520. Play count monitoring module 510: used to determine the number of times the voice information has been played within a predetermined time after a single piece of voice has been played, and determine whether the number of times played is within a predetermined interval of play times; The voice information processing module 520 is used to process the voice information according to a predefined voice processing strategy when the number of times played is within the interval of the number of times played. The present invention also provides a computer device, which at least includes a memory, a processor, and a computer program stored on the memory and capable of running on the processor, wherein the processor implements the aforementioned voice processing method when the program is executed. Methods include: After monitoring the playback of a single voice, determine the number of times that the voice information has been played within a predetermined time, and determine whether the number of times that has been played is within a predetermined interval of times of playback; If the number of times played is within a predetermined interval of times of playing times, the voice information is processed according to a predefined voice processing strategy. 6 shows a more specific hardware structure diagram of a computing device provided by the present invention. The device may include a processor 1110, a memory 1120, an input/output interface 1130, a communication interface 1140, and a bus 1150. The processor 1110, the memory 1120, the input/output interface 1130, and the communication interface 1140 realize the communication connection between each other in the device through the bus 1150. The processor 1110 can use a general-purpose CPU (Central Processing Unit, central processing unit), microprocessor, application-specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, etc., used to execute related programs to achieve what the present invention provides Technical solutions. The memory 1120 can be implemented in the form of ROM (Read Only Memory), RAM (Random Access Memory), static storage device, dynamic storage device, etc. The memory 1120 can store an operating system and other application programs. When the technical solution provided by the present invention is implemented through software or firmware, the related program codes are stored in the memory 1120 and called and executed by the processor 1110. The input/output interface 1130 is used to connect input/output modules to realize information input and output. The input/output/module can be configured in the device as a component (not shown in the figure), or can be connected to the device to provide corresponding functions. The input device may include a keyboard, a mouse, a touch screen, a microphone, various sensors, etc., and the output device may include a display, a speaker, a vibrator, an indicator light, and so on. The communication interface 1140 is used to connect a communication module (not shown in the figure) to realize the communication interaction between the device and other devices. The communication module can realize communication through wired means (such as USB, network cable, etc.), or through wireless means (such as mobile network, WIFI, Bluetooth, etc.). The bus 1150 includes a path for transmitting information between various components of the device (for example, the processor 1110, the memory 1120, the input/output interface 1130, and the communication interface 1140). It should be noted that although the above device only shows the processor 1110, the memory 1120, the input/output interface 1130, the communication interface 1140, and the bus 1150, in the specific implementation process, the device may also include a device for normal operation. Other necessary components. In addition, those skilled in the art can understand that the above-mentioned equipment may also include only the components necessary for implementing the solution of the present invention, and not necessarily all the elements shown in the figures. The present invention also provides a computer-readable storage medium on which a computer program is stored. When the program is executed by a processor, the aforementioned voice processing method is realized, and the method includes: After monitoring the playback of a single voice, determine the number of times that the voice information has been played within a predetermined time, and determine whether the number of times that has been played is within a predetermined interval of times of playback; If the number of times played is within a predetermined interval of times of playing times, the voice information is processed according to a predefined voice processing strategy. Computer-readable media include permanent and non-permanent, movable and non-movable media, and information storage can be realized by any method or technology. Information can be computer-readable instructions, data structures, program modules, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), and other types of random access memory (RAM) , Read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), flash memory or other memory technology, CD-ROM, digital multi-function Optical discs (DVD) or other optical storage, magnetic cassettes, magnetic tape storage or other magnetic storage devices or any other non-transmission media that can be used to store information that can be accessed by computing devices. According to the definition in this article, computer-readable media does not include transitory media, such as modulated data signals and carrier waves. For the device embodiment, since it basically corresponds to the method embodiment, the relevant part can refer to the part of the description of the method embodiment. The device embodiments described above are merely illustrative, where the units described as separate elements may or may not be physically separate, and the elements displayed as units may or may not be physical units, that is, they may be located in One place, or it can be distributed to multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution in this specification. Those of ordinary skill in the art can understand and implement it without creative work. It can be seen from the description of the above embodiments that those skilled in the art can clearly understand that the present invention can be implemented by means of software plus a necessary universal hardware platform. Based on this understanding, the technical solution of the present invention essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product can be stored in a storage medium, such as ROM/RAM, magnetic disk, An optical disc, etc., includes a number of instructions to make a computer device (which can be a personal computer, a server, or a network device, etc.) execute the methods described in the various embodiments or some parts of the embodiments of the present invention. The systems, devices, modules, or units explained in the above embodiments may be implemented by computer chips or entities, or implemented by products with certain functions. A typical implementation device is a computer. The specific form of the computer can be a personal computer, a notebook computer, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email receiving and sending device, and a game control A desktop, a tablet, a wearable device, or a combination of any of these devices. The various embodiments in this specification are described in a progressive manner, and the same or similar parts between the various embodiments can be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, as for the device embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and for related parts, please refer to the part of the description of the method embodiment. The device embodiments described above are only illustrative, and the modules described as separate elements may or may not be physically separated. When implementing the solution of the present invention, the functions of each module may be in the same Or multiple software and/or hardware implementations. Some or all of the modules can also be selected according to actual needs to achieve the objectives of the solutions of the embodiments. Those of ordinary skill in the art can understand and implement it without creative work. The above are only specific embodiments of the present invention. It should be pointed out that for those of ordinary skill in the art, without departing from the principle of the present invention, several improvements and modifications can be made, and these improvements and modifications are also It should be regarded as the protection scope of the present invention.

S101-S402:步驟 510:播放次數監測模組 520:語音資訊處理模組 1110:處理器 1120:記憶體 1130:輸入/輸出介面 1140:通信介面 1150:匯流排S101-S402: steps 510: Play count monitoring module 520: Voice Information Processing Module 1110: processor 1120: memory 1130: input/output interface 1140: Communication interface 1150: bus

為了更清楚地說明本發明或現有技術中的技術方案，下面將對實施例或現有技術描述中所需要使用的圖式作簡單地介紹，顯而易見地，下面描述中的圖式僅僅是本發明中記載的一些實施例，對於本領域普通技術人員來講，還可以根據這些圖式獲得其他的圖式。圖1是本說明書一示例性實施例示出的語音處理方法的一種流程圖；圖2是本說明書一示例性實施例示出的語音處理方法的另一種流程圖；圖3是本說明書一示例性實施例示出的後續語音處理方法的一種流程圖；圖4是本說明書一示例性實施例示出的後續語音處理方法的另一種流程圖；圖5是本說明書一示例性實施例示出的語音處理裝置的一種示意圖；圖6是本說明書一示例性實施例示出的一種電腦設備的結構示意圖。In order to more clearly explain the technical solutions of the present invention or the prior art, the following will briefly introduce the drawings that need to be used in the embodiments or the prior art description. Obviously, the drawings in the following description are only used in the present invention. For some of the described embodiments, those of ordinary skill in the art can also obtain other drawings based on these drawings. Fig. 1 is a flowchart of a voice processing method shown in an exemplary embodiment of this specification; Fig. 2 is another flowchart of a voice processing method shown in an exemplary embodiment of this specification; Fig. 3 is a flow chart of a subsequent voice processing method shown in an exemplary embodiment of this specification; Fig. 4 is another flowchart of a subsequent voice processing method shown in an exemplary embodiment of this specification; Fig. 5 is a schematic diagram of a voice processing device shown in an exemplary embodiment of this specification; Fig. 6 is a schematic structural diagram of a computer device shown in an exemplary embodiment of this specification.

Claims

A voice processing method, the method comprising: after monitoring the playback of a single voice, determining the number of times that the voice information has been played within a predetermined time, and judging whether the number of played times is within a predetermined number of times of playback; Within the interval of playing times, the voice information is processed according to a predefined voice processing strategy. After the voice information is processed according to the predefined voice processing strategy, it also includes: determining that the voice information has been processed by the voice processing strategy within a predetermined period of time A contact whose number of voice information is higher than a preset threshold will automatically use a predefined voice processing strategy to process subsequent voice information from the contact.

According to the method described in claim 1, if the number of times played is within a predetermined interval of playing times, processing the voice information according to a predefined voice processing strategy includes: if the number of times played is within a predetermined interval of playing times, Then, the voice information is processed according to the voice processing strategy corresponding to the playing times interval, wherein the corresponding voice processing strategies are set for different playing times intervals.

According to the method described in claim 1, if the number of times played is within a predetermined number of times played, processing the voice information according to a predefined voice processing strategy includes: If the play times are within the predetermined play times interval, the voice quality problem of the voice information is detected, and the corresponding voice processing strategy is selected to process the voice information according to the detection result.

According to the method described in claim 1, the predefined voice processing strategy includes: reducing the playback speed of the voice information, increasing the playback volume of the voice information, or converting the voice information into text display.

As described in claim 1, after processing the voice information according to the predefined voice processing strategy, the method further includes: detecting the number of voice information processed by the voice processing strategy within a predetermined time, and if the number is higher than If the predetermined threshold is set, the pre-defined voice processing strategy is automatically used to process the subsequent received voice information.

According to the method described in claim 5, the automatic use of a predefined voice processing strategy to process the subsequently received voice information includes: determining the most frequently used voice processing strategy within a predetermined time, and automatically using the most frequently used voice processing strategy Process the voice information received later.

A voice processing device, the device comprising: a playback frequency monitoring module: used to monitor the playback of a single voice, determine the number of times the voice information has been played within a predetermined time, and determine whether the number of played times is within a predetermined interval of playback times ； Voice information processing module: used to process the voice information according to a predefined voice processing strategy when the number of times played is within a predetermined interval of playing times, wherein, after the voice information is processed according to the predefined voice processing strategy, It also includes: determining that the number of voice messages processed by the voice processing strategy is higher than the preset contact within a predetermined period of time, and automatically using the predefined voice processing strategy to process subsequent voice information from the contact.

For the device according to claim 7, if the number of played times is within a predetermined interval of playing times, processing the voice information according to a predefined voice processing strategy includes: if the number of times played is within the predetermined interval of playing times, Then, the voice information is processed according to the voice processing strategy corresponding to the playing times interval, wherein the corresponding voice processing strategies are set for different playing times intervals.

For the device according to claim 7, if the number of played times is within a predetermined interval of playing times, processing the voice information according to a predefined voice processing strategy includes: if the number of times played is within the predetermined interval of playing times, Then, the voice quality problem of the voice information is detected, and the corresponding voice processing strategy is selected according to the detection result to process the voice information.

The device according to claim 7, the predefined voice processing strategy, Including: reducing the playback speed of the voice information, increasing the playback volume of the voice information, or converting the voice information into text display.

According to the device of claim 7, after processing the voice information according to the predefined voice processing strategy, it further includes: detecting the number of voice information processed by the voice processing strategy within a predetermined time, if the number is higher than If the predetermined threshold is set, the pre-defined voice processing strategy is automatically used to process the subsequent received voice information.

For the device according to claim 11, the automatic use of a predefined voice processing strategy to process the subsequently received voice information includes: determining the voice processing strategy used most frequently within a predetermined time, and automatically using the most frequently used voice processing strategy Process the voice information received later.

A computer device includes a memory, a processor, and a computer program that is stored on the memory and can run on the processor, wherein the processor implements the method described in claim 1 when the program is executed.