TW202020652A

TW202020652A - Voice processing method and apparatus

Info

Publication number: TW202020652A
Application number: TW108130240A
Authority: TW
Inventors: 柳林東
Original assignee: 香港商阿里巴巴集團服務有限公司
Priority date: 2018-11-22
Filing date: 2019-08-23
Publication date: 2020-06-01
Also published as: WO2020103562A1; TWI724507B; CN110018806A

Abstract

Provided are a voice processing method and apparatus. The information acquisition difficulty of a voice is determined based on the determination of the number of times a user plays the voice, and different voice playing strategies are actively provided, thereby improving the use experience for the user in a voice communication scenario.

Description

Voice processing method and device

本說明書涉及網際網路技術領域，尤其涉及一種語音處理方法和裝置。This specification relates to the field of Internet technology, and in particular to a voice processing method and device.

隨著網際網路技術的發展，傳統的聊天工具開始具備語音交流的功能，用戶除了打字發送文本資訊外，還可以選擇輸入並發送一段語音與他人進行聊天交流。在現有技術的語音聊天功能中，用戶在接收到一段語音後，可能因為環境嘈雜或者對方語速太快等因素，需要反覆傾聽某段語音來獲取其中包含的資訊量，用戶體驗較差，目前尚沒有對這種場景進行最佳化和處理的方案。With the development of Internet technology, traditional chat tools begin to have the function of voice communication. In addition to typing and sending text information, users can also choose to enter and send a voice to chat with others. In the voice chat function of the prior art, after receiving a voice, the user may need to listen to a certain voice repeatedly to obtain the amount of information contained due to factors such as noisy environment or the other party's speech speed is too fast. The user experience is poor. There is no plan to optimize and deal with this scenario.

針對上述技術問題，本發明提供一種語音處理方法和裝置，技術方案如下：根據本發明的第一態樣，提供一種語音處理方法，該方法包括：監測到單條語音播放後，確定所述語音資訊在預定時間內的已播放次數，判斷所述已播放次數是否處於預定的播放次數區間內；若所述已播放次數處於預定的播放次數區間內，則按照預定義的語音處理策略處理所述語音資訊。根據本發明的第二態樣，提供一種語音處理裝置，該裝置包括：播放次數監測模組：用於監測到單條語音播放後，確定所述語音資訊在預定時間內的已播放次數，判斷所述已播放次數是否處於預定的播放次數區間內；語音資訊處理模組：用於當所述已播放次數處於預定的播放次數區間內時，按照預定義的語音處理策略處理所述語音資訊。根據本發明的第三態樣，提供一種電腦設備，包括記憶體、處理器及儲存在記憶體上並可在處理器上運行的電腦程式，其中，所述處理器執行所述程式時實現一種語音播放方法，該方法包括：監測到單條語音播放後，確定所述語音資訊在預定時間內的已播放次數，判斷所述已播放次數是否處於預定的播放次數區間內；若所述已播放次數處於預定的播放次數區間內，則按照預定義的語音處理策略處理所述語音資訊。本發明所提供的技術方案，提供了一種語音處理方法，基於用戶語音播放次數判斷確定這段語音的資訊獲取難度，並主動提供不同的語音播放策略，提升用戶在語音交流場景下的使用體驗。應當理解的是，以上的一般描述和後文的細節描述僅是示例性和解釋性的，並不能限制本發明。此外，本發明中的任一實施例並不需要達到上述的全部效果。In response to the above technical problems, the present invention provides a voice processing method and device. The technical solution is as follows: According to a first aspect of the present invention, a voice processing method is provided, the method including: After monitoring the playback of a single voice, determine the number of times the voice information has been played within a predetermined time, and determine whether the number of times the voice has been played is within the predetermined number of times of playback; If the number of played times is within a predetermined number of played times, the voice information is processed according to a predefined voice processing strategy. According to a second aspect of the present invention, there is provided a voice processing device including: Play times monitoring module: used to monitor the playback of a single voice, determine the number of times the voice information has been played within a predetermined time, and determine whether the number of times played has been within a predetermined number of times of playback; Voice information processing module: used to process the voice information according to a predefined voice processing strategy when the number of times of playing is within a predetermined interval of times of playing. According to a third aspect of the present invention, a computer device is provided, including a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements a Voice playback method, the method includes: After monitoring the playback of a single voice, determine the number of times the voice information has been played within a predetermined time, and determine whether the number of times the voice has been played is within the predetermined number of times of playback; If the number of played times is within a predetermined number of played times, the voice information is processed according to a predefined voice processing strategy. The technical solution provided by the present invention provides a voice processing method, which determines the difficulty of obtaining information for this segment of voice based on the number of times the user plays the voice, and actively provides different voice playback strategies to improve the user experience in voice communication scenarios. It should be understood that the above general description and the following detailed description are only exemplary and explanatory, and do not limit the present invention. In addition, any embodiment of the present invention does not need to achieve all the above-mentioned effects.

這裡將詳細地對示例性實施例進行說明，其示例表示在圖式中。下面的描述涉及圖式時，除非另有表示，不同圖式中的相同數字表示相同或相似的要素。以下示例性實施例中所描述的實施方式並不代表與本說明書相一致的所有實施方式。相反，它們僅是與如所附申請專利範圍中所詳述的、本說明書的一些態樣相一致的裝置和方法的例子。在本說明書使用的術語是僅僅出於描述特定實施例的目的，而非旨在限制本說明書。在本說明書和所附申請專利範圍中所使用的單數形式的“一種”、“所述”和“該”也旨在包括多數形式，除非上下文清楚地表示其他含義。還應當理解，本文中使用的術語“和/或”是指並包含一個或多個相關聯的列出專案的任何或所有可能組合。應當理解，儘管在本說明書可能採用術語第一、第二、第三等來描述各種資訊，但這些資訊不應限於這些術語。這些術語僅用來將同一類型的資訊彼此區分開。例如，在不脫離本說明書範圍的情況下，第一資訊也可以被稱為第二資訊，類似地，第二資訊也可以被稱為第一資訊。取決於語境，如在此所使用的詞語“如果”可以被解釋成為“在……時”或“當……時”或“回應於確定”。隨著網際網路技術的發展，傳統的聊天工具開始具備語音交流的功能，用戶除了打字發送文本資訊外，還可以選擇輸入並發送一段語音與他人進行聊天交流。在現有技術的語音聊天功能中，用戶在接收到一段語音後，可能因為環境嘈雜或者對方語速太快等因素，需要反覆傾聽某段語音來獲取其中包含的資訊量，用戶體驗較差，目前尚沒有對這種場景進行最佳化和處理的方案。針對以上問題，本發明提供一種語音處理方法，以及一種用於執行該方法的語音處理裝置，下面對本實施例涉及的語音處理方法進行詳細說明，參見圖1所示，該方法可以包括以下步驟： S101，監測到單條語音播放後，確定所述語音資訊在預定時間內的已播放次數； S102，判斷所述已播放次數是否處於預定的播放次數區間內；若所述已播放次數處於預定的播放次數區間內，執行步驟S103，若所述已播放次數未處於預定的播放次數區間內，則不操作。本實施例提供的方法應用於通過語音資訊進行交流的場景。具體地，該語音資訊指的是並不是打電話一類的語音通話，而是錄製好的一段音訊。舉例說明，在用戶使用微信進行交流的過程中，用戶可輸入一段語音資訊發送給自己指定的連絡人，也可收到並播放連絡人錄製好的一段語音資訊。在一些情況下，用戶可能無法順利獲取每一條語音資訊包含的資訊量，例如：發送語音資訊的連絡人語速過快，音量過低，發送環境較嘈雜。或用戶自身所處的環境較嘈雜等等。為了聽清對方的語音資訊，用戶通常會進行多次播放。在本實施例中，當用戶播放語音資訊後，確定該語音資訊在預定時間內的已播放次數是否處於預定的播放次數區間內。其中，該播放次數區間為預先劃分的，可由用戶自訂的次數區間。舉例說明，可將1-2次播放設定為第一次數區間，3-5次播放設定為第二次數區間，高於6次播放設定為第三次數區間。進一步地，可根據每條語音資訊的被播放次數所落在的次數區間範圍為該語音資訊選取不同的處理策略。需要注意的是，本實施例監測的是單條語音資訊在預定時間內的已播放次數，如，單條語音資訊在2分鐘內的已播放次數、如果語音資訊在超過預定時間，如幾天內被間隔性的反覆播放，則大概率並不是用戶聽不清楚語音資訊，不需要採取語音處理策略進行處理。 S103，按照預定義的語音處理策略處理所述語音資訊。具體地，語音處理策略可以包括降低所述語音資訊的播放速度、提高所述語音資訊的播放音量或將所述語音資訊轉換成文本顯示等等。每種語音處理策略可以單獨使用，在某些情況下，也可對同一條語音資訊使用一種以上的語音處理策略。其中，按照預定義的語音處理策略處理語音資訊可包含多種處理方式。下面舉出幾種較常用的處理方式，以下舉例並不用以限制本說明書，用戶可根據不同應用場景設定更多不同的處理方式。 a)若所述播放次數處於預定的播放次數區間內，則按照所述播放次數區間對應的語音處理策略處理所述語音資訊，其中，不同的播放次數區間被設定了對應的語音處理策略。舉例說明：如上文所述，可將1-2次播放設定為第一次數區間，3-5次播放設定為第二次數區間，高於6次播放設定為第三次數區間。當單條語音的播放次數落在第一次數區間時，不使用語音處理策略對該條語音進行處理；當單條語音的播放次數落在第二次數區間時，使用語音處理策略-按比例提高音量對該條語音進行處理；當單條語音的播放次數落在第三次數區間時，使用語音處理策略-按比例提高音量和語音策略按比例降低播放速度共同對該條語音進行處理。其中，每種次數區間被設定的語音處理策略可以不同，也可以相同。不同次數區間對應的語音處理策略可由用戶自行設定。 b)若所述播放次數處於預定的播放次數區間內，則檢測所述語音資訊的語音品質問題，根據檢測結果選擇對應的語音處理策略處理所述語音資訊。舉例說明，可將高於3次播放設定為第一次數區間，當單條語音的播放次數落在第一次數區間時，檢測該條語音資訊的語音品質問題。語音品質問題可能包括：音量太小，語速過快，背景音過於嘈雜等。則可依據檢測出的不同語音品質問題採取相對應的語音處理策略，如，提高音量，放慢播放速度，進行降噪處理等。在一些較為簡單和普遍的應用場景中，可只設定一個次數區間，並對應設定該次數區間的處理策略。參見圖2所示，為本實施例提供的一種語音播放方法，該方法可以包括以下步驟： S201，監測到單條語音播放後，確定所述語音資訊在預定時間內的已播放次數； S202，判斷所述已播放次數是否高於預設閾值；若所述已播放次數高於預設閾值，執行步驟S203，若所述已播放次數未高於預設閾值，則不操作； S203，按照預定義的語音處理策略處理所述語音資訊。具體可為降低所述語音資訊的播放速度、提高所述語音資訊的播放音量或將所述語音資訊轉換成文本顯示等。該預定義的語音處理策略為用戶預先設定的，在語音資訊播放次數高於預設閾值後採取的語音處理策略。舉例說明，只要單條語音資訊在2分鐘內的播放次數高於3次，則提高該條語音資訊的播放音量。也可在首次監測到有語音資訊的播放次數高於預設閾值後，將不同的語音處理策略作為可選項展示給用戶，在用戶選擇後，後續自動使用該語音處理策略處理播放次數高於預設閾值的語音資訊。具體地，在用戶如何預先設定不同語音處理策略的問題上，有多種可行方式，下面舉出幾種較常用的處理方式，以下舉例並不用以限制本說明書，用戶可根據不同應用場景設定更多不同的處理方式。 a)針對連絡人情況進行設定，用戶可為不同的連絡人設定一種或多種常用語音處理策略。舉例說明，若連絡人A語速較快，則為該連絡人設定語音處理策略-放慢播放速度，當用戶播放連絡人A的語音資訊到達預設次數區間後，自動放慢來自連絡人A的語音資訊的播放速度；若連絡人D的方言較重，則為該連絡人設定語音處理策略-轉為文本，當用戶播放連絡人D的語音資訊到達預設次數區間後，自動將來自連絡人D的語音資訊轉為文本展示。 b)針對自身情況進行設定，例如自身所處環境嘈雜，則將語音處理策略設定為-提高音量；或自身處於不方便播放語音資訊的環境中，則可將語音處理策略設定為-轉為文本展示。進一步地，該播放次數區間可被設定為0，如上文，在自身所處環境不方便播放語音資訊時，不需要檢測已被播放次數，直接將收到的語音資訊自動轉為文本展示。進一步地，當檢測到用戶在一段時間內對多條語音資訊進行反覆播放後，可進一步提供一種更為智慧地處理方式，參見圖3所示，為本說明書提供的語音播放方法，該方法可以包括以下步驟： S301，檢測在預定時間內被語音處理策略處理過的語音資訊的條數； S302，判斷所述被語音處理策略處理過的語音資訊的條數是否高於預定閾值，若所述條數高於預定閾值，執行步驟S303，若所述條數不高於預定閾值，則不操作； S303，自動使用預定義的語音處理策略處理後續收到的語音資訊；具體地，若預定時間內被語音處理策略處理過的語音資訊的條數是否高於預定閾值，則說明用戶在一段時間內對多條語音資訊進行反覆播放。則可進一步去掉“反覆播放”這一判定步驟，將後續收到的語音資訊都使用語音處理策略進行處理。進一步地，可確定預定時間內使用次數最多的語音處理策略，自動使用所述使用次數最多的語音處理策略處理後續收到的語音資訊。進一步地，當檢測到用戶在一段時間內對多條語音資訊進行反覆播放後，可進一步判斷造成反覆播放的是否為單個連絡人，參見圖4所示，為本說明書提供的一種語音播放方法，該方法可以包括以下步驟： S401，確定預定時間內，被語音處理策略處理過的語音資訊的條數高於預設閾值的連絡人； S402，使用預定義的語音處理策略處理來自所述連絡人的後續語音資訊。具體地，若預定時間內被語音處理策略處理過的語音資訊的條數是否高於預定閾值，則說明用戶在一段時間內對多條語音資訊進行反覆播放。若該多條語音資訊來自相同連絡人，而其他連絡人的語音資訊並未被多次處理，則可判定這段時間內，來自該連絡人的語音資訊需要進行智慧的後續處理。進一步地，可確定預定時間內對該連絡人的語音資訊使用過次數最多的語音處理策略，自動使用所述使用次數最多的語音處理策略處理後續收到的，來自該連絡人的語音資訊。或，具體檢測該連絡人的語音資訊的語音品質問題，根據語音品質檢測結果選取有針對性的語音處理策略處理後續收到的，來自該連絡人的語音資訊。或，為用戶展示可選用的，針對該連絡人的語音改善選項，並使用被選擇的語音處理策略處理後續收到的，來自該連絡人的語音資訊。相應於上述方法實施例，本發明還提供一種語音處理方法裝置，應用於用戶端，參見圖5所示，所述裝置可以包括：播放次數監測模組510和語音資訊處理模組520。播放次數監測模組510：用於監測到單條語音播放後，確定所述語音資訊在預定時間內的已播放次數，判斷所述已播放次數是否處於預定的播放次數區間內；語音資訊處理模組520：用於當所述已播放次數處於預定的播放次數區間內時，按照預定義的語音處理策略處理所述語音資訊。本發明還提供一種電腦設備，其至少包括記憶體、處理器及儲存在記憶體上並可在處理器上運行的電腦程式，其中，處理器執行所述程式時實現前述語音處理方法，所述方法包括：監測到單條語音播放後，確定所述語音資訊在預定時間內的已播放次數，判斷所述已播放次數是否處於預定的播放次數區間內；若所述已播放次數處於預定的播放次數區間內，則按照預定義的語音處理策略處理所述語音資訊。圖6示出了本發明所提供的一種更為具體的計算設備硬體結構示意圖，該設備可以包括：處理器1110、記憶體1120、輸入/輸出介面1130、通信介面1140和匯流排1150。其中處理器1110、記憶體1120、輸入/輸出介面1130和通信介面1140通過匯流排1150實現彼此之間在設備內部的通信連接。處理器1110可以採用通用的CPU(Central Processing Unit，中央處理器)、微處理器、應用專用積體電路(Application Specific Integrated Circuit，ASIC)、或者一個或多個積體電路等方式實現，用於執行相關程式，以實現本發明所提供的技術方案。記憶體1120可以採用ROM(Read Only Memory，唯讀記憶體)、RAM(Random Access Memory，隨機存取記憶體)、靜態存放裝置，動態儲存裝置設備等形式實現。記憶體1120可以儲存作業系統和其他應用程式，在通過軟體或者韌體來實現本發明所提供的技術方案時，相關的程式碼保存在記憶體1120中，並由處理器1110來調用執行。輸入/輸出介面1130用於連接輸入/輸出模組，以實現資訊輸入及輸出。輸入輸出/模組可以作為元件配置在設備中(圖中未示出)，也可以外接於設備以提供相應功能。其中輸入裝置可以包括鍵盤、滑鼠、觸控式螢幕、麥克風、各類感測器等，輸出設備可以包括顯示器、揚聲器、振動器、指示燈等。通信介面1140用於連接通信模組(圖中未示出)，以實現本設備與其他設備的通信交互。其中通信模組可以通過有線方式(例如USB、網路線等)實現通信，也可以通過無線方式(例如行動網路、WIFI、藍牙等)實現通信。匯流排1150包括一通路，在設備的各個元件(例如處理器1110、記憶體1120、輸入/輸出介面1130和通信介面1140)之間傳輸資訊。需要說明的是，儘管上述設備僅示出了處理器1110、記憶體1120、輸入/輸出介面1130、通信介面1140以及匯流排1150，但是在具體實施過程中，該設備還可以包括實現正常運行所必需的其他元件。此外，本領域的技術人員可以理解的是，上述設備中也可以僅包含實現本發明方案所必需的組件，而不必包含圖中所示的全部元件。本發明還提供一種電腦可讀儲存介質，其上儲存有電腦程式，該程式被處理器執行時實現前述的語音處理方法，所述方法包括：監測到單條語音播放後，確定所述語音資訊在預定時間內的已播放次數，判斷所述已播放次數是否處於預定的播放次數區間內；若所述已播放次數處於預定的播放次數區間內，則按照預定義的語音處理策略處理所述語音資訊。電腦可讀介質包括永久性和非永久性、可行動和非可行動媒體可以由任何方法或技術來實現資訊儲存。資訊可以是電腦可讀指令、資料結構、程式的模組或其他資料。電腦的儲存介質的例子包括，但不限於相變記憶體(PRAM)、靜態隨機存取記憶體(SRAM)、動態隨機存取記憶體(DRAM)、其他類型的隨機存取記憶體(RAM)、唯讀記憶體(ROM)、電可擦除可程式設計唯讀記憶體(EEPROM)、快閃記憶體或其他記憶體技術、唯讀光碟唯讀記憶體(CD-ROM)、數位多功能光碟(DVD)或其他光學儲存、磁盒式磁帶，磁帶磁磁片儲存或其他磁性存放裝置或任何其他非傳輸介質，可用於儲存可以被計算設備存取的資訊。按照本文中的界定，電腦可讀介質不包括暫存電腦可讀媒體(transitory media)，如調變的資料信號和載波。對於裝置實施例而言，由於其基本對應於方法實施例，所以相關之處參見方法實施例的部分說明即可。以上所描述的裝置實施例僅僅是示意性的，其中所述作為分離元件說明的單元可以是或者也可以不是實體上分開的，作為單元顯示的元件可以是或者也可以不是實體單元，即可以位於一個地方，或者也可以分佈到多個網路單元上。可以根據實際的需要選擇其中的部分或者全部模組來實現本說明書方案的目的。本領域普通技術人員在不付出創造性勞動的情況下，即可以理解並實施。通過以上的實施方式的描述可知，本領域的技術人員可以清楚地瞭解到本發明可借助軟體加必需的通用硬體平臺的方式來實現。基於這樣的理解，本發明的技術方案本質上或者說對現有技術做出貢獻的部分可以以軟體產品的形式體現出來，該電腦軟體產品可以儲存在儲存介質中，如ROM/RAM、磁碟、光碟等，包括若干指令用以使得一台電腦設備(可以是個人電腦，伺服器，或者網路設備等)執行本發明各個實施例或者實施例的某些部分所述的方法。上述實施例闡明的系統、裝置、模組或單元，具體可以由電腦晶片或實體實現，或者由具有某種功能的產品來實現。一種典型的實現設備為電腦，電腦的具體形式可以是個人電腦、筆記型電腦、蜂巢式電話、相機電話、智慧型電話、個人數位助理、媒體播放機、導航設備、電子郵件收發設備、遊戲控制台、平板電腦、可穿戴設備或者這些設備中的任意幾種設備的組合。本說明書中的各個實施例均採用遞進的方式描述，各個實施例之間相同相似的部分互相參見即可，每個實施例重點說明的都是與其他實施例的不同之處。尤其，對於裝置實施例而言，由於其基本相似於方法實施例，所以描述得比較簡單，相關之處參見方法實施例的部分說明即可。以上所描述的裝置實施例僅僅是示意性的，其中所述作為分離元件說明的模組可以是或者也可以不是實體上分開的，在實施本發明方案時可以把各模組的功能在同一個或多個軟體和/或硬體中實現。也可以根據實際的需要選擇其中的部分或者全部模組來實現本實施例方案的目的。本領域普通技術人員在不付出創造性勞動的情況下，即可以理解並實施。以上所述僅是本發明的具體實施方式，應當指出，對於本技術領域的普通技術人員來說，在不脫離本發明原理的前提下，還可以做出若干改進和潤飾，這些改進和潤飾也應視為本發明的保護範圍。Exemplary embodiments will be described in detail here, examples of which are shown in the drawings. When the following description refers to drawings, the same numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all the embodiments consistent with this specification. Rather, they are merely examples of devices and methods that are consistent with some aspects of this specification as detailed in the scope of the attached patent applications. The terminology used in this specification is for the purpose of describing particular embodiments only, and is not intended to limit this specification. The singular forms "a", "said" and "the" used in this specification and the appended patent applications are also intended to include most forms unless the context clearly indicates other meanings. It should also be understood that the term "and/or" as used herein refers to and includes any or all possible combinations of one or more associated listed items. It should be understood that although the terms first, second, third, etc. may be used in this specification to describe various information, the information should not be limited to these terms. These terms are only used to distinguish the same type of information from each other. For example, without departing from the scope of this specification, the first information may also be referred to as second information, and similarly, the second information may also be referred to as first information. Depending on the context, the word "if" as used herein can be interpreted as "when" or "when" or "responsive to certainty". With the development of Internet technology, traditional chat tools begin to have the function of voice communication. In addition to typing and sending text information, users can also choose to enter and send a voice to chat with others. In the voice chat function of the prior art, after receiving a voice, the user may need to listen to a certain voice repeatedly to obtain the amount of information contained due to factors such as noisy environment or the other party's speech speed is too fast. The user experience is poor. There is no plan to optimize and deal with this scenario. In view of the above problems, the present invention provides a voice processing method and a voice processing device for executing the method. The voice processing method involved in this embodiment will be described in detail below. Referring to FIG. 1, the method may include the following steps: S101: After monitoring the playback of a single voice, determine the number of times the voice information has been played within a predetermined time; S102: Determine whether the played times are within a predetermined play count interval; if the played times are within a scheduled play count interval, step S103 is executed, if the played times are not within a scheduled play count interval, Then do not operate. The method provided in this embodiment is applied to the scenario of communicating through voice information. Specifically, the voice information refers to not a voice call such as a phone call, but a recorded piece of audio. For example, in the process of users using WeChat to communicate, users can input a piece of voice information to send to their designated contact, or they can receive and play a piece of voice information recorded by the contact. In some cases, the user may not be able to successfully obtain the amount of information contained in each piece of voice information, for example: the contact person sending the voice information speaks too fast, the volume is too low, and the sending environment is noisy. Or the user is in a noisy environment, etc. In order to hear the other party's voice information, the user usually plays multiple times. In this embodiment, after the user plays the voice information, it is determined whether the number of times the voice information has been played within the predetermined time is within the predetermined number of playback times. The playback frequency interval is pre-divided and can be customized by the user. For example, 1-2 times of playback can be set as the first time interval, 3-5 times of playback can be set as the second time interval, and more than 6 times of playback can be set as the third time interval. Further, different processing strategies can be selected for the voice information according to the range of times within which the number of times each voice information is played. It should be noted that this embodiment monitors the number of times a single voice message has been played within a predetermined time, such as the number of times a single voice message has been played within 2 minutes. If the voice information exceeds the predetermined time, such as within a few days Intermittently repeated playback, then there is a high probability that the user cannot clearly hear the voice information and does not need to adopt a voice processing strategy for processing. S103: Process the voice information according to a predefined voice processing strategy. Specifically, the voice processing strategy may include reducing the playback speed of the voice information, increasing the playback volume of the voice information, or converting the voice information into a text display, and so on. Each voice processing strategy can be used separately, and in some cases, more than one voice processing strategy can be used for the same voice information. Among them, processing voice information according to a predefined voice processing strategy may include multiple processing methods. The following are some of the more commonly used processing methods. The following examples are not used to limit this manual. Users can set more different processing methods according to different application scenarios. a) If the playback times are within a predetermined playback times interval, the voice information is processed according to the voice processing strategy corresponding to the playback times interval, wherein corresponding voice processing strategies are set for different playback times intervals. For example, as mentioned above, 1-2 times of playback can be set as the first number of intervals, 3-5 times of playback can be set as the second number of intervals, and more than 6 times of playback can be set as the third number of intervals. When the playback frequency of a single voice falls within the first interval, the voice processing strategy is not used to process the voice; when the playback frequency of a single voice falls within the second interval, the voice processing strategy is used-increase the volume proportionally Process the voice; when the number of times a single voice plays falls within the third time interval, use the voice processing strategy—proportionally increase the volume and the voice strategy proportionally decrease the playback speed to process the voice. The voice processing strategy set for each frequency interval may be different or the same. The voice processing strategy corresponding to different frequency intervals can be set by the user. b) If the number of times of playing is within a predetermined number of times of playing, detecting a voice quality problem of the voice information, and selecting a corresponding voice processing strategy to process the voice information according to the detection result. For example, it is possible to set more than 3 times of playback as the first time interval. When the number of times of playing a single voice falls within the first time interval, the voice quality problem of the voice information is detected. Voice quality problems may include: the volume is too low, the speech speed is too fast, the background sound is too noisy, etc. According to the detected different voice quality problems, corresponding voice processing strategies can be adopted, such as increasing the volume, slowing down the playback speed, and performing noise reduction processing. In some relatively simple and general application scenarios, only one frequency interval can be set, and the processing strategy for setting the frequency interval can be set accordingly. Referring to FIG. 2, it is a voice playback method provided by this embodiment. The method may include the following steps: S201, after monitoring the playback of a single voice, determine the number of times the voice information has been played within a predetermined time; S202. Determine whether the number of played times is higher than a preset threshold; if the number of played times is higher than the preset threshold, perform step S203; if the number of played times is not higher than the preset threshold, do not operate; S203: Process the voice information according to a predefined voice processing strategy. Specifically, it may be to reduce the playback speed of the voice information, increase the playback volume of the voice information, or convert the voice information into text display. The pre-defined voice processing strategy is preset by the user, and is adopted after the number of times of playing voice information is higher than a preset threshold. For example, as long as the number of times a single voice message is played is higher than 3 times within 2 minutes, the volume of the voice message is increased. You can also display different voice processing strategies as an option to the user after the first monitoring of the number of times that the voice information has been played is higher than the preset threshold. Threshold voice information. Specifically, in terms of how users pre-set different voice processing strategies, there are multiple feasible methods. The following are some of the more commonly used processing methods. The following examples are not used to limit this manual. Users can set more according to different application scenarios. Different treatments. a) Set according to the situation of the contact, the user can set one or more common voice processing strategies for different contacts. For example, if contact person A speaks faster, then set a voice processing strategy for the contact person-slow down the playback speed. When the user plays the contact person A's voice information reaches the preset number of times, it will automatically slow down the contact person A The playback speed of the voice information of the contact; if the dialect of the contact person D is heavier, set a voice processing strategy for the contact person-convert to text, when the user plays the contact person D's voice information reaches the preset number of times, it will automatically come from the contact The voice information of person D is converted into text display. b) Set according to your own situation, for example, if you are in a noisy environment, set the voice processing strategy to-increase the volume; or if you are in an environment where it is not convenient to play voice information, you can set the voice processing strategy to-to text Show. Further, the playback frequency interval can be set to 0. As mentioned above, when it is inconvenient to play voice information in its own environment, there is no need to detect the number of times it has been played, and the received voice information is directly converted into text display. Further, when it is detected that the user repeatedly plays back multiple pieces of voice information within a period of time, a more intelligent processing method may be further provided, as shown in FIG. 3, which is a voice playback method provided by this specification. It includes the following steps: S301, detecting the number of pieces of voice information processed by a voice processing strategy within a predetermined time; S302. Determine whether the number of pieces of voice information processed by the voice processing strategy is higher than a predetermined threshold. If the number is higher than the predetermined threshold, perform step S303. If the number is not higher than the predetermined threshold, do not operating; S303: Automatically use a predefined voice processing strategy to process subsequently received voice information; Specifically, if the number of pieces of voice information processed by the voice processing strategy within a predetermined time is higher than a predetermined threshold, it means that the user repeatedly plays multiple pieces of voice information within a period of time. Then, the determination step of "repeated playback" can be further removed, and all subsequent received voice information will be processed using a voice processing strategy. Further, a voice processing strategy that is used most frequently within a predetermined time can be determined, and the voice processing strategy that is used most frequently is automatically used to process subsequently received voice information. Further, when it is detected that the user repeatedly plays multiple pieces of voice information within a period of time, it can further determine whether it is a single contact person who caused the repeated playback. See FIG. 4 for a voice playback method provided by this specification. The method may include the following steps: S401: Determine contacts whose number of voice information processed by the voice processing strategy is higher than a preset threshold within a predetermined time; S402. Use a predefined voice processing strategy to process subsequent voice information from the contact. Specifically, if the number of pieces of voice information processed by the voice processing strategy within a predetermined time is higher than a predetermined threshold, it means that the user repeatedly plays multiple pieces of voice information within a period of time. If the multiple pieces of voice information come from the same contact person and the voice information of other contact persons has not been processed multiple times, it can be determined that the voice information from the contact person needs to be wisely processed after this period of time. Further, it is possible to determine the voice processing strategy that has been used the most for the contact's voice information within a predetermined time, and automatically use the voice processing strategy that has been used the most to process subsequently received voice information from the contact. Or, specifically detect the voice quality problem of the contact person's voice information, and select a targeted voice processing strategy based on the voice quality detection result to process subsequently received voice information from the contact person. Or, show the user optional voice improvement options for the contact, and use the selected voice processing strategy to process subsequent received voice information from the contact. Corresponding to the above method embodiment, the present invention also provides a voice processing method device, which is applied to the user terminal. Referring to FIG. 5, the device may include: a playback frequency monitoring module 510 and a voice information processing module 520. Playtime monitoring module 510: used to monitor the playback of a single voice, determine the number of times that the voice information has been played within a predetermined time, and determine whether the number of times played has been within a predetermined playtime interval; Voice information processing module 520: used to process the voice information according to a predefined voice processing strategy when the number of times of playing is within a predetermined number of times of playing. The present invention also provides a computer device, which includes at least a memory, a processor, and a computer program stored on the memory and executable on the processor, wherein the processor implements the aforementioned voice processing method when executing the program, the Methods include: After monitoring the playback of a single voice, determine the number of times the voice information has been played within a predetermined time, and determine whether the number of times the voice has been played is within the predetermined number of times of playback; If the number of played times is within a predetermined number of played times, the voice information is processed according to a predefined voice processing strategy. 6 shows a schematic diagram of a more specific hardware structure of a computing device provided by the present invention. The device may include: a processor 1110, a memory 1120, an input/output interface 1130, a communication interface 1140, and a bus 1150. Among them, the processor 1110, the memory 1120, the input/output interface 1130 and the communication interface 1140 realize the communication connection among the devices through the bus bar 1150. The processor 1110 may use a general-purpose CPU (Central Processing Unit, central processing unit), microprocessor, application specific integrated circuit (Application Specific Integrated Circuit, ASIC), or one or more integrated circuits, etc., used to execute related programs to achieve the provided by the present invention Technical solutions. The memory 1120 may be implemented in the form of ROM (Read Only Memory, read only memory), RAM (Random Access Memory), static storage device, and dynamic storage device equipment. The memory 1120 may store an operating system and other application programs. When the technical solution provided by the present invention is implemented by software or firmware, related program codes are stored in the memory 1120 and called and executed by the processor 1110. The input/output interface 1130 is used to connect input/output modules to realize information input and output. The input/output/module can be configured as a component in the device (not shown in the figure), or can be externally connected to the device to provide corresponding functions. The input device may include a keyboard, a mouse, a touch screen, a microphone, and various sensors, and the output device may include a display, a speaker, a vibrator, and an indicator light. The communication interface 1140 is used to connect a communication module (not shown in the figure) to implement communication interaction between the device and other devices. Among them, the communication module can realize the communication through the wired mode (such as USB, network line, etc.), and can also realize the communication through the wireless mode (such as mobile network, WIFI, Bluetooth, etc.). The bus 1150 includes a path for transmitting information between various components of the device (such as the processor 1110, the memory 1120, the input/output interface 1130, and the communication interface 1140). It should be noted that although the above device only shows the processor 1110, the memory 1120, the input/output interface 1130, the communication interface 1140, and the bus bar 1150, in the specific implementation process, the device may also include Required other components. In addition, those skilled in the art may understand that the above-mentioned device may also include only the components necessary to implement the solution of the present invention, and does not necessarily include all the elements shown in the figures. The present invention also provides a computer-readable storage medium on which a computer program is stored. When the program is executed by a processor, the foregoing voice processing method is implemented. The method includes: After monitoring the playback of a single voice, determine the number of times the voice information has been played within a predetermined time, and determine whether the number of times the voice has been played is within the predetermined number of times of playback; If the number of played times is within a predetermined number of played times, the voice information is processed according to a predefined voice processing strategy. Computer-readable media, including permanent and non-permanent, removable and non-removable media, can store information by any method or technology. The information can be computer readable instructions, data structures, modules of programs, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), static random access memory (SRAM), dynamic random access memory (DRAM), and other types of random access memory (RAM) , Read-only memory (ROM), electrically erasable and programmable read-only memory (EEPROM), flash memory or other memory technologies, read-only disc read-only memory (CD-ROM), digital multifunction Optical discs (DVD) or other optical storage, magnetic cassette tapes, magnetic tape magnetic tape storage or other magnetic storage devices or any other non-transmission media can be used to store information that can be accessed by computing devices. As defined in this article, computer-readable media does not include temporary computer-readable media (transitory media), such as modulated data signals and carrier waves. For the device embodiment, since it basically corresponds to the method embodiment, the relevant part can be referred to the description of the method embodiment. The device embodiments described above are only schematic, wherein the units described as separate elements may or may not be physically separated, and the elements displayed as units may or may not be physical units, that is, may be located One place, or it can be distributed to multiple network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution in this specification. Those of ordinary skill in the art can understand and implement without paying creative labor. It can be known from the description of the above embodiments that those skilled in the art can clearly understand that the present invention can be implemented by software plus a necessary general hardware platform. Based on this understanding, the technical solution of the present invention can be embodied in the form of software products in essence or part of contributions to the existing technology, and the computer software products can be stored in storage media, such as ROM/RAM, magnetic disk, An optical disc, etc., includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in various embodiments of the present invention or some parts of the embodiments. The system, device, module or unit explained in the above embodiments may be implemented by a computer chip or entity, or by a product with a certain function. A typical implementation device is a computer, and the specific forms of the computer may be a personal computer, a notebook computer, a cellular phone, a camera phone, a smart phone, a personal digital assistant, a media player, a navigation device, an email sending and receiving device, and a game control Desk, tablet, wearable device, or any combination of these devices. The embodiments in this specification are described in a progressive manner. The same or similar parts between the embodiments can be referred to each other, and each embodiment focuses on the differences from other embodiments. In particular, for the device embodiment, since it is basically similar to the method embodiment, the description is relatively simple, and the relevant part can be referred to the description of the method embodiment. The device embodiments described above are only schematics, wherein the modules described as separate components may or may not be physically separated, and the functions of the modules may be the same when implementing the solution of the present invention Or multiple software and/or hardware. Part or all of the modules may also be selected according to actual needs to achieve the purpose of the solution of this embodiment. Those of ordinary skill in the art can understand and implement without paying creative labor. The above is only a specific embodiment of the present invention. It should be pointed out that for those of ordinary skill in the art, without departing from the principles of the present invention, several improvements and retouches can be made. These improvements and retouches also It should be regarded as the protection scope of the present invention.

S101-S402:步驟 510:播放次數監測模組 520:語音資訊處理模組 1110:處理器 1120:記憶體 1130:輸入/輸出介面 1140:通信介面 1150:匯流排S101-S402: Steps 510: Module for monitoring the number of playback times 520: Voice information processing module 1110: Processor 1120: Memory 1130: input/output interface 1140: Communication interface 1150: busbar

為了更清楚地說明本發明或現有技術中的技術方案，下面將對實施例或現有技術描述中所需要使用的圖式作簡單地介紹，顯而易見地，下面描述中的圖式僅僅是本發明中記載的一些實施例，對於本領域普通技術人員來講，還可以根據這些圖式獲得其他的圖式。圖1是本說明書一示例性實施例示出的語音處理方法的一種流程圖；圖2是本說明書一示例性實施例示出的語音處理方法的另一種流程圖；圖3是本說明書一示例性實施例示出的後續語音處理方法的一種流程圖；圖4是本說明書一示例性實施例示出的後續語音處理方法的另一種流程圖；圖5是本說明書一示例性實施例示出的語音處理裝置的一種示意圖；圖6是本說明書一示例性實施例示出的一種電腦設備的結構示意圖。In order to more clearly explain the technical solutions in the present invention or the prior art, the following will briefly introduce the drawings required in the embodiments or the description of the prior art. Obviously, the drawings in the following description are only in the present invention. For some of the recorded embodiments, those of ordinary skill in the art may also obtain other drawings based on these drawings. FIG. 1 is a flowchart of a voice processing method shown in an exemplary embodiment of this specification; 2 is another flowchart of a voice processing method shown in an exemplary embodiment of this specification; 3 is a flowchart of a subsequent voice processing method shown in an exemplary embodiment of this specification; 4 is another flowchart of a subsequent voice processing method shown in an exemplary embodiment of this specification; 5 is a schematic diagram of a voice processing device shown in an exemplary embodiment of this specification; 6 is a schematic structural diagram of a computer device shown in an exemplary embodiment of this specification.

Claims

A voice processing method, the method includes: After monitoring the playback of a single voice, determine the number of times the voice information has been played within a predetermined time, and determine whether the number of times the voice has been played is within the predetermined number of times of playback; If the number of played times is within a predetermined number of played times, the voice information is processed according to a predefined voice processing strategy.

According to the method of claim 1, if the played times are within a predetermined play number interval, processing the voice information according to a predefined voice processing strategy includes: If the number of times of playback is within a predetermined number of times of playback, the voice information is processed according to the voice processing strategy corresponding to the number of times of playback, where corresponding voice processing strategies are set for different intervals of times of playback.

According to the method of claim 1, if the played times are within a predetermined play number interval, processing the voice information according to a predefined voice processing strategy includes: If the number of times of playback is within a predetermined number of times of playback, a voice quality problem of the voice information is detected, and a corresponding voice processing strategy is selected according to the detection result to process the voice information.

According to the method of claim 1, the predefined voice processing strategy includes: reducing the playback speed of the voice information, increasing the playback volume of the voice information, or converting the voice information into a text display.

According to the method of claim 1, after processing the voice information according to a predefined voice processing strategy, the method further includes: Detect the number of pieces of voice information processed by the voice processing strategy within a predetermined time, and if the number is higher than a predetermined threshold, automatically use a predefined voice processing strategy to process subsequent received voice information.

According to the method described in claim 5, the automatic use of a predefined voice processing strategy to process subsequently received voice information includes: Determine the most frequently used voice processing strategy within a predetermined time, and automatically use the most frequently used voice processing strategy to process subsequent received voice information.

According to the method of claim 1, after processing the voice information according to a predefined voice processing strategy, the method further includes: It is determined that the contact person whose number of pieces of voice information processed by the voice processing strategy is higher than the preset threshold within a predetermined period of time will automatically use a predefined voice processing strategy to process subsequent voice information from the contact person.

A voice processing device, including: Play times monitoring module: used to monitor the playback of a single voice, determine the number of times the voice information has been played within a predetermined time, and determine whether the played times are within the scheduled playback times interval; Voice information processing module: used to process the voice information according to a pre-defined voice processing strategy when the number of times of playing is within a predetermined number of playing times.

According to the device described in claim 8, if the number of times of playing is within a predetermined number of times of playing, processing the voice information according to a predefined voice processing strategy, including: If the number of times of playback is within a predetermined number of times of playback, the voice information is processed according to the voice processing strategy corresponding to the number of times of playback, where corresponding voice processing strategies are set for different intervals of times of playback.

According to the device described in claim 8, if the number of times of playing is within a predetermined number of times of playing, processing the voice information according to a predefined voice processing strategy, including: If the number of times of playback is within a predetermined number of times of playback, a voice quality problem of the voice information is detected, and a corresponding voice processing strategy is selected according to the detection result to process the voice information.

The device according to claim 8, the predefined voice processing strategy includes: reducing the playback speed of the voice information, increasing the playback volume of the voice information, or converting the voice information into text display.

The device according to claim 8, after processing the voice information according to a predefined voice processing strategy, further comprising: Detect the number of pieces of voice information processed by the voice processing strategy within a predetermined time, and if the number is higher than a predetermined threshold, automatically use a predefined voice processing strategy to process subsequent received voice information.

The device according to claim 12, the automatic use of a predefined voice processing strategy to process subsequently received voice information, including: Determine the most frequently used voice processing strategy within a predetermined time, and automatically use the most frequently used voice processing strategy to process subsequent received voice information.

The device according to claim 8, after processing the voice information according to a predefined voice processing strategy, further comprising: It is determined that the number of voice information processed by the voice processing strategy is higher than the preset contact person within a predetermined time, and the subsequent voice information from the contact person is automatically processed using a predefined voice processing strategy.

A computer device includes a memory, a processor, and a computer program stored on the memory and capable of running on the processor, wherein the processor implements the method to implement the method described in claim 1 when the processor executes the program.