TWI807504B

TWI807504B - Method, device and storage medium for audio processing of virtual meeting room

Info

Publication number: TWI807504B
Application number: TW110144724A
Authority: TW
Inventors: 王呈裕; 陳柏誠; 李育德
Original assignee: 新加坡商鴻運科股份有限公司
Priority date: 2021-11-30
Filing date: 2021-11-30
Publication date: 2023-07-01
Also published as: TW202325006A

Abstract

This application discloses a method, device and storage medium for audio processing of virtual meeting room, and relates to a technical field of virtual meeting room. The method includes: seting the number of mesh vertex according to a seat distribution of the virtual meeting room; obtaining a first voiceprint information of a speaker, the first voiceprint information includes the frequency, amplitude and phase difference of a speech signal; adjusting the frequency or amplitude of the first voiceprint information according to the number of mesh vertex to obtain a second voiceprint information; determining a seat of the speaker in the virtual meeting room according to the second voiceprint information. This application can simulate the sound source characteristics of the speaker and make the speaker's voice recognizable.

Description

Audio processing method, device and storage medium for virtual conference room

本申請涉及虛擬會議室技術領域，具體涉及一種虛擬會議室之音訊處理方法、裝置及存儲介質。 The present application relates to the technical field of virtual meeting rooms, in particular to an audio processing method, device and storage medium for virtual meeting rooms.

虛擬會議室(Virtual Meeting Room,VMR)係一種高效、便捷之網路會議室。使用者通過手機、電腦等移動終端產品可快速高效地與其他用戶組建虛擬會議，不受時間和空間之局限，感受身臨其境之會議溝通效果。目前之虛擬會議室係把發言者之圖像放大，而難以區分不同發言者之聲音。當虛擬會議室中有複數發言者同時講話時，使用者難以分辨每個發言者之講話內容。 Virtual Meeting Room (Virtual Meeting Room, VMR) is an efficient and convenient network meeting room. Users can quickly and efficiently set up virtual meetings with other users through mobile terminal products such as mobile phones and computers, without being limited by time and space, and feel the effect of immersive meeting communication. The current virtual conference room enlarges the speaker's image, making it difficult to distinguish the voices of different speakers. When there are multiple speakers speaking at the same time in the virtual meeting room, it is difficult for the user to distinguish the speech content of each speaker.

本申請提供一種虛擬會議室之音訊處理方法、裝置及存儲介質，以提升發言者之聲音的可識別性。 This application provides an audio processing method, device and storage medium for a virtual meeting room, so as to improve the recognizability of the speaker's voice.

本申請第一方面提供一種虛擬會議室之音訊處理方法，包括：根據虛擬會議室之座位分佈設置網格頂點之數目。擷取發言者之第一聲紋資訊，第一聲紋資訊包括語音訊號之頻率、振幅及相位差。根據網格頂點之數目調整第一聲紋資訊之頻率或振幅，得到第二聲紋資訊。根據第二聲紋資訊確定發言者於虛擬會議室中的座位。 The first aspect of the present application provides an audio processing method for a virtual conference room, including: setting the number of grid vertices according to the seat distribution of the virtual conference room. The first voiceprint information of the speaker is extracted, and the first voiceprint information includes the frequency, amplitude and phase difference of the voice signal. The frequency or amplitude of the first voiceprint information is adjusted according to the number of grid vertices to obtain the second voiceprint information. The speaker's seat in the virtual conference room is determined according to the second voiceprint information.

於其中一種實施方式中，根據虛擬會議室之座位分佈設置網格頂點之數目，包括：於各個座位所覆蓋區域設置不同數目之網格頂點，以建立座位與網格頂點之數目的對應關係。 In one embodiment, setting the number of grid vertices according to the seat distribution of the virtual conference room includes: setting different numbers of grid vertices in the area covered by each seat, so as to establish a corresponding relationship between seats and the number of grid vertices.

於另一種實施方式中，根據網格頂點之數目調整第一聲紋資訊之頻率或振幅，包括：當第一座位所覆蓋區域之網格頂點之數目大於第二座位所覆蓋區域之網格頂點之數目時，調高來自於第一座位之第一聲紋資訊之頻率，或調低來自於第二座位之第一聲紋資訊之頻率，使得來自於第一座位之第一聲紋資訊之頻率大於來自於第二座位之第一聲紋資訊之頻率。 In another embodiment, adjusting the frequency or amplitude of the first voiceprint information according to the number of grid vertices includes: when the number of grid vertices in the area covered by the first seat is greater than the number of grid vertices in the area covered by the second seat, increasing the frequency of the first voiceprint information from the first seat, or lowering the frequency of the first voiceprint information from the second seat, so that the frequency of the first voiceprint information from the first seat is greater than the frequency of the first voiceprint information from the second seat.

於另一種實施方式中，根據網格頂點之數目調整第一聲紋資訊之頻率或振幅，包括：當第一座位所覆蓋區域之網格頂點之數目大於第二座位所覆蓋區域之網格頂點之數目時，調大來自於第一座位之第一聲紋資訊之振幅，或調小來自於第二座位之第一聲紋資訊之振幅，使得來自於第一座位之第一聲紋資訊之振幅大於來自於第二座位之第一聲紋資訊之振幅。 In another embodiment, adjusting the frequency or amplitude of the first voiceprint information according to the number of grid vertices includes: when the number of grid vertices in the area covered by the first seat is greater than the number of grid vertices in the area covered by the second seat, increasing the amplitude of the first voiceprint information from the first seat, or reducing the amplitude of the first voiceprint information from the second seat, so that the amplitude of the first voiceprint information from the first seat is greater than the amplitude of the first voiceprint information from the second seat.

於另一種實施方式中，於根據第二聲紋資訊確定發言者於虛擬會議室中的座位之後，音訊處理方法還包括：擷取參會者之眼球運動方向資訊。根據眼球運動方向資訊確定參會者之專心度，專心度之取值為0或1。根據專心度確定參會者對會議議題是否有興趣。 In another implementation manner, after the speaker's seat in the virtual meeting room is determined according to the second voiceprint information, the audio processing method further includes: capturing eye movement direction information of the meeting participant. Determine the concentration of the participants according to the eye movement direction information, and the value of the concentration is 0 or 1. Determine whether the participants are interested in the meeting topic according to the degree of concentration.

於另一種實施方式中，根據眼球運動方向資訊確定參會者之專心度，包括：當參會者之眼球運動方向朝向發言者時，將專心度標記為1。當參會者之眼球運動方向遠離發言者時，將專心度標記為0。 In another embodiment, determining the concentration of the participant according to the eye movement direction information includes: marking the concentration as 1 when the eye movement direction of the participant is facing the speaker. When the participant's eye movement direction is away from the speaker, mark the concentration as 0.

於另一種實施方式中，音訊處理方法還包括：當存在複數發言者時，統計參會者於每個發言者發言時之專心度之取值。根據專心度之取值確定參會者對會議議題之專心度。 In another embodiment, the audio processing method further includes: when there are multiple speakers, counting the value of the concentration of the participants when each speaker speaks. Determine the concentration of the participants on the conference topic according to the value of the concentration.

於另一種實施方式中，根據專心度確定參會者對會議議題是否有興趣，包括：當專心度之取值大於或等於預設之興趣閾值時，確定參會者對會議議題有興趣。當專心度之取值小於興趣閾值時，確定參會者對會議議題沒有興趣。 In another implementation manner, determining whether a participant is interested in a conference topic according to the degree of concentration includes: determining that the participant is interested in the conference topic when the value of the concentration degree is greater than or equal to a preset interest threshold. When the value of the concentration is less than the interest threshold, it is determined that the participant is not interested in the meeting topic.

本申請第二方面提供一種音訊處理裝置，包括伺服器、主設備及從設備，主設備用以發起虛擬會議，伺服器用以根據來自於主設備之指令構建虛擬會議室，從設備用以根據來自於主設備之連結進入虛擬會議室，伺服器包括第一處理器和第一記憶體，第一處理器運行存儲於第一記憶體中的電腦程式或代碼，實現本申請實施例之音訊處理方法。 The second aspect of the present application provides an audio processing device, including a server, a master device, and a slave device. The master device is used to initiate a virtual meeting, the server is used to construct a virtual conference room according to instructions from the master device, and the slave device is used to enter the virtual conference room according to a connection from the master device. The server includes a first processor and a first memory. The first processor runs a computer program or code stored in the first memory to implement the audio processing method of the embodiment of the present application.

本申請第三方面提供一種存儲介質，用於存儲電腦程式或代碼，當電腦程式或代碼被處理器執行時，實現本申請實施例之音訊處理方法。 The third aspect of the present application provides a storage medium for storing computer programs or codes. When the computer programs or codes are executed by a processor, the audio processing method of the embodiment of the present application is realized.

本申請實施例將虛擬會議室中的每個座位所覆蓋區域之網格頂點之數目與第一聲紋資訊建立對應關係，根據網格頂點之數目調整來自於不同座位之第一聲紋資訊之頻率或振幅，得到更具辨識性之第二聲紋資訊，從而建立起每個座位與第二聲紋資訊之對應關係。如此，可根據第二聲紋資訊確定發言者於虛擬會議室中的座位。本申請實施例可模擬發言者之聲源特性，使發言者之聲音具可識別性，使用者可以清楚地分辨出每個發言者之講話內容。 In this embodiment of the present application, the number of grid vertices in the area covered by each seat in the virtual conference room is associated with the first voiceprint information, and the frequency or amplitude of the first voiceprint information from different seats is adjusted according to the number of grid vertices to obtain more recognizable second voiceprint information, thereby establishing a correspondence between each seat and the second voiceprint information. In this way, the speaker's seat in the virtual conference room can be determined according to the second voiceprint information. The embodiment of the present application can simulate the characteristics of the sound source of the speaker, so that the voice of the speaker can be recognized, and the user can clearly distinguish the speech content of each speaker.

100:音訊處理裝置 100: Audio processing device

200:伺服器 200: server

300:電子設備 300: Electronic equipment

310:主設備 310: master device

320:從設備 320: slave device

210:第一處理器 210: first processor

220:第一記憶體 220: The first memory

311:第二處理器 311: second processor

312:第二記憶體 312: Second memory

313:第一音訊模組 313:The first audio module

314:第一顯示幕 314: The first display screen

315:第一前置攝像頭 315: The first front camera

321:第三處理器 321: the third processor

322:第三記憶體 322: The third memory

323:第二音訊模組 323:Second audio module

324:第二顯示幕 324: Second display screen

325:第二前置攝像頭 325: Second front camera

S101-S104,S201-S207,S301-S307,S401-S403:步驟 S101-S104, S201-S207, S301-S307, S401-S403: steps

圖1係本申請一實施方式之音訊處理裝置之結構示意圖。 FIG. 1 is a schematic structural diagram of an audio processing device according to an embodiment of the present application.

圖2係本申請一實施方式之音訊處理方法之流程圖。 FIG. 2 is a flowchart of an audio processing method in an embodiment of the present application.

圖3a係本申請一實施方式之虛擬會議室之結構示意圖。 Fig. 3a is a schematic structural diagram of a virtual conference room according to an embodiment of the present application.

圖3b係本申請另一實施方式之虛擬會議室之結構示意圖。 Fig. 3b is a schematic structural diagram of a virtual meeting room according to another embodiment of the present application.

圖4係本申請另一實施方式之音訊處理方法之流程圖。 FIG. 4 is a flowchart of an audio processing method in another embodiment of the present application.

圖5係本申請一實施方式之虛擬會議室之示意圖。 FIG. 5 is a schematic diagram of a virtual meeting room according to an embodiment of the present application.

圖6係本申請另一實施方式之虛擬會議室之示意圖。 FIG. 6 is a schematic diagram of a virtual conference room in another embodiment of the present application.

圖7係本申請另一實施方式之音訊處理方法之流程圖。 FIG. 7 is a flowchart of an audio processing method in another embodiment of the present application.

圖8係本申請一實施方式之第一聲紋資訊之示意圖。 FIG. 8 is a schematic diagram of the first voiceprint information in an embodiment of the present application.

圖9係本申請另一實施方式之音訊處理方法之流程圖。 FIG. 9 is a flowchart of an audio processing method in another embodiment of the present application.

需要說明的是，本申請實施例中“至少一個”係指一個或者複數，“複數”係指兩個或多於兩個。“和/或”，描述關聯物件之關聯關係，表示可存在三種關係，例如，A和/或B可表示：單獨存在A，同時存在A和B，單獨存在B之情況，其中A，B可係單數或者複數。本申請之說明書和申請專利範圍及附圖中的術語“第一”、“第二”、“第三”、“第四”等(如果存在)係用於區別類似之物件，而非用於描述特定之順序或先後次序。 It should be noted that "at least one" in the embodiments of the present application refers to one or plural, and "plural" refers to two or more than two. "And/or" describes the relationship between related objects, and means that there may be three relationships. For example, A and/or B can mean: A exists alone, A and B exist simultaneously, and B exists alone, where A and B can be singular or plural. The terms "first", "second", "third", "fourth", etc. (if any) in the specification and scope of claims of this application and the drawings are used to distinguish similar items, rather than to describe a specific order or sequence.

另外需要說明的是，本申請實施例中公開之方法或流程圖所示出之方法，包括用於實現方法之一個或複數步驟，於不脫離請求項之範圍之情況下，複數步驟之執行順序可彼此互換，其中某些步驟也可被刪除。 In addition, it should be noted that the method disclosed in the embodiment of the application or the method shown in the flow chart includes one or a plurality of steps for realizing the method. Without departing from the scope of the claim, the execution order of the plurality of steps can be interchanged with each other, and some of the steps can also be deleted.

圖1係本申請一實施方式之音訊處理裝置100之結構示意圖。 FIG. 1 is a schematic structural diagram of an audio processing device 100 according to an embodiment of the present application.

可參閱圖1，音訊處理裝置100可以包括伺服器200和電子設備300。電子設備300包括主設備310和從設備320。主設備310係指會議主持人使用之電子設備，從設備320係指其餘參會者使用之電子設備。伺服器200通訊連接於主設備310和從設備320。主持人通過主設備310發起虛擬會議，伺服器200根據來自於主設備310之指令構建虛擬會議室，主設備310發送會議連結至從設備320，其餘參會者通過從設備320進入虛擬會議室。 Referring to FIG. 1 , the audio processing device 100 may include a server 200 and an electronic device 300 . The electronic device 300 includes a master device 310 and a slave device 320 . The master device 310 refers to the electronic device used by the conference host, and the slave device 320 refers to the electronic device used by the other participants. The server 200 is communicatively connected to the master device 310 and the slave device 320 . The moderator initiates a virtual conference through the master device 310 , the server 200 builds a virtual conference room according to the instructions from the master device 310 , the master device 310 sends a meeting link to the slave device 320 , and other participants enter the virtual conference room through the slave device 320 .

其中，通訊連接可以包括有線連接和無線連接。有線連接係指通過光纖或雙絞線等有線傳輸介質進行連接。無線連接係指通過WiFi或移動通訊網路(例如2G/3G/4G/5G)等無線傳輸介質進行連接。 Wherein, the communication connection may include a wired connection and a wireless connection. Wired connection refers to connection through wired transmission media such as optical fiber or twisted pair. Wireless connection refers to connection through wireless transmission media such as WiFi or mobile communication network (such as 2G/3G/4G/5G).

於一些實施例中，音訊處理裝置100還可以包括360度魚眼攝像機(圖未示)，360度魚眼攝像機係指可以獨立實現大範圍無死角監控之全景攝像機。360度魚眼攝像機通訊連接於伺服器200。360度魚眼攝像機可以設置於辦公室內部分工位之上空，鏡頭朝上或朝下，拍攝辦公室內部分工位。伺服器200將360度魚眼攝像機拍攝到之工位映射到虛擬會議室模型中，使得工位上之人員如同置身於虛擬會議室中。當鏡頭拍攝到之畫面為倒置畫面時，伺服器200對主設備310和從設備320顯示出之畫面進行倒置處理，以校正畫面之方向。 In some embodiments, the audio processing device 100 may also include a 360-degree fish-eye camera (not shown in the figure). A 360-degree fish-eye camera refers to a panoramic camera that can independently monitor a large area without blind spots. The 360-degree fisheye camera is communicatively connected to the server 200. The 360-degree fisheye camera can be set above some workstations in the office, with the lens pointing up or down, to photograph some workstations in the office. The server 200 maps the workstations captured by the 360-degree fisheye camera to the virtual conference room model, so that the people at the workstations feel as if they are in the virtual conference room. When the picture captured by the camera is an inverted picture, the server 200 performs inversion processing on the pictures displayed by the master device 310 and the slave device 320 to correct the direction of the picture.

伺服器200可以包括第一處理器210和第一記憶體220，第一處理器210可以運行存儲於第一記憶體220中的電腦程式或代碼，實現本申請一些實施例之音訊處理方法。 The server 200 may include a first processor 210 and a first memory 220, and the first processor 210 may run computer programs or codes stored in the first memory 220 to implement the audio processing method of some embodiments of the present application.

第一處理器210可以包括一個或複數處理單元。例如，第一處理器210可以包括，但不限於，應用處理器(Application Processor,AP)、調製解調處理器、圖形處理器(Graphics Processing Unit,GPU)、圖像訊號處理器(Image Signal Processor,ISP)、控制器、視頻轉碼器、數位訊號處理器(Digital Signal Processor,DSP)、基帶處理器、神經網路處理器(Neural-Network Processing Unit,NPU)等。其中，不同之處理單元可以係獨立之器件，也可以集成於一個或複數處理器中。 The first processor 210 may include one or a plurality of processing units. For example, the first processor 210 can include, but it is not limited to the Application Processor (AP), the modem processor, the graphics processor, the Graphics Processing Unit (GPU), the image signal processor (IMP), and the control. Makers, video transcoditors, and digital signal processors (DSP), baseband processors, neuros network processors, NPUs, NPUs, etc. Wherein, different processing units may be independent devices, or may be integrated in one or multiple processors.

第一處理器210中還可以設置記憶體，用於存儲指令和資料。於一些實施例中，第一處理器210中的記憶體為高速緩衝記憶體。該記憶體可以保存第一處理器210剛用過或迴圈使用之指令或資料。如果第一處理器210需要再次使用該指令或資料，可從所述記憶體中直接調用。 A memory may also be provided in the first processor 210 for storing instructions and data. In some embodiments, the memory in the first processor 210 is a cache memory. The memory can store instructions or data that the first processor 210 has just used or used repeatedly. If the first processor 210 needs to use the instruction or data again, it can be called directly from the memory.

於一些實施例中，第一處理器210可以包括一個或複數介面。介面可以包括，但不限於，積體電路(Inter-Integrated Circuit，I2C)介面、積體電路內置音訊(Inter-Integrated Circuit Sound,I2S)介面、脈衝碼調制(Pulse Code Modulation,PCM)介面、通用非同步收發傳輸器(Universal Asynchronous Receiver/Transmitter,UART)介面、移動產業處理器介面(Mobile Industry Processor Interface,MIPI)、通用輸入輸出(General-Purpose Input/Output,GPIO)介面、使用者標記模組(Subscriber Identity Module,SIM)介面、通用序列匯流排(Universal Serial Bus,USB)介面等。 In some embodiments, the first processor 210 may include one or more interfaces. The interface may include, but not limited to, an Inter-Integrated Circuit (I2C) interface, an Inter-Integrated Circuit Sound (I2S) interface, a Pulse Code Modulation (PCM) interface, a Universal Asynchronous Receiver/Transmitter (UART) interface, and a Mobile Industry Processor Interface (Mobile Industry Processor Interface, MIPI), General-Purpose Input/Output (GPIO) interface, Subscriber Identity Module (SIM) interface, Universal Serial Bus (USB) interface, etc.

可以理解，本申請實施例示意之各模組間之介面連接關係，只係示意性說明，並不構成對伺服器200之結構限定。於本申請另一些實施例中，伺服器200也可以採用上述實施例中不同之介面連接方式，或多種介面連接方式之組合。 It can be understood that the interface connection relationship between the various modules shown in the embodiment of the present application is only a schematic illustration and does not constitute a structural limitation of the server 200 . In other embodiments of the present application, the server 200 may also adopt different interface connection methods in the above embodiments, or a combination of multiple interface connection methods.

第一記憶體220可以包括外部記憶體介面和內部記憶體。其中，外部記憶體介面可以用於連接外部存儲卡，例如Micro SD卡，實現擴展伺服器200之存儲能力。外部存儲卡通過外部記憶體介面與第一處理器210通訊，實現資料存儲功能。內部記憶體可以用於存儲電腦可執行程式碼，所述可執行程式碼包括指令。內部記憶體可以包括存儲程式區和存儲資料區。其中，存儲程式區可存儲作業系統，至少一個功能所需之應用程式(例如聲音播放功能，圖像播放功能等)等。存儲資料區可存儲伺服器200使用過程中所創建之資料(例如音訊資料，圖像資料等)等。此外，內部記憶體可以包括高速隨機存取記憶體，還可以包括非易失性記憶體，例如至少一個磁碟記憶體件、快閃記憶體器件或通用快閃記憶體記憶體(Universal Flash Storage,UFS)等。第一處理器210通過運行存儲於內部記憶體之指令，和/或存儲於設置於第一處理器210中的記憶體之指令，執行伺服器200之各種功能應用以及資料處理，例如實現本申請一些實施例之音訊處理方法。 The first memory 220 may include an external memory interface and an internal memory. Wherein, the external memory interface can be used to connect an external memory card, such as a Micro SD card, to expand the storage capacity of the server 200 . The external memory card communicates with the first processor 210 through the external memory interface to realize the data storage function. Internal memory may be used to store computer-executable code, which includes instructions. The internal memory may include a program storage area and a data storage area. Wherein, the stored program area can store the operating system, at least one application program required by a function (such as sound playing function, image playing function, etc.) and the like. The data storage area can store data (such as audio data, image data, etc.) created during the use of the server 200 . In addition, the internal memory may include a high-speed random access memory, and may also include a non-volatile memory, such as at least one disk memory device, flash memory device, or Universal Flash Storage (UFS) and the like. The first processor 210 executes the instructions stored in the internal memory, and/or the instructions stored in the memory provided in the first processor 210, Execute various functional applications and data processing of the server 200, such as realizing the audio processing methods of some embodiments of the present application.

於一些實施例中，伺服器200可以包括多台虛擬機器(Virtual Machine,VM)。伺服器200具高可用性(High Availability,HA)和彈性伸縮(Auto Scaling)功能。高可用性係指可提供冗餘處理能力，當一個節點(Node)不可用或者不能處理用戶之請求時，該請求會及時轉到另外之可用節點來處理。彈性伸縮功能係指可根據業務需求和策略自動調整計算能力(即實例數量)。於業務需求增長時，彈性伸縮自動增加指定類型之實例，以保證計算能力。於業務需求下降時，彈性伸縮自動減少指定類型之實例，以節約成本。 In some embodiments, the server 200 may include multiple virtual machines (Virtual Machine, VM). The server 200 has high availability (High Availability, HA) and auto scaling (Auto Scaling) functions. High availability refers to the ability to provide redundant processing. When a node (Node) is unavailable or unable to process a user's request, the request will be transferred to another available node in time for processing. The elastic scaling function refers to the ability to automatically adjust computing power (that is, the number of instances) according to business needs and policies. When business demands grow, auto-scaling automatically increases instances of the specified type to ensure computing power. When business demand drops, auto-scaling automatically reduces instances of the specified type to save costs.

於一些實施例中，主設備310可以包括第二處理器311、第二記憶體312、第一音訊模組313及第一顯示幕314。第二處理器311電連接於其他上述部件和伺服器200之第一處理器210。第一音訊模組313用於對音訊訊號進行模數變換、編碼和解碼。第一顯示幕314用於顯示虛擬會議室之場景和部分參會者之頭像。第二處理器311可以運行存儲於第二記憶體312中的電腦程式或代碼，實現本申請另一些實施例之音訊處理方法。 In some embodiments, the main device 310 may include a second processor 311 , a second memory 312 , a first audio module 313 and a first display 314 . The second processor 311 is electrically connected to other above-mentioned components and the first processor 210 of the server 200 . The first audio module 313 is used for analog-to-digital conversion, encoding and decoding of audio signals. The first display screen 314 is used to display the scene of the virtual conference room and the avatars of some participants. The second processor 311 can run computer programs or codes stored in the second memory 312 to implement the audio processing methods of other embodiments of the present application.

於本實施例中，主設備310可以包括可追蹤用戶眼球運動方向之3自由度(Degree of Freedom,DoF)虛擬實境(Virtual Reality,VR)眼鏡或頭戴式設備(Head-Mounted Device,HMD)。 In this embodiment, the main device 310 may include 3 degrees of freedom (Degree of Freedom, DoF) virtual reality (Virtual Reality, VR) glasses or a head-mounted device (Head-Mounted Device, HMD) that can track the user's eyeball movement direction.

可以理解，第二處理器311和第二記憶體312之具體實施方式可參閱上述第一處理器210和第一記憶體220，此處不再贅述。 It can be understood that, the specific implementation manners of the second processor 311 and the second memory 312 can refer to the above-mentioned first processor 210 and the first memory 220 , which will not be repeated here.

於一些實施例中，第一音訊模組313可以設置於第二處理器311中，或將第一音訊模組313之部分功能模組設置於第二處理器311中。主設備310可以通過第一音訊模組313實現音訊功能，例如語音播放、錄音等。 In some embodiments, the first audio module 313 can be installed in the second processor 311 , or some functional modules of the first audio module 313 can be installed in the second processor 311 . The main device 310 can implement audio functions, such as voice playback and recording, through the first audio module 313 .

於另一些實施例中，主設備310還可以包括第一前置攝像頭315。第一前置攝像頭315電連接於第二處理器311。第一前置攝像頭315用於拍攝人臉和捕捉人眼之運動方向，以支持伺服器200對用戶參加虛擬會議之專注度和對會議議題之感興趣程度進行分析。 In other embodiments, the main device 310 may further include a first front camera 315 . The first front camera 315 is electrically connected to the second processor 311 . The first front camera 315 is used to take pictures of people's faces and capture the movement direction of people's eyes, so as to support the server 200 to analyze the degree of concentration of the users participating in the virtual meeting and the degree of interest in the meeting topics.

於本實施例中，主設備310可以包括智慧型電話、平板電腦、個人電腦(Personal Computer,PC)或個人數位助理(Personal Digital Assistant,PDA)。 In this embodiment, the main device 310 may include a smart phone, a tablet computer, a personal computer (Personal Computer, PC) or a personal digital assistant (Personal Digital Assistant, PDA).

於一些實施例中，從設備320可以包括第三處理器321、第三記憶體322、第二音訊模組323、第二顯示幕324及第二前置攝像頭325。第三處理器321電連接於其他上述部件、伺服器200之第一處理器210及主設備310之第二處理器311。 In some embodiments, the slave device 320 may include a third processor 321 , a third memory 322 , a second audio module 323 , a second display screen 324 and a second front camera 325 . The third processor 321 is electrically connected to other above-mentioned components, the first processor 210 of the server 200 and the second processor 311 of the main device 310 .

可以理解，從設備320之各個部件和具體實施方式可參閱主設備310。 It can be understood that the components and specific implementation of the slave device 320 can refer to the master device 310 .

本申請實施例示意之結構並不構成對伺服器200、主設備310或從設備320之具體限定。於本申請另一些實施例中，伺服器200、主設備310或從設備320可以包括比圖示更多或更少之部件，或者組合某些部件，或者拆分某些部件，或者不同之部件佈置。圖示之部件可以以硬體，軟體或軟體和硬體之組合實現。 The structure shown in the embodiment of this application does not constitute a specific limitation on the server 200 , the master device 310 or the slave device 320 . In other embodiments of the present application, the server 200 , the master device 310 or the slave device 320 may include more or less components than shown in the figure, or combine some components, or separate some components, or arrange different components. The illustrated components can be implemented in hardware, software or a combination of software and hardware.

可參閱圖2，本實施例之音訊處理方法應用於主設備310，音訊處理方法可以包括以下步驟： Referring to FIG. 2, the audio processing method of this embodiment is applied to the master device 310, and the audio processing method may include the following steps:

S101，回應於主持人之第一操作，主設備310發送建立會議之請求至伺服器200。 S101 , in response to the host's first operation, the main device 310 sends a request to establish a conference to the server 200 .

其中，第一操作可以包括於主設備310之三維圖形圖像軟體(例如Blender)中觸發建立會議之控制項。 Wherein, the first operation may include triggering a control item of setting up a meeting in the three-dimensional graphics software (such as Blender) of the main device 310 .

於本實施例中，主設備310上安裝有三維圖形圖像軟體，主持人可以於三維圖形圖像軟體中觸發建立會議之控制項，使得主設備310發送建立會議之請求至伺服器200。其中，三維圖形圖像軟體可提供全面之三維創作工具，包括建模(Modeling)、UV映射(UV-Mapping)、貼圖(Texturing)、綁定(Rigging)、蒙皮(Skinning)、動畫(Animation)、粒子(Particle)和其它系統之物理學類比(Physics)、腳本控制(Scripting)、渲染(Rendering)、運動跟蹤(Motion Tracking)、合成(Compositing)、後期處理(Post-production)等。 In this embodiment, 3D graphics software is installed on the main device 310 , and the moderator can trigger the control item of establishing a conference in the 3D graphics software, so that the main device 310 sends a request for establishing a conference to the server 200 . Among them, 3D graphics and image software can provide comprehensive 3D creation tools, including modeling (Modeling), UV-Mapping (UV-Mapping), texture (Texturing), binding (Rigging), skinning (Skinning), animation (Animation), particle (Particle) and physics analogy (Physics), scripting control (Scripting), rendering (Rendering), motion tracking (Motion Tracking), synthesis (Compositing), post-processing (Post-pro) reduction) and so on.

S102，回應於主持人之第二操作，主設備310從伺服器200之模型庫中選擇虛擬會議室模型。 S102 , in response to the host's second operation, the main device 310 selects a virtual conference room model from the model library of the server 200 .

其中，第二操作可以包括於主設備310之三維圖形圖像軟體中觸發選擇虛擬會議室模型之控制項。 Wherein, the second operation may include triggering the selection of the control item of the virtual conference room model in the three-dimensional graphics software of the main device 310 .

可參閱圖3a和圖3b，模型庫中存儲有多種不同形狀特徵之虛擬會議室，例如長方體虛擬會議室、環形虛擬會議室等。伺服器200接收到來自於主設備310之建立會議之請求後，允許主設備310訪問伺服器200之模型庫。主持人可以於三維圖形圖像軟體中選擇虛擬會議室模型，觸發選擇虛擬會議室模型之控制項，使得主設備310從伺服器200之模型庫中選擇虛擬會議室模型。 Referring to FIG. 3a and FIG. 3b, the model library stores virtual conference rooms with different shapes and characteristics, such as cuboid virtual conference rooms, circular virtual conference rooms, and the like. The server 200 allows the main device 310 to access the model library of the server 200 after receiving the meeting establishment request from the main device 310 . The moderator can select the virtual conference room model in the three-dimensional graphics software, and trigger the control item for selecting the virtual conference room model, so that the main device 310 selects the virtual conference room model from the model library of the server 200 .

S103，回應於主持人之第三操作，主設備310收集主持人之第一聲紋資訊，並將第一聲紋資訊傳送至伺服器200。 S103 , in response to the host's third operation, the main device 310 collects the first voiceprint information of the host, and sends the first voiceprint information to the server 200 .

其中，第三操作可以包括於主設備310之三維圖形圖像軟體中觸發錄製音訊之控制項。 Wherein, the third operation may include triggering a control item for recording audio in the three-dimensional graphic image software of the host device 310 .

於本實施例中，主設備310確定虛擬會議室模型後，主持人於主設備310之三維圖形圖像軟體中觸發錄製音訊之控制項，主設備310通過第一音訊模組313錄製主持人之語音訊號，從語音訊號中提取主持人之第一聲紋資訊，並將第一聲紋資訊傳送至伺服器200。其中，第一聲紋資訊可以包括語音訊號之頻率、振幅及相位差。 In this embodiment, after the main device 310 determines the model of the virtual meeting room, the moderator triggers the control item of recording audio in the three-dimensional graphics software of the main device 310, the main device 310 records the voice signal of the moderator through the first audio module 313, and extracts the first voiceprint information of the moderator from the voice signal, And send the first voiceprint information to the server 200. Wherein, the first voiceprint information may include the frequency, amplitude and phase difference of the voice signal.

S104，回應於主持人之第四操作，主設備310根據虛擬會議室模型確定參會者之座位，並發送會議連結至從設備320。 S104 , in response to the fourth operation of the moderator, the master device 310 determines the seats of the participants according to the virtual conference room model, and sends a conference link to the slave device 320 .

其中，第四操作可以包括於主設備310之三維圖形圖像軟體中觸發添加會議連結之控制項。 Wherein, the fourth operation may include triggering a control item of adding a conference link in the three-dimensional graphics software of the main device 310 .

於本實施例中，主設備310發送主持人之第一聲紋資訊至伺服器200後，主持人於主設備310之三維圖形圖像軟體中觸發添加會議連結之控制項，主設備310根據虛擬會議室模型確定參會者之座位，虛擬會議室中的每個座位對應一個唯一之會議連結。主設備310發送會議連結至從設備320。 In this embodiment, after the host device 310 sends the host's first voiceprint information to the server 200, the host triggers the control item of adding a conference link in the three-dimensional graphics software of the host device 310. The host device 310 determines the participants' seats according to the virtual conference room model, and each seat in the virtual conference room corresponds to a unique conference link. The master device 310 sends the conference link to the slave device 320 .

舉例而言，虛擬會議室中可以設置N個座位，一個座位對應一個會議連結，主設備310可以將N個會議連結分別發送至N個從設備320。參會者通過一個會議連結進入虛擬會議室後，可從對應座位之視角觀察虛擬會議室和其他參會者，且於會議中可以發言。 For example, N seats can be set in the virtual conference room, and one seat corresponds to one conference link, and the master device 310 can send the N conference links to N slave devices 320 respectively. After the participants enter the virtual meeting room through a meeting link, they can observe the virtual meeting room and other participants from the perspective of the corresponding seats, and can speak during the meeting.

於一些實施例中，虛擬會議室中還可以設置複數旁聽座位，每個旁聽座位也對應一個唯一之會議連結。主設備310可以將M個旁聽座位之會議連結分別發送至M個從設備320。參會者通過一個會議連結進入虛擬會議室後，可從旁聽座位之視角觀察虛擬會議室和其他參會者，旁聽座位之參會者不能發言。其中，M和N均為正整數。 In some embodiments, multiple observer seats can also be set in the virtual conference room, and each observer seat also corresponds to a unique conference link. The master device 310 can send the conference links of the M observation seats to the M slave devices 320 respectively. After the participants enter the virtual meeting room through a meeting link, they can observe the virtual meeting room and other participants from the perspective of the observer seats, and the participants in the observer seats cannot speak. Wherein, both M and N are positive integers.

於另一些實施例中，主設備310將虛擬會議室之會議連結發送至複數從設備320。參會者通過虛擬會議室之會議連結進入虛擬會議室後，可通過從設備320選擇座位。 In other embodiments, the master device 310 sends the conference link of the virtual meeting room to the plurality of slave devices 320 . Participants can select seats through the slave device 320 after entering the virtual meeting room through the meeting link of the virtual meeting room.

可參閱圖4，本實施例之音訊處理方法應用於從設備320，音訊處理方法可以包括以下步驟： Referring to FIG. 4, the audio processing method of this embodiment is applied to the slave device 320, and the audio processing method may include the following steps:

S201，從設備320接收來自於主設備310之會議連結。 S201 , the slave device 320 receives a conference link from the master device 310 .

於本實施例中，主設備310建立虛擬會議室後，將虛擬會議室之會議連結或虛擬會議室中座位之會議連結發送至參會者。參會者通過從設備320接收來自於主設備310之會議連結。 In this embodiment, after the virtual meeting room is established, the main device 310 sends the meeting link of the virtual meeting room or the meeting link of the seats in the virtual meeting room to the participants. Participants receive a conference link from the master device 310 through the slave device 320 .

S202，回應於參會者之第一操作，從設備320根據會議連結進入虛擬會議室。 S202, in response to the first operation of the participant, the slave device 320 enters the virtual conference room according to the conference link.

其中，第一操作可以包括於從設備320上點擊會議連結，啟動流覽器應用(例如Chrome Browser)。 Wherein, the first operation may include clicking a meeting link on the slave device 320 to start a browser application (such as Chrome Browser).

可參閱圖5，從設備320接收到來自於主設備310之會議連結後，參會者於從設備320上點擊會議連結，啟動流覽器應用，通過流覽器應用進入虛擬會議室。 Referring to FIG. 5 , after the slave device 320 receives the conference link from the master device 310 , the participant clicks the conference link on the slave device 320 to start the browser application and enter the virtual conference room through the browser application.

S203，從設備320根據虛擬會議室中是否有座位來確定會議連結是否為預定座位之會議連結。若會議連結為預定座位之會議連結，則執行步驟S204。若會議連結不係預定座位之會議連結，則執行步驟S205。 S203, the slave device 320 determines whether the conference link is a conference link with reserved seats according to whether there are seats in the virtual conference room. If the conference link is a reserved seat conference link, step S204 is performed. If the conference link is not a conference link for seat reservation, step S205 is executed.

於本實施例中，從設備320根據會議連結進入虛擬會議室後，從設備320根據虛擬會議室中是否有座位來確定會議連結是否為預定座位之會議連結。當參會者於虛擬會議室中有座位時，從設備320從預定座位之視角顯示虛擬會議室之場景和其他參會者。當參會者於虛擬會議室中沒有座位時，從設備320顯示整體虛擬會議室之場景。從設備320可以通過參會者進入虛擬會議室後之視角不同確定參會者於虛擬會議室中是否有座位，進而確定會議連結是否為預定座位之會議連結。 In this embodiment, after the slave device 320 enters the virtual conference room according to the conference link, the slave device 320 determines whether the conference link is a conference link with reserved seats according to whether there are seats in the virtual conference room. When a participant has a seat in the virtual meeting room, the slave device 320 displays the scene of the virtual meeting room and other participants from the perspective of the reserved seat. When the participants have no seats in the virtual conference room, the slave device 320 displays the overall virtual conference room scene. The slave device 320 can determine whether the participant has a seat in the virtual meeting room according to the different viewing angles of the participant after entering the virtual meeting room, and then determine whether the meeting link is a meeting link with reserved seats.

S204，回應於參會者之第二操作，從設備320收集參會者之第一聲紋資訊，並將第一聲紋資訊傳送至伺服器200。 S204, in response to the second operation of the participant, collect the first voiceprint information of the participant from the device 320, and send the first voiceprint information to the server 200.

其中，第二操作可以包括於從設備320之流覽器應用中觸發錄製音訊之控制項。 Wherein, the second operation may include triggering a control item for recording audio in the browser application of the slave device 320 .

於本實施例中，從設備320確定會議連結為預定座位之會議連結後，參會者於從設備320之流覽器應用中觸發錄製音訊之控制項，從設備320通過第二音訊模組323錄製參會者之語音訊號，從語音訊號中提取參會者之第一聲紋資訊，並將第一聲紋資訊傳送至伺服器200。 In this embodiment, after the slave device 320 determines that the conference link is the conference link for reserved seats, the participant triggers the control item of recording audio in the browser application of the slave device 320, and the slave device 320 records the voice signal of the participant through the second audio module 323, extracts the first voiceprint information of the participant from the voice signal, and sends the first voiceprint information to the server 200.

S205，回應於參會者之第三操作，從設備320確定座位。 S205 , in response to the third operation of the participant, determine the seat from the device 320 .

其中，第三操作可以包括於從設備320之流覽器應用中觸發選擇座位之控制項。 Wherein, the third operation may include triggering a control item of seat selection in the browser application of the slave device 320 .

於本實施例中，從設備320確定會議連結不係預定座位之會議連結後，參會者於從設備320之流覽器應用中觸發選擇座位之控制項，從設備320選擇座位，並讀取該座位之選定資訊。其中，座位之選定資訊包括座位已被選定或座位未被選定。 In this embodiment, after the slave device 320 determines that the conference link is not a conference link for seat reservation, the participant triggers the seat selection control item in the browser application of the slave device 320, selects a seat from the slave device 320, and reads the selected information of the seat. Wherein, the seat selection information includes the seat has been selected or the seat has not been selected.

S206，從設備320確定座位是否已被其他參會者選定。若座位已被其他參會者選定，則返回執行步驟S205。若座位未被其他參會者選定，則返回執行步驟S204。 S206, the slave device 320 determines whether the seat has been selected by other participants. If the seat has been selected by other participants, return to step S205. If the seat is not selected by other participants, return to step S204.

於本實施例中，當參會者通過從設備320選擇座位後，從設備320可以讀取該座位之選定資訊，以確定該座位是否已被其他參會者選定。當參會者選擇之座位未被其他參會者選定時，從設備320可從選定座位之視角顯示虛擬會議室之場景和其他參會者。當參會者選擇之座位已被其他參會者選定時，從設備320提示參會者重選座位。 In this embodiment, after a participant selects a seat through the slave device 320, the slave device 320 can read the selection information of the seat to determine whether the seat has been selected by other participants. When the seat selected by the participant is not selected by other participants, the slave device 320 can display the scene of the virtual conference room and other participants from the perspective of the selected seat. When the seat selected by the participant has been selected by other participants, the slave device 320 prompts the participant to reselect the seat.

S207，從設備320顯示虛擬會議室之座點陣圖。 S207, the slave device 320 displays a bitmap of seats in the virtual conference room.

於本實施例中，當從設備320將第一聲紋資訊傳送至伺服器200後，顯示虛擬會議室之座點陣圖。可參閱圖6，環形虛擬會議室中有6個座位，6個座位環繞形成虛擬會議室之座點陣圖，呈現真實虛擬會議室之效果。 In this embodiment, after the slave device 320 transmits the first voiceprint information to the server 200, the seat dot matrix of the virtual conference room is displayed. Please refer to Figure 6, there are 6 seats in the circular virtual meeting room, and the 6 seats are surrounded to form a dot matrix of the virtual meeting room, presenting the effect of a real virtual meeting room.

可參閱圖7，本實施例之音訊處理方法應用於伺服器200，音訊處理方法可以包括以下步驟： Referring to FIG. 7, the audio processing method of this embodiment is applied to the server 200, and the audio processing method may include the following steps:

S301，伺服器200接收來自於主設備310之建立會議之請求。 S301, the server 200 receives a request from the master device 310 to establish a meeting.

於本實施例中，主持人可以於三維圖形圖像軟體中觸發建立會議之控制項，使得主設備310發送建立會議之請求至伺服器200。伺服器200接收來自於主設備310之建立會議之請求。 In this embodiment, the moderator can trigger the control item of establishing a meeting in the 3D graphics software, so that the main device 310 sends a request for establishing a meeting to the server 200 . The server 200 receives a conference establishment request from the main device 310 .

S302，伺服器200根據建立會議之請求向主設備310開放模型庫之存取權限。 S302, the server 200 releases the access authority of the model library to the main device 310 according to the request for establishing a meeting.

於本實施例中，伺服器200接收到來自於主設備310之建立會議之請求後，向主設備310開放模型庫之存取權限，允許主設備310訪問伺服器200之模型庫並從模型庫中調用虛擬會議室模型。 In this embodiment, after the server 200 receives the meeting establishment request from the main device 310, it opens the access authority of the model library to the main device 310, allowing the main device 310 to access the model library of the server 200 and call the virtual meeting room model from the model library.

S303，伺服器200根據主設備310所選定之虛擬會議室模型建立虛擬會議室。 S303, the server 200 creates a virtual meeting room according to the virtual meeting room model selected by the main device 310.

於本實施例中，主持人可以於三維圖形圖像軟體中選擇虛擬會議室模型，觸發選擇虛擬會議室模型之控制項，使得主設備310從伺服器200之模型庫中選擇虛擬會議室模型。伺服器200根據主設備310所選定之虛擬會議室模型建立虛擬會議室。 In this embodiment, the moderator can select a virtual conference room model in the 3D graphics software, and trigger the control item for selecting the virtual conference room model, so that the main device 310 selects the virtual conference room model from the model library of the server 200 . The server 200 establishes a virtual conference room according to the virtual conference room model selected by the host device 310 .

於一些實施例中，伺服器200可以根據預設之虛擬會議室比例建立虛擬會議室模型，並通過UV映射之工具，使得主設備310或從設備320可以顯示虛擬會議室模型之動態畫面。 In some embodiments, the server 200 can create a virtual conference room model according to the preset ratio of the virtual conference room, and through the UV mapping tool, the master device 310 or the slave device 320 can display the dynamic image of the virtual conference room model.

於另一些實施例中，伺服器200可以根據預存之虛擬會議室圖片，從虛擬會議室圖片中提取紋理特徵，並通過貼圖之工具，將紋理特徵添加到預設之基本模型中，使得主設備310或從設備320可以顯示虛擬會議室模型之靜態畫面。 In some other embodiments, the server 200 can extract texture features from the virtual meeting room picture according to the pre-stored virtual meeting room picture, and add the texture feature to the preset basic model through a mapping tool, so that the master device 310 or the slave device 320 can display the static picture of the virtual meeting room model.

S304，伺服器200根據虛擬會議室之座位分佈設置網格頂點之數目。 S304, the server 200 sets the number of grid vertices according to the seat distribution of the virtual conference room.

其中，網格(Mesh)係三維圖形圖像軟體構圖之基本單元，虛擬會議室由複數網格拼接構成。一個網格包括4個頂點(Vertex)。虛擬會議室中一個座位所覆蓋區域包含之網格頂點之數目越多，該區域之網格頂點之密度也就越大。伺服器200根據虛擬會議室之座位分佈設置網格頂點之數目，於各個座位所覆蓋區域設置不同數目之網格頂點，即各個座位所覆蓋區域之網格頂點之密度不同，使得座位與網格頂點之數目或密度形成一一對應之關係。 Among them, Mesh is the basic unit of three-dimensional graphics and image software composition, and the virtual conference room is composed of multiple meshes. A grid includes 4 vertices (Vertex). The more the number of grid vertices contained in the area covered by a seat in the virtual conference room, the greater the density of the grid vertices in this area. The server 200 sets the number of grid vertices according to the seat distribution of the virtual meeting room, and sets different numbers of grid vertices in the area covered by each seat, that is, the density of the grid vertices in the area covered by each seat is different, so that the number or density of the seats and the grid vertices form a one-to-one correspondence.

S305，伺服器200接收來自於主設備310或從設備320之第一聲紋資訊。 S305, the server 200 receives the first voiceprint information from the master device 310 or the slave device 320 .

其中，第一聲紋資訊可以包括語音訊號之頻率、振幅及相位差。 Wherein, the first voiceprint information may include the frequency, amplitude and phase difference of the voice signal.

於本實施例中，當主持人發言時，主持人於主設備310之三維圖形圖像軟體中觸發錄製音訊之控制項，主設備310通過第一音訊模組313錄製主持人之語音訊號，從語音訊號中提取主持人之第一聲紋資訊，並將第一聲紋資訊傳送至伺服器200。伺服器200可以接收來自於主設備310之第一聲紋資訊。 In this embodiment, when the host speaks, the host triggers the audio recording control item in the three-dimensional graphics software of the main device 310. The main device 310 records the voice signal of the host through the first audio module 313, extracts the first voiceprint information of the host from the voice signal, and transmits the first voiceprint information to the server 200. The server 200 can receive the first voiceprint information from the main device 310 .

當參會者發言時，參會者於從設備320之流覽器應用中觸發錄製音訊之控制項，從設備320通過第二音訊模組323錄製參會者之語音訊號，從語音訊號中提取參會者之第一聲紋資訊，並將第一聲紋資訊傳送至伺服器200。伺服器200可以接收來自於從設備320之第一聲紋資訊。 When a participant speaks, the participant triggers the audio recording control item in the browser application of the slave device 320, and the slave device 320 records the participant's voice signal through the second audio module 323, extracts the participant's first voiceprint information from the voice signal, and sends the first voiceprint information to the server 200. The server 200 can receive the first voiceprint information from the slave device 320 .

S306，伺服器200根據網格頂點之數目調整第一聲紋資訊之頻率或振幅，得到第二聲紋資訊。 S306, the server 200 adjusts the frequency or amplitude of the first voiceprint information according to the number of grid vertices to obtain the second voiceprint information.

於本實施例中，虛擬會議室中每個座位具對應之網格頂點之數目。伺服器200根據網格頂點之數目調整第一聲紋資訊之頻率或振幅。例如，網格頂點數目越多或密度越大之座位對應之第一聲紋資訊之頻率越高或振幅越大。當第一座位所覆蓋區域之網格頂點之數目n₁與第二座位所覆蓋區域之網格頂點之數目n₂滿足：n₁>n₂時，伺服器200調整來自於第一座位之第一聲紋資訊或來自於第二座位之第一聲紋資訊，使得來自於第一座位之第一聲紋資訊和來自於第二座位之第一聲紋資訊滿足：f₁>f₂或a₁>a₂，其中，f₁表示來自於第一座位之第一聲紋資訊之頻率，f₂表示來自於第二座位之第一聲紋資訊之頻率，a₁表示來自於第一座位之第一聲紋資訊之振幅，a₂表示來自於第二座位之第一聲紋資訊之振幅。 In this embodiment, each seat in the virtual conference room has a corresponding number of grid vertices. The server 200 adjusts the frequency or amplitude of the first voiceprint information according to the number of grid vertices. For example, seats with more grid vertices or higher density correspond to higher frequencies or larger amplitudes of the first voiceprint information.當第一座位所覆蓋區域之網格頂點之數目n ₁與第二座位所覆蓋區域之網格頂點之數目n ₂滿足：n ₁ >n ₂時，伺服器200調整來自於第一座位之第一聲紋資訊或來自於第二座位之第一聲紋資訊，使得來自於第一座位之第一聲紋資訊和來自於第二座位之第一聲紋資訊滿足：f ₁ >f ₂或a ₁ >a ₂ ，其中，f ₁表示來自於第一座位之第一聲紋資訊之頻率，f ₂表示來自於第二座位之第一聲紋資訊之頻率，a ₁表示來自於第一座位之第一聲紋資訊之振幅，a ₂表示來自於第二座位之第一聲紋資訊之振幅。

舉例而言，可參閱圖8，伺服器200預先設置每個座位之網格頂點之數目，當伺服器200接收到6個參會者之第一聲紋資訊時，可以對6段第一聲紋資訊進行處理，根據每個座位之網格頂點之數目或密度調整相應之第一聲紋資訊之頻率或振幅，得到6段第二聲紋資訊，以提高聲紋資訊之可辨識性。 For example, referring to FIG. 8, the server 200 presets the number of grid vertices for each seat. When the server 200 receives the first voiceprint information of 6 participants, it can process the 6 pieces of first voiceprint information, adjust the frequency or amplitude of the corresponding first voiceprint information according to the number or density of the grid vertices of each seat, and obtain 6 pieces of second voiceprint information, so as to improve the recognizability of the voiceprint information.

S307，伺服器200根據第二聲紋資訊確定發言者於虛擬會議室中的座位。 S307, the server 200 determines the speaker's seat in the virtual conference room according to the second voiceprint information.

於本實施例中，當伺服器200擷取第一聲紋資訊後，無法確定第一聲紋資訊之來源。伺服器200將虛擬會議室中的每個座位所覆蓋區域之網格頂點之數目與第一聲紋資訊建立對應關係，網格頂點之數目越多之區域對應之第一聲紋資訊之頻率或振幅越高。伺服器200根據網格頂點之數目調整來自於不同座位之第一聲紋資訊之頻率或振幅，得到更具辨識性之第二聲紋資訊。由於每個座位上之第二聲紋資訊之頻率或振幅不同，使得第二聲紋資訊與每個座位具一一對應之關係，伺服器200由此可以根據第二聲紋資訊確定發言者於虛擬會議室中的座位。 In this embodiment, after the server 200 captures the first voiceprint information, the source of the first voiceprint information cannot be determined. The server 200 establishes a correspondence between the number of grid vertices in the area covered by each seat in the virtual conference room and the first voiceprint information, and the area with more grid vertices corresponds to a higher frequency or amplitude of the first voiceprint information. The server 200 adjusts the frequency or amplitude of the first voiceprint information from different seats according to the number of grid vertices to obtain more recognizable second voiceprint information. Since the frequency or amplitude of the second voiceprint information on each seat is different, the second voiceprint information is different from each seat. With a one-to-one correspondence, the server 200 can determine the speaker's seat in the virtual conference room according to the second voiceprint information.

可參閱圖9，本實施例之音訊處理方法應用於伺服器200，音訊處理方法可以包括以下步驟： Referring to FIG. 9, the audio processing method of this embodiment is applied to the server 200, and the audio processing method may include the following steps:

S401，伺服器200控制從設備320採集參會者之眼球運動方向資訊。 S401, the server 200 controls the slave device 320 to collect eye movement direction information of the participants.

於本實施例中，當一個參會者正於發言時，伺服器200識別出該參會者之聲紋資訊後，控制其他從設備320採集其他參會者之眼球運動方向資訊。 In this embodiment, when a participant is speaking, the server 200 controls other slave devices 320 to collect eye movement direction information of other participants after identifying the voiceprint information of the participant.

S402，伺服器200根據參會者之眼球運動方向資訊確定參會者之專心度。 S402, the server 200 determines the participant's concentration according to the participant's eyeball movement direction information.

其中，專心度係指參會者對發言者講話內容之專心程度或對會議議題之專心程度。專心度越高表示參會者對會議議題越有興趣。當一個發言者正於發言時，伺服器200接收到其他參會者之眼球運動方向資訊後，可以根據其他參會者之眼球運動方向資訊確定其他參會者之專心度。 Among them, the degree of concentration refers to the degree of concentration of the participants on the content of the speaker's speech or the degree of concentration on the topic of the meeting. The higher the degree of concentration, the more interested the participants are in the meeting topics. When a speaker is speaking, the server 200 can determine the concentration of other participants according to the eye movement direction information of other participants after receiving the eye movement direction information of other participants.

舉例而言，當一個發言者正於發言時，如果一個參會者之眼球運動方向朝向該發言者，則表示該參會者當前係專心之，可將該參會者之專心度標記為1。如果該參會者之眼球運動方向遠離該發言者，則表示該參會者當前不專心，可將該參會者之專心度標記為0。於整場會議之10輪發言中，如果一個參會者專心度為1之次數為6輪，專心度為0之次數為4輪，可認為該參會者對會議議題之專心度為6/10=0.6。 For example, when a speaker is speaking, if a participant's eyeball moves towards the speaker, it means that the participant is currently concentrating, and the concentration of the participant can be marked as 1. If the participant's eye movement direction is far away from the speaker, it means that the participant is currently not paying attention, and the concentration of the participant can be marked as 0. In the 10 rounds of speeches in the whole meeting, if the number of times a participant's concentration is 1 is 6 rounds, and the number of times of concentration is 0 is 4 rounds, it can be considered that the participant's concentration on the conference topic is 6/10=0.6.

S403，伺服器200根據參會者之專心度確定參會者對會議議題是否有興趣。 S403, the server 200 determines whether the participant is interested in the conference topic according to the concentration of the participant.

於本實施例中，伺服器200統計參會者對會議議題之專心度，可將專心度與預設之興趣閾值進行比較，如果專心度大於或等於興趣閾值，則說明參會者對會議議題有興趣。如果專心度小於興趣閾值，則說明參會者對會議議題沒有興趣。 In this embodiment, the server 200 counts the concentration of the participants on the conference topic, and can compare the concentration with a preset interest threshold. If the concentration is greater than or equal to the interest threshold, it means that the participant is interested in the conference topic. If the concentration is less than the interest threshold, it means that the participants are not interested in the meeting topic.

舉例而言，預設之興趣閾值為0.6，於整場會議中，如果一個參會者對會議議題之專心度為0.5，由於該參會者對會議議題之專心度小於興趣閾值，則說明該參會者對會議議題沒有興趣。如果一個參會者對會議議題之專心度為0.7，由於該參會者對會議議題之專心度大於興趣閾值，則說明該參會者對會議議題有興趣。 For example, the preset interest threshold is 0.6. In the whole meeting, if a participant's concentration on the conference topic is 0.5, since the participant's concentration on the conference topic is less than the interest threshold, it means that the participant is not interested in the conference topic. If a participant's concentration on the conference topic is 0.7, since the participant's concentration on the conference topic is greater than the interest threshold, it means that the participant is interested in the conference topic.

本申請實施例還提供一種存儲介質，用於存儲電腦程式或代碼，當所述電腦程式或代碼被處理器執行時，實現本申請實施例之音訊處理方法。 The embodiment of the present application also provides a storage medium for storing computer programs or codes, and when the computer programs or codes are executed by a processor, the audio processing method of the embodiments of the present application is implemented.

存儲介質包括於用於存儲資訊(諸如電腦可讀指令、資料結構、程式模組或其它資料)之任何方法或技術中實施之易失性和非易失性、可移除和不可移除介質。存儲介質包括，但不限於，隨機存取記憶體(Random Access Memory,RAM)、唯讀記憶體(Read-Only Memory,ROM)、帶電可擦可程式設計唯讀記憶體(Electrically Erasable Programmable Read-Only Memory,EEPROM)、快閃記憶體或其它記憶體、唯讀光碟(Compact Disc Read-Only Memory,CD-ROM)、數位通用光碟(Digital Versatile Disc,DVD)或其它光碟存儲、磁盒、磁帶、磁片存儲或其它磁存儲裝置、或者可以用於存儲期望之資訊並且可以被電腦訪問之任何其它之介質。 Storage media includes volatile and nonvolatile, removable and non-removable media implemented in any method or technology for storage of information such as computer readable instructions, data structures, program modules or other data. Storage media include, but are not limited to, random access memory (Random Access Memory, RAM), read-only memory (Read-Only Memory, ROM), electrically erasable programmable read-only memory (Electrically Erasable Programmable Read-Only Memory, EEPROM), flash memory or other memories, compact disc read-only memory (CD-ROM), digital versatile disc ( Digital Versatile Disc (DVD) or other optical disc storage, magnetic cartridge, magnetic tape, magnetic disk storage or other magnetic storage device, or any other medium that can be used to store desired information and can be accessed by a computer.

上面結合附圖對本申請實施例作了詳細說明，但本申請不限於上述實施例，於所屬技術領域普通具通常技藝者所具備之知識範圍內，還可於不脫離本申請宗旨之前提下做出各種變化。 The embodiments of the present application have been described in detail above in conjunction with the accompanying drawings, but the present application is not limited to the above embodiments, and various changes can be made within the knowledge of ordinary skilled persons in the technical field without departing from the purpose of the present application.

S301-S307:步驟 S301-S307: Steps

Claims

An audio processing method for a virtual meeting room, comprising: setting the number of grid vertices according to the seat distribution of the virtual meeting room, specifically including: setting different numbers of the grid vertices in the area covered by each seat, so as to establish a corresponding relationship between the seat and the number of grid vertices; extracting the first voiceprint information of the speaker, the first voiceprint information including the frequency, amplitude and phase difference of the voice signal; adjusting the frequency or amplitude of the first voiceprint information according to the number of grid vertices to obtain second voiceprint information; Seats in a virtual conference room.

The audio processing method according to claim 1, wherein said adjusting the frequency or amplitude of the first voiceprint information according to the number of grid vertices includes: when the number of grid vertices in the area covered by the first seat is greater than the number of grid vertices in the area covered by the second seat, increasing the frequency of the first voiceprint information from the first seat, or lowering the frequency of the first voiceprint information from the second seat, so that the frequency of the first voiceprint information from the first seat is greater than the frequency of the first voiceprint information from the second seat.

The audio processing method according to claim 1, wherein the adjusting the frequency or amplitude of the first voiceprint information according to the number of grid vertices includes: when the number of grid vertices in the area covered by the first seat is greater than the number of grid vertices in the area covered by the second seat, increasing the amplitude of the first voiceprint information from the first seat, or reducing the amplitude of the first voiceprint information from the second seat, so that the amplitude of the first voiceprint information from the first seat is greater than the amplitude of the first voiceprint information from the second seat.

The audio processing method according to claim 1, wherein, after determining the seat of the speaker in the virtual conference room according to the second voiceprint information, the party The method also includes: extracting the eye movement direction information of the participant; determining the concentration of the participant according to the eye movement direction information, and the concentration value is 0 or 1; determining whether the participant is interested in the conference topic according to the concentration.

The audio processing method according to claim 4, wherein said determining the concentration of the participant according to the eye movement direction information includes: marking the concentration as 1 when the eye movement direction of the participant is toward the speaker; marking the concentration as 0 when the eye movement direction of the participant is away from the speaker.

The audio processing method as described in claim 5, wherein the method further includes: when there are multiple speakers, counting the value of the concentration of the participants when each speaker speaks; and determining the concentration of the participants on the conference topic according to the value of the concentration.

The audio processing method as described in claim 4, wherein the determining whether the participant is interested in the conference topic according to the degree of concentration includes: determining that the participant is interested in the conference topic when the value of the concentration degree is greater than or equal to a preset interest threshold; determining that the participant is not interested in the conference topic when the value of the concentration degree is smaller than the interest threshold.

An audio processing device, including a server, a master device and a slave device, the master device is used to initiate a virtual meeting, and the server is used to A virtual conference room is constructed, the slave device is used to enter the virtual conference room according to the link from the master device, the server includes a first processor and a first memory, and the improvement is that the first processor runs a computer program or code stored in the first memory to implement the audio processing method as described in any one of claims 1 to 7.

A storage medium for storing computer programs or codes, the improvement is that when the computer programs or codes are executed by a processor, the audio processing method as described in any one of Claims 1 to 7 is realized.