WO2018010129A1 - 基于电话会议的会议记录生成方法和装置 - Google Patents

基于电话会议的会议记录生成方法和装置 Download PDF

Info

Publication number
WO2018010129A1
WO2018010129A1 PCT/CN2016/089950 CN2016089950W WO2018010129A1 WO 2018010129 A1 WO2018010129 A1 WO 2018010129A1 CN 2016089950 W CN2016089950 W CN 2016089950W WO 2018010129 A1 WO2018010129 A1 WO 2018010129A1
Authority
WO
WIPO (PCT)
Prior art keywords
conference
content
voice
voice content
text
Prior art date
Application number
PCT/CN2016/089950
Other languages
English (en)
French (fr)
Inventor
张立新
Original Assignee
深圳市沃特沃德股份有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市沃特沃德股份有限公司 filed Critical 深圳市沃特沃德股份有限公司
Priority to PCT/CN2016/089950 priority Critical patent/WO2018010129A1/zh
Publication of WO2018010129A1 publication Critical patent/WO2018010129A1/zh

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04MTELEPHONIC COMMUNICATION
    • H04M3/00Automatic or semi-automatic exchanges
    • H04M3/42Systems providing special services or facilities to subscribers
    • H04M3/56Arrangements for connecting several subscribers to a common circuit, i.e. affording conference facilities

Definitions

  • the present invention relates to the field of teleconferencing technologies, and in particular, to a conference recording generation method and apparatus based on a conference call.
  • the current teleconferencing system only has the function of recording or recording, and for the meeting record, it needs to be manually recorded by the personnel. After the meeting is finished, the document is sent to the participants of the meeting, and the operation is cumbersome and inefficient.
  • a primary object of the present invention is to provide a method and apparatus for generating a conference call based on a conference call, which aims to solve the technical problem of inefficiency in conducting conference calls and collating conference records.
  • the present invention provides a conference call generation method based on a conference call, the method comprising the steps of:
  • the step of acquiring the voice content collected by each conference terminal includes:
  • the conference terminal segment of the voice content source saves the voice content, and adds identification information to each segment of the voice content, where the identifier information includes at least a device identifier of the conference terminal corresponding to the voice content.
  • the step of saving the voice content according to the conference terminal segment of the voice content source includes: saving the voice content continuously collected by one conference terminal as a piece of voice content.
  • the step of saving the voice content according to the conference terminal segment of the voice content source comprises: performing a smart sentence segmentation on a voice content continuously collected by a conference terminal, and saving each sentence voice content as a segment Voice content.
  • the identifier information further includes a sentence number of the voice content.
  • the device identification code of the conference terminal is a unique identification code of the conference terminal or a sequence code of the conference terminal joining the conference.
  • the step of converting the voice content into text content includes:
  • Each piece of voice content is converted into a piece of text content, and each piece of text content is added with identification information that matches the identification information of the corresponding voice content.
  • the method further includes:
  • the method further includes:
  • the step of converting each piece of voice content into a piece of text content separately includes
  • the method further includes: when receiving a voice playback instruction for the text content, acquiring corresponding voice content according to the link relationship and playing the same
  • the present invention provides a conference call generation device based on a conference call, and the device includes: [0025] a voice content acquisition module, configured to acquire voice content collected by each conference terminal;
  • a voice recognition module configured to convert the voice content into text content
  • a meeting record generating module configured to generate a meeting record according to the text content, and store the meeting record and/or send the meeting record to a specified address.
  • the voice content acquisition module includes a receiving unit and a segmentation unit, where:
  • the receiving unit is configured to collect voice content by using each conference terminal, and receive the voice content sent by each conference terminal;
  • the segmentation unit is configured to save the voice content according to the conference terminal segment of the voice content source, and add identification information to each segment of the voice content, where the identifier information includes at least the voice content corresponding to the voice content.
  • the device ID of the conference terminal is configured to save the voice content according to the conference terminal segment of the voice content source, and add identification information to each segment of the voice content, where the identifier information includes at least the voice content corresponding to the voice content.
  • the segmentation unit is configured to: save the voice content continuously collected by one conference terminal as a piece of voice content.
  • the segmentation unit is configured to: perform intelligent segmentation on the voice content continuously collected by one conference terminal, and save each sentence voice content as a piece of voice content.
  • the voice recognition module is configured to: convert each piece of voice content into a piece of text content, and add identification information that matches the identification information of the corresponding voice content to each piece of text content.
  • the conference record generation module includes an editing unit, and the editing unit is configured to: when the editing instruction for a piece of text content is received, edit the text content.
  • the conference record generation module includes a translation unit, and the translation unit is configured to: when the translation instruction for a piece of text content is received, translate the text content.
  • the conference record generating module further includes a voice playback unit, where the voice recognition module is further configured to: establish a link relationship between the at least one piece of text content and the voice content corresponding thereto;
  • the voice playback unit is configured to: when receiving a voice playback instruction for the text content, acquire a corresponding voice content according to the link relationship and play the same.
  • a method for generating a conference call based on a conference call which automatically converts voice content recorded by each conference terminal into text content by using a voice recognition technology, and generates a conference record according to the text content.
  • the automatic generation of the conference record of the conference call is realized, which omits the cumbersome process of manually sorting the conference record, improves the operation efficiency, and makes the conference call system more intelligent.
  • the voice content is segmented and the text content is recorded in segments, so that the speaker of the segment can be clearly distinguished in the meeting record, so that the meeting record is more clear.
  • the playback and editing functions enable the user to perform the check and revision of the meeting records, making the meeting records more accurate.
  • the translation function the meeting record content can be translated into the required language, thus meeting the needs of international conference calls.
  • FIG. 1 is a block diagram of an alternative teleconferencing system embodying embodiments of the present invention.
  • FIG. 2 is a schematic structural diagram of a typical conference call system for implementing various embodiments of the present invention
  • FIG. 3 is a schematic structural diagram of a typical video conference terminal in the conference call system of FIG. 2
  • [0043] 4 is a flowchart of a first embodiment of a method for generating a conference call based conference call according to the present invention
  • FIG. 5 is a flowchart of a second embodiment of a method for generating a conference call based conference call according to the present invention.
  • FIG. 6 is a block diagram showing an embodiment of a conference call generation device based on a conference call according to the present invention
  • FIG. 7 is a block diagram of an optional voice content acquisition module in the conference record generation apparatus of FIG. 6; [0047] FIG. 8 is a block diagram of an optional conference record generation module in the conference record generation apparatus of FIG. schematic diagram.
  • terminal and terminal device used herein include both a device of a wireless signal receiver, a device having only a wireless signal receiver without a transmitting capability, and a receiving and receiving device.
  • Such a device may comprise: a cellular or other communication device having a single line display or a multi-line display or a cellular or other communication device without a multi-line display; PCS (Persona 1 Communications Service), which may combine voice, Data processing, fax and/or data communication capabilities; PDA (Personal Digital Assistant), which can include radio frequency receivers, pagers, Internet/Intranet access, web browsers, notepads, calendars and/or GPS ( Global Positioning System, Receiver; Conventional laptop and/or palmtop computer or other device having a conventional laptop and/or palmtop computer or other device that includes and/or includes a radio frequency receiver.
  • PCS Personala 1 Communications Service
  • PDA Personal Digital Assistant
  • terminal may be portable, transportable, installed in a vehicle (aviation, sea and/or land), or adapted and/or configured to operate locally, and/or Run in any other location on the Earth and/or space in a distributed fashion.
  • the "terminal” and “terminal device” used herein may also be a communication terminal, an internet terminal, a music/video playback terminal, and may be, for example, a PDA, a MID (Mobile Internet Device), and/or have a music/video playback.
  • Functional mobile phones can also be smart TVs, set-top boxes and other devices.
  • the method and apparatus for generating a conference record based on a conference call are mainly applied to a conference call.
  • the conference call here should be understood in a broad sense, including both a pure voice conference and a video conference.
  • a block diagram of an optional teleconferencing system for implementing various embodiments of the present invention includes a server 10 and a conference terminal 20.
  • the conference terminal 20 has at least two terminals, which may be various terminal devices that join the conference, such as mobile terminals such as mobile phones and tablets, and personal computers.
  • a computer terminal such as a notebook computer, and a video conferencing terminal dedicated to the conference call, and the like;
  • the server 10 is a device for implementing the conference record generation method of the embodiment of the present invention, and is usually a cloud server that exclusively hosts the conference call, or One of the terminal devices joining the conference is designated as the server 10 that generates the conference record.
  • FIG. 2 it is a schematic structural diagram of a typical teleconferencing system.
  • the conference call system includes a cloud server 11, and six conference terminals respectively establishing a wired or wireless connection with the cloud server 11, including a video conference terminal 21 located at the main conference site A, and a video conference terminal 22 located at the conference site B.
  • the video conference terminal 23 of the conference site C is located at the video conference terminal 24 of the conference hall D, the smartphone 25 carried by the business traveler, and the laptop computer 26 carried by the business traveler.
  • smartphones and laptops can implement teleconferencing by loading teleconferencing client software.
  • FIG. 2 is only an optional embodiment, and the present invention does not impose any limitation.
  • FIG. 3 it is a schematic structural diagram of a typical video conference terminal.
  • the video conference terminal includes at least one host 210.
  • the core component of the host 210 is preferably a high-performance 4G smart phone chip.
  • the host 210 has a high-definition rotating camera (preferably more than 5 million pixels) and a high-sensitivity omnidirectional microphone.
  • the speaker and LCD and capacitive touch screen can be used for small meetings of about 10 people in a single venue without external equipment.
  • the HDMI or VGA interface of the host 210 can connect an external HD LCD TV or projector 211, add an external wireless microphone 212 and an amplifier 213, and add a USB HD camera 214 (for full-field recording) and Bluetooth keyboard and mouse 215 (for host remote control and text editing).
  • the host 210 of the video conferencing terminal can connect to the cloud server through a wired broadband or WIFI router or an LTE 4G network to access the Internet.
  • a wired broadband or WIFI router or an LTE 4G network to access the Internet.
  • a first embodiment of a method for generating a conference call based on a conference call according to the present invention includes the following steps:
  • the server collects voice content through each conference terminal, and receives the conference terminals.
  • the voice content is sent and saved, and can be saved to a specified audio format, such as MP3, wma, wav, etc.
  • the conference terminal collects the voice content through a sound collection device (such as a microphone).
  • the conference terminal can send the collected voice content to the server in real time or in a fixed manner, or when the conference participant on the conference terminal side ends a speech, the conference terminal sends the continuously collected voice content to the server. .
  • the server After receiving the voice content sent by the conference terminal, the server saves the voice content.
  • the server may continuously receive the voice content sent by each conference terminal during the conference call, and save all the voice content received as a recording file after the conference call ends.
  • the server may also save the voice content according to the conference terminal segment of the voice content source, and add identification information to each segment of the voice content to distinguish, that is, the voice content recorded in one conference call is divided into at least In two paragraphs, each piece of voice content is saved as a recording file, and a conference call will generate at least two recording files.
  • the identification information of the voice content includes at least the device identifier of the conference terminal corresponding to the voice content, and the device identifier of the conference terminal may be a unique identifier of the conference terminal or a sequence code of the conference terminal joining the conference.
  • the unique identification code refers to the code that can uniquely identify the terminal, such as media access control (Media Access).
  • MAC Media Access Control
  • device serial number such as IMEI, MEID or ESN code
  • SIM Serial Number etc.
  • sequential coding means that after each conference terminal successively logs into the conference call system to join the conference, The system assigns the number assigned to each conference terminal in the order in which the conference terminals are registered. Further, the identification information of each piece of voice content may further include the time information of the user.
  • the server saves the voice content continuously collected by one conference terminal as a piece of voice content. That is to say, in the course of the participants' speeches on the side of each conference terminal, the voice content of one participant on the conference terminal side is saved as one recording file. Therefore, in a conference call, if the participants of each conference terminal take turns to speak N times, the voice content recorded in the conference call is divided into N segments and saved as N recording files.
  • the server intelligently breaks the voice content continuously collected by one conference terminal at a time, and saves each sentence voice content as a piece of voice content. That is, on the side of each conference terminal During the rotation of the participants, the speech content of one participant on the conference terminal side is divided into several sentences, and each sentence is saved as a recording file.
  • the identification information of each piece of voice content may also include the sequence number of the speech content of the segment, that is, the speech content of the segment is the first sentence.
  • the server may perform a smart sentence according to a preset silence interval length (eg, set to 1 second, 1.5 seconds, etc.), and each time the silent content of the voice content reaches a preset silence interval length, a segmentation sentence is performed.
  • the speech content of the sentence is saved as a piece of speech content. If it is necessary to add a sentence number, each time the sentence is broken, a unit is added to the sentence number sequence as the sentence number of the speech content of the segment.
  • the server may also perform a sentence break every other fixed segment, or use other methods in the prior art to perform sentence breaks, and no further description is provided herein.
  • the server converts the voice content into text content by using a voice recognition technology.
  • the server converts each piece of voice content into a piece of text content, and adds each piece of text content to match the identification information of the corresponding voice content. Identify the information to show the difference.
  • the matching described herein refers to the same or at least partially identical or corresponding, for example, adding identification information identical to the identification information of the corresponding voice content for each piece of text content, the identification information including at least the corresponding
  • the device identification code of the conference terminal may also include the daytime information or the serial number of the sentence.
  • the server After the conference call ends, the server generates the converted text content as a text document, and the text document is a conference record.
  • the server may first generate the converted text content as a text document during the conference call, and then add the subsequently converted text content to the text document.
  • the plurality of pieces of text content are first sorted in a certain order, and then the meeting record is generated.
  • the multi-segment text content can be sorted by the inter-turn axis (e.g., according to the order in which the text content is generated, the diurnal information in the identification information, or the sentence number, etc.).
  • the server edits the text content, such as modification, deletion, addition, and the like.
  • the editing instruction may be a preset touch operation, a key operation, a volley gesture action, a voice command, or the like.
  • the "Edit" icon is displayed.
  • the server receives the editing command and enters the editing state.
  • the editing state is exited.
  • the server may further establish a link relationship between the at least one piece of text content and the corresponding voice content.
  • the server acquires the corresponding voice content according to the link relationship and plays it.
  • the voice playback instruction may be a preset touch operation, a key operation, a volley gesture action, a voice command, or the like. For example, a "voice playback" icon is displayed at each text content, and when the user touches the "voice playback" icon, the server receives a voice playback command, finds a corresponding voice content according to the link relationship, and plays the voice content.
  • the editing instruction can be triggered to edit the text content of the paragraph.
  • the server when receiving a translation instruction for a piece of text content or the entire text content, the server translates the text content and translates one language into another language, such as translating Chinese into English, Japanese, and French. Other languages, or translate other languages such as English, Japanese, French into Chinese, or other languages, and so on.
  • the translation instruction may be a preset touch operation, a key operation, a volley gesture action, a voice command, or the like.
  • a "translation" icon is displayed at each text content, and when the user touches the "translation" icon, the server receives the translation instruction, starts translating the text content, and displays the translation near the original text content for reference. , and the translation can be specially marked to distinguish it from the original.
  • the server may store the conference record in a designated location, and/or send the conference record to the designated address.
  • the specified address may be a specified device, a designated mailbox, a designated contact, etc., for example, the meeting record is sent to the mailbox of the designated participant.
  • the conference record may also be encrypted to ensure data security.
  • the conference record document is compressed and encrypted, and the decompression password is a designated password or a password known or agreed by each participant.
  • the embodiment of the present invention is based on a conference call generation method for a conference call, and automatically converts the voice content recorded by each conference terminal into text content by using a voice recognition technology, and generates a conference record according to the text content, thereby realizing the conference record of the conference call.
  • Automatic generation eliminates the cumbersome process of manually organizing meeting minutes. Increased operational efficiency and made the teleconferencing system more intelligent.
  • the voice content is segmented and the text content is recorded in segments, so that the speaker of the segment can be clearly distinguished in the meeting record, so that the meeting record is more clear.
  • the user can perform the verification and verification of the meeting records to make the meeting records more accurate.
  • the meeting record content can be translated into the required language, thus meeting the international conference call. Demand.
  • a second embodiment of a method for generating a conference call based on a conference call according to the present invention includes the following steps:
  • the first conference terminal logs in to the conference call system of the server, submits a conference call application, and obtains a conference name and a conference access password.
  • the first conference terminal is a conference initiator, which applies for a conference call by using the registered account login server's conference call system, and inputs conference information such as a conference name, a conference day, and the like to submit an application. After receiving the application, the server returns the conference access password to the first conference terminal. In addition, the meeting name can also be automatically generated by the server.
  • the first conference terminal and the second conference terminal log in to the server of the conference call system, and join the conference call by using the conference name and the conference access password.
  • the second conference terminal is a conference invitee, and may be one or at least two. After the appointed meeting, the first conference terminal and the second conference terminal log in to the server's conference call system, and join the conference call by the conference name and the conference access password.
  • the server numbers each conference terminal according to the registration order of each conference terminal.
  • the server numbers the conference terminals according to the registration order of each conference terminal, such as uppercase letters, lowercase letters, Arabic numerals, Roman numerals, and the like.
  • the first conference terminal selects a conference recording mode, and starts a conference call. It is judged whether the smart record mode is selected, when the smart record mode is selected, step S26 is performed; when the smart record mode is not selected, step S25 is performed.
  • the user can select a conference recording mode according to requirements, where the smart recording mode, that is, the mode in which the system automatically generates the conference record in the present invention.
  • the smart record mode is not selected, such as selecting In the normal mode, the user does not want the system to automatically generate the meeting record, but manually creates the meeting record as in the prior art.
  • Each conference terminal performs recording and recording and automatically saves the specified address to the cloud server or the local storage device.
  • the system does not automatically generate the conference record, and the conference terminal performs recording and recording and automatically saves the design to the cloud server or the local storage device. address.
  • the server starts a speech recognition program, a text editing program, and a translation program.
  • the server starts the speech recognition program, the text editing program, and the translation program to automatically generate the conference record.
  • the server may display the main and sub-site conference scenes without recording only the recording, and the server may automatically set appropriate recordings for each conference terminal according to the environmental noise level of each conference terminal side.
  • Voice sensitivity Appropriate recording voice sensitivity ensures that the recording will not malfunction or miss the speech.
  • the server may also set the silence interval length between each sentence to perform intelligent segmentation on the voice content.
  • the silence interval length may be set from 1 second to 1.5 seconds.
  • a suitable silence interval length facilitates sentence breaks and queries.
  • the conference terminals collect voice content and send the content to the server.
  • the conference terminal detects that a participant has spoken, the voice content is collected by a sound collection device such as a microphone.
  • the conference terminal can send the collected voice content to the server in real time or in a fixed manner, or when the conference participant on the conference terminal ends a speech, the conference terminal sends the continuously collected voice content to the server.
  • the server receives the voice content sent by each conference terminal, and intelligently breaks the voice content continuously collected by one conference terminal, saves each voice content as a voice content, and adds identification information to each voice content.
  • the server performs intelligent segmentation on the voice content continuously collected by one conference terminal according to the preset silence interval length, and performs a sentence break every time the silent content of the voice content reaches a preset silence interval length ⁇ , save the speech content of this sentence as a piece of voice content, and add each piece of voice content And adding identification information, where the identification information includes at least a number of the conference terminal from which the voice content source is located and a sentence number sequence number of the voice content of the segment.
  • the identification information is divided into two parts. The front part uses uppercase letters to indicate which conference terminal the sound comes from (ie, the login sequence number), and the latter part uses numbers to indicate the first sentence. This makes it easy to find out which party is speaking.
  • the server converts each piece of voice content into a piece of text content by using a voice recognition program, and adds identification information equal to the identification information of the corresponding voice content to each piece of text content, and the content and the content of each piece of text.
  • the corresponding voice content establishes a link relationship.
  • the server generates a conference record according to the text content, and provides voice playback, editing, and translation functions for each piece of text content.
  • the server first sorts the pieces of text content in a certain order, and then generates a meeting record.
  • the multi-segment text content can be sorted according to the inter-turn axis (e.g., according to the order in which the text content is generated, the diurnal information in the identification information, or the sentence number, etc.).
  • voice playback, editing and translation functions are also provided for each piece of text content in the conference record.
  • the server displays the "Voice Playback", "Edit”, and “Translate” icons after each paragraph of text content or after the identification information.
  • the server receives the voice playback command, activates the voice playback function, and obtains the voice content corresponding to the segment text according to the link relationship and plays it.
  • the server receives the editing command, starts the editing function, and edits the text content through the text editing program.
  • the server When the user clicks on the "translation” (such as "Chinese-English translation”) icon, the server receives the translation instruction, activates the translation function, translates the text through the translation program, and displays the translation near the original text content for For reference, the translation can also be specially marked to distinguish it from the original.
  • translation such as "Chinese-English translation”
  • the server compresses and encrypts the conference record document (for example, the decompression password is a conference access password), and sends it to the designated mailbox of the participant.
  • the conference record document for example, the decompression password is a conference access password
  • the embodiment of the present invention is based on a conference call generation method for a conference call, and provides a smart record mode for the user during the conference call.
  • the conference record is automatically generated. Every sentence in the minutes of the meeting marks the identity of the speaker, so the minutes of the meeting are clear. Each sentence in the meeting record can be played back, edited and translated, so that the user can perform the audit of the meeting record. Pairing, modification and translation have improved the accuracy of meeting minutes and met the needs of international conference calls.
  • the device is applied to the aforementioned conference call system, particularly a server in a conference call system.
  • the server may be a cloud server that exclusively hosts a conference call, or may be a terminal device that is designated as a server to join a conference call, such as a mobile terminal such as a mobile phone or a tablet, a computer terminal such as a personal computer or a notebook computer, and a dedicated computer terminal.
  • Video conferencing terminals for conference calls and more.
  • the apparatus includes a voice content acquisition module 101, a voice recognition module 102, and a conference record generation module 103, which are sequentially connected, wherein
  • the voice content obtaining module 101 is configured to acquire voice content collected by each conference terminal.
  • the voice content acquisition module 101 collects voice content through each conference terminal, receives the voice content sent by each conference terminal, and saves it, and can save the voice content as a specified audio format, such as MP3, wma. , wav, etc.
  • the voice content obtaining module 101 may continuously receive the voice content sent by each conference terminal during the conference call, and save all the voice content received as a recording file after the conference call ends.
  • the voice content obtaining module 101 may also save the voice content according to the conference terminal segment of the voice content source, and add identification information to each segment of the voice content to distinguish, that is, the voice recorded in one conference call.
  • the content is divided into at least two segments, each piece of speech content is saved as a recording file, and a conference call will generate at least two recording files.
  • the voice content acquisition module 101 includes a receiving unit 111 and a segmentation unit 112, in which:
  • the receiving unit 111 is configured to collect voice content through each conference terminal, and receive voice content sent by each conference terminal;
  • the segmentation unit 112 is configured to save the voice content according to the conference terminal segment of the voice content source, and add the identifier information to each segment of the voice content, where the identifier information includes at least the device identifier of the conference terminal corresponding to the voice content.
  • the segmentation unit 112 saves the voice content continuously collected by one conference terminal as a piece of voice content. That is to say, in the course of speaking on the side of each conference terminal, participants will take turns The voice content of one participant at the conference terminal side is saved as a recording file. Therefore, in a conference call, if the participants of each conference terminal take turns to speak N times, the voice content recorded in the conference call is divided into N segments and saved as N recording files.
  • the segmentation unit 112 intelligently breaks the voice content continuously collected by one conference terminal at a time, and saves each sentence voice content as a piece of voice content. That is to say, in the course of the participants' speeches on the side of each conference terminal, the speech content of one participant on the conference terminal side is divided into several sentences, and each sentence is saved as a recording file.
  • the identification information of each piece of voice content may also include the number of sentences of the voice content of the paragraph, that is, the number of the voice content of the paragraph is the first sentence
  • the segmentation unit 112 may perform a smart sentence according to a preset silence interval length (eg, set to 1 second, 1.5 seconds, etc.), and each time the silent content of the voice content reaches a preset silence interval length, the process proceeds. Once the sentence is broken, the speech content of the sentence is saved as a piece of speech content. If it is necessary to add a sentence number, each time the sentence is broken, a unit is added to the sentence number sequence as the sentence number of the speech content of the segment. In addition, the segmentation unit 112 may perform a sentence break every other fixed segment, or use other methods in the prior art to perform a sentence segmentation, and the description thereof will not be repeated here.
  • a preset silence interval length eg, set to 1 second, 1.5 seconds, etc.
  • Speech recognition module 102 for converting voice content into text content.
  • the voice recognition module 102 converts voice content into text content by using voice recognition technology.
  • the voice recognition module 102 converts each piece of voice content into a piece of text content, and adds each piece of text content to match the identification information of the corresponding voice content. Identify the information to show the difference.
  • the matching described herein refers to the same or at least partially identical or corresponding, for example, adding identification information identical to the identification information of the corresponding voice content for each piece of text content, the identification information including at least corresponding
  • the device identification code of the conference terminal may also include the daytime information or the serial number of the sentence.
  • the voice recognition module 102 may further establish a link relationship between the at least one piece of text content and the corresponding voice content, so as to facilitate subsequent playback confirmation of the voice content.
  • the conference record generation module 103 configured to generate a conference record according to the text content, and store the conference record and/or send the conference record to the specified address.
  • the conference record generation module 103 After the conference call ends, the conference record generation module 103 generates the converted text content as A text document, which is a meeting record. Alternatively, the conference record generation module 103 may first generate the converted text content as a text document during the conference call, and then add the subsequently converted text content to the text document.
  • the conference record generating module 103 first sorts the plurality of pieces of text content in a certain order, and then generates a meeting record.
  • the multi-segment text content can be sorted according to the inter-turn axis (e.g., according to the order in which the text content is generated, the diurnal information in the identification information, or the sentence number, etc.).
  • the conference record generation module 103 includes an editing unit 131, and the editing unit 131 is configured to: when receiving an editing instruction for a piece of text content or the entire text content, the text content Edit, such as modify, delete, add, etc.
  • the editing instruction may be a preset touch operation, a key operation, a volley gesture action, a voice command, or the like.
  • an "Edit" icon is displayed at each text content, and when the user touches the "Edit" icon, the editing unit 131 receives the editing command, enters the editing state, and exits the editing state when the editing is completed.
  • the conference record generating module 103 further includes a voice playback unit 132, and the voice playback unit 132 is configured to: when receiving a voice playback instruction for a piece of text content, according to a link relationship Get the corresponding voice content and play it.
  • the voice playback instruction may be a preset touch operation, a key operation, a volley gesture action, a voice command, or the like.
  • a "Voice Playback" icon is displayed at each piece of text content, and when the user touches the "Voice Playback" icon, the voice playback unit 132 receives the voice playback instruction, finds the corresponding voice content according to the link relationship, and plays the voice. content.
  • the editing instruction can be triggered to edit the text content of the paragraph.
  • the conference record generating module 103 further includes a translation unit 133, configured to: when receiving a translation instruction for a piece of text content or the entire text content, Translating content, translating one language into another language, such as translating Chinese into English, essay, French, etc., or translating other languages such as English, Japanese, French into Chinese, or other languages Translation, and so on.
  • the translation instruction may be a preset touch operation, a key operation, a volley gesture action, a voice command, or the like.
  • a "translation" icon is displayed at each piece of text content, and when the user touches the "translation" icon, the translation unit 133 receives the translation instruction, starts translating the text content, and displays the translation near the original text content.
  • the translation unit 133 receives the translation instruction, starts translating the text content, and displays the translation near the original text content.
  • the translation unit 133 receives the translation instruction, starts translating the text content, and displays the translation near the original text content.
  • the conference record generation module 103 may store the conference record in a designated location, and/or send the conference record to the designated address.
  • the specified address may be a specified device, a designated mailbox, a designated contact, etc., for example, a meeting record is sent to a designated participant's mailbox.
  • the conference record generation module 103 may also encrypt the conference record to ensure data security.
  • the conference record document is compressed and encrypted, and the decompression password is a designated password or a password known or agreed by each participant.
  • the embodiment of the present invention is based on a conference call generation device for a conference call, which automatically converts the voice content recorded by each conference terminal into text content by using a voice recognition technology, and generates a conference record according to the text content, thereby realizing the conference record of the conference call.
  • Automatic generation eliminates the cumbersome process of manually organizing meeting minutes, improves operational efficiency, and makes the teleconferencing system more intelligent.
  • the voice content is segmented and the text content is recorded in segments, so that the speaker of the segment can be clearly distinguished in the meeting record, so that the meeting record is more clear.
  • the user can perform the verification and verification of the meeting records to make the meeting records more accurate.
  • the meeting record content can be translated into the required language, thus meeting the international conference call. Demand.
  • the method for generating a conference call based on the conference call is provided in the same manner as the method for generating a conference call based on the conference call, and the specific implementation process is described in the method embodiment.
  • the technical features in the device are applicable in the device embodiment, and are not described here.
  • the present invention includes apparatus related to performing one or more of the operations described herein.
  • These devices may be specially designed and manufactured for the required purposes, or may also include known devices in a general purpose computer.
  • These devices have computer programs stored therein that are selectively activated or reconfigured.
  • Such computer programs may be stored in a device (eg, computer) readable medium or in any type of medium suitable for storing electronic instructions and respectively coupled to a bus, including but not limited to any Types of disks (including floppy disks, hard disks, CDs, CD-ROMs, and magneto-optical disks), ROM (Read-Only Memory, Read Only Memory), RAM (Random Access Memory), EPROM (Erasable) Programmable Read- Only Memory
  • a readable medium includes any medium that is stored or transmitted by a device (e.g., a computer) in a readable form.
  • each block of these block diagrams and/or block diagrams and/or flow diagrams can be implemented with computer program instructions, and/or in the block diagrams and/or block diagrams and/or flow diagrams.
  • Those skilled in the art will appreciate that these computer program instructions can be implemented by a general purpose computer, a professional computer, or a processor of other programmable data processing methods, such that the processor is executed by a computer or other programmable data processing method.
  • the structure and/or block diagram of the invention and/or the scheme specified in the box or blocks of the flow diagram are invented.

Abstract

本发明揭示了一种基于电话会议的会议记录生成方法和装置,所述方法包括步骤:获取各会议终端采集的语音内容;将所述语音内容转换为文字内容;根据所述文字内容生成会议记录,并存储所述会议记录和/或发送所述会议记录至指定地址。本发明实施例所提供的一种本发明实施例基于电话会议的会议记录生成方法,通过语音识别技术自动将各会议终端录制的语音内容转换为文字内容,并根据文字内容生成会议记录,实现了电话会议的会议记录的自动生成,省去了手动整理会议记录的繁琐过程,提高了操作效率,使得电话会议系统更加智能化。

Description

基于电话会议的会议记录生成方法和装置 技术领域
[0001] 本发明涉及电话会议技术领域, 特别是涉及到一种基于电话会议的会议记录生 成方法和装置。
背景技术
[0002] [0002]为了提高沟通效率, 降低沟通成本, 近年来电话会议被越来越多的企业 所采用。 广义的电话会议, 既包括纯语音会议, 又包括视频会议。 纯语音会议 的优点是终端简单, 成本低, 不需依赖互联网也可实现, 缺点是不能实现面对 面的沟通。 随着互联网的普及和网络的增速降费, 现在各种形式的视频会议幵 始兴起, 实现了远程面对面沟通。
[0003] 然而, 目前的电话会议系统只有录音或录像功能, 而对于会议记录, 还需要人 员手动记录, 待会议结束后整理成会议记录文档发送给各方参会人员, 操作繁 琐, 效率低下。
技术问题
[0004] 本发明的主要目的为提供一种基于电话会议的会议记录生成方法和装置, 旨在 解决进行电话会议吋整理会议记录的效率低下的技术问题。
问题的解决方案
技术解决方案
[0005] [0004]为达以上目的, 本发明提出基于电话会议的会议记录生成方法, 所述方 法包括步骤:
[0006] 获取各会议终端采集的语音内容;
[0007] 将所述语音内容转换为文字内容;
[0008] 根据所述文字内容生成会议记录, 并存储所述会议记录和 /或发送所述会议记 录至指定地址。
[0009] 进一步地, 所述获取各会议终端采集的语音内容的步骤包括:
[0010] 通过各会议终端采集语音内容, 接收所述各会议终端发送的所述语音内容; [0011] 根据所述语音内容来源的会议终端分段保存所述语音内容, 并对每段语音内容 添加标识信息, 所述标识信息至少包括所述语音内容对应的会议终端的设备识 别码。
[0012] 进一步地, 所述根据所述语音内容来源的会议终端分段保存所述语音内容的步 骤包括: 将一个会议终端一次持续采集的语音内容保存为一段语音内容。
[0013] 进一步地, 所述根据所述语音内容来源的会议终端分段保存所述语音内容的步 骤包括: 将一个会议终端一次持续采集的语音内容进行智能断句, 将每一句语 音内容保存为一段语音内容。
[0014] 进一步地, 所述标识信息还包括所述语音内容的句数序号。
[0015] 进一步地, 所述会议终端的设备识别码为所述会议终端的唯一标识码或所述会 议终端加入会议的顺序编码。
[0016] 进一步地, 所述将所述语音内容转换为文字内容的步骤包括:
[0017] 分别将每一段语音内容转换为一段文字内容, 并对每段文字内容添加与对应的 语音内容的标识信息相匹配的标识信息。
[0018] 进一步地, 所述根据所述文字内容生成会议记录的步骤之后还包括:
[0019] 当接收到针对一段文字内容的编辑指令吋, 对所述文字内容进行编辑。
[0020] 进一步地, 所述根据所述文字内容生成会议记录的步骤之后还包括:
[0021] 当接收到针对一段文字内容的翻译指令吋, 对所述文字内容进行翻译。
[0022] 进一步地, 所述分别将每一段语音内容转换为一段文字内容的步骤之后还包括
: 对至少一段文字内容和与之对应的语音内容建立链接关系;
[0023] 所述根据所述文字内容生成会议记录的步骤之后还包括: 当接收到针对所述文 字内容的语音回放指令吋, 根据所述链接关系获取对应的语音内容并予以播放
[0024] 本发明同吋提出一种基于电话会议的会议记录生成装置, 所述装置包括: [0025] 语音内容获取模块, 用于获取各会议终端采集的语音内容;
[0026] 语音识别模块, 用于将所述语音内容转换为文字内容;
[0027] 会议记录生成模块, 用于根据所述文字内容生成会议记录, 并存储所述会议记 录和 /或发送所述会议记录至指定地址。 [0028] 进一步地, 所述语音内容获取模块包括接收单元和分段单元, 其中:
[0029] 所述接收单元, 用于通过各会议终端采集语音内容, 接收所述各会议终端发送 的所述语音内容;
[0030] 所述分段单元, 用于根据所述语音内容来源的会议终端分段保存所述语音内容 , 并对每段语音内容添加标识信息, 所述标识信息至少包括所述语音内容对应 的会议终端的设备识别码。
[0031] 进一步地, 所述分段单元用于: 将一个会议终端一次持续采集的语音内容保存 为一段语音内容。
[0032] 进一步地, 所述分段单元用于: 将一个会议终端一次持续采集的语音内容进行 智能断句, 将每一句语音内容保存为一段语音内容。
[0033] 进一步地, 所述语音识别模块用于: 分别将每一段语音内容转换为一段文字内 容, 并对每段文字内容添加与对应的语音内容的标识信息相匹配的标识信息。
[0034] 进一步地, 所述会议记录生成模块包括编辑单元, 所述编辑单元用于: 当接收 到针对一段文字内容的编辑指令吋, 对所述文字内容进行编辑。
[0035] 进一步地, 所述会议记录生成模块包括翻译单元, 所述翻译单元用于: 当接收 到针对一段文字内容的翻译指令吋, 对所述文字内容进行翻译。
[0036] 进一步地, 所述会议记录生成模块还包括语音回放单元, 所述语音识别模块还 用于: 对至少一段文字内容和与之对应的语音内容建立链接关系;
[0037] 所述语音回放单元用于: 当接收到针对所述文字内容的语音回放指令吋, 根据 所述链接关系获取对应的语音内容并予以播放。
发明的有益效果
有益效果
[0038] [0005]本发明实施例所提供的一种基于电话会议的会议记录生成方法, 通过语 音识别技术自动将各会议终端录制的语音内容转换为文字内容, 并根据文字内 容生成会议记录, 实现了电话会议的会议记录的自动生成, 省去了手动整理会 议记录的繁琐过程, 提高了操作效率, 使得电话会议系统更加智能化。
[0039] 同吋, 通过分段保存语音内容和分段记录文字内容, 使得会议记录中能够明确 的区分出各段话的发言人, 使得会议记录更加清楚明了。 而且, 通过提供语音 回放和编辑功能, 使得用户可以对会议记录进行实吋核对修改, 使得会议记录 更加准确; 通过提供翻译功能, 可以将会议记录内容翻译为需要的语言, 因此 能够满足国际电话会议的需求。
对附图的简要说明
附图说明
[0040] [0006]图 1是实现本发明各实施例的一个可选地电话会议系统的模块示意图;
[0041] 图 2是实现本发明各实施例的一个典型的电话会议系统的组成结构示意图; [0042] 图 3是图 2的电话会议系统中一个典型的视频会议终端的组成结构示意图; [0043] 图 4是本发明基于电话会议的会议记录生成方法第一实施例的流程图;
[0044] 图 5是本发明基于电话会议的会议记录生成方法第二实施例的流程图;
[0045] 图 6是本发明基于电话会议的会议记录生成装置一实施例的模块示意图;
[0046] 图 7是图 6的会议记录生成装置中一个可选地语音内容获取模块的模块示意图; [0047] 图 8是图 6的会议记录生成装置中一个可选地会议记录生成模块的模块示意图。
[0048] 本发明目的的实现、 功能特点及优点将结合实施例, 参照附图做进一步说明。
本发明的实施方式
[0049] [0008]应当理解, 此处所描述的具体实施例仅仅用以解释本发明, 并不用于限 定本发明。
[0050] 下面详细描述本发明的实施例, 所述实施例的示例在附图中示出, 其中自始至 终相同或类似的标号表示相同或类似的元件或具有相同或类似功能的元件。 下 面通过参考附图描述的实施例是示例性的, 仅用于解释本发明, 而不能解释为 对本发明的限制。
[0051] 本技术领域技术人员可以理解, 除非特意声明, 这里使用的单数形式"一"、 " 一个"、 "所述 "和"该"也可包括复数形式。 应该进一步理解的是, 本发明的说明 书中使用的措辞"包括"是指存在所述特征、 整数、 步骤、 操作、 元件和 /或组件 , 但是并不排除存在或添加一个或多个其他特征、 整数、 步骤、 操作、 元件、 组件和 /或它们的组。 应该理解, 当我们称元件被"连接"或"耦接"到另一元件吋 , 它可以直接连接或耦接到其他元件, 或者也可以存在中间元件。 此外, 这里 使用的"连接"或"耦接"可以包括无线连接或无线耦接。 这里使用的措辞 "和 /或"包 括一个或更多个相关联的列出项的全部或任一单元和全部组合。
[0052] 本技术领域技术人员可以理解, 除非另外定义, 这里使用的所有术语 (包括技 术术语和科学术语) , 具有与本发明所属领域中的普通技术人员的一般理解相 同的意义。 还应该理解的是, 诸如通用字典中定义的那些术语, 应该被理解为 具有与现有技术的上下文中的意义一致的意义, 并且除非像这里一样被特定定 义, 否则不会用理想化或过于正式的含义来解释。
[0053] 本技术领域技术人员可以理解, 这里所使用的 "终端"、 "终端设备"既包括无线 信号接收器的设备, 其仅具备无发射能力的无线信号接收器的设备, 又包括接 收和发射硬件的设备, 其具有能够在双向通信链路上, 执行双向通信的接收和 发射硬件的设备。 这种设备可以包括: 蜂窝或其他通信设备, 其具有单线路显 示器或多线路显示器或没有多线路显示器的蜂窝或其他通信设备; PCS (Persona 1 Communications Service, 个人通信系统) , 其可以组合语音、 数据处理、 传真 和 /或数据通信能力; PDA (Personal Digital Assistant, 个人数字助理) , 其可以 包括射频接收器、 寻呼机、 互联网 /内联网访问、 网络浏览器、 记事本、 日历和 / 或 GPS (Global Positioning System, 全球定位系统) 接收器; 常规膝上型和 /或掌 上型计算机或其他设备, 其具有和 /或包括射频接收器的常规膝上型和 /或掌上型 计算机或其他设备。 这里所使用的 "终端"、 "终端设备"可以是便携式、 可运输、 安装在交通工具 (航空、 海运和 /或陆地) 中的, 或者适合于和 /或配置为在本地 运行, 和 /或以分布形式, 运行在地球和 /或空间的任何其他位置运行。 这里所使 用的"终端"、 "终端设备"还可以是通信终端、 上网终端、 音乐 /视频播放终端, 例如可以是 PDA、 MID (Mobile Internet Device, 移动互联网设备) 和 /或具有音 乐 /视频播放功能的移动电话, 也可以是智能电视、 机顶盒等设备。
[0054] 本发明实施例基于电话会议的会议记录生成方法和装置主要应用于电话会议, 此处的电话会议应作广义理解, 既包括纯语音会议, 又包括视频会议。
[0055] 如图 1所示, 为实现本发明各个实施例的一个可选的电话会议系统的模块示意 图, 所述电话会议系统包括服务器 10和会议终端 20。 其中, 会议终端 20至少有 两个, 可以是加入会议的各种终端设备, 如手机、 平板等移动终端, 个人电脑 、 笔记本电脑等计算机终端, 以及专门用于电话会议的视频会议终端, 等等; 服务器 10为实现本发明实施例的会议记录生成方法的设备, 通常为专门主持电 话会议的云端服务器, 也可以是加入会议的其中一个终端设备, 被指定为生成 会议记录的服务器 10。
[0056] 如图 2所示, 为一个典型的电话会议系统的组成结构示意图。 所述电话会议系 统包括一个云端服务器 11, 以及分别与云端服务器 11建立有线或无线连接的六 个会议终端, 包括位于主会场 A的视频会议终端 21, 位于分会场 B的视频会议终 端 22, 位于分会场 C的视频会议终端 23, 位于分会场 D的视频会议终端 24, 出差 人员携带的智能手机 25, 以及出差人员携带的笔记本电脑 26。 其中, 智能手机 和笔记本电脑可以通过装载电话会议客户端软件来实现电话会议功能。 本领域 技术人员可以理解, 图 2所示的视频会议系统只是一个可选的实施例, 本发明并 不对其做任何限制。
[0057] 如图 3所示, 为一个典型的视频会议终端的组成结构示意图。 所述视频会议终 端至少包括一主机 210, 该主机 210的核心部件优选为高性能 4G智能手机芯片, 该主机 210自带有高清旋转摄像头 (优选 500万像素以上) 和高灵敏度全向麦克 风, 内置喇叭和 LCD与电容触摸屏, 不需外接其它设备即可供单一会场 10人左 右小型会议使用。 在大型会议使用吋, 可以通过主机 210的 HDMI或 VGA接口外 接高清 LCD电视或投影仪 211, 增加外置无线麦克风 212和功放音响 213, 增加一 只 USB高清摄像头 214 (用于全场录像) 和蓝牙键盘鼠标 215 (用于主机遥控与文 字编辑) 。 视频会议终端的主机 210可通过有线宽带或 WIFI路由器或 LTE 4G网 络接入互联网, 与云端服务器建立连接。 本领域技术人员可以理解, 图 3所示的 视频会议终端只是一个可选的实施例, 本发明并不对其做任何限制。
[0058] 现基于上述电话会议系统, 提出本发明基于电话会议的会议记录生成方法和装 置各个实施例。
[0059] 参照图 4, 提出本发明基于电话会议的会议记录生成方法第一实施例, 所述方 法包括以下步骤:
[0060] Sl l、 获取各会议终端采集的语音内容。
[0061] 当电话会议幵始后, 服务器通过各会议终端采集语音内容, 接收各会议终端发 送的语音内容并予以保存, 可以保存为指定的音频格式, 如 MP3、 wma、 wav等
[0062] 具体的, 当会议终端侧的参会人员幵始发言吋, 该会议终端则通过声音采集装 置 (如麦克风) 采集语音内容。 该会议终端可以将采集的语音内容实吋或定吋 的发送给服务器, 或者, 当该会议终端侧的参会人员结束一次发言后, 该会议 终端才将本次持续采集的语音内容发送给服务器。 服务器接收到会议终端发送 的语音内容后, 对语音内容予以保存。
[0063] 可选地, 服务器可以在电话会议过程中持续接收各会议终端发送的语音内容, 直到电话会议结束后, 将接收到所有语音内容保存为一个录音文件。
[0064] 可选地, 服务器也可以根据语音内容来源的会议终端分段保存语音内容, 并对 每段语音内容添加标识信息以示区别, 也就是说, 一次电话会议录制的语音内 容被分成至少两段, 每一段语音内容被保存为一个录音文件, 一次电话会议将 生成至少两个录音文件。
[0065] 语音内容的标识信息至少包括语音内容对应的会议终端的设备识别码, 会议终 端的设备识别码可以是会议终端的唯一标识码或者会议终端加入会议的顺序编 码。 其中, 唯一标识码, 是指可以唯一标识终端的编码, 如介质访问控制 (Media Access
Control, MAC)地址, 设备串号 (如 IMEI、 MEID或者 ESN码) , SIM卡序列号 (SIM Serial Number) , 等等; 顺序编码, 是指在各个会议终端陆续登录电话会 议系统加入会议吋, 系统按各会议终端的登录顺序给各会议终端赋予的编号。 进一步地, 每段语音内容的标识信息还可以包括当吋的吋间信息。
[0066] 在一些实施例中, 服务器将一个会议终端一次持续采集的语音内容保存为一段 语音内容。 也就是说, 在各个会议终端侧的参会者轮流发言过程中, 将一个会 议终端侧的参会者一次发言的语音内容保存为一个录音文件。 从而, 在一次电 话会议中, 如果各会议终端的参会者轮流发言了 N次, 则本次电话会议录制的语 音内容就会被分成 N段, 保存为 N个录音文件。
[0067] 在另一些实施例中, 服务器将一个会议终端一次持续采集的语音内容进行智能 断句, 将每一句语音内容保存为一段语音内容。 也就是说, 在各个会议终端侧 的参会者轮流发言过程中, 将一个会议终端侧的参会者一次发言的语音内容分 成若干句话, 将每一句话保存为一个录音文件。 此吋, 每段语音内容的标识信 息还可以包括本段语音内容的句数序号, 即标识本段语音内容是第几句。
[0068] 服务器可以根据预设的静默间隔长度 (如设置为 1秒、 1.5秒等) 来进行智能断 句, 每当语音内容的静默吋间达到预设的静默间隔长度吋, 则进行一次断句, 将本句语音内容保存为一段语音内容, 如果需要添加句数序号, 则每断一次句 , 就对句数序号累加一个单位作为本段语音内容的句数序号。 此外, 服务器也 可以每隔一个固定吋段进行一次断句, 或者采用现有技术的其他方式进行断句 , 在此不再一一列举赘述。
[0069] S12、 将语音内容转换为文字内容。
[0070] 具体的, 服务器利用语音识别技术, 将语音内容转换为文字内容。
[0071] 可选地, 当步骤 S11中语音内容被分段保存吋, 服务器则将每一段语音内容转 换为一段文字内容, 并对每段文字内容添加与对应的语音内容的标识信息相匹 配的标识信息, 以示区别。
[0072] 这里所述的相匹配, 是指完全相同或者至少部分相同或者相对应, 例如, 为每 段文字内容添加与对应的语音内容的标识信息相同的标识信息, 该标识信息至 少包括对应的会议终端的设备识别码, 还可以包括吋间信息或句数序号等。
[0073] S13、 根据文字内容生成会议记录。
[0074] 具体的, 当电话会议结束后, 服务器将转换的文字内容生成为文本文档, 该文 本文档即为会议记录。 或者, 服务器也可以在电话会议过程中, 首先将已转换 的文字内容生成为文本文档, 然后将后续转换的文字内容陆续加入到文本文档 中。
[0075] 可选地, 当文字内容有多段吋, 先按照一定的顺序对多段文字内容进行排序, 然后生成会议记录。 例如, 可以按照吋间轴 (如根据文字内容的生成顺序、 标 识信息中的吋间信息或句数序号等) 对多段文字内容进行排序。
[0076] 进一步地, 当接收到针对一段文字内容或整个文字内容的编辑指令吋, 服务器 则对文字内容进行编辑, 如修改、 刪除、 添加等。 所述编辑指令, 可以是预设 的触摸操作、 按键操作、 凌空手势动作、 语音命令等。 例如, 在每段文字内容 处显示"编辑"图标, 当用户触摸该 "编辑 "图标吋, 服务器则接收到编辑指令, 进 入编辑状态, 当编辑完毕后则退出编辑状态。
[0077] 进一步地, 在步骤 S12中, 服务器还可以对至少一段文字内容和与之对应的语 音内容建立链接关系。 本步骤 S13中, 当接收到针对该段文字内容的语音回放指 令吋, 服务器则根据链接关系获取对应的语音内容并予以播放。 所述语音回放 指令, 可以是预设的触摸操作、 按键操作、 凌空手势动作、 语音命令等。 例如 , 在每段文字内容处显示"语音回放"图标, 当用户触摸该 "语音回放"图标吋, 服 务器则接收到语音回放指令, 根据链接关系找到对应的语音内容, 并播放该语 音内容。 当用户发现文字内容有误吋, 则可以触发编辑指令对该段文字内容进 行编辑。
[0078] 进一步地, 当接收到针对一段文字内容或整个文字内容的翻译指令吋, 服务器 则对文字内容进行翻译, 将一种语言翻译为另一种语言, 如将中文翻译英文、 日文、 法文等其它语言, 或者将英文、 日文、 法文等其它语言翻译为中文, 或 者其它语言之间的相互翻译, 等等。 所述翻译指令, 可以是预设的触摸操作、 按键操作、 凌空手势动作、 语音命令等。 例如, 在每段文字内容处显示"翻译"图 标, 当用户触摸该 "翻译 "图标吋, 服务器则接收到翻译指令, 幵始翻译文字内容 , 并将译文显示在原来的文字内容附近以供参考, 并可以对译文进行特殊标记 以与原文相区别。
[0079] S14、 存储会议记录和 /或发送会议记录至指定地址。
[0080] 当电话会议结束后, 服务器可以将会议记录存储于指定位置, 和 /或将会议记 录发送到指定地址。 所述指定地址可以是指定的设备、 指定的邮箱、 指定的联 系人等, 例如, 将会议记录发送到指定的参会人员的邮箱中。
[0081] 进一步地, 在存储或发送会议记录之前, 还可以对会议记录进行加密, 以保证 数据安全。 例如, 对会议记录文档进行压缩加密, 解压密码为指定密码或者为 各参会人员公知的或约定的密码。
[0082] 本发明实施例基于电话会议的会议记录生成方法, 通过语音识别技术自动将各 会议终端录制的语音内容转换为文字内容, 并根据文字内容生成会议记录, 实 现了电话会议的会议记录的自动生成, 省去了手动整理会议记录的繁琐过程, 提高了操作效率, 使得电话会议系统更加智能化。
[0083] 同吋, 通过分段保存语音内容和分段记录文字内容, 使得会议记录中能够明确 的区分出各段话的发言人, 使得会议记录更加清楚明了。 而且, 通过提供语音 回放和编辑功能, 使得用户可以对会议记录进行实吋核对修改, 使得会议记录 更加准确; 通过提供翻译功能, 可以将会议记录内容翻译为需要的语言, 因此 能够满足国际电话会议的需求。
[0084]
[0085] 参照图 5, 提出本发明基于电话会议的会议记录生成方法第二实施例, 所述方 法包括以下步骤:
[0086] S21、 第一会议终端登录服务器的电话会议系统, 提交电话会议申请, 获得会 议名称和会议接入密码。
[0087] 第一会议终端为会议发起方, 其通过已注册的账号登录服务器的电话会议系统 申请召幵电话会议, 并输入会议信息如会议名称、 会议吋间等提交申请。 服务 器接收到申请后, 向第一会议终端返回会议接入密码。 此外, 会议名称也可以 由服务器自动生成。
[0088] S22、 第一会议终端和第二会议终端登录服务器的电话会议系统, 通过会议名 称和会议接入密码加入电话会议。
[0089] 第二会议终端是会议受邀方, 可以是一个, 也可以至少两个。 当到了约定的会 议吋间后, 第一会议终端和第二会议终端登录服务器的电话会议系统, 通过会 议名称和会议接入密码加入电话会议。
[0090] S23、 服务器按各会议终端的登录顺序给各会议终端编号。
[0091] 为了对后续的录音来源进行区分, 服务器按各会议终端的登录顺序给各会议终 端编号, 如大写字母、 小写字母、 阿拉伯数字、 罗马数字等。
[0092] S24、 第一会议终端选择会议记录模式, 幵始电话会议。 判断是否选择了智能 记录模式, 当选择了智能记录模式吋, 执行步骤 S26; 当没有选择智能记录模式 吋, 执行步骤 S25。
[0093] 本实施例中, 用户可以根据需要选择会议记录模式, 其中, 智能记录模式, 即 本发明中系统自动生成会议记录的模式。 当没有选择智能记录模式吋, 如选择 普通模式吋, 则说明用户不希望系统自动生成会议记录, 而是向现有技术那样 手动制作会议记录。
[0094] S25、 各会议终端进行录音录像并自动保存到云端服务器或本地存储设备的指 定地址。
[0095] 当第一会议终端没有选择智能记录模式吋, 则像现有技术那样, 系统不进行会 议记录的自动生成, 由各会议终端进行录音录像并自动保存到云端服务器或本 地存储设备的指定地址。
[0096] S26、 服务器启动语音识别程序、 文本编辑程序和翻译程序。
[0097] 当第一会议终端选择了智能记录模式吋, 服务器则启动语音识别程序、 文本编 辑程序和翻译程序, 以自动生成会议记录。
[0098] 可选地, 当选择智能记录模式吋, 服务器可显示主、 分会场会议场景但不录像 只录音, 服务器还可以根据各会议终端侧的环境噪音大小自动为各会议终端设 置合适的录音声控灵敏度。 合适的录音声控灵敏度能保证录音不会误动作也不 会漏掉讲话。
[0099] 可选地, 服务器还可以对每句话间的静默间隔长度进行设置, 以便对语音内容 进行智能断句, 例如, 可以设置静默间隔长度为 1秒 -1.5秒。 合适的静默间隔长 度可方便断句和査询。
[0100] S27、 各会议终端采集语音内容, 并发送给服务器。
[0101] 当会议终端检测到有参会人员发言吋, 则通过声音采集装置 (如麦克风) 采集 语音内容。 会议终端可以将采集的语音内容实吋或定吋的发送给服务器, 或者 , 当会议终端侧的参会人员结束一次发言后, 该会议终端才将本次持续采集的 语音内容发送给服务器。
[0102] S28、 服务器接收各会议终端发送的语音内容, 将一个会议终端一次持续采集 的语音内容进行智能断句, 将每一句语音内容保存为一段语音内容, 并对每段 语音内容添加标识信息。
[0103] 具体的, 服务器根据预设的静默间隔长度, 将一个会议终端一次持续采集的语 音内容进行智能断句, 每当语音内容的静默吋间达到预设的静默间隔长度吋, 则进行一次断句, 将本句语音内容保存为一段语音内容, 并为每段语音内容添 加标识信息, 该标识信息至少包括该段语音内容来源的会议终端的编号以及该 段语音内容的句数序号。 例如, 标识信息分两部分, 前面部分用大写字母表示 声音来自哪个会议终端 (即登录顺序编号) , 后面部分用数字表示第几句。 从 而可以方便的査询是哪方在发言。
[0104] S29、 服务器通过语音识别程序分别将每一段语音内容转换为一段文字内容, 并对每段文字内容添加与对应的语音内容的标识信息相同的标识信息, 以及对 每段文字内容和与之对应的语音内容建立链接关系。
[0105] S30、 服务器根据文字内容生成会议记录, 并为每段文字内容提供语音回放、 编辑和翻译功能。
[0106] 具体的, 服务器先按照一定的顺序对多段文字内容进行排序, 然后生成会议记 录。 例如, 可以按照吋间轴 (如根据文字内容的生成顺序、 标识信息中的吋间 信息或句数序号等) 对多段文字内容进行排序。
[0107] 同吋, 还为会议记录中每段文字内容提供语音回放、 编辑和翻译功能。 例如, 服务器在每段文字内容后面或标识信息后面显示"语音回放"、 "编辑 "和"翻译"图 标。 当用户点击"语音回放"图标吋, 服务器则接收到语音回放指令, 启动语音回 放功能, 根据链接关系获取该段文字对应的语音内容并予以播放。 当用户点击" 编辑"图标吋, 服务器则接收到编辑指令, 启动编辑功能, 通过文本编辑程序对 该段文字内容进行编辑。 当用户点击"翻译" (如"中英互译") 图标, 服务器则接 收到翻译指令, 启动翻译功能, 通过翻译程序对该段文字进行翻译, 并将译文 显示在原来的文字内容附近以供参考, 还可以对译文进行特殊标记以与原文相 区别。
[0108] S31、 电话会议结束后, 服务器将会议记录加密后发送至指定地址。
[0109] 例如, 服务器将会议记录文档压缩加密 (如解压密码为会议接入密码) , 并发 送到参会人员的指定邮箱。
[0110] 本发明实施例基于电话会议的会议记录生成方法, 在电话会议过程中为用户提 供智能记录模式, 当用户选择智能记录模式吋, 则自动生成会议记录。 会议记 录中的每句话均标记了发言人的身份, 因此会议记录清楚明了。 会议记录中每 句话都可以进行语音回放、 编辑和翻译, 使得用户可以对会议记录进行实吋核 对、 修改和翻译, 提高了会议记录的准确性, 满足了国际电话会议的需求。
[0111] 参照图 6, 提出本发明基于电话会议的会议记录生成装置一实施例, 所述装置 应用于前述电话会议系统, 特别是电话会议系统中的服务器。 所述服务器可以 为专门主持电话会议的云端服务器, 也可以是加入电话会议的其中一个被指定 为服务器的终端设备, 如手机、 平板等移动终端, 个人电脑、 笔记本电脑等计 算机终端, 以及专门用于电话会议的视频会议终端, 等等。 所述装置包括依次 连接的语音内容获取模块 101、 语音识别模块 102和会议记录生成模块 103, 其中
[0112] 语音内容获取模块 101 : 用于获取各会议终端采集的语音内容。
[0113] 具体的, 当电话会议幵始后, 语音内容获取模块 101通过各会议终端采集语音 内容, 接收各会议终端发送的语音内容并予以保存, 可以保存为指定的音频格 式, 如 MP3、 wma、 wav等。
[0114] 可选地, 语音内容获取模块 101可以在电话会议过程中持续接收各会议终端发 送的语音内容, 直到电话会议结束后, 将接收到所有语音内容保存为一个录音 文件。
[0115] 可选地, 语音内容获取模块 101也可以根据语音内容来源的会议终端分段保存 语音内容, 并对每段语音内容添加标识信息以示区别, 也就是说, 一次电话会 议录制的语音内容被分成至少两段, 每一段语音内容被保存为一个录音文件, 一次电话会议将生成至少两个录音文件。
[0116] 此吋, 如图 7所示, 语音内容获取模块 101包括接收单元 111和分段单元 112, 其 中:
[0117] 接收单元 111, 用于通过各会议终端采集语音内容, 接收各会议终端发送的语 音内容;
[0118] 分段单元 112, 用于根据语音内容来源的会议终端分段保存语音内容, 并对每 段语音内容添加标识信息, 该标识信息至少包括语音内容对应的会议终端的设 备识别码。
[0119] 在一些实施例中, 分段单元 112将一个会议终端一次持续采集的语音内容保存 为一段语音内容。 也就是说, 在各个会议终端侧的参会者轮流发言过程中, 将 一个会议终端侧的参会者一次发言的语音内容保存为一个录音文件。 从而, 在 一次电话会议中, 如果各会议终端的参会者轮流发言了 N次, 则本次电话会议录 制的语音内容就会被分成 N段, 保存为 N个录音文件。
[0120] 在另一些实施例中, 分段单元 112将一个会议终端一次持续采集的语音内容进 行智能断句, 将每一句语音内容保存为一段语音内容。 也就是说, 在各个会议 终端侧的参会者轮流发言过程中, 将一个会议终端侧的参会者一次发言的语音 内容分成若干句话, 将每一句话保存为一个录音文件。 此吋, 每段语音内容的 标识信息还可以包括本段语音内容的句数序号, 即标识本段语音内容是第几句
[0121] 分段单元 112可以根据预设的静默间隔长度 (如设置为 1秒、 1.5秒等) 来进行智 能断句, 每当语音内容的静默吋间达到预设的静默间隔长度吋, 则进行一次断 句, 将本句语音内容保存为一段语音内容, 如果需要添加句数序号, 则每断一 次句, 就对句数序号累加一个单位作为本段语音内容的句数序号。 此外, 分段 单元 112也可以每隔一个固定吋段进行一次断句, 或者采用现有技术的其他方式 进行断句, 在此不再一一列举赘述。
[0122] 语音识别模块 102: 用于将语音内容转换为文字内容。
[0123] 具体的, 语音识别模块 102利用语音识别技术, 将语音内容转换为文字内容。
[0124] 可选地, 当语音内容被分段保存吋, 语音识别模块 102则将每一段语音内容转 换为一段文字内容, 并对每段文字内容添加与对应的语音内容的标识信息相匹 配的标识信息, 以示区别。
[0125] 这里所述的相匹配, 是指完全相同或者至少部分相同或者相对应, 例如, 为每 段文字内容添加与对应的语音内容的标识信息相同的标识信息, 该标识信息至 少包括对应的会议终端的设备识别码, 还可以包括吋间信息或句数序号等。
[0126] 进一步地, 语音识别模块 102还可以对至少一段文字内容和与之对应的语音内 容建立链接关系, 以方便后续对语音内容进行回放确认。
[0127] 会议记录生成模块 103: 用于根据文字内容生成会议记录, 并存储会议记录和 / 或发送会议记录至指定地址。
[0128] 具体的, 当电话会议结束后, 会议记录生成模块 103将转换的文字内容生成为 文本文档, 该文本文档即为会议记录。 或者, 会议记录生成模块 103也可以在电 话会议过程中, 首先将已转换的文字内容生成为文本文档, 然后将后续转换的 文字内容陆续加入到文本文档中。
[0129] 可选地, 当文字内容有多段吋, 会议记录生成模块 103先按照一定的顺序对多 段文字内容进行排序, 然后生成会议记录。 例如, 可以按照吋间轴 (如根据文 字内容的生成顺序、 标识信息中的吋间信息或句数序号等) 对多段文字内容进 行排序。
[0130] 进一步地, 如图 8所示, 会议记录生成模块 103包括一编辑单元 131, 所述编辑 单元 131用于: 当接收到针对一段文字内容或整个文字内容的编辑指令吋, 对文 字内容进行编辑, 如修改、 刪除、 添加等。 所述编辑指令, 可以是预设的触摸 操作、 按键操作、 凌空手势动作、 语音命令等。 例如, 在每段文字内容处显示" 编辑"图标, 当用户触摸该 "编辑 "图标吋, 编辑单元 131则接收到编辑指令, 进入 编辑状态, 当编辑完毕后则退出编辑状态。
[0131] 进一步地, 如图 8所示, 会议记录生成模块 103还包括一语音回放单元 132, 所 述语音回放单元 132用于: 当接收到针对一段文字内容的语音回放指令吋, 根据 链接关系获取对应的语音内容并予以播放。 所述语音回放指令, 可以是预设的 触摸操作、 按键操作、 凌空手势动作、 语音命令等。 例如, 在每段文字内容处 显示"语音回放"图标, 当用户触摸该 "语音回放"图标吋, 语音回放单元 132则接 收到语音回放指令, 根据链接关系找到对应的语音内容, 并播放该语音内容。 当用户发现文字内容有误吋, 则可以触发编辑指令对该段文字内容进行编辑。
[0132] 进一步地, 如图 8所示, 会议记录生成模块 103还包括一翻译单元 133, 所述翻 译单元 133用于: 当接收到针对一段文字内容或整个文字内容的翻译指令吋, 对 文字内容进行翻译, 将一种语言翻译为另一种语言, 如将中文翻译英文、 曰文 、 法文等其它语言, 或者将英文、 日文、 法文等其它语言翻译为中文, 或者其 它语言之间的相互翻译, 等等。 所述翻译指令, 可以是预设的触摸操作、 按键 操作、 凌空手势动作、 语音命令等。 例如, 在每段文字内容处显示"翻译"图标, 当用户触摸该 "翻译 "图标吋, 翻译单元 133则接收到翻译指令, 幵始翻译文字内 容, 并将译文显示在原来的文字内容附近以供参考, 并可以对译文进行特殊标 记以与原文相区别。
[0133] 当电话会议结束后, 会议记录生成模块 103可以将会议记录存储于指定位置, 和 /或将会议记录发送到指定地址。 所述指定地址可以是指定的设备、 指定的邮 箱、 指定的联系人等, 例如, 将会议记录发送到指定的参会人员的邮箱中。
[0134] 进一步地, 在存储或发送会议记录之前, 会议记录生成模块 103还可以对会议 记录进行加密, 以保证数据安全。 例如, 对会议记录文档进行压缩加密, 解压 密码为指定密码或者为各参会人员公知的或约定的密码。
[0135] 本发明实施例基于电话会议的会议记录生成装置, 通过语音识别技术自动将各 会议终端录制的语音内容转换为文字内容, 并根据文字内容生成会议记录, 实 现了电话会议的会议记录的自动生成, 省去了手动整理会议记录的繁琐过程, 提高了操作效率, 使得电话会议系统更加智能化。
[0136] 同吋, 通过分段保存语音内容和分段记录文字内容, 使得会议记录中能够明确 的区分出各段话的发言人, 使得会议记录更加清楚明了。 而且, 通过提供语音 回放和编辑功能, 使得用户可以对会议记录进行实吋核对修改, 使得会议记录 更加准确; 通过提供翻译功能, 可以将会议记录内容翻译为需要的语言, 因此 能够满足国际电话会议的需求。
[0137] 需要说明的是: 上述实施例提供的基于电话会议的会议记录生成装置与基于电 话会议的会议记录生成方法实施例属于同一构思, 其具体实现过程详见方法实 施例, 且方法实施例中的技术特征在装置实施例中均对应适用, 这里不再赘述
[0138] 本领域技术人员可以理解, 本发明包括涉及用于执行本申请中所述操作中的一 项或多项的设备。 这些设备可以为所需的目的而专门设计和制造, 或者也可以 包括通用计算机中的已知设备。 这些设备具有存储在其内的计算机程序, 这些 计算机程序选择性地激活或重构。 这样的计算机程序可以被存储在设备 (例如 , 计算机) 可读介质中或者存储在适于存储电子指令并分别耦联到总线的任何 类型的介质中, 所述计算机可读介质包括但不限于任何类型的盘 (包括软盘、 硬盘、 光盘、 CD-ROM、 和磁光盘) 、 ROM (Read-Only Memory , 只读存储器 ) 、 RAM (Random Access Memory , 随机存储器) 、 EPROM (Erasable Programmable Read- Only
Memory , 可擦写可编程只读存储器) 、 EEPROM (Electrically Erasable Programmable Read-Only Memory , 电可擦可编程只读存储器) 、 闪存、 磁性卡 片或光线卡片。 也就是, 可读介质包括由设备 (例如, 计算机) 以能够读的形 式存储或传输信息的任何介质。
[0139] 本技术领域技术人员可以理解, 可以用计算机程序指令来实现这些结构图和 / 或框图和 /或流图中的每个框以及这些结构图和 /或框图和 /或流图中的框的组合。 本技术领域技术人员可以理解, 可以将这些计算机程序指令提供给通用计算机 、 专业计算机或其他可编程数据处理方法的处理器来实现, 从而通过计算机或 其他可编程数据处理方法的处理器来执行本发明公幵的结构图和 /或框图和 /或流 图的框或多个框中指定的方案。
[0140] 本技术领域技术人员可以理解, 本发明中已经讨论过的各种操作、 方法、 流程 中的步骤、 措施、 方案可以被交替、 更改、 组合或刪除。 进一步地, 具有本发 明中已经讨论过的各种操作、 方法、 流程中的其他步骤、 措施、 方案也可以被 交替、 更改、 重排、 分解、 组合或刪除。 进一步地, 现有技术中的具有与本发 明中公幵的各种操作、 方法、 流程中的步骤、 措施、 方案也可以被交替、 更改 、 重排、 分解、 组合或刪除。
[0141] 以上参照附图说明了本发明的优选实施例, 并非因此局限本发明的权利范围。
本领域技术人员不脱离本发明的范围和实质, 可以有多种变型方案实现本发明 , 比如作为一个实施例的特征可用于另一实施例而得到又一实施例。 凡在运用 本发明的技术构思之内所作的任何修改、 等同替换和改进, 均应在本发明的权 利范围之内。

Claims

权利要求书
一种基于电话会议的会议记录生成方法, 其特征在于, 包括步骤: 获取各会议终端采集的语音内容; 将所述语音内容转换为文字内容; 根据所述文字内容生成会议记录, 并存储所述会议记录和 /或发送所 述会议记录至指定地址。
根据权利要求 1所述的基于电话会议的会议记录生成方法, 其特征在 于, 所述获取各会议终端采集的语音内容的步骤包括: 通过各会议终 端采集语音内容, 接收所述各会议终端发送的所述语音内容; 根据所 述语音内容来源的会议终端分段保存所述语音内容, 并对每段语音内 容添加标识信息, 所述标识信息至少包括所述语音内容对应的会议终 端的设备识别码。
根据权利要求 2所述的基于电话会议的会议记录生成方法, 其特征在 于, 所述根据所述语音内容来源的会议终端分段保存所述语音内容的 步骤包括: 将一个会议终端一次持续采集的语音内容保存为一段语音 内容。
根据权利要求 2所述的基于电话会议的会议记录生成方法, 其特征在 于, 所述根据所述语音内容来源的会议终端分段保存所述语音内容的 步骤包括: 将一个会议终端一次持续采集的语音内容进行智能断句, 将每一句语音内容保存为一段语音内容。
根据权利要求 4所述的基于电话会议的会议记录生成方法, 其特征在 于, 所述标识信息还包括所述语音内容的句数序号。
根据权利要求 2所述的基于电话会议的会议记录生成方法, 其特征在 于, 所述会议终端的设备识别码为所述会议终端的唯一标识码或所述 会议终端加入会议的顺序编码。
根据权利要求 2-6任一项所述的基于电话会议的会议记录生成方法, 其特征在于, 所述将所述语音内容转换为文字内容的步骤包括: 分别 将每一段语音内容转换为一段文字内容, 并对每段文字内容添加与对 应的语音内容的标识信息相匹配的标识信息。 根据权利要求 7所述的基于电话会议的会议记录生成方法, 其特征在 于, 所述根据所述文字内容生成会议记录的步骤之后还包括: 当接收 到针对一段文字内容的编辑指令吋, 对所述文字内容进行编辑。 根据权利要求 7所述的基于电话会议的会议记录生成方法, 其特征在 于, 所述根据所述文字内容生成会议记录的步骤之后还包括: 当接收 到针对一段文字内容的翻译指令吋, 对所述文字内容进行翻译。 根据权利要求 8所述的基于电话会议的会议记录生成方法, 其特征在 于, 所述分别将每一段语音内容转换为一段文字内容的步骤之后还包 括: 对至少一段文字内容和与之对应的语音内容建立链接关系; 所述 根据所述文字内容生成会议记录的步骤之后还包括: 当接收到针对所 述文字内容的语音回放指令吋, 根据所述链接关系获取对应的语音内 容并予以播放。
一种基于电话会议的会议记录生成装置, 其特征在于, 包括: 语音内 容获取模块, 用于获取各会议终端采集的语音内容; 语音识别模块, 用于将所述语音内容转换为文字内容; 会议记录生成模块, 用于根据 所述文字内容生成会议记录, 并存储所述会议记录和 /或发送所述会 议记录至指定地址。
根据权利要求 11所述的基于电话会议的会议记录生成装置, 其特征在 于, 所述语音内容获取模块包括接收单元和分段单元, 其中: 所述接 收单元, 用于通过各会议终端采集语音内容, 接收所述各会议终端发 送的所述语音内容; 所述分段单元, 用于根据所述语音内容来源的会 议终端分段保存所述语音内容, 并对每段语音内容添加标识信息, 所 述标识信息至少包括所述语音内容对应的会议终端的设备识别码。 根据权利要求 12所述的基于电话会议的会议记录生成装置, 其特征在 于, 所述分段单元用于: 将一个会议终端一次持续采集的语音内容保 存为一段语音内容。
根据权利要求 12所述的基于电话会议的会议记录生成装置, 其特征在 于, 所述分段单元用于: 将一个会议终端一次持续采集的语音内容进 行智能断句, 将每一句语音内容保存为一段语音内容。
[权利要求 15] 根据权利要求 11-14任一项所述的基于电话会议的会议记录生成装置 , 其特征在于, 所述语音识别模块用于: 分别将每一段语音内容转换 为一段文字内容, 并对每段文字内容添加与对应的语音内容的标识信 息相匹配的标识信息。
[权利要求 16] 根据权利要求 15所述的基于电话会议的会议记录生成装置, 其特征在 于, 所述会议记录生成模块包括编辑单元, 所述编辑单元用于: 当接 收到针对一段文字内容的编辑指令吋, 对所述文字内容进行编辑。
[权利要求 17] 根据权利要求 15所述的基于电话会议的会议记录生成装置, 其特征在 于, 所述会议记录生成模块包括翻译单元, 所述翻译单元用于: 当接 收到针对一段文字内容的翻译指令吋, 对所述文字内容进行翻译。
[权利要求 18] 根据权利要求 16所述的基于电话会议的会议记录生成装置, 其特征在 于, 所述会议记录生成模块还包括语音回放单元, 所述语音识别模块 还用于: 对至少一段文字内容和与之对应的语音内容建立链接关系; 所述语音回放单元用于: 当接收到针对所述文字内容的语音回放指令 吋, 根据所述链接关系获取对应的语音内容并予以播放。
PCT/CN2016/089950 2016-07-13 2016-07-13 基于电话会议的会议记录生成方法和装置 WO2018010129A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/089950 WO2018010129A1 (zh) 2016-07-13 2016-07-13 基于电话会议的会议记录生成方法和装置

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/089950 WO2018010129A1 (zh) 2016-07-13 2016-07-13 基于电话会议的会议记录生成方法和装置

Publications (1)

Publication Number Publication Date
WO2018010129A1 true WO2018010129A1 (zh) 2018-01-18

Family

ID=60952298

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/089950 WO2018010129A1 (zh) 2016-07-13 2016-07-13 基于电话会议的会议记录生成方法和装置

Country Status (1)

Country Link
WO (1) WO2018010129A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11445146B2 (en) * 2019-12-30 2022-09-13 Yealink (Xiamen) Network Technology Co., Ltd. Video conference terminal and video conference system

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020038212A1 (en) * 2000-09-25 2002-03-28 Prosodie Telephony system with subtitles and/or translation
CN101068271A (zh) * 2007-06-26 2007-11-07 华为技术有限公司 电话纪要生成系统、通信终端、媒体服务器及方法
CN201243302Y (zh) * 2008-05-21 2009-05-20 北京帮助在线信息技术有限公司 一种运用语音识别技术的多对多会议记录的设备
CN101582951A (zh) * 2008-05-14 2009-11-18 北京帮助在线信息技术有限公司 一种运用语音识别技术的会议记录的实现方法和设备
CN101587496A (zh) * 2008-05-21 2009-11-25 北京帮助在线信息技术有限公司 一种可由人工或系统自动进行会议记录的实现方法和设备
CN102436812A (zh) * 2011-11-01 2012-05-02 展讯通信(上海)有限公司 会议记录装置及利用该装置对会议进行记录的方法
CN105745921A (zh) * 2016-01-19 2016-07-06 王晓光 一种视频网络会议的会议记录方法及系统
CN106057193A (zh) * 2016-07-13 2016-10-26 深圳市沃特沃德股份有限公司 基于电话会议的会议记录生成方法和装置

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020038212A1 (en) * 2000-09-25 2002-03-28 Prosodie Telephony system with subtitles and/or translation
CN101068271A (zh) * 2007-06-26 2007-11-07 华为技术有限公司 电话纪要生成系统、通信终端、媒体服务器及方法
CN101582951A (zh) * 2008-05-14 2009-11-18 北京帮助在线信息技术有限公司 一种运用语音识别技术的会议记录的实现方法和设备
CN201243302Y (zh) * 2008-05-21 2009-05-20 北京帮助在线信息技术有限公司 一种运用语音识别技术的多对多会议记录的设备
CN101587496A (zh) * 2008-05-21 2009-11-25 北京帮助在线信息技术有限公司 一种可由人工或系统自动进行会议记录的实现方法和设备
CN102436812A (zh) * 2011-11-01 2012-05-02 展讯通信(上海)有限公司 会议记录装置及利用该装置对会议进行记录的方法
CN105745921A (zh) * 2016-01-19 2016-07-06 王晓光 一种视频网络会议的会议记录方法及系统
CN106057193A (zh) * 2016-07-13 2016-10-26 深圳市沃特沃德股份有限公司 基于电话会议的会议记录生成方法和装置

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11445146B2 (en) * 2019-12-30 2022-09-13 Yealink (Xiamen) Network Technology Co., Ltd. Video conference terminal and video conference system

Similar Documents

Publication Publication Date Title
WO2014173370A1 (zh) 会议纪要的提取方法及装置
CN105556955B (zh) 视频通话装置和视频通话处理方法
CN107609045B (zh) 一种会议记录生成装置及其方法
CN103327181B (zh) 可提高用户获知语音信息效率的语音聊天方法
US8125508B2 (en) Sharing participant information in a videoconference
US8351581B2 (en) Systems and methods for intelligent call transcription
US20050261890A1 (en) Method and apparatus for providing language translation
US9742849B2 (en) Methods and systems for establishing collaborative communications between devices using ambient audio
JP5739009B2 (ja) 会議情報を提供するためのシステムおよび方法
US8120638B2 (en) Speech to text conversion in a videoconference
CN106057193A (zh) 基于电话会议的会议记录生成方法和装置
WO2020073633A1 (zh) 会议音箱及会议记录方法、设备、系统和计算机存储介质
US20040267387A1 (en) System and method for capturing media
US20170359393A1 (en) System and Method for Building Contextual Highlights for Conferencing Systems
US10423382B2 (en) Teleconference recording management system
JP6987124B2 (ja) 通訳装置及び方法(device and method of translating a language)
JP2006190296A (ja) マルチメディア通信システムにおけるコンテキスト抽出及びこれを用いた情報提供装置及び方法
US20140162612A1 (en) Method of recording call logs and device thereof
CN113138743A (zh) 使用音频水印的关键词组检测
CN102932543A (zh) 利用多媒体配置多媒体采集设备的方法
WO2019029073A1 (zh) 传屏方法、装置、电子设备及计算机可读存储介质
CN101232542A (zh) 移动终端实现语音备忘功能的方法及应用其的移动终端
US8988484B2 (en) Video processing apparatus and control method thereof
CN105282621A (zh) 一种语音消息可视化服务的实现方法及装置
WO2018010129A1 (zh) 基于电话会议的会议记录生成方法和装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16908457

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 16908457

Country of ref document: EP

Kind code of ref document: A1