CN108335697A - Minutes method, apparatus, equipment and computer-readable medium - Google Patents

Minutes method, apparatus, equipment and computer-readable medium Download PDF

Info

Publication number
CN108335697A
CN108335697A CN201810085820.3A CN201810085820A CN108335697A CN 108335697 A CN108335697 A CN 108335697A CN 201810085820 A CN201810085820 A CN 201810085820A CN 108335697 A CN108335697 A CN 108335697A
Authority
CN
China
Prior art keywords
voice
minutes
conference
carrying
conference voice
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810085820.3A
Other languages
Chinese (zh)
Inventor
耿雷
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Baidu Netcom Science and Technology Co Ltd
Original Assignee
Beijing Baidu Netcom Science and Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Baidu Netcom Science and Technology Co Ltd filed Critical Beijing Baidu Netcom Science and Technology Co Ltd
Priority to CN201810085820.3A priority Critical patent/CN108335697A/en
Publication of CN108335697A publication Critical patent/CN108335697A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/20Speech recognition techniques specially adapted for robustness in adverse environments, e.g. in noise, of stress induced speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L2021/02082Noise filtering the noise being echo, reverberation of the speech
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Processing of the speech or voice signal to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • G10L21/0216Noise filtering characterised by the method used for estimating noise
    • G10L2021/02161Number of inputs available containing the signal or the noise to be suppressed
    • G10L2021/02166Microphone arrays; Beamforming

Abstract

The present invention proposes a kind of minutes method, including:Conference voice is pre-processed;Sentence segmentation is carried out to conference voice, and is uploaded to voice server and carries out text conversion;It receives and carries out the text information after voice conversion by voice server.The embodiment of the present invention can be improved working efficiency, be economized on resources by the carry out minutes of speech-to-text mode.Further, the embodiment of the present invention also carries out the processing such as noise reduction to conference voice, can improve the accuracy rate of speech recognition.

Description

Minutes method, apparatus, equipment and computer-readable medium
Technical field
The present invention relates to minutes technical field more particularly to the minutes methods and dress of a kind of speech-to-text It sets, equipment and computer-readable medium.
Background technology
Currently with the fast development of business, conference system is using very extensive, during meeting, in many cases It needs to make meeting summary and reaches meeting to form file to determine and common recognition.
In this scene, meeting summary generally requires manual record or sound recordings, then carries out artificial transcription again. However this mode greatly reduces the working efficiency of information age, and a large amount of waste is formd to resource, it is unfavorable for environmental protection And green offfice.
Invention content
A kind of minutes method, apparatus of offer of the embodiment of the present invention, equipment and computer-readable medium, to solve or delay Solve the above technical problem in the prior art.
In a first aspect, an embodiment of the present invention provides a kind of minutes methods, including:
Conference voice is pre-processed;
Sentence segmentation is carried out to conference voice, and is uploaded to voice server and carries out text conversion;
It receives and carries out the text information after voice conversion by voice server.
With reference to first aspect, the present invention is described to be carried out in advance to conference voice in the first realization method of first aspect Processing includes:
Carry out echo cancellation process;
Beam forming processing is carried out to conference voice;
Noise reduction process is carried out to conference voice;
Enhancing enhanced processing is carried out to conference voice.
The first realization method with reference to first aspect, the present invention are described in second of realization method of first aspect Include to the step of conference voice progress beam forming processing:
Conference voice information is received by microphone array;
The voice messaging received to microphone array is weighted summation according to the weight of different direction.
The first realization method with reference to first aspect, the present invention are described in the third realization method of first aspect Carrying out noise reduction process to conference voice includes:
Noise suppression, amplitude identical using frequency noise are carried out according to voice signal frequency, signal strength and signal duration Identical, opposite in phase sound is cancelled out each other;
Dereverberation processing is carried out according to the space characteristics of sound field.
With reference to first aspect, the present invention, when receiving trigger signal, is held in the 4th kind of realization method of first aspect The row step:Sentence segmentation is carried out to conference voice, and is uploaded to voice server and carries out text conversion.
The 4th kind of realization method with reference to first aspect, the present invention are described in the 5th kind of realization method of first aspect Trigger signal is keyword voice trigger signal or button trigger signal.
Second aspect, an embodiment of the present invention provides a kind of minutes devices, including:
Preprocessing module, for being pre-processed to conference voice;
Voice conversion module for carrying out sentence segmentation to conference voice, and is uploaded to voice server and carries out word turn It changes;
Received text module is carried out the text information after voice conversion by voice server for receiving.
In conjunction with second aspect, in second aspect the first realization method, the preprocessing module includes the present invention:
Echo cancellation unit, for carrying out echo cancellation process;
Beam shaping elements, for carrying out beam forming processing to conference voice;
Noise reduction unit, for carrying out noise reduction process to conference voice;
Enhance amplifying unit, for carrying out enhancing enhanced processing to conference voice.
In conjunction with second aspect, in second of realization method of second aspect, the beam shaping elements include the present invention:
Voice messaging receiving subelement, for receiving conference voice information by microphone array;
Weighted sum subelement, the voice messaging for being received to microphone array are carried out according to the weight of different direction Weighted sum.
In conjunction with second aspect, in second aspect the third realization method, the noise reduction unit includes the present invention:
Noise suppression subelement is adopted for carrying out noise suppression according to voice signal frequency, signal strength and signal duration With frequency noise is identical, amplitude is identical, the sound of opposite in phase is cancelled out each other;
Dereverberation subelement, for carrying out dereverberation processing according to the space characteristics of sound field.
In conjunction with the third realization method of second aspect, the present invention also wraps in the 4th kind of realization method of second aspect Trigger module is included, for when receiving trigger signal, executing the voice conversion module.
In conjunction with the 4th kind of realization method of second aspect, the present invention is described in the 5th kind of realization method of second aspect Trigger signal is keyword voice trigger signal or button trigger signal.
The function of described device can also execute corresponding software realization by hardware realization by hardware.It is described Hardware or software include one or more modules corresponding with above-mentioned function.
In a possible design, the structure of minutes device includes processor and memory, the memory For storing the program for supporting that minutes device executes minutes method in above-mentioned first aspect, the processor is configured For for executing the program stored in the memory.The minutes device can also include communication interface, be used for meeting Recording device and other equipment or communication.
The third aspect, an embodiment of the present invention provides a kind of computer-readable mediums, for storing minutes device institute Computer software instructions comprising for executing the program involved by the minutes method of above-mentioned first aspect.
A technical solution in above-mentioned technical proposal has the following advantages that or advantageous effect:The embodiment of the present invention passes through language Sound turns the carry out minutes of text mode, can improve working efficiency, economize on resources.Further, the embodiment of the present invention is also right Conference voice carries out the processing such as noise reduction, can improve the accuracy rate of speech recognition.
Above-mentioned general introduction is merely to illustrate that the purpose of book, it is not intended to be limited in any way.Except foregoing description Schematical aspect, except embodiment and feature, by reference to attached drawing and the following detailed description, the present invention is further Aspect, embodiment and feature, which will be, to be readily apparent that.
Description of the drawings
In the accompanying drawings, unless specified otherwise herein, otherwise run through the identical reference numeral of multiple attached drawings and indicate same or analogous Component or element.What these attached drawings were not necessarily to scale.It should be understood that these attached drawings are depicted only according to the present invention Some disclosed embodiments, and should not serve to limit the scope of the present invention.
Fig. 1 is the step flow chart of the minutes method of embodiment one;
Fig. 2 is the specific steps flow chart of the step S110 of embodiment one;
Fig. 3 is the step flow chart of the minutes method of embodiment two;
Fig. 4 is the connection block diagram of the minutes device of embodiment three;
Fig. 5 is the connection block diagram of the minutes device of example IV;
Fig. 6 is that the minutes equipment of embodiment five connects block diagram.
Specific implementation mode
Hereinafter, certain exemplary embodiments are simply just described.As one skilled in the art will recognize that Like that, without departing from the spirit or scope of the present invention, described embodiment can be changed by various different modes. Therefore, attached drawing and description are considered essentially illustrative rather than restrictive.
The embodiment of the present invention aims to solve the problem that be needed to cause by manual record when carrying out minutes in the prior art The technical issues of inefficiency, the embodiment of the present invention mainly provide a kind of minutes method and dress by speech-to-text It sets, the expansion for carrying out technical solution by following embodiment separately below describes.
Embodiment one
Referring to Fig. 1, its step flow chart for the minutes method of the embodiment of the present invention one.The present embodiment one provides A kind of minutes method, includes the following steps:
S110:Conference voice is pre-processed.
In general meeting, due to the influence of ambient enviroment, different noises will produce, it is therefore desirable to the language in meeting Message breath is pre-processed, to improve the accuracy of speech recognition.As shown in Fig. 2, in one embodiment, the step S110 Include:
S111:Carry out echo cancellation process.
In general conference system, acoustic echo phenomenon, Ye Jiyang especially often occur in tele-conferencing system The sound that sound device plays back beams back distal end, the phenomenon that enabling far-end talker to hear the sound of oneself after being picked up by microphone. Therefore, it is necessary to do echo cancellation process.For example, echo canceling method may be used, it can also pass through estimate echo signal Then size subtracts the estimated value to offset echo in receiving signal.
S112:Beam forming processing is carried out to conference voice.
Wherein, it when carrying out beam forming processing, first passes through microphone array and receives conference voice information.Then, then it is right The voice messaging that microphone array receives is weighted summation according to the weight of different direction.In embodiment, pass through multiple wheats Gram wind acquires the voice messaging of user in different direction, determines the direction of sound source.According to the weighted of different direction, added Power summation.For example, the sound weight bigger in other orientation of the weight ratio of Sounnd source direction, is believed with ensureing to enhance voice input by user Breath, weakens the influence of other sound.
S113:Noise reduction process is carried out to conference voice.
Wherein, when carrying out noise reduction process, first pressed down into Row noise according to voice signal frequency, signal strength and signal duration Then system carries out dereverberation processing further according to the space characteristics of sound field.Wherein, when carrying out noise suppression, such as:It can pass through Using identical as frequency noise, amplitude is identical, the sound of opposite in phase is cancelled out each other.And when carrying out dereverberation processing, The audio plug of some existing dereverberations may be used, microphone array can also be used to eliminate reverberation.
S114:Enhancing enhanced processing is carried out to conference voice.
In the present embodiment, for example AGC (automatic growth control) modes may be used processing is amplified to voice.
S120:Sentence segmentation is carried out to conference voice, and is uploaded to voice server and carries out text conversion.
It after completing the pretreatment of voice, needs to carry out sentence segmentation to conference voice, to facilitate the conversion for carrying out word And record.For example, phonetic segmentation etc. can be carried out according to the dead time of voice, prevent occurring whole section when carrying out text conversion Word is without the result of punctuation mark.In addition, when carrying out text conversion, can be passed through by training transformation model mode The transformation model carries out text conversion.
S130:It receives and carries out the text information after voice conversion by voice server.
After completing voice conversion, need transformed text information being sent in the device end of user.Wherein, may be used With the mobile phone, computer or conference facility etc. for being sent to user.
Embodiment two
With embodiment one difference lies in:The present embodiment two also adds a triggering step, and specific scheme is as follows:
As shown in figure 3, the present embodiment two provides a kind of minutes method, include the following steps:
S210:Conference voice is pre-processed.
S220:When receiving trigger signal, step S230 is executed.
Step is triggered by increase, user can be facilitated to start and close voice conversion function at any time.And wherein, triggering Form may be used:Button triggers, i.e., by the way that a triggering button is arranged, user can manually carry out triggering voice Conversion function.It can also use:Keyword triggers, i.e., user is by saying that some keyword carries out triggering voice conversion function.
S230:Sentence segmentation is carried out to conference voice, and is uploaded to voice server and carries out text conversion.
S240:It receives and carries out the text information after voice conversion by voice server.
Step S210, S230, S240 in the present embodiment two is identical as embodiment one, and so it will not be repeated.
Embodiment three
The present embodiment three corresponds to embodiment one, provides a kind of minutes device.Referring to Fig. 4, it is this implementation The connection block diagram of the minutes device of example three.
The minutes device of the present embodiment three, including:
Preprocessing module 110, for being pre-processed to conference voice.Wherein, the preprocessing module 110 includes:It returns Sound eliminates unit 111, beam shaping elements 112, noise reduction unit 113 and enhancing amplifying unit 114.
The echo cancellation unit 111, for carrying out echo cancellation process.
The beam shaping elements 112, for carrying out beam forming processing to conference voice.The beam shaping elements 112 include:
Voice messaging receiving subelement 112a, for receiving conference voice information by microphone array.
Weighted sum subelement 112b, voice messaging for being received to microphone array according to the weight of different direction, It is weighted summation.
The noise reduction unit 113, for carrying out noise reduction process to conference voice.The noise reduction unit 113 includes:
Noise suppression subelement 113a, for being pressed down into Row noise according to voice signal frequency, signal strength and signal duration System.
Dereverberation subelement 113b, for carrying out dereverberation processing according to the space characteristics of sound field.
The enhancing amplifying unit 114, for carrying out enhancing enhanced processing to conference voice.
Voice conversion module 120 for carrying out sentence segmentation to conference voice, and is uploaded to voice server and carries out word Conversion.
Received text module 130 is carried out the text information after voice conversion by voice server for receiving.
The present embodiment three is identical as the principle of embodiment one, and so it will not be repeated.
Example IV
The present embodiment four is corresponding with embodiment two, provides a kind of minutes device, specific as follows:
As shown in figure 5, the connection block diagram of the minutes device for the present embodiment four.The embodiment of the present invention four provides one Kind minutes device, including:
Preprocessing module 210, for being pre-processed to conference voice.
Trigger module 220, for when receiving trigger signal, executing the voice conversion module.
Voice conversion module 230 for carrying out sentence segmentation to conference voice, and is uploaded to voice server and carries out word Conversion.
Received text module 240 is carried out the text information after voice conversion by voice server for receiving.
The application mode of the present embodiment four is identical as embodiment two as principle, and so it will not be repeated.
Embodiment five
The embodiment of the present invention five provides a kind of minutes equipment, as shown in fig. 6, the equipment includes:Memory 310 and place Device 320 is managed, the computer program that can be run on the processor 320 is stored in memory 310.The processor 320 executes institute The minutes method in above-described embodiment is realized when stating computer program.The quantity of the memory 310 and processor 320 can Think one or more.
The equipment further includes:
Communication interface 330 carries out data interaction for being communicated with external device.
Memory 310 may include high-speed RAM memory, it is also possible to further include nonvolatile memory (non- Volatile memory), a for example, at least magnetic disk storage.
If memory 310, processor 320 and the independent realization of communication interface 330, memory 310,320 and of processor Communication interface 330 can be connected with each other by bus and complete mutual communication.The bus can be Industry Standard Architecture Structure (ISA, Industry Standard Architecture) bus, external equipment interconnection (PCI, Peripheral Component) bus or extended industry-standard architecture (EISA, Extended Industry Standard Component) bus etc..The bus can be divided into address bus, data/address bus, controlling bus etc..For ease of indicating, Fig. 6 In only indicated with a thick line, it is not intended that an only bus or a type of bus.
Optionally, in specific implementation, if memory 310, processor 320 and communication interface 330 are integrated in one piece of core On piece, then memory 310, processor 320 and communication interface 330 can complete mutual communication by internal interface.
In the description of this specification, reference term " one embodiment ", " some embodiments ", " example ", " specifically show The description of example " or " some examples " etc. means specific features, structure, material or spy described in conjunction with this embodiment or example Point is included at least one embodiment or example of the invention.Moreover, particular features, structures, materials, or characteristics described It may be combined in any suitable manner in any one or more of the embodiments or examples.In addition, without conflicting with each other, this The technical staff in field can be by the spy of different embodiments or examples described in this specification and different embodiments or examples Sign is combined.
In addition, term " first ", " second " are used for description purposes only, it is not understood to indicate or imply relative importance Or implicitly indicate the quantity of indicated technical characteristic." first " is defined as a result, the feature of " second " can be expressed or hidden Include at least one this feature containing ground.In the description of the present invention, the meaning of " plurality " is two or more, unless otherwise Clear specific restriction.
Any process described otherwise above or method description are construed as in flow chart or herein, and expression includes It is one or more for realizing specific logical function or process the step of executable instruction code module, segment or portion Point, and the range of the preferred embodiment of the present invention includes other realization, wherein can not press shown or discuss suitable Sequence, include according to involved function by it is basic simultaneously in the way of or in the opposite order, to execute function, this should be of the invention Embodiment person of ordinary skill in the field understood.
Expression or logic and/or step described otherwise above herein in flow charts, for example, being considered use In the order list for the executable instruction for realizing logic function, may be embodied in any computer-readable medium, for Instruction execution system, device or equipment (system of such as computer based system including processor or other can be held from instruction The instruction fetch of row system, device or equipment and the system executed instruction) it uses, or combine these instruction execution systems, device or set It is standby and use.For the purpose of this specification, " computer-readable medium " can any can be included, store, communicating, propagating or passing Defeated program is for instruction execution system, device or equipment or the dress used in conjunction with these instruction execution systems, device or equipment It sets.
Computer-readable medium described in the embodiment of the present invention can be that computer-readable signal media or computer can Storage medium either the two is read arbitrarily to combine.The more specific example of computer readable storage medium is at least (non-poor Property list to the greatest extent) include following:Electrical connection section (electronic device) with one or more wiring, portable computer diskette box (magnetic Device), random access memory (RAM), read-only memory (ROM), erasable edit read-only storage (EPROM or flash Memory), fiber device and portable read-only memory (CDROM).In addition, computer readable storage medium even can be with It is the paper or other suitable media that can print described program on it, because can be for example by paper or the progress of other media Optical scanner is then handled into edlin, interpretation or when necessary with other suitable methods described electronically to obtain Program is then stored in computer storage.
In embodiments of the present invention, computer-readable signal media may include in a base band or as a carrier wave part The data-signal of propagation, wherein carrying computer-readable program code.The data-signal of this propagation may be used a variety of Form, including but not limited to electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media is also Can be any computer-readable medium other than computer readable storage medium, which can send, pass Either transmission is broadcast for instruction execution system, input method or device use or program in connection.Computer can The program code for reading to include on medium can transmit with any suitable medium, including but not limited to:Wirelessly, electric wire, optical cable, penetrate Frequently (RadioFrequency, RF) etc. or above-mentioned any appropriate combination.
It should be appreciated that each section of the present invention can be realized with hardware, software, firmware or combination thereof.Above-mentioned In embodiment, software that multiple steps or method can in memory and by suitable instruction execution system be executed with storage Or firmware is realized.It, and in another embodiment, can be under well known in the art for example, if realized with hardware Any one of row technology or their combination are realized:With the logic gates for realizing logic function to data-signal Discrete logic, with suitable combinational logic gate circuit application-specific integrated circuit, programmable gate array (PGA), scene Programmable gate array (FPGA) etc..
Those skilled in the art are appreciated that realize all or part of step that above-described embodiment method carries Suddenly it is that relevant hardware can be instructed to complete by program, the program can be stored in a kind of computer-readable storage medium In matter, which includes the steps that one or a combination set of embodiment of the method when being executed.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing module, it can also That each unit physically exists alone, can also two or more units be integrated in a module.Above-mentioned integrated mould The form that hardware had both may be used in block is realized, can also be realized in the form of software function module.The integrated module is such as Fruit is realized in the form of software function module and when sold or used as an independent product, can also be stored in a computer In readable storage medium storing program for executing.The storage medium can be read-only memory, disk or CD etc..
In conclusion carry out minutes of the embodiment of the present invention by speech-to-text mode, can improve work effect Rate economizes on resources.Further, the embodiment of the present invention also carries out the processing such as noise reduction to conference voice, can improve speech recognition Accuracy rate.
The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any Those familiar with the art in the technical scope disclosed by the present invention, can readily occur in its various change or replacement, These should be covered by the protection scope of the present invention.Therefore, protection scope of the present invention should be with the guarantor of the claim It protects subject to range.

Claims (14)

1. a kind of minutes method, which is characterized in that including:
Conference voice is pre-processed;
Sentence segmentation is carried out to conference voice, and is uploaded to voice server and carries out text conversion;
It receives and carries out the text information after voice conversion by voice server.
2. minutes method according to claim 1, which is characterized in that it is described to conference voice carry out pretreatment include:
Carry out echo cancellation process;
Beam forming processing is carried out to conference voice;
Noise reduction process is carried out to conference voice;
Enhancing enhanced processing is carried out to conference voice.
3. minutes method according to claim 2, which is characterized in that described to carry out beam forming processing to conference voice The step of include:
Conference voice information is received by microphone array;
The voice messaging received to microphone array is weighted summation according to the weight of different direction.
4. minutes method according to claim 2, which is characterized in that described to carry out noise reduction process packet to conference voice It includes:
Noise suppression, amplitude phase identical using frequency noise are carried out according to voice signal frequency, signal strength and signal duration Same, opposite in phase sound is cancelled out each other;
Dereverberation processing is carried out according to the space characteristics of sound field.
5. minutes method according to claim 1, which is characterized in that when receiving trigger signal, execute the step Suddenly:Sentence segmentation is carried out to conference voice, and is uploaded to voice server and carries out text conversion.
6. minutes method according to claim 5, which is characterized in that the trigger signal is believed for keyword speech trigger Number or button trigger signal.
7. a kind of minutes device, which is characterized in that including:
Preprocessing module, for being pre-processed to conference voice;
Voice conversion module for carrying out sentence segmentation to conference voice, and is uploaded to voice server and carries out text conversion;
Received text module is carried out the text information after voice conversion by voice server for receiving.
8. minutes device according to claim 7, which is characterized in that the preprocessing module includes:
Echo cancellation unit, for carrying out echo cancellation process;
Beam shaping elements, for carrying out beam forming processing to conference voice;
Noise reduction unit, for carrying out noise reduction process to conference voice;
Enhance amplifying unit, for carrying out enhancing enhanced processing to conference voice.
9. minutes device according to claim 8, which is characterized in that the beam shaping elements include:
Voice messaging receiving subelement, for receiving conference voice information by microphone array;
Weighted sum subelement, the voice messaging for being received to microphone array are weighted according to the weight of different direction Summation.
10. minutes device according to claim 8, which is characterized in that the noise reduction unit includes:
Noise suppression subelement, for carrying out noise suppression according to voice signal frequency, signal strength and signal duration, using making an uproar Voice frequency is identical, amplitude is identical, the sound of opposite in phase is cancelled out each other;
Dereverberation subelement, for carrying out dereverberation processing according to the space characteristics of sound field.
11. minutes device according to claim 7, which is characterized in that further include trigger module, for tactile when receiving When signalling, the voice conversion module is executed.
12. according to minutes device described in claim 11, which is characterized in that the trigger signal is keyword trigger signal Or button trigger signal.
13. a kind of minutes equipment, which is characterized in that the equipment includes:
One or more processors;
Storage device, for storing one or more programs;
When one or more of programs are executed by one or more of processors so that one or more of processors Realize the minutes method as described in any in claim 1-6.
14. a kind of computer-readable medium, is stored with computer program, which is characterized in that when the program is executed by processor Realize the minutes method as described in any in claim 1-6.
CN201810085820.3A 2018-01-29 2018-01-29 Minutes method, apparatus, equipment and computer-readable medium Pending CN108335697A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810085820.3A CN108335697A (en) 2018-01-29 2018-01-29 Minutes method, apparatus, equipment and computer-readable medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810085820.3A CN108335697A (en) 2018-01-29 2018-01-29 Minutes method, apparatus, equipment and computer-readable medium

Publications (1)

Publication Number Publication Date
CN108335697A true CN108335697A (en) 2018-07-27

Family

ID=62926101

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810085820.3A Pending CN108335697A (en) 2018-01-29 2018-01-29 Minutes method, apparatus, equipment and computer-readable medium

Country Status (1)

Country Link
CN (1) CN108335697A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109361527A (en) * 2018-12-28 2019-02-19 苏州思必驰信息科技有限公司 Voice conferencing recording method and system
CN109803059A (en) * 2018-12-17 2019-05-24 百度在线网络技术(北京)有限公司 Audio-frequency processing method and device
CN110097891A (en) * 2019-04-22 2019-08-06 广州视源电子科技股份有限公司 A kind of microphone signal processing method, device, equipment and storage medium
CN110232925A (en) * 2019-06-28 2019-09-13 百度在线网络技术(北京)有限公司 Generate the method, apparatus and conference terminal of minutes
CN110263313A (en) * 2019-06-19 2019-09-20 安徽声讯信息技术有限公司 A kind of man-machine coordination edit methods for meeting shorthand
CN110335612A (en) * 2019-07-11 2019-10-15 招商局金融科技有限公司 Minutes generation method, device and storage medium based on speech recognition
WO2020073633A1 (en) * 2018-10-12 2020-04-16 深圳海翼智新科技有限公司 Conference loudspeaker box, conference recording method, device and system, and computer storage medium
CN112634879A (en) * 2020-12-18 2021-04-09 建信金融科技有限责任公司 Voice conference management method, device, equipment and medium
CN112750452A (en) * 2020-12-29 2021-05-04 北京字节跳动网络技术有限公司 Voice processing method, device and system, intelligent terminal and electronic equipment

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101034158A (en) * 2007-02-25 2007-09-12 四川川大智胜软件股份有限公司 Low altitude target monitoring method based on microphones array network
CN201294570Y (en) * 2008-10-31 2009-08-19 比亚迪股份有限公司 Echo elimination device for teleconference system
US20090248411A1 (en) * 2008-03-28 2009-10-01 Alon Konchitsky Front-End Noise Reduction for Speech Recognition Engine
CN102306496A (en) * 2011-09-05 2012-01-04 歌尔声学股份有限公司 Noise elimination method, device and system of multi-microphone array
CN103257787A (en) * 2013-05-16 2013-08-21 北京小米科技有限责任公司 Method and device for starting voice assistant application
US8983844B1 (en) * 2012-07-31 2015-03-17 Amazon Technologies, Inc. Transmission of noise parameters for improving automatic speech recognition
CN105512112A (en) * 2015-12-01 2016-04-20 百度在线网络技术(北京)有限公司 Translation providing method and device
WO2016146316A1 (en) * 2015-03-18 2016-09-22 Qualcomm Technologies International, Ltd. Structure for multi-microphone speech einhancement system
CN106057193A (en) * 2016-07-13 2016-10-26 深圳市沃特沃德股份有限公司 Conference record generation method based on telephone conference and device
CN106600212A (en) * 2016-11-24 2017-04-26 南京九致信息科技有限公司 Conference record system and method for automatically generating conference record
US9659576B1 (en) * 2016-06-13 2017-05-23 Biamp Systems Corporation Beam forming and acoustic echo cancellation with mutual adaptation control
CN107274901A (en) * 2017-08-10 2017-10-20 湖州金软电子科技有限公司 A kind of far field voice interaction device
CN107507623A (en) * 2017-10-09 2017-12-22 维拓智能科技(深圳)有限公司 Self-service terminal based on Microphone Array Speech interaction

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101034158A (en) * 2007-02-25 2007-09-12 四川川大智胜软件股份有限公司 Low altitude target monitoring method based on microphones array network
US20090248411A1 (en) * 2008-03-28 2009-10-01 Alon Konchitsky Front-End Noise Reduction for Speech Recognition Engine
CN201294570Y (en) * 2008-10-31 2009-08-19 比亚迪股份有限公司 Echo elimination device for teleconference system
CN102306496A (en) * 2011-09-05 2012-01-04 歌尔声学股份有限公司 Noise elimination method, device and system of multi-microphone array
US8983844B1 (en) * 2012-07-31 2015-03-17 Amazon Technologies, Inc. Transmission of noise parameters for improving automatic speech recognition
CN103257787A (en) * 2013-05-16 2013-08-21 北京小米科技有限责任公司 Method and device for starting voice assistant application
WO2016146316A1 (en) * 2015-03-18 2016-09-22 Qualcomm Technologies International, Ltd. Structure for multi-microphone speech einhancement system
CN105512112A (en) * 2015-12-01 2016-04-20 百度在线网络技术(北京)有限公司 Translation providing method and device
US9659576B1 (en) * 2016-06-13 2017-05-23 Biamp Systems Corporation Beam forming and acoustic echo cancellation with mutual adaptation control
CN106057193A (en) * 2016-07-13 2016-10-26 深圳市沃特沃德股份有限公司 Conference record generation method based on telephone conference and device
CN106600212A (en) * 2016-11-24 2017-04-26 南京九致信息科技有限公司 Conference record system and method for automatically generating conference record
CN107274901A (en) * 2017-08-10 2017-10-20 湖州金软电子科技有限公司 A kind of far field voice interaction device
CN107507623A (en) * 2017-10-09 2017-12-22 维拓智能科技(深圳)有限公司 Self-service terminal based on Microphone Array Speech interaction

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
刘聪锋: "波束形成", 《稳健自适应波束形成算法》 *
张丽艳等: "一种适用于混响环境的麦克风阵列语音增强方法 ", 《信号处理》 *
李周复: "噪声的麦克风阵列测量", 《风洞试验手册》 *
郭威等: "嵌入式语音识别在混响环境中的信号增强方法 ", 《计算机应用研究》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2020073633A1 (en) * 2018-10-12 2020-04-16 深圳海翼智新科技有限公司 Conference loudspeaker box, conference recording method, device and system, and computer storage medium
CN109803059A (en) * 2018-12-17 2019-05-24 百度在线网络技术(北京)有限公司 Audio-frequency processing method and device
CN109361527A (en) * 2018-12-28 2019-02-19 苏州思必驰信息科技有限公司 Voice conferencing recording method and system
CN109361527B (en) * 2018-12-28 2021-02-05 苏州思必驰信息科技有限公司 Voice conference recording method and system
CN110097891A (en) * 2019-04-22 2019-08-06 广州视源电子科技股份有限公司 A kind of microphone signal processing method, device, equipment and storage medium
CN110263313A (en) * 2019-06-19 2019-09-20 安徽声讯信息技术有限公司 A kind of man-machine coordination edit methods for meeting shorthand
CN110263313B (en) * 2019-06-19 2021-08-24 安徽声讯信息技术有限公司 Man-machine collaborative editing method for conference shorthand
CN110232925A (en) * 2019-06-28 2019-09-13 百度在线网络技术(北京)有限公司 Generate the method, apparatus and conference terminal of minutes
CN110335612A (en) * 2019-07-11 2019-10-15 招商局金融科技有限公司 Minutes generation method, device and storage medium based on speech recognition
CN112634879A (en) * 2020-12-18 2021-04-09 建信金融科技有限责任公司 Voice conference management method, device, equipment and medium
CN112750452A (en) * 2020-12-29 2021-05-04 北京字节跳动网络技术有限公司 Voice processing method, device and system, intelligent terminal and electronic equipment
WO2022142984A1 (en) * 2020-12-29 2022-07-07 北京字节跳动网络技术有限公司 Voice processing method, apparatus and system, smart terminal and electronic device

Similar Documents

Publication Publication Date Title
CN108335697A (en) Minutes method, apparatus, equipment and computer-readable medium
CN107423364B (en) Method, device and storage medium for answering operation broadcasting based on artificial intelligence
CN111883156B (en) Audio processing method and device, electronic equipment and storage medium
US20190325888A1 (en) Speech recognition method, device, apparatus and computer-readable storage medium
CN107995360B (en) Call processing method and related product
CN108681440A (en) A kind of smart machine method for controlling volume and system
US9336786B2 (en) Signal processing device, signal processing method, and storage medium
CN107071119B (en) A kind of sound removing method and mobile terminal
MX2008016354A (en) Detecting an answering machine using speech recognition.
US11587560B2 (en) Voice interaction method, device, apparatus and server
CN104157292B (en) Anti- utter long and high-pitched sounds acoustic signal processing method and device
CN103973877A (en) Method and device for using characters to realize real-time communication in mobile terminal
CN103458137A (en) Systems and methods for voice enhancement in audio conference
CN110956976B (en) Echo cancellation method, device and equipment and readable storage medium
WO2017071183A1 (en) Voice processing method and device, and pickup circuit
CN207603881U (en) A kind of intelligent sound wireless sound box
CN108733341B (en) Voice interaction method and device
CN109725869A (en) Continuous interactive control method and device
CN107886963B (en) A kind of method, apparatus and electronic equipment of speech processes
CN113241085A (en) Echo cancellation method, device, equipment and readable storage medium
CN102655006A (en) Voice transmission device and voice transmission method
CN105933512A (en) Safety communication method and device, and mobile terminal
CN107452398A (en) Echo acquisition methods, electronic equipment and computer-readable recording medium
CN108053833A (en) Processing method, device, electronic equipment and the storage medium that voice is uttered long and high-pitched sounds
CN109584877B (en) Voice interaction control method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180727

RJ01 Rejection of invention patent application after publication