CN109754808A - Method, apparatus, computer equipment and the storage medium of voice conversion text - Google Patents
Method, apparatus, computer equipment and the storage medium of voice conversion text Download PDFInfo
- Publication number
- CN109754808A CN109754808A CN201811526588.9A CN201811526588A CN109754808A CN 109754808 A CN109754808 A CN 109754808A CN 201811526588 A CN201811526588 A CN 201811526588A CN 109754808 A CN109754808 A CN 109754808A
- Authority
- CN
- China
- Prior art keywords
- voice messaging
- voice
- text
- decibel value
- preset
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 49
- 238000006243 chemical reaction Methods 0.000 title claims abstract description 29
- 238000003860 storage Methods 0.000 title claims abstract description 12
- 230000011218 segmentation Effects 0.000 claims abstract description 39
- 238000012545 processing Methods 0.000 claims description 29
- 238000013507 mapping Methods 0.000 claims description 7
- 238000011946 reduction process Methods 0.000 claims description 7
- 238000010586 diagram Methods 0.000 description 18
- 230000006854 communication Effects 0.000 description 10
- 238000004891 communication Methods 0.000 description 10
- 238000007630 basic procedure Methods 0.000 description 9
- 230000008569 process Effects 0.000 description 6
- 230000008859 change Effects 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 230000007246 mechanism Effects 0.000 description 2
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 235000015170 shellfish Nutrition 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 230000007175 bidirectional communication Effects 0.000 description 1
- 230000005540 biological transmission Effects 0.000 description 1
- 230000001427 coherent effect Effects 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000009826 distribution Methods 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000007613 environmental effect Effects 0.000 description 1
- 230000036651 mood Effects 0.000 description 1
- 238000003909 pattern recognition Methods 0.000 description 1
- 238000002360 preparation method Methods 0.000 description 1
- 238000012163 sequencing technique Methods 0.000 description 1
Classifications
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Machine Translation (AREA)
- Telephonic Communication Services (AREA)
Abstract
The embodiment of the invention discloses a kind of voice conversion text method, apparatus, computer equipment and storage mediums, include the following steps: to obtain voice messaging to be processed;The voice messaging is segmented according to preset punctuate rule;Voice messaging after segmentation is converted into text.Voice messaging is segmented by preset punctuate rule, and is converted to text according to the voice messaging after segmentation, the readability of text can be increased by being segmented to text, avoids the occurrence of unnecessary misread or ambiguity.
Description
Technical field
The present embodiments relate to financial field, especially a kind of voice conversion text method, apparatus, computer equipment and
Storage medium.
Background technique
Speech recognition is that an application more develops very rapidly, in industry, household electrical appliances, communication, automotive electronics, medical treatment, family
The every field such as front yard service, consumption electronic product have wide application scenarios.Field involved in speech recognition technology includes:
Signal processing, pattern-recognition, probability theory and information theory, sound generating mechanism and hearing mechanism, artificial intelligence etc..
In the prior art, text can be converted speech by speech recognition.But during identification, when with
Family is spoken coherent or in some scenes, and when multiple people talk, after converting speech into text, text is not made pauses in reading unpunctuated ancient writings, and not
It can distinguish speaker, there are ambiguity or misunderstandings for the text after causing voice to be converted.
Summary of the invention
The embodiment of the present invention provides a kind of voice conversion text method, apparatus, computer equipment and storage medium.
In order to solve the above technical problems, the technical solution that the embodiment of the invention uses is: providing a kind of language
Sound converts text method, includes the following steps:
Obtain voice messaging to be processed;
The voice messaging is segmented according to preset punctuate rule;
Voice messaging after segmentation is converted into text.
It is optionally, described to be segmented the voice messaging according to preset punctuate rule, comprising:
Detect the decibel value in the voice messaging;
When the decibel value in the voice messaging is less than default decibel value, the decibel value is less than the default decibel
First waypoint of the position of value as the voice messaging;
The voice messaging is segmented according to first waypoint.
Optionally, described that the decibel value is less than the position of the default decibel value as the first of the voice messaging
Waypoint, comprising:
Judge that decibel value is less than the voice duration of the default decibel value in the voice messaging;
When the voice duration is greater than preset duration, the sound bite that decibel value is less than the default decibel value is obtained;
Using any time in the sound bite as first waypoint.
It is optionally, described to be segmented the voice messaging according to preset punctuate rule, comprising:
Judge tone color variation whether occurs in the voice messaging;
When tone color variation occurs in the voice messaging, using the position of tone color variation as the second of the voice messaging
Waypoint;
The voice messaging is segmented according to second waypoint.
Optionally, the voice messaging by after segmentation is converted to text, comprising:
Tone color label is carried out to the voice messaging with identical tone color after segmentation;
The voice being segmented after label is converted into text by preset speech software;
Role's label is carried out to the text after conversion according to tone color label.
Optionally, the voice messaging by after segmentation is converted to text, comprising:
The voice after segmentation is converted into target text by preset speech software;
Obtain the tone keyword in the target text;
The punctuation mark that there are mapping relations with the tone keyword is searched in preset information table, and by the mark
After point symbol is added to the target text.
It is optionally, described to obtain voice messaging to be processed, comprising:
Acquire the voice messaging of user;
Noise reduction process is carried out to the voice messaging according to preset processing software.
In order to solve the above technical problems, the embodiment of the present invention also provides a kind of voice conversion text device, comprising:
Module is obtained, for obtaining voice messaging to be processed;
Processing module, for being segmented the voice messaging according to preset punctuate rule;
Execution module, for the voice messaging after segmentation to be converted to text.
Optionally, the processing module includes:
First processing submodule, for detecting the decibel value in the voice messaging;
Second processing submodule, for that described will divide when the decibel value in the voice messaging is less than default decibel value
Shellfish value is less than first waypoint of the position of the default decibel value as the voice messaging;
First implementation sub-module, for being segmented according to first waypoint to the voice messaging.
Optionally, the second processing submodule includes:
Third handles submodule, when for judging that decibel value is less than the voice of the default decibel value in the voice messaging
It is long;
First acquisition submodule, for obtaining decibel value less than described pre- when the voice duration is greater than preset duration
If the sound bite of decibel value;
Second implementation sub-module, for using any time in the sound bite as first waypoint.
Optionally, the processing module includes:
Fourth process submodule, for judging tone color variation whether occurs in the voice messaging;
5th processing submodule, for when tone color variation occurs in the voice messaging, the position of tone color variation to be made
For the second waypoint of the voice messaging;
Third implementation sub-module, for being segmented according to second waypoint to the voice messaging.
Optionally, the execution module includes:
6th processing submodule, for carrying out tone color label to the voice messaging with identical tone color after segmentation;
7th processing submodule, for the voice being segmented after label to be converted to text by preset speech software
Word;
4th implementation sub-module, for carrying out role's label to the text after conversion according to tone color label.
Optionally, the execution module includes:
8th processing submodule, for the voice after segmentation to be converted to target text by preset speech software
Word;
Second acquisition submodule, for obtaining the tone keyword in the target text;
5th implementation sub-module has mapping relations with the tone keyword for searching in preset information table
Punctuation mark, and after the punctuation mark is added to the target text.
Optionally, the acquisition module includes:
Third acquisition submodule, for acquiring the voice messaging of user;
9th processing submodule, for carrying out noise reduction process to the voice messaging according to preset processing software.
In order to solve the above technical problems, the embodiment of the present invention also provides a kind of computer equipment, including memory and processing
Device is stored with computer-readable instruction in the memory, when the computer-readable instruction is executed by the processor, so that
The processor executes the step of voice conversion text method described above.
In order to solve the above technical problems, the embodiment of the present invention also provides a kind of storage Jie for being stored with computer-readable instruction
Matter, when the computer-readable instruction is executed by one or more processors, so that one or more processors execute above-mentioned institute
Predicate sound converts the step of text method.
The beneficial effect of the embodiment of the present invention is: voice messaging is segmented by preset punctuate rule, and according to
Voice messaging after segmentation is converted to text, can increase the readability of text by being segmented to text, avoid the occurrence of not
It is necessary to misread or ambiguity.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment
Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for
For those skilled in the art, without creative efforts, it can also be obtained according to these attached drawings other attached
Figure.
Fig. 1 is the basic procedure schematic diagram that voice of the embodiment of the present invention converts text method;
Fig. 2 is the base of the method provided in an embodiment of the present invention for being segmented voice messaging according to preset punctuate rule
This flow diagram;
Fig. 3 is the position provided in an embodiment of the present invention using decibel value less than default decibel value as the first of voice messaging
The basic procedure schematic diagram of the method for waypoint;
Fig. 4 is a kind of method for being segmented voice messaging according to preset punctuate rule provided in an embodiment of the present invention
Basic procedure schematic diagram;
Fig. 5 is the basic flow for the method that a kind of voice messaging by after segmentation provided in an embodiment of the present invention is converted to text
Journey schematic diagram;
Fig. 6 is the basic flow for the method that a kind of voice messaging by after segmentation provided in an embodiment of the present invention is converted to text
Journey schematic diagram;
Fig. 7 is that voice of the embodiment of the present invention converts text device basic structure block diagram;
Fig. 8 is computer equipment of embodiment of the present invention basic structure block diagram.
Specific embodiment
In order to enable those skilled in the art to better understand the solution of the present invention, below in conjunction in the embodiment of the present invention
Attached drawing, technical scheme in the embodiment of the invention is clearly and completely described.
In some processes of the description in description and claims of this specification and above-mentioned attached drawing, contain according to
Multiple operations that particular order occurs, but it should be clearly understood that these operations can not be what appears in this article suitable according to its
Sequence is executed or is executed parallel, and serial number of operation such as 101,102 etc. is only used for distinguishing each different operation, serial number
It itself does not represent and any executes sequence.In addition, these processes may include more or fewer operations, and these operations can
To execute or execute parallel in order.It should be noted that the description such as " first " herein, " second ", is for distinguishing not
Same message, equipment, module etc., does not represent sequencing, does not also limit " first " and " second " and be different type.
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, those skilled in the art's every other implementation obtained without creative efforts
Example, shall fall within the protection scope of the present invention.
Embodiment
Those skilled in the art of the present technique are appreciated that " terminal " used herein above, " terminal device " both include wireless communication
The equipment of number receiver, only has the equipment of the wireless signal receiver of non-emissive ability, and including receiving and emitting hardware
Equipment, have on bidirectional communication link, can execute two-way communication reception and emit hardware equipment.This equipment
It may include: honeycomb or other communication equipments, shown with single line display or multi-line display or without multi-line
The honeycomb of device or other communication equipments;PCS (Personal Communications Service, PCS Personal Communications System), can
With combine voice, data processing, fax and/or communication ability;PDA (PersonalDigital Assistant, it is personal
Digital assistants), it may include radio frequency receiver, pager, the Internet/intranet access, web browser, notepad, day
It goes through and/or GPS (Global Positioning System, global positioning system) receiver;Conventional laptop and/or palm
Type computer or other equipment, have and/or the conventional laptop including radio frequency receiver and/or palmtop computer or its
His equipment." terminal " used herein above, " terminal device " can be it is portable, can transport, be mounted on the vehicles (aviation,
Sea-freight and/or land) in, or be suitable for and/or be configured in local runtime, and/or with distribution form, operate in the earth
And/or any other position operation in space." terminal " used herein above, " terminal device " can also be communication terminal, on
Network termination, music/video playback terminal, such as can be PDA, MID (Mobile Internet Device, mobile Internet
Equipment) and/or mobile phone with music/video playing function, it is also possible to the equipment such as smart television, set-top box.
Client terminal in present embodiment is above-mentioned terminal.
Specifically, referring to Fig. 1, Fig. 1 is the basic procedure schematic diagram that the present embodiment voice converts text method.
As shown in Figure 1, voice conversion text method includes the following steps:
S1100, voice messaging to be processed is obtained;
Voice messaging to be processed is the voice messaging for needing to be converted into text information, it is generally the case that in order to improve text
The accuracy of word conversion, voice messaging to be processed are generally the voice messaging for passing through noise reduction process.Specifically, it obtains to be processed
Voice messaging include: acquire user voice messaging, and according to preset processing software to voice messaging carry out noise reduction process.
Can be by the voice recording module input voice information of terminal built-in when acquiring the voice messaging of user, it can also
To be obtained by downloading or receiving the voice messaging that other terminals are sent.It, can when carrying out noise reduction process to voice messaging
With using preset audio processing software handled, for example, Adobe Audition CS6, VinylStudio etc..
S1200, voice messaging is segmented according to preset punctuate rule;
Preset punctuate rule is the preset rule for being segmented voice messaging, for example, according to voice messaging
In the position of pause voice messaging is divided into multistage, when occurring the voice of multiple roles in voice messaging, according to tone color pair
Voice messaging is segmented.
S1300, the voice messaging after segmentation is converted into text.
In the embodiment of the present invention, the voice messaging after segmentation can be turned by text conversion software built-in in terminal
It is changed to text, for example, SwiftScribe software.
Voice messaging is segmented by the method for above-mentioned voice conversion text by preset punctuate rule, and according to dividing
Voice messaging after section is converted to text, can increase the readability of text by being segmented to text, avoiding the occurrence of need not
Want misread or ambiguity.
In practical applications, when user passes through terminal input voice information, it will appear according to the habit of speaking of people and stop
, therefore, in order to which voice messaging is made pauses in reading unpunctuated ancient writings according to habit of naturally speaking, the embodiment of the invention provides one kind according to pre-
If the punctuate rule method that is segmented voice messaging, as shown in Fig. 2, Fig. 2 is provided in an embodiment of the present invention according to pre-
If the punctuate rule basic procedure schematic diagram of method that is segmented voice messaging.
Specifically, as shown in Fig. 2, step S1200 specifically include the following steps:
Decibel value in S1211, detection voice messaging;
S1212, the position when the decibel value in voice messaging is less than default decibel value, by decibel value less than default decibel value
Set the first waypoint as voice messaging;
In the embodiment of the present invention, terminal detects the decibel value of voice messaging by preset decibel inspection software, for example,
Sound meter 2.0, Digital Sound Meter etc..
Preset decibel value be it is preset, when occur in voice messaging user speak pause when this at decibel value compared with
It is low, in view of, there are noise, preset decibel value being set as in environment in view of the decibel value for being higher than environmental noise and lower than just
Sound when often speaking.
Decibel value is less than the position of default decibel value as the of voice messaging the embodiment of the invention also provides a kind of
The method of one waypoint, as shown in figure 3, Fig. 3 is the position provided in an embodiment of the present invention that decibel value is less than to default decibel value
The basic procedure schematic diagram of the method for the first waypoint as voice messaging.
Specifically, as shown in figure 3, step S1212 includes the following steps:
S12121, judge that decibel value is less than the voice duration for presetting decibel value in voice messaging;
In practical applications, there are time intervals between each word when being spoken due to people, under normal circumstances, complete
Pause after sentence has been expressed is only segmentation in the embodiment of the present invention, therefore, during determining the first waypoint, when
When the decibel value deposited at a time in voice messaging is less than preset decibel value, the voice using the moment as starting point is judged
When decibel value is respectively less than default decibel value in segment, whether the duration of the sound bite is greater than preset duration.The present invention is implemented
Voice duration in example is the duration that above-mentioned decibel value is less than the sound bite of default decibel value.
S12122, when voice duration is greater than preset duration, obtain the sound bite that decibel value is less than default decibel value;
In the embodiment of the present invention, when obtaining sound bite of the decibel value less than default decibel value, the voice sheet need to be only obtained
Time of the section in voice messaging is unknown.
S12123, using any time in sound bite as the first waypoint.
Above content is illustrated below, for example, decibel inspection software detects point in voice messaging at 2s
Shellfish is lower than default decibel value, at this point, being starting point from 2S, i.e., the decibel value of the sound bite of 2-3S is below pre- in voice messaging
If decibel value, judge that the duration 1S of the sound bite of 2-3S is greater than preset duration 0.5S, accordingly, it is determined that in the sound bite of 2-3S
Any time at i.e. 2.5S, can be used as the first waypoint at 3S etc..
In this way, the first waypoint can be determined accurately, the problem of arbitrarily disconnecting sentence appearance is avoided the occurrence of.
S1213, voice messaging is segmented according to the first waypoint.
In the embodiment of the present invention, according to the first waypoint, i.e. decibel value is respectively less than in the sound bite of default decibel value and appoints
Voice messaging is divided into multiple sound bites by the point at meaning moment.It should be noted that being pressed after segmentation to each sound bite
It is ranked up according to the position originally in voice messaging, to keep continuity.
In practical applications, the scene that often will appear more people's chats in voice messaging, for example, interview, minutes etc..
In this case, one kind is provided according to preset punctuate rule in order to enhance the embodiment of the present invention of the text information after conversion
The method that voice messaging is segmented, as shown in figure 4, Fig. 4 is one kind provided in an embodiment of the present invention according to preset punctuate
The basic procedure schematic diagram for the method that voice messaging is segmented by rule.
Specifically, as shown in figure 4, step S1220 includes the following steps:
S1221, judge tone color variation whether occurs in voice messaging;
Whether can use in built-in tone color inspection software detection voice messaging in the embodiment of the present invention has tone color variation,
For example, Polyphone software etc..
S1222, when tone color variation occurs in voice messaging, using the position of tone color variation as second point of voice messaging
Duan Dian;
S1223, voice messaging is segmented according to the second waypoint.
In the embodiment of the present invention, when in voice messaging occur tone color variation when, extract the color change of voice messaging middle pitch when
Between point, and using the time point as the second waypoint of voice messaging.
It is to be understood that the same voice messaging will appear embodiment shown in Fig. 2 and this implementation in practical applications
Situation in example, that is, the case where existing simultaneously the first waypoint and the second waypoint, at this point, according to the first waypoint and second point
Section point is segmented voice messaging, is ranked up with and to sound bite after segmentation according to the sequence in voice messaging,
To avoid the situation for the text confusion after segmentation occur.
When occurring multiple roles (i.e. tone color) in voice messaging, voice messaging is divided into multistage according to the second waypoint,
And it after the sound bite after segmentation is converted to text, due to not knowing about role during readers ' reading, is easy to mix up
What is said or talked about by each role, and in order to solve this problem, the embodiment of the invention provides a kind of voice messagings by after segmentation to turn
The method for being changed to text, as shown in figure 5, Fig. 5 is that a kind of voice messaging by after segmentation provided in an embodiment of the present invention is converted to
The basic procedure schematic diagram of the method for text.
Specifically, as shown in figure 5, step S1300 the following steps are included:
S1311, tone color label is carried out to the voice messaging with identical tone color after segmentation;
When in voice messaging including multiple roles, i.e., when different tone color, tone color is marked, for example, voice is believed
It include two role A and B in breath, A is marked using a, and B is marked using b, and then is distinguished to role.
S1312, the voice being segmented after label is converted to by text by preset speech software;
S1313, role's label is carried out to the text after conversion according to tone color label.
In the embodiment of the present invention, every section of voice after segmentation is converted in sequence by built-in speech software
Text, and role is remembered in the section head of every section of text, it so that reader is apparent from which role every section of text is by
It says, improves the readability of text.
In practical applications, in order to increase conversion after text readability, provide good reading experience, this hair for reader
Bright embodiment provides another method that voice messaging after segmentation is converted to text, as shown in fig. 6, Fig. 6 is that the present invention is real
Apply the basic procedure schematic diagram that a kind of voice messaging by after segmentation that example provides is converted to the method for text.
Specifically, as shown in fig. 6, step S1300 includes the following steps:
S1321, the voice after segmentation is converted to by target text by preset speech software;
Speech software includes SwiftScribe, the softwares such as IBM Viavoice.
Tone keyword in S1322, acquisition target text;
In the embodiment of the present invention, terminal is preset with tone vocabulary database, when obtaining the tone keyword of target text,
Vocabulary in tone vocabulary database is compared terminal with the vocabulary in target text, when existing in target text and the tone
When the identical vocabulary of vocabulary database, the vocabulary is extracted, and using the vocabulary as the tone keyword of target text.
S1323, the punctuation mark that there are mapping relations with tone keyword is searched in preset information table, and by punctuate
After symbol is added to target text.
The corresponding relationship of tone vocabulary and punctuation mark is described in information table, for example, occurring usually in sentence " assorted
" when indicate question, should use in end of the sentence "? ", when " " occurs in sentence end, generally sigh with feeling, it should be answered in end of the sentence
The use "!".In this way, in information table tone vocabulary " what " with "? " with mapping relations, " " with "!" there is mapping to close
System.
In practical applications, according to the difference of context, the mood of different tone lexical representations is different, for example,
" " also may indicate that question.Addition punctuation mark in order to be more accurate, can also structure and context language to sentence
Justice is analyzed, and to determine the punctuation mark of end of the sentence, details are not described herein.
The embodiment of the present invention also provides a kind of voice conversion text device to solve above-mentioned technical problem.Referring specifically to figure
7, Fig. 7 convert text device basic structure block diagram for the present embodiment voice.
As shown in fig. 7, a kind of voice converts text device, comprising: obtain module 2100, processing module 2200 and execute mould
Block 2300.Wherein, module 2100 is obtained, for obtaining voice messaging to be processed;Processing module 2200, for according to preset
The voice messaging is segmented by punctuate rule;Execution module 2300, for the voice messaging after segmentation to be converted to text.
Voice converts text device and is segmented voice messaging by preset punctuate rule, and according to the language after segmentation
Message breath is converted to text, can increase the readability of text by being segmented to text, avoid the occurrence of unnecessary misread
Or ambiguity.
In some embodiments, the processing module includes: the first processing submodule, for detecting the voice messaging
In decibel value;Second processing submodule will be described for when the decibel value in the voice messaging is less than default decibel value
Decibel value is less than first waypoint of the position of the default decibel value as the voice messaging;First implementation sub-module is used
The voice messaging is segmented according to first waypoint.
In some embodiments, the second processing submodule includes: third processing submodule, the predicate for judging
Decibel value is less than the voice duration of the default decibel value in message breath;First acquisition submodule, for working as the voice duration
When greater than preset duration, the sound bite that decibel value is less than the default decibel value is obtained;Second implementation sub-module is used for institute
Any time in sound bite is stated as first waypoint.
In some embodiments, the processing module includes: fourth process submodule, for judging the voice messaging
In whether occur tone color variation;5th processing submodule, for when tone color variation occurs in the voice messaging, tone color to be become
Second waypoint of the position of change as the voice messaging;Third implementation sub-module, for according to second waypoint pair
The voice messaging is segmented.
In some embodiments, the execution module includes: the 6th processing submodule, for having phase to after segmentation
Voice messaging with tone color carries out tone color label;7th processing submodule, for that will be marked by preset speech software
The voice being segmented afterwards is converted to text;4th implementation sub-module, for according to the tone color label to the text after conversion into
Row role label.
In some embodiments, the execution module includes: the 8th processing submodule, for being turned by preset voice
It changes software and the voice after segmentation is converted into target text;Second acquisition submodule, for obtaining the language in the target text
Air to close keyword;5th implementation sub-module has mapping relations with the tone keyword for searching in preset information table
Punctuation mark, and after the punctuation mark is added to the target text.
In some embodiments, the acquisition module includes: third acquisition submodule, and the voice for acquiring user is believed
Breath;9th processing submodule, for carrying out noise reduction process to the voice messaging according to preset processing software.
In order to solve the above technical problems, the embodiment of the present invention also provides computer equipment.It is this referring specifically to Fig. 8, Fig. 8
Embodiment computer equipment basic structure block diagram.
As shown in figure 8, the schematic diagram of internal structure of computer equipment.As shown in figure 8, the computer equipment includes passing through to be
Processor, non-volatile memory medium, memory and the network interface of bus of uniting connection.Wherein, the computer equipment is non-easy
The property lost storage medium is stored with operating system, database and computer-readable instruction, can be stored with control information sequence in database
Column when the computer-readable instruction is executed by processor, may make processor to realize a kind of voice conversion text method.The calculating
The processor of machine equipment supports the operation of entire computer equipment for providing calculating and control ability.The computer equipment
It can be stored with computer-readable instruction in memory, when which is executed by processor, processor may make to hold
A kind of voice of row converts text method.The network interface of the computer equipment is used for and terminal connection communication.Those skilled in the art
Member is appreciated that structure shown in Fig. 8, only the block diagram of part-structure relevant to application scheme, composition pair
The restriction for the computer equipment that application scheme is applied thereon, specific computer equipment may include than as shown in the figure more
More or less component perhaps combines certain components or with different component layouts.
Processor obtains module 2100, processing module 2200 and execution module for executing in present embodiment in Fig. 7
2300 particular content, program code and Various types of data needed for memory is stored with the above-mentioned module of execution.Network interface is used for
To the data transmission between user terminal or server.Memory in present embodiment is stored in voice conversion text method
Program code needed for executing all submodules and data, server is capable of the program code of invoking server and data execute institute
There is the function of submodule.
Voice messaging is segmented by computer equipment by preset punctuate rule, and according to the voice messaging after segmentation
Text is converted to, the readability of text can be increased by being segmented to text, avoids the occurrence of unnecessary misread or discrimination
Justice.
The present invention also provides a kind of storage mediums for being stored with computer-readable instruction, and the computer-readable instruction is by one
When a or multiple processors execute, so that one or more processors execute voice described in any of the above-described embodiment and convert text side
The step of method.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with
Relevant hardware is instructed to complete by computer program, which can be stored in a computer-readable storage and be situated between
In matter, the program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, storage medium above-mentioned can be
The non-volatile memory mediums such as magnetic disk, CD, read-only memory (Read-Only Memory, ROM) or random storage note
Recall body (Random Access Memory, RAM) etc..
It should be understood that although each step in the flow chart of attached drawing is successively shown according to the instruction of arrow,
These steps are not that the inevitable sequence according to arrow instruction successively executes.Unless expressly stating otherwise herein, these steps
Execution there is no stringent sequences to limit, can execute in the other order.Moreover, at least one in the flow chart of attached drawing
Part steps may include that perhaps these sub-steps of multiple stages or stage are not necessarily in synchronization to multiple sub-steps
Completion is executed, but can be executed at different times, execution sequence, which is also not necessarily, successively to be carried out, but can be with other
At least part of the sub-step or stage of step or other steps executes in turn or alternately.
The above is only some embodiments of the invention, it is noted that for the ordinary skill people of the art
For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered
It is considered as protection scope of the present invention.
Claims (10)
1. a kind of method of voice conversion text, which is characterized in that include the following steps:
Obtain voice messaging to be processed;
The voice messaging is segmented according to preset punctuate rule;
Voice messaging after segmentation is converted into text.
2. the method for voice conversion text according to claim 1, which is characterized in that described according to preset punctuate rule
The voice messaging is segmented, comprising:
Detect the decibel value in the voice messaging;
When the decibel value in the voice messaging is less than default decibel value, the decibel value is less than the default decibel value
First waypoint of the position as the voice messaging;
The voice messaging is segmented according to first waypoint.
3. the method for voice conversion text according to claim 2, which is characterized in that described that the decibel value is less than institute
State first waypoint of the position as the voice messaging of default decibel value, comprising:
Judge that decibel value is less than the voice duration of the default decibel value in the voice messaging;
When the voice duration is greater than preset duration, the sound bite that decibel value is less than the default decibel value is obtained;
Using any time in the sound bite as first waypoint.
4. the method for voice conversion text according to claim 1, which is characterized in that described according to preset punctuate rule
The voice messaging is segmented, comprising:
Judge tone color variation whether occurs in the voice messaging;
When tone color variation occurs in the voice messaging, using the position of tone color variation as the second segmentation of the voice messaging
Point;
The voice messaging is segmented according to second waypoint.
5. the method for voice conversion text according to claim 4, which is characterized in that the voice messaging by after segmentation
Be converted to text, comprising:
Tone color label is carried out to the voice messaging with identical tone color after segmentation;
The voice being segmented after label is converted into text by preset speech software;
Role's label is carried out to the text after conversion according to tone color label.
6. the method for voice conversion text according to claim 1, which is characterized in that the voice messaging by after segmentation
Be converted to text, comprising:
The voice after segmentation is converted into target text by preset speech software;
Obtain the tone keyword in the target text;
The punctuation mark that there are mapping relations with the tone keyword is searched in preset information table, and the punctuate is accorded with
After number being added to the target text.
7. the method for voice conversion text according to claim 1, which is characterized in that described to obtain voice letter to be processed
Breath, comprising:
Acquire the voice messaging of user;
Noise reduction process is carried out to the voice messaging according to preset processing software.
8. a kind of voice converts text device characterized by comprising
Module is obtained, for obtaining voice messaging to be processed;
Processing module, for being segmented the voice messaging according to preset punctuate rule;
Execution module, for the voice messaging after segmentation to be converted to text.
9. a kind of computer equipment, including memory and processor, it is stored with computer-readable instruction in the memory, it is described
When computer-readable instruction is executed by the processor, so that the processor executes such as any one of claims 1 to 7 right
It is required that the step of voice conversion text method.
10. a kind of storage medium for being stored with computer-readable instruction, the computer-readable instruction is handled by one or more
When device executes, so that one or more processors execute the voice conversion text as described in any one of claims 1 to 7 claim
The step of word method.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811526588.9A CN109754808B (en) | 2018-12-13 | 2018-12-13 | Method, device, computer equipment and storage medium for converting voice into text |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811526588.9A CN109754808B (en) | 2018-12-13 | 2018-12-13 | Method, device, computer equipment and storage medium for converting voice into text |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109754808A true CN109754808A (en) | 2019-05-14 |
CN109754808B CN109754808B (en) | 2024-02-13 |
Family
ID=66403800
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811526588.9A Active CN109754808B (en) | 2018-12-13 | 2018-12-13 | Method, device, computer equipment and storage medium for converting voice into text |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109754808B (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110335612A (en) * | 2019-07-11 | 2019-10-15 | 招商局金融科技有限公司 | Minutes generation method, device and storage medium based on speech recognition |
CN110827825A (en) * | 2019-11-11 | 2020-02-21 | 广州国音智能科技有限公司 | Punctuation prediction method, system, terminal and storage medium for speech recognition text |
CN112151042A (en) * | 2019-06-27 | 2020-12-29 | 中国电信股份有限公司 | Voiceprint recognition method, device and system and computer readable storage medium |
CN113408996A (en) * | 2020-03-16 | 2021-09-17 | 上海博泰悦臻网络技术服务有限公司 | Schedule management method, schedule management device and computer readable storage medium |
Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1770260A (en) * | 2004-11-01 | 2006-05-10 | 英业达股份有限公司 | Speech waveform processing system and method |
CN101178790A (en) * | 2006-11-10 | 2008-05-14 | 胡鹏 | Method for realizing synergic listen and type recording method by intelligent virtual punctuate |
CN102522084A (en) * | 2011-12-22 | 2012-06-27 | 广东威创视讯科技股份有限公司 | Method and system for converting voice data into text files |
CN102903361A (en) * | 2012-10-15 | 2013-01-30 | Itp创新科技有限公司 | Instant call translation system and instant call translation method |
CN104050160A (en) * | 2014-03-12 | 2014-09-17 | 北京紫冬锐意语音科技有限公司 | Machine and human translation combined spoken language translation method and device |
CN104142915A (en) * | 2013-05-24 | 2014-11-12 | 腾讯科技(深圳)有限公司 | Punctuation adding method and system |
CN105609107A (en) * | 2015-12-23 | 2016-05-25 | 北京奇虎科技有限公司 | Text processing method and device based on voice identification |
CN106504746A (en) * | 2016-10-28 | 2017-03-15 | 普强信息技术(北京)有限公司 | A kind of method for extracting structuring traffic information from speech data |
CN106656767A (en) * | 2017-01-09 | 2017-05-10 | 武汉斗鱼网络科技有限公司 | Method and system for increasing new anchor retention |
CN106971723A (en) * | 2017-03-29 | 2017-07-21 | 北京搜狗科技发展有限公司 | Method of speech processing and device, the device for speech processes |
CN108141498A (en) * | 2015-11-25 | 2018-06-08 | 华为技术有限公司 | A kind of interpretation method and terminal |
CN108132995A (en) * | 2017-12-20 | 2018-06-08 | 北京百度网讯科技有限公司 | For handling the method and apparatus of audio-frequency information |
CN108447486A (en) * | 2018-02-28 | 2018-08-24 | 科大讯飞股份有限公司 | A kind of voice translation method and device |
CN108831481A (en) * | 2018-08-01 | 2018-11-16 | 平安科技(深圳)有限公司 | Symbol adding method, device, computer equipment and storage medium in speech recognition |
-
2018
- 2018-12-13 CN CN201811526588.9A patent/CN109754808B/en active Active
Patent Citations (14)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN1770260A (en) * | 2004-11-01 | 2006-05-10 | 英业达股份有限公司 | Speech waveform processing system and method |
CN101178790A (en) * | 2006-11-10 | 2008-05-14 | 胡鹏 | Method for realizing synergic listen and type recording method by intelligent virtual punctuate |
CN102522084A (en) * | 2011-12-22 | 2012-06-27 | 广东威创视讯科技股份有限公司 | Method and system for converting voice data into text files |
CN102903361A (en) * | 2012-10-15 | 2013-01-30 | Itp创新科技有限公司 | Instant call translation system and instant call translation method |
CN104142915A (en) * | 2013-05-24 | 2014-11-12 | 腾讯科技(深圳)有限公司 | Punctuation adding method and system |
CN104050160A (en) * | 2014-03-12 | 2014-09-17 | 北京紫冬锐意语音科技有限公司 | Machine and human translation combined spoken language translation method and device |
CN108141498A (en) * | 2015-11-25 | 2018-06-08 | 华为技术有限公司 | A kind of interpretation method and terminal |
CN105609107A (en) * | 2015-12-23 | 2016-05-25 | 北京奇虎科技有限公司 | Text processing method and device based on voice identification |
CN106504746A (en) * | 2016-10-28 | 2017-03-15 | 普强信息技术(北京)有限公司 | A kind of method for extracting structuring traffic information from speech data |
CN106656767A (en) * | 2017-01-09 | 2017-05-10 | 武汉斗鱼网络科技有限公司 | Method and system for increasing new anchor retention |
CN106971723A (en) * | 2017-03-29 | 2017-07-21 | 北京搜狗科技发展有限公司 | Method of speech processing and device, the device for speech processes |
CN108132995A (en) * | 2017-12-20 | 2018-06-08 | 北京百度网讯科技有限公司 | For handling the method and apparatus of audio-frequency information |
CN108447486A (en) * | 2018-02-28 | 2018-08-24 | 科大讯飞股份有限公司 | A kind of voice translation method and device |
CN108831481A (en) * | 2018-08-01 | 2018-11-16 | 平安科技(深圳)有限公司 | Symbol adding method, device, computer equipment and storage medium in speech recognition |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112151042A (en) * | 2019-06-27 | 2020-12-29 | 中国电信股份有限公司 | Voiceprint recognition method, device and system and computer readable storage medium |
CN110335612A (en) * | 2019-07-11 | 2019-10-15 | 招商局金融科技有限公司 | Minutes generation method, device and storage medium based on speech recognition |
CN110827825A (en) * | 2019-11-11 | 2020-02-21 | 广州国音智能科技有限公司 | Punctuation prediction method, system, terminal and storage medium for speech recognition text |
CN113408996A (en) * | 2020-03-16 | 2021-09-17 | 上海博泰悦臻网络技术服务有限公司 | Schedule management method, schedule management device and computer readable storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN109754808B (en) | 2024-02-13 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109754808A (en) | Method, apparatus, computer equipment and the storage medium of voice conversion text | |
US20240021202A1 (en) | Method and apparatus for recognizing voice, electronic device and medium | |
US20190096402A1 (en) | Method and apparatus for extracting information | |
CN111625635A (en) | Question-answer processing method, language model training method, device, equipment and storage medium | |
CN112634876B (en) | Speech recognition method, device, storage medium and electronic equipment | |
CN111951780B (en) | Multitasking model training method for speech synthesis and related equipment | |
CN103871401A (en) | Method for voice recognition and electronic equipment | |
CN108768824B (en) | Information processing method and device | |
CN111401071A (en) | Model training method and device, computer equipment and readable storage medium | |
CN111312231A (en) | Audio detection method and device, electronic equipment and readable storage medium | |
CN110347866B (en) | Information processing method, information processing device, storage medium and electronic equipment | |
CN112906381B (en) | Dialog attribution identification method and device, readable medium and electronic equipment | |
CN111767740A (en) | Sound effect adding method and device, storage medium and electronic equipment | |
CN112765460A (en) | Conference information query method, device, storage medium, terminal device and server | |
CN112668333A (en) | Named entity recognition method and device, and computer-readable storage medium | |
CN111667810A (en) | Method and device for acquiring polyphone corpus, readable medium and electronic equipment | |
CN111738791B (en) | Text processing method, device, equipment and storage medium | |
CN110245334B (en) | Method and device for outputting information | |
CN110232920B (en) | Voice processing method and device | |
CN111555960A (en) | Method for generating information | |
CN111444321B (en) | Question answering method, device, electronic equipment and storage medium | |
CN114242047A (en) | Voice processing method and device, electronic equipment and storage medium | |
CN109768913A (en) | Information processing method, device, computer equipment and storage medium | |
CN115116427B (en) | Labeling method, voice synthesis method, training method and training device | |
CN116629236A (en) | Backlog extraction method, device, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |