CN109754808A - Method, apparatus, computer equipment and the storage medium of voice conversion text - Google Patents

Method, apparatus, computer equipment and the storage medium of voice conversion text Download PDF

Info

Publication number
CN109754808A
CN109754808A CN201811526588.9A CN201811526588A CN109754808A CN 109754808 A CN109754808 A CN 109754808A CN 201811526588 A CN201811526588 A CN 201811526588A CN 109754808 A CN109754808 A CN 109754808A
Authority
CN
China
Prior art keywords
voice messaging
voice
text
decibel value
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811526588.9A
Other languages
Chinese (zh)
Other versions
CN109754808B (en
Inventor
胡大兵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201811526588.9A priority Critical patent/CN109754808B/en
Publication of CN109754808A publication Critical patent/CN109754808A/en
Application granted granted Critical
Publication of CN109754808B publication Critical patent/CN109754808B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Machine Translation (AREA)
  • Telephonic Communication Services (AREA)

Abstract

The embodiment of the invention discloses a kind of voice conversion text method, apparatus, computer equipment and storage mediums, include the following steps: to obtain voice messaging to be processed;The voice messaging is segmented according to preset punctuate rule;Voice messaging after segmentation is converted into text.Voice messaging is segmented by preset punctuate rule, and is converted to text according to the voice messaging after segmentation, the readability of text can be increased by being segmented to text, avoids the occurrence of unnecessary misread or ambiguity.

Description

Method, apparatus, computer equipment and the storage medium of voice conversion text
Technical field
The present embodiments relate to financial field, especially a kind of voice conversion text method, apparatus, computer equipment and Storage medium.
Background technique
Speech recognition is that an application more develops very rapidly, in industry, household electrical appliances, communication, automotive electronics, medical treatment, family The every field such as front yard service, consumption electronic product have wide application scenarios.Field involved in speech recognition technology includes: Signal processing, pattern-recognition, probability theory and information theory, sound generating mechanism and hearing mechanism, artificial intelligence etc..
In the prior art, text can be converted speech by speech recognition.But during identification, when with Family is spoken coherent or in some scenes, and when multiple people talk, after converting speech into text, text is not made pauses in reading unpunctuated ancient writings, and not It can distinguish speaker, there are ambiguity or misunderstandings for the text after causing voice to be converted.
Summary of the invention
The embodiment of the present invention provides a kind of voice conversion text method, apparatus, computer equipment and storage medium.
In order to solve the above technical problems, the technical solution that the embodiment of the invention uses is: providing a kind of language Sound converts text method, includes the following steps:
Obtain voice messaging to be processed;
The voice messaging is segmented according to preset punctuate rule;
Voice messaging after segmentation is converted into text.
It is optionally, described to be segmented the voice messaging according to preset punctuate rule, comprising:
Detect the decibel value in the voice messaging;
When the decibel value in the voice messaging is less than default decibel value, the decibel value is less than the default decibel First waypoint of the position of value as the voice messaging;
The voice messaging is segmented according to first waypoint.
Optionally, described that the decibel value is less than the position of the default decibel value as the first of the voice messaging Waypoint, comprising:
Judge that decibel value is less than the voice duration of the default decibel value in the voice messaging;
When the voice duration is greater than preset duration, the sound bite that decibel value is less than the default decibel value is obtained;
Using any time in the sound bite as first waypoint.
It is optionally, described to be segmented the voice messaging according to preset punctuate rule, comprising:
Judge tone color variation whether occurs in the voice messaging;
When tone color variation occurs in the voice messaging, using the position of tone color variation as the second of the voice messaging Waypoint;
The voice messaging is segmented according to second waypoint.
Optionally, the voice messaging by after segmentation is converted to text, comprising:
Tone color label is carried out to the voice messaging with identical tone color after segmentation;
The voice being segmented after label is converted into text by preset speech software;
Role's label is carried out to the text after conversion according to tone color label.
Optionally, the voice messaging by after segmentation is converted to text, comprising:
The voice after segmentation is converted into target text by preset speech software;
Obtain the tone keyword in the target text;
The punctuation mark that there are mapping relations with the tone keyword is searched in preset information table, and by the mark After point symbol is added to the target text.
It is optionally, described to obtain voice messaging to be processed, comprising:
Acquire the voice messaging of user;
Noise reduction process is carried out to the voice messaging according to preset processing software.
In order to solve the above technical problems, the embodiment of the present invention also provides a kind of voice conversion text device, comprising:
Module is obtained, for obtaining voice messaging to be processed;
Processing module, for being segmented the voice messaging according to preset punctuate rule;
Execution module, for the voice messaging after segmentation to be converted to text.
Optionally, the processing module includes:
First processing submodule, for detecting the decibel value in the voice messaging;
Second processing submodule, for that described will divide when the decibel value in the voice messaging is less than default decibel value Shellfish value is less than first waypoint of the position of the default decibel value as the voice messaging;
First implementation sub-module, for being segmented according to first waypoint to the voice messaging.
Optionally, the second processing submodule includes:
Third handles submodule, when for judging that decibel value is less than the voice of the default decibel value in the voice messaging It is long;
First acquisition submodule, for obtaining decibel value less than described pre- when the voice duration is greater than preset duration If the sound bite of decibel value;
Second implementation sub-module, for using any time in the sound bite as first waypoint.
Optionally, the processing module includes:
Fourth process submodule, for judging tone color variation whether occurs in the voice messaging;
5th processing submodule, for when tone color variation occurs in the voice messaging, the position of tone color variation to be made For the second waypoint of the voice messaging;
Third implementation sub-module, for being segmented according to second waypoint to the voice messaging.
Optionally, the execution module includes:
6th processing submodule, for carrying out tone color label to the voice messaging with identical tone color after segmentation;
7th processing submodule, for the voice being segmented after label to be converted to text by preset speech software Word;
4th implementation sub-module, for carrying out role's label to the text after conversion according to tone color label.
Optionally, the execution module includes:
8th processing submodule, for the voice after segmentation to be converted to target text by preset speech software Word;
Second acquisition submodule, for obtaining the tone keyword in the target text;
5th implementation sub-module has mapping relations with the tone keyword for searching in preset information table Punctuation mark, and after the punctuation mark is added to the target text.
Optionally, the acquisition module includes:
Third acquisition submodule, for acquiring the voice messaging of user;
9th processing submodule, for carrying out noise reduction process to the voice messaging according to preset processing software.
In order to solve the above technical problems, the embodiment of the present invention also provides a kind of computer equipment, including memory and processing Device is stored with computer-readable instruction in the memory, when the computer-readable instruction is executed by the processor, so that The processor executes the step of voice conversion text method described above.
In order to solve the above technical problems, the embodiment of the present invention also provides a kind of storage Jie for being stored with computer-readable instruction Matter, when the computer-readable instruction is executed by one or more processors, so that one or more processors execute above-mentioned institute Predicate sound converts the step of text method.
The beneficial effect of the embodiment of the present invention is: voice messaging is segmented by preset punctuate rule, and according to Voice messaging after segmentation is converted to text, can increase the readability of text by being segmented to text, avoid the occurrence of not It is necessary to misread or ambiguity.
Detailed description of the invention
To describe the technical solutions in the embodiments of the present invention more clearly, make required in being described below to embodiment Attached drawing is briefly described, it should be apparent that, drawings in the following description are only some embodiments of the invention, for For those skilled in the art, without creative efforts, it can also be obtained according to these attached drawings other attached Figure.
Fig. 1 is the basic procedure schematic diagram that voice of the embodiment of the present invention converts text method;
Fig. 2 is the base of the method provided in an embodiment of the present invention for being segmented voice messaging according to preset punctuate rule This flow diagram;
Fig. 3 is the position provided in an embodiment of the present invention using decibel value less than default decibel value as the first of voice messaging The basic procedure schematic diagram of the method for waypoint;
Fig. 4 is a kind of method for being segmented voice messaging according to preset punctuate rule provided in an embodiment of the present invention Basic procedure schematic diagram;
Fig. 5 is the basic flow for the method that a kind of voice messaging by after segmentation provided in an embodiment of the present invention is converted to text Journey schematic diagram;
Fig. 6 is the basic flow for the method that a kind of voice messaging by after segmentation provided in an embodiment of the present invention is converted to text Journey schematic diagram;
Fig. 7 is that voice of the embodiment of the present invention converts text device basic structure block diagram;
Fig. 8 is computer equipment of embodiment of the present invention basic structure block diagram.
Specific embodiment
In order to enable those skilled in the art to better understand the solution of the present invention, below in conjunction in the embodiment of the present invention Attached drawing, technical scheme in the embodiment of the invention is clearly and completely described.
In some processes of the description in description and claims of this specification and above-mentioned attached drawing, contain according to Multiple operations that particular order occurs, but it should be clearly understood that these operations can not be what appears in this article suitable according to its Sequence is executed or is executed parallel, and serial number of operation such as 101,102 etc. is only used for distinguishing each different operation, serial number It itself does not represent and any executes sequence.In addition, these processes may include more or fewer operations, and these operations can To execute or execute parallel in order.It should be noted that the description such as " first " herein, " second ", is for distinguishing not Same message, equipment, module etc., does not represent sequencing, does not also limit " first " and " second " and be different type.
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation description, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, those skilled in the art's every other implementation obtained without creative efforts Example, shall fall within the protection scope of the present invention.
Embodiment
Those skilled in the art of the present technique are appreciated that " terminal " used herein above, " terminal device " both include wireless communication The equipment of number receiver, only has the equipment of the wireless signal receiver of non-emissive ability, and including receiving and emitting hardware Equipment, have on bidirectional communication link, can execute two-way communication reception and emit hardware equipment.This equipment It may include: honeycomb or other communication equipments, shown with single line display or multi-line display or without multi-line The honeycomb of device or other communication equipments;PCS (Personal Communications Service, PCS Personal Communications System), can With combine voice, data processing, fax and/or communication ability;PDA (PersonalDigital Assistant, it is personal Digital assistants), it may include radio frequency receiver, pager, the Internet/intranet access, web browser, notepad, day It goes through and/or GPS (Global Positioning System, global positioning system) receiver;Conventional laptop and/or palm Type computer or other equipment, have and/or the conventional laptop including radio frequency receiver and/or palmtop computer or its His equipment." terminal " used herein above, " terminal device " can be it is portable, can transport, be mounted on the vehicles (aviation, Sea-freight and/or land) in, or be suitable for and/or be configured in local runtime, and/or with distribution form, operate in the earth And/or any other position operation in space." terminal " used herein above, " terminal device " can also be communication terminal, on Network termination, music/video playback terminal, such as can be PDA, MID (Mobile Internet Device, mobile Internet Equipment) and/or mobile phone with music/video playing function, it is also possible to the equipment such as smart television, set-top box.
Client terminal in present embodiment is above-mentioned terminal.
Specifically, referring to Fig. 1, Fig. 1 is the basic procedure schematic diagram that the present embodiment voice converts text method.
As shown in Figure 1, voice conversion text method includes the following steps:
S1100, voice messaging to be processed is obtained;
Voice messaging to be processed is the voice messaging for needing to be converted into text information, it is generally the case that in order to improve text The accuracy of word conversion, voice messaging to be processed are generally the voice messaging for passing through noise reduction process.Specifically, it obtains to be processed Voice messaging include: acquire user voice messaging, and according to preset processing software to voice messaging carry out noise reduction process.
Can be by the voice recording module input voice information of terminal built-in when acquiring the voice messaging of user, it can also To be obtained by downloading or receiving the voice messaging that other terminals are sent.It, can when carrying out noise reduction process to voice messaging With using preset audio processing software handled, for example, Adobe Audition CS6, VinylStudio etc..
S1200, voice messaging is segmented according to preset punctuate rule;
Preset punctuate rule is the preset rule for being segmented voice messaging, for example, according to voice messaging In the position of pause voice messaging is divided into multistage, when occurring the voice of multiple roles in voice messaging, according to tone color pair Voice messaging is segmented.
S1300, the voice messaging after segmentation is converted into text.
In the embodiment of the present invention, the voice messaging after segmentation can be turned by text conversion software built-in in terminal It is changed to text, for example, SwiftScribe software.
Voice messaging is segmented by the method for above-mentioned voice conversion text by preset punctuate rule, and according to dividing Voice messaging after section is converted to text, can increase the readability of text by being segmented to text, avoiding the occurrence of need not Want misread or ambiguity.
In practical applications, when user passes through terminal input voice information, it will appear according to the habit of speaking of people and stop , therefore, in order to which voice messaging is made pauses in reading unpunctuated ancient writings according to habit of naturally speaking, the embodiment of the invention provides one kind according to pre- If the punctuate rule method that is segmented voice messaging, as shown in Fig. 2, Fig. 2 is provided in an embodiment of the present invention according to pre- If the punctuate rule basic procedure schematic diagram of method that is segmented voice messaging.
Specifically, as shown in Fig. 2, step S1200 specifically include the following steps:
Decibel value in S1211, detection voice messaging;
S1212, the position when the decibel value in voice messaging is less than default decibel value, by decibel value less than default decibel value Set the first waypoint as voice messaging;
In the embodiment of the present invention, terminal detects the decibel value of voice messaging by preset decibel inspection software, for example, Sound meter 2.0, Digital Sound Meter etc..
Preset decibel value be it is preset, when occur in voice messaging user speak pause when this at decibel value compared with It is low, in view of, there are noise, preset decibel value being set as in environment in view of the decibel value for being higher than environmental noise and lower than just Sound when often speaking.
Decibel value is less than the position of default decibel value as the of voice messaging the embodiment of the invention also provides a kind of The method of one waypoint, as shown in figure 3, Fig. 3 is the position provided in an embodiment of the present invention that decibel value is less than to default decibel value The basic procedure schematic diagram of the method for the first waypoint as voice messaging.
Specifically, as shown in figure 3, step S1212 includes the following steps:
S12121, judge that decibel value is less than the voice duration for presetting decibel value in voice messaging;
In practical applications, there are time intervals between each word when being spoken due to people, under normal circumstances, complete Pause after sentence has been expressed is only segmentation in the embodiment of the present invention, therefore, during determining the first waypoint, when When the decibel value deposited at a time in voice messaging is less than preset decibel value, the voice using the moment as starting point is judged When decibel value is respectively less than default decibel value in segment, whether the duration of the sound bite is greater than preset duration.The present invention is implemented Voice duration in example is the duration that above-mentioned decibel value is less than the sound bite of default decibel value.
S12122, when voice duration is greater than preset duration, obtain the sound bite that decibel value is less than default decibel value;
In the embodiment of the present invention, when obtaining sound bite of the decibel value less than default decibel value, the voice sheet need to be only obtained Time of the section in voice messaging is unknown.
S12123, using any time in sound bite as the first waypoint.
Above content is illustrated below, for example, decibel inspection software detects point in voice messaging at 2s Shellfish is lower than default decibel value, at this point, being starting point from 2S, i.e., the decibel value of the sound bite of 2-3S is below pre- in voice messaging If decibel value, judge that the duration 1S of the sound bite of 2-3S is greater than preset duration 0.5S, accordingly, it is determined that in the sound bite of 2-3S Any time at i.e. 2.5S, can be used as the first waypoint at 3S etc..
In this way, the first waypoint can be determined accurately, the problem of arbitrarily disconnecting sentence appearance is avoided the occurrence of.
S1213, voice messaging is segmented according to the first waypoint.
In the embodiment of the present invention, according to the first waypoint, i.e. decibel value is respectively less than in the sound bite of default decibel value and appoints Voice messaging is divided into multiple sound bites by the point at meaning moment.It should be noted that being pressed after segmentation to each sound bite It is ranked up according to the position originally in voice messaging, to keep continuity.
In practical applications, the scene that often will appear more people's chats in voice messaging, for example, interview, minutes etc.. In this case, one kind is provided according to preset punctuate rule in order to enhance the embodiment of the present invention of the text information after conversion The method that voice messaging is segmented, as shown in figure 4, Fig. 4 is one kind provided in an embodiment of the present invention according to preset punctuate The basic procedure schematic diagram for the method that voice messaging is segmented by rule.
Specifically, as shown in figure 4, step S1220 includes the following steps:
S1221, judge tone color variation whether occurs in voice messaging;
Whether can use in built-in tone color inspection software detection voice messaging in the embodiment of the present invention has tone color variation, For example, Polyphone software etc..
S1222, when tone color variation occurs in voice messaging, using the position of tone color variation as second point of voice messaging Duan Dian;
S1223, voice messaging is segmented according to the second waypoint.
In the embodiment of the present invention, when in voice messaging occur tone color variation when, extract the color change of voice messaging middle pitch when Between point, and using the time point as the second waypoint of voice messaging.
It is to be understood that the same voice messaging will appear embodiment shown in Fig. 2 and this implementation in practical applications Situation in example, that is, the case where existing simultaneously the first waypoint and the second waypoint, at this point, according to the first waypoint and second point Section point is segmented voice messaging, is ranked up with and to sound bite after segmentation according to the sequence in voice messaging, To avoid the situation for the text confusion after segmentation occur.
When occurring multiple roles (i.e. tone color) in voice messaging, voice messaging is divided into multistage according to the second waypoint, And it after the sound bite after segmentation is converted to text, due to not knowing about role during readers ' reading, is easy to mix up What is said or talked about by each role, and in order to solve this problem, the embodiment of the invention provides a kind of voice messagings by after segmentation to turn The method for being changed to text, as shown in figure 5, Fig. 5 is that a kind of voice messaging by after segmentation provided in an embodiment of the present invention is converted to The basic procedure schematic diagram of the method for text.
Specifically, as shown in figure 5, step S1300 the following steps are included:
S1311, tone color label is carried out to the voice messaging with identical tone color after segmentation;
When in voice messaging including multiple roles, i.e., when different tone color, tone color is marked, for example, voice is believed It include two role A and B in breath, A is marked using a, and B is marked using b, and then is distinguished to role.
S1312, the voice being segmented after label is converted to by text by preset speech software;
S1313, role's label is carried out to the text after conversion according to tone color label.
In the embodiment of the present invention, every section of voice after segmentation is converted in sequence by built-in speech software Text, and role is remembered in the section head of every section of text, it so that reader is apparent from which role every section of text is by It says, improves the readability of text.
In practical applications, in order to increase conversion after text readability, provide good reading experience, this hair for reader Bright embodiment provides another method that voice messaging after segmentation is converted to text, as shown in fig. 6, Fig. 6 is that the present invention is real Apply the basic procedure schematic diagram that a kind of voice messaging by after segmentation that example provides is converted to the method for text.
Specifically, as shown in fig. 6, step S1300 includes the following steps:
S1321, the voice after segmentation is converted to by target text by preset speech software;
Speech software includes SwiftScribe, the softwares such as IBM Viavoice.
Tone keyword in S1322, acquisition target text;
In the embodiment of the present invention, terminal is preset with tone vocabulary database, when obtaining the tone keyword of target text, Vocabulary in tone vocabulary database is compared terminal with the vocabulary in target text, when existing in target text and the tone When the identical vocabulary of vocabulary database, the vocabulary is extracted, and using the vocabulary as the tone keyword of target text.
S1323, the punctuation mark that there are mapping relations with tone keyword is searched in preset information table, and by punctuate After symbol is added to target text.
The corresponding relationship of tone vocabulary and punctuation mark is described in information table, for example, occurring usually in sentence " assorted " when indicate question, should use in end of the sentence "? ", when " " occurs in sentence end, generally sigh with feeling, it should be answered in end of the sentence The use "!".In this way, in information table tone vocabulary " what " with "? " with mapping relations, " " with "!" there is mapping to close System.
In practical applications, according to the difference of context, the mood of different tone lexical representations is different, for example, " " also may indicate that question.Addition punctuation mark in order to be more accurate, can also structure and context language to sentence Justice is analyzed, and to determine the punctuation mark of end of the sentence, details are not described herein.
The embodiment of the present invention also provides a kind of voice conversion text device to solve above-mentioned technical problem.Referring specifically to figure 7, Fig. 7 convert text device basic structure block diagram for the present embodiment voice.
As shown in fig. 7, a kind of voice converts text device, comprising: obtain module 2100, processing module 2200 and execute mould Block 2300.Wherein, module 2100 is obtained, for obtaining voice messaging to be processed;Processing module 2200, for according to preset The voice messaging is segmented by punctuate rule;Execution module 2300, for the voice messaging after segmentation to be converted to text.
Voice converts text device and is segmented voice messaging by preset punctuate rule, and according to the language after segmentation Message breath is converted to text, can increase the readability of text by being segmented to text, avoid the occurrence of unnecessary misread Or ambiguity.
In some embodiments, the processing module includes: the first processing submodule, for detecting the voice messaging In decibel value;Second processing submodule will be described for when the decibel value in the voice messaging is less than default decibel value Decibel value is less than first waypoint of the position of the default decibel value as the voice messaging;First implementation sub-module is used The voice messaging is segmented according to first waypoint.
In some embodiments, the second processing submodule includes: third processing submodule, the predicate for judging Decibel value is less than the voice duration of the default decibel value in message breath;First acquisition submodule, for working as the voice duration When greater than preset duration, the sound bite that decibel value is less than the default decibel value is obtained;Second implementation sub-module is used for institute Any time in sound bite is stated as first waypoint.
In some embodiments, the processing module includes: fourth process submodule, for judging the voice messaging In whether occur tone color variation;5th processing submodule, for when tone color variation occurs in the voice messaging, tone color to be become Second waypoint of the position of change as the voice messaging;Third implementation sub-module, for according to second waypoint pair The voice messaging is segmented.
In some embodiments, the execution module includes: the 6th processing submodule, for having phase to after segmentation Voice messaging with tone color carries out tone color label;7th processing submodule, for that will be marked by preset speech software The voice being segmented afterwards is converted to text;4th implementation sub-module, for according to the tone color label to the text after conversion into Row role label.
In some embodiments, the execution module includes: the 8th processing submodule, for being turned by preset voice It changes software and the voice after segmentation is converted into target text;Second acquisition submodule, for obtaining the language in the target text Air to close keyword;5th implementation sub-module has mapping relations with the tone keyword for searching in preset information table Punctuation mark, and after the punctuation mark is added to the target text.
In some embodiments, the acquisition module includes: third acquisition submodule, and the voice for acquiring user is believed Breath;9th processing submodule, for carrying out noise reduction process to the voice messaging according to preset processing software.
In order to solve the above technical problems, the embodiment of the present invention also provides computer equipment.It is this referring specifically to Fig. 8, Fig. 8 Embodiment computer equipment basic structure block diagram.
As shown in figure 8, the schematic diagram of internal structure of computer equipment.As shown in figure 8, the computer equipment includes passing through to be Processor, non-volatile memory medium, memory and the network interface of bus of uniting connection.Wherein, the computer equipment is non-easy The property lost storage medium is stored with operating system, database and computer-readable instruction, can be stored with control information sequence in database Column when the computer-readable instruction is executed by processor, may make processor to realize a kind of voice conversion text method.The calculating The processor of machine equipment supports the operation of entire computer equipment for providing calculating and control ability.The computer equipment It can be stored with computer-readable instruction in memory, when which is executed by processor, processor may make to hold A kind of voice of row converts text method.The network interface of the computer equipment is used for and terminal connection communication.Those skilled in the art Member is appreciated that structure shown in Fig. 8, only the block diagram of part-structure relevant to application scheme, composition pair The restriction for the computer equipment that application scheme is applied thereon, specific computer equipment may include than as shown in the figure more More or less component perhaps combines certain components or with different component layouts.
Processor obtains module 2100, processing module 2200 and execution module for executing in present embodiment in Fig. 7 2300 particular content, program code and Various types of data needed for memory is stored with the above-mentioned module of execution.Network interface is used for To the data transmission between user terminal or server.Memory in present embodiment is stored in voice conversion text method Program code needed for executing all submodules and data, server is capable of the program code of invoking server and data execute institute There is the function of submodule.
Voice messaging is segmented by computer equipment by preset punctuate rule, and according to the voice messaging after segmentation Text is converted to, the readability of text can be increased by being segmented to text, avoids the occurrence of unnecessary misread or discrimination Justice.
The present invention also provides a kind of storage mediums for being stored with computer-readable instruction, and the computer-readable instruction is by one When a or multiple processors execute, so that one or more processors execute voice described in any of the above-described embodiment and convert text side The step of method.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, which can be stored in a computer-readable storage and be situated between In matter, the program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, storage medium above-mentioned can be The non-volatile memory mediums such as magnetic disk, CD, read-only memory (Read-Only Memory, ROM) or random storage note Recall body (Random Access Memory, RAM) etc..
It should be understood that although each step in the flow chart of attached drawing is successively shown according to the instruction of arrow, These steps are not that the inevitable sequence according to arrow instruction successively executes.Unless expressly stating otherwise herein, these steps Execution there is no stringent sequences to limit, can execute in the other order.Moreover, at least one in the flow chart of attached drawing Part steps may include that perhaps these sub-steps of multiple stages or stage are not necessarily in synchronization to multiple sub-steps Completion is executed, but can be executed at different times, execution sequence, which is also not necessarily, successively to be carried out, but can be with other At least part of the sub-step or stage of step or other steps executes in turn or alternately.
The above is only some embodiments of the invention, it is noted that for the ordinary skill people of the art For member, various improvements and modifications may be made without departing from the principle of the present invention, these improvements and modifications are also answered It is considered as protection scope of the present invention.

Claims (10)

1. a kind of method of voice conversion text, which is characterized in that include the following steps:
Obtain voice messaging to be processed;
The voice messaging is segmented according to preset punctuate rule;
Voice messaging after segmentation is converted into text.
2. the method for voice conversion text according to claim 1, which is characterized in that described according to preset punctuate rule The voice messaging is segmented, comprising:
Detect the decibel value in the voice messaging;
When the decibel value in the voice messaging is less than default decibel value, the decibel value is less than the default decibel value First waypoint of the position as the voice messaging;
The voice messaging is segmented according to first waypoint.
3. the method for voice conversion text according to claim 2, which is characterized in that described that the decibel value is less than institute State first waypoint of the position as the voice messaging of default decibel value, comprising:
Judge that decibel value is less than the voice duration of the default decibel value in the voice messaging;
When the voice duration is greater than preset duration, the sound bite that decibel value is less than the default decibel value is obtained;
Using any time in the sound bite as first waypoint.
4. the method for voice conversion text according to claim 1, which is characterized in that described according to preset punctuate rule The voice messaging is segmented, comprising:
Judge tone color variation whether occurs in the voice messaging;
When tone color variation occurs in the voice messaging, using the position of tone color variation as the second segmentation of the voice messaging Point;
The voice messaging is segmented according to second waypoint.
5. the method for voice conversion text according to claim 4, which is characterized in that the voice messaging by after segmentation Be converted to text, comprising:
Tone color label is carried out to the voice messaging with identical tone color after segmentation;
The voice being segmented after label is converted into text by preset speech software;
Role's label is carried out to the text after conversion according to tone color label.
6. the method for voice conversion text according to claim 1, which is characterized in that the voice messaging by after segmentation Be converted to text, comprising:
The voice after segmentation is converted into target text by preset speech software;
Obtain the tone keyword in the target text;
The punctuation mark that there are mapping relations with the tone keyword is searched in preset information table, and the punctuate is accorded with After number being added to the target text.
7. the method for voice conversion text according to claim 1, which is characterized in that described to obtain voice letter to be processed Breath, comprising:
Acquire the voice messaging of user;
Noise reduction process is carried out to the voice messaging according to preset processing software.
8. a kind of voice converts text device characterized by comprising
Module is obtained, for obtaining voice messaging to be processed;
Processing module, for being segmented the voice messaging according to preset punctuate rule;
Execution module, for the voice messaging after segmentation to be converted to text.
9. a kind of computer equipment, including memory and processor, it is stored with computer-readable instruction in the memory, it is described When computer-readable instruction is executed by the processor, so that the processor executes such as any one of claims 1 to 7 right It is required that the step of voice conversion text method.
10. a kind of storage medium for being stored with computer-readable instruction, the computer-readable instruction is handled by one or more When device executes, so that one or more processors execute the voice conversion text as described in any one of claims 1 to 7 claim The step of word method.
CN201811526588.9A 2018-12-13 2018-12-13 Method, device, computer equipment and storage medium for converting voice into text Active CN109754808B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201811526588.9A CN109754808B (en) 2018-12-13 2018-12-13 Method, device, computer equipment and storage medium for converting voice into text

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811526588.9A CN109754808B (en) 2018-12-13 2018-12-13 Method, device, computer equipment and storage medium for converting voice into text

Publications (2)

Publication Number Publication Date
CN109754808A true CN109754808A (en) 2019-05-14
CN109754808B CN109754808B (en) 2024-02-13

Family

ID=66403800

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811526588.9A Active CN109754808B (en) 2018-12-13 2018-12-13 Method, device, computer equipment and storage medium for converting voice into text

Country Status (1)

Country Link
CN (1) CN109754808B (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110335612A (en) * 2019-07-11 2019-10-15 招商局金融科技有限公司 Minutes generation method, device and storage medium based on speech recognition
CN110827825A (en) * 2019-11-11 2020-02-21 广州国音智能科技有限公司 Punctuation prediction method, system, terminal and storage medium for speech recognition text
CN112151042A (en) * 2019-06-27 2020-12-29 中国电信股份有限公司 Voiceprint recognition method, device and system and computer readable storage medium
CN113408996A (en) * 2020-03-16 2021-09-17 上海博泰悦臻网络技术服务有限公司 Schedule management method, schedule management device and computer readable storage medium

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1770260A (en) * 2004-11-01 2006-05-10 英业达股份有限公司 Speech waveform processing system and method
CN101178790A (en) * 2006-11-10 2008-05-14 胡鹏 Method for realizing synergic listen and type recording method by intelligent virtual punctuate
CN102522084A (en) * 2011-12-22 2012-06-27 广东威创视讯科技股份有限公司 Method and system for converting voice data into text files
CN102903361A (en) * 2012-10-15 2013-01-30 Itp创新科技有限公司 Instant call translation system and instant call translation method
CN104050160A (en) * 2014-03-12 2014-09-17 北京紫冬锐意语音科技有限公司 Machine and human translation combined spoken language translation method and device
CN104142915A (en) * 2013-05-24 2014-11-12 腾讯科技(深圳)有限公司 Punctuation adding method and system
CN105609107A (en) * 2015-12-23 2016-05-25 北京奇虎科技有限公司 Text processing method and device based on voice identification
CN106504746A (en) * 2016-10-28 2017-03-15 普强信息技术(北京)有限公司 A kind of method for extracting structuring traffic information from speech data
CN106656767A (en) * 2017-01-09 2017-05-10 武汉斗鱼网络科技有限公司 Method and system for increasing new anchor retention
CN106971723A (en) * 2017-03-29 2017-07-21 北京搜狗科技发展有限公司 Method of speech processing and device, the device for speech processes
CN108141498A (en) * 2015-11-25 2018-06-08 华为技术有限公司 A kind of interpretation method and terminal
CN108132995A (en) * 2017-12-20 2018-06-08 北京百度网讯科技有限公司 For handling the method and apparatus of audio-frequency information
CN108447486A (en) * 2018-02-28 2018-08-24 科大讯飞股份有限公司 A kind of voice translation method and device
CN108831481A (en) * 2018-08-01 2018-11-16 平安科技(深圳)有限公司 Symbol adding method, device, computer equipment and storage medium in speech recognition

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN1770260A (en) * 2004-11-01 2006-05-10 英业达股份有限公司 Speech waveform processing system and method
CN101178790A (en) * 2006-11-10 2008-05-14 胡鹏 Method for realizing synergic listen and type recording method by intelligent virtual punctuate
CN102522084A (en) * 2011-12-22 2012-06-27 广东威创视讯科技股份有限公司 Method and system for converting voice data into text files
CN102903361A (en) * 2012-10-15 2013-01-30 Itp创新科技有限公司 Instant call translation system and instant call translation method
CN104142915A (en) * 2013-05-24 2014-11-12 腾讯科技(深圳)有限公司 Punctuation adding method and system
CN104050160A (en) * 2014-03-12 2014-09-17 北京紫冬锐意语音科技有限公司 Machine and human translation combined spoken language translation method and device
CN108141498A (en) * 2015-11-25 2018-06-08 华为技术有限公司 A kind of interpretation method and terminal
CN105609107A (en) * 2015-12-23 2016-05-25 北京奇虎科技有限公司 Text processing method and device based on voice identification
CN106504746A (en) * 2016-10-28 2017-03-15 普强信息技术(北京)有限公司 A kind of method for extracting structuring traffic information from speech data
CN106656767A (en) * 2017-01-09 2017-05-10 武汉斗鱼网络科技有限公司 Method and system for increasing new anchor retention
CN106971723A (en) * 2017-03-29 2017-07-21 北京搜狗科技发展有限公司 Method of speech processing and device, the device for speech processes
CN108132995A (en) * 2017-12-20 2018-06-08 北京百度网讯科技有限公司 For handling the method and apparatus of audio-frequency information
CN108447486A (en) * 2018-02-28 2018-08-24 科大讯飞股份有限公司 A kind of voice translation method and device
CN108831481A (en) * 2018-08-01 2018-11-16 平安科技(深圳)有限公司 Symbol adding method, device, computer equipment and storage medium in speech recognition

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112151042A (en) * 2019-06-27 2020-12-29 中国电信股份有限公司 Voiceprint recognition method, device and system and computer readable storage medium
CN110335612A (en) * 2019-07-11 2019-10-15 招商局金融科技有限公司 Minutes generation method, device and storage medium based on speech recognition
CN110827825A (en) * 2019-11-11 2020-02-21 广州国音智能科技有限公司 Punctuation prediction method, system, terminal and storage medium for speech recognition text
CN113408996A (en) * 2020-03-16 2021-09-17 上海博泰悦臻网络技术服务有限公司 Schedule management method, schedule management device and computer readable storage medium

Also Published As

Publication number Publication date
CN109754808B (en) 2024-02-13

Similar Documents

Publication Publication Date Title
CN109754808A (en) Method, apparatus, computer equipment and the storage medium of voice conversion text
US20240021202A1 (en) Method and apparatus for recognizing voice, electronic device and medium
US20190096402A1 (en) Method and apparatus for extracting information
CN111625635A (en) Question-answer processing method, language model training method, device, equipment and storage medium
CN112634876B (en) Speech recognition method, device, storage medium and electronic equipment
CN111951780B (en) Multitasking model training method for speech synthesis and related equipment
CN103871401A (en) Method for voice recognition and electronic equipment
CN108768824B (en) Information processing method and device
CN111401071A (en) Model training method and device, computer equipment and readable storage medium
CN111312231A (en) Audio detection method and device, electronic equipment and readable storage medium
CN110347866B (en) Information processing method, information processing device, storage medium and electronic equipment
CN112906381B (en) Dialog attribution identification method and device, readable medium and electronic equipment
CN111767740A (en) Sound effect adding method and device, storage medium and electronic equipment
CN112765460A (en) Conference information query method, device, storage medium, terminal device and server
CN112668333A (en) Named entity recognition method and device, and computer-readable storage medium
CN111667810A (en) Method and device for acquiring polyphone corpus, readable medium and electronic equipment
CN111738791B (en) Text processing method, device, equipment and storage medium
CN110245334B (en) Method and device for outputting information
CN110232920B (en) Voice processing method and device
CN111555960A (en) Method for generating information
CN111444321B (en) Question answering method, device, electronic equipment and storage medium
CN114242047A (en) Voice processing method and device, electronic equipment and storage medium
CN109768913A (en) Information processing method, device, computer equipment and storage medium
CN115116427B (en) Labeling method, voice synthesis method, training method and training device
CN116629236A (en) Backlog extraction method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant