CN109634935A

CN109634935A - Method of speech processing, storage medium and device

Info

Publication number: CN109634935A
Application number: CN201811321774.9A
Authority: CN
Inventors: 刘炳林; 程勇; 孔浩
Original assignee: Chongqing Haite Technology Development Co Ltd
Current assignee: Chongqing Haite Technology Development Co Ltd
Priority date: 2018-11-07
Filing date: 2018-11-07
Publication date: 2019-04-16

Abstract

The present invention provides a kind of voice data processing method, storage medium and device, comprising: step 11: obtaining and records associated recording data to be analyzed with detection；Step 13: based on the speech recognition modeling with engineering detecting specialized vocabulary recognition capability, being analysed to voice data and be converted to recognition result text；Step 15: pressing presupposed information extracting rule, detection record attribute information is extracted from recognition result text.Based on method of the invention, the recognition result text of voice data can be automatically converted to the attribute information of detection record, not only improve the use value of recognition result text, reduce the text editing workload of engineering staff, and when dramatically reducing field work, engineering staff records the work load of detection information.

Description

Method of speech processing, storage medium and device

Technical field

The present invention relates to computer field, in particular to a kind of voice data processing method, storage medium and device.

Background technique

Engineering structure (bridge, tunnel, dam, port and pier and various buildings) usually requires periodically to be examined Survey (or check, similarly hereinafter), it is often necessary to conduct a field operation and record detection information, handheld device for detect record by Gradually be applied, by taking bridge machinery terminal as an example, common bridge machinery detection terminal be generally mounted to tablet computer or The APP of mobile phone, main input mode are structured software interfaces, as shown in Figure 1, testing staff records the defect number of detection discovery According to when, need more screen point selection operation and to complete the input of control one by one by soft keyboard each to complete with handover operation The input of defect attribute data, therefore the data recorded required for a detection record are a even more up to more than ten, at the scene, It is inputted by the touch screen of mobile device, input efficiency is lower.When the information for needing to record is more, can seriously affect Detect job scheduling.

When in order to make field work, engineering staff, which can be convenient, efficiently records detection information, avoids recording detection data When inefficient frequent touch control operation, be badly in need of developing a kind of technical solution of more efficient recording detection data.

Summary of the invention

In view of this, the present invention provides a kind of voice data processing method, storage medium and device, how to solve by language Sound data change into the problem of detection record automatically.

The present invention provides a kind of voice data processing method, this method comprises:

Step 11: obtaining and record associated recording data to be analyzed with detection；

Step 13: based on the speech recognition modeling with engineering detecting specialized vocabulary recognition capability, being analysed to voice number According to being converted to recognition result text；

Step 15: pressing presupposed information extracting rule, detection record attribute information is extracted from recognition result text.

The present invention also provides a kind of non-transitory computer-readable storage medium, non-transitory computer-readable storage medium storages Instruction, instruction make processor execute the step in above-mentioned voice data processing method when executed by the processor.

The present invention also provides a kind of voice processing apparatus, including processor and above-mentioned non-instantaneous computer-readable storage medium Matter.

Based on method of the invention, it is additionally arranged step 15, the recognition result text of voice data can be automatically converted to The attribute information of record is detected, this method not only improves the use value of recognition result text, reduces the text of engineering staff Editing amount, and when dramatically reducing field work, engineering staff records the work load of detection information.

Detailed description of the invention

Fig. 1 is the user record interactive interface of existing highway bridge detection record terminal；

Fig. 2 is the flow chart of voice data processing method of the present invention；

Fig. 3 is one embodiment of voice data processing method of the present invention；

Fig. 4 is the structure chart of voice data processing apparatus of the present invention.

Specific embodiment

To make the objectives, technical solutions, and advantages of the present invention clearer, right in the following with reference to the drawings and specific embodiments The present invention is described in detail.

Fig. 2 is voice data processing method of the invention, including

Step 11 (S11): it obtains and records associated recording data to be analyzed with detection；

Step 13 (S13): based on the speech recognition modeling with engineering detecting specialized vocabulary recognition capability, it is analysed to language Sound data are converted to recognition result text；

Step 15 (S15): pressing presupposed information extracting rule, and detection record attribute information is extracted from recognition result text.

Detection record attribute information can both be stored with structured way, can also store (example with non-structured mode As storage detection records one by one), but for the ease of post depth exploitation, preferably stored with structured way.

Detection record attribute information can be used for subsequent statistical analysis and report, such as generating detection record, detection Report, the quantity of all kinds of diseases of query statistic and distribution situation of disease etc..

The important record object of detection record is the defect of the various components of detection process discovery, in order to make the defect of record Informational support statistical analysis and report output, it usually needs the attribute that defect is described by the way of structural data (is mentioned Detection record attribute information is taken, and is saved with structuring), recognition result text is only stored, without will wrap in these texts The defect contained describes attribute information and extracts and store corresponding defect attribute, then can not carry out to these defect informations quickly quasi- True statistical analysis and output report.

By taking bridge machinery as an example, the attribute that typical defect includes includes:

Defect type

Said members number

Defective locations

Defect description

Above-mentioned attribute can also further be split, the attribute information (sub- attribute) including attribute, such as: defect description Attribute may further include sub- attribute.By taking the defect to " crack " type describes attribute as an example, it may also include following Sub- attribute:

Rift defect description:

Trend

Seam length

Slit width

Existing engineering detecting terminal, by taking common bridge periodic detection records terminal as an example, usually bridge faultiness design Data structure and interactive input interface towards its attribute, as shown in Figure 1, interface requirements user utilizes interface control one by one Association attributes are inputted, user is needed to input large amount of text information, input efficiency is lower.

The present invention is inputted by voice, and user can pass through voice directly with the casting disease description of continuous natural-sounding After identification model obtains recognition result text, by presupposed information extracting rule, the detection extracted in recognition result text records category Property information, directly obtain include in recognition result text multiple detections record attribute information (usually defect attribute believe Breath), it is clicked on a user interface without user, the information of acquisition can be shown with speech recognition result textual form, can also be with The detection record attribute information of extraction is shown with corresponding interface control, but storage inside then includes the fast quick checking of a support Ask the detection record attribute information data of statistics, usually structural data, storage form preference database.

By this programme, needs the detection data Input Process that control carries out typing to be one by one reduced to voice script and broadcast Report, and reach institute's typing detection data and be able to carry out statistical analysis and export the target of examining report, drastically reduce behaviour The complexity of work, and greatly improve the efficiency of inputting and data-handling efficiency of detection record.

The defect of user's typing describes, and final goal is to be used to generate detection record, examining report, query statistic etc., In general, these for statistics attributes only storaged voice recognition result text itself can not support quickly and easily to count with Report form application arrives the information storage of extraction therefore, it is necessary to extract information needed from recognition result text before being counted Detect database of record.The information of extraction generally includes the attribute information of defect, also includes other necessary information.

Optionally, examining report can also be generated with user from edlin, at this point, the application method supports user arbitrarily to call Or use voice data to be analyzed and subsequent recognition result text and detection record attribute information.

Detection record attribute may include: number, defective locations and defect description, correspondingly, presupposed information extracting rule It includes at least:

Extracting rule 1: defining component number extracting rule, for mentioning " the first number is expressed " in recognition result text It is taken as " the second number expression "；

For example, defining extracting rule 1-1:$ number $ Component Category=[component number], $ key representations one kind is default to be closed Key word or a kind of character for meeting preset rules, the information that "=" keyword indicates that "=" number front is extracted directly are arranged to Variable in "=" number back object [component number].Such as " 1-2# beam ", [component number] is set as " 1-2# beam ".

Extracting rule 2: Define defects position extracting rule, for " the first position expression " in recognition result text to be mentioned It is taken as " second position expression "；

For example, defining extracting rule 2-1:<% |>[position]<it was found that | having, % indicates that subsequent character keyword can With vacancy, similarly hereinafter.Such as " bottom discovery ", then the variable in object [position] is set as " bottom ".

Extracting rule 3: Define defects describe extracting rule, and " description of the first defect " for that will extract in resulting text mentions It is taken as " description of the second defect ".

Multiple attribute informations are generally included in defect description, such as crack, the sub- attribute that may include are as follows: the class in crack Type, trend, length, width etc., for concrete scaling, the attribute that may include are as follows: peel off area.

For example, Define defects description type extracting rule 3-1 (containing quantity): [quantity] $ defect type such as " is split 1 longitudinal direction Seam ", then the variable in object [quantity] is set as " 1 ".

Defect description type extracting rule (being free of quantity) 3-2:$ defect type=[defect type], such as " splits 1 longitudinal direction Seam ", then the variable in object [defect type] is set as " crack ".

According to the above rule, " longitudinal crack 1 is found in the bottom of 1-2# beam, seam in the recognition result text of user Long 1m, slit width 0.03mm " can then extract following detection record attribute information:

Number: 1-2# beam

Defect description:

Defect type: crack

Quantity: 1 (item)

Trend: longitudinal

Seam length: 1 (m)

Slit width: 0.03 (mm)

The above rule is example, Different Rule can be set with support attribute information under different communicative habits extract and Storage mode, while multiple recognition result Text Feature Extraction attribute informations can also be combined.

Optionally, the multiple detection record attributes extracted from a voice data are shown to user interface one by one In corresponding control.Such as, attributes such as " slit widths " " are stitched and grown " to the attribute of the above extraction, " quantity ", " trend " of every detection record Value is shown to " quantity " edit control of detection record, " type " edit control, " trend " edit control, " seam length " respectively Edit control, " slit width " edit control.

As shown in figure 3, after the step 13 of Fig. 1 and before step 15 further include:

Step 14: being to meet the knowledge of engineering detecting language specification by recognition result text conversion based on default transformation rule Other resulting text.

It is further described below for step 14.

By taking bridge machinery as an example, engineering detecting specialized vocabulary includes: component name, (component) number, defective locations, lacks Fall into the relevant proprietary vocabulary of the detection record attribute such as description.

Speech recognition modeling needs with engineering detecting specialized vocabulary recognition capability are carrying out identifying preceding foundation, or are based on Existing identification model carry out it is perfect, establish or improve process include: building engineering detecting specialized vocabulary library, by Engineering Speciality word Library input speech recognition engine (core component of speech recognition modeling) that converges carries out modeling training, and speech recognition engine is made to have work Journey detects specialized vocabulary recognition capability.

Building specialized vocabulary, which is included in dictionary, is added substitute, and substitute is used for the identification of speech recognition engine output Resulting text, then identification conversion is carried out to substitute for subsequent conversion step 14.

For the speech recognition of engineering detecting, difficult point is that the recognition result for meeting industry communicative habits can not pass through Speech recognition engine easily obtains, and needs to adjust by human-edited, low efficiency.

For example, the identification that speech recognition engine can number component is inaccurate, and component number identification is inaccurate, will lead to subsequent Element type can not judge, and then cause also filter load with element type related data.The component of engineering detecting is compiled Number, usually are as follows: component serial number+Component Category composition, such as " 1-2# beam ", " 1-2 " they are the serial numbers of beam, expression first is across the 2nd Beam (counts starting point and voluntarily arranges with rule), and special circumstances can also be identified plus framing, such as the left width of " L1-2# beam " expression " the One across the 2nd beam ".Using existing voice input method, the recognition accuracy of component number is very low, and user needs largely to be repaired Change, therefore does not have practical value." 1-2# beam " is identified as " thick stick two is good beautiful ", and " L1-2# beam " is identified as " having suffered a steel Two is good beautiful ", " 10-10# beam " is identified as " Shi Gangshi is great good " (speech recognition engine suppositions " Shi Haoliang " for name), for solution The problem proposes to optimize setting for number dictionary, comprising:

The component number that will likely be used using the method for exhaustion is included in dictionary, and such as: No. 1 is cross over No. 40 across 1-1 beam to 40- No. 40 beams, 1-1-1 support to 40-40-2 support ... consider bridge type, and element type, across number, the series of number is various Combination will be magnanimity, and dictionary maintenance modification trouble, adaptability is poor, and recognition efficiency is low, and various combined long words are more, also can Lead to the reduction of its recognition accuracy.

The present invention by the way that " thick stick " " thick stick two " similar entry is added in dictionary, to speech recognition engine be trained with Afterwards, speech recognition engine can be allow preferably to export " thick stick two ", then be converted to " thick stick two " by default transformation rule Expected " 1-2 ".Equally, uplink 1-2 beam is represented for input number " S1-2# beam ", when the S in number is inputted with voice Recognition result is unreliable, and " uplink " entry can directly be arranged, and broadcasts " No. two beams of one thick stick of uplink " when inputting number, identification knot After fruit output, then by default transformation rule will " No. two beams of one thick stick of uplink " be converted to expected from " S1-2# beam ".

A kind of preferred embodiment is to identify component number related voice and dictionary is added with basic entry, basic entry is at least Combination of more than two kinds including number, "-", " Component Category " entry or its substitute.

Basic entry includes:

A, number+"-", such as: 1-, 2-...n-.

B, number+"-"+number, such as: 1-1,1-2 ... n-n.

- 1, -2 c, "-"+number, such as: ...-n.

D, "-"+number+" # ", such as: -1#, -2# ...-n#.

E, number+" # ", such as: 1#, 2# ... n#.

Number f ,+Component Category, such as: # beam, # support ....

After basic entry needed for component number identification is set, it can increase substantially and various be combined by relevant rudimentary entry Made of compound number recognition accuracy.

But due to the nonstandard disunity of pronunciation of the speech recognition engine for numbers and symbols, it is also possible to cause to identify As a result unreliable, for example, " 1# " this how to pronounce and could be identified by speech recognition engine? " 1 well "? " pound sign "? " No.1 "? thing In reality, according to the usage of trade, the corresponding pronunciation of the " # " number is " number ", and only pronunciation " No.1 " just meets user cognition, but identification is drawn Holding up obviously can not learn this usage of trade, can preferentially export the various unisonance candidate results including " No.1 ".To user It is required that some identification engines do not support numbers and symbols that the customized dictionary of user is added.To solve this problem, this programme proposes Substitute+conversion plan:

To not have ambiguous alternative entry that dictionary is added as required basic entry to speech recognition engine, user only needs According to alternative entry typing voice, related to speech recognition engine output includes alternative entry, for example, recognition result text is " No. two beams of one thick stick of uplink ", the recognition result text after thening follow the steps 14 are " S1-2# beam ".

Alternative entry is not limited to number, and can be used for other scenes.

The related alternative entry example of number is given below:

One-dimensional leading serial number a: thick stick, two thick sticks ....

The leading serial number of two dimension: a thick stick one, a thick stick two ... 20 thick sticks 40.

One-dimensional suffix: thick stick one, thick stick two ... thick stick 40.

One-dimensional suffix+number: thick stick No.1, thick stick two ... thick stick 40.

Component Category title: across, platform, abutment, beam, support ....

Number+Component Category: number across, number platform, number beam, number support ....

It is modeled and is imported after speech recognition engine is trained, the number of speech recognition engine output by the above dictionary Accuracy rate will greatly promote, and be adapted to various number casting habits, such as whole liaison " No. three supports of two thick stick of a thick stick ", voice Identify that customized dictionary can cover a variety of participle modes when engine automatic word segmentation, such as:

One thick stick, two thick sticks, No. three, support

One thick stick two, thick stick three, support

One thick stick, two thick sticks three, number support

One thick stick two, thick stick three, number support

...

It can be seen that the dictionary by the above speech recognition modeling models, component is numbered for speech recognition engine It is not problem, manually when casting, carrying out pause participle according to the habit of oneself, also there is no problem, and artificial pause is with above-mentioned point Word combination is similar.The speech recognition that the recording data to be analyzed to pronounce comprising engineering detecting specialized vocabulary inputs completion training is drawn It holds up and is identified, the recognition result text comprising engineering detecting specialized vocabulary can be converted voice data into.

Speech recognition engine can be online cloud identification engine, be also possible to local identification engine.It is preferable in network condition When, online recognition accuracy rate is higher, and when no network, local identification can be used as alternative scheme, can also will local identification and networking Identification combines using to balance recognition accuracy and efficiency.

It is further described below for step 14.

It can solve being identified as specific term by the speech recognition engine that specialized dictionary trains speech recognition modeling The low problem of power, but can not solve the problems, such as industry communicative habits.Such as: trained identification engine " can prop up to avoid inciting somebody to action Seat " is identified as " making ", identifies number " support of thick stick two ", for speech recognition modeling, this has been correctly to tie Fruit, and according to engineering detecting professional standard, such describing mode is undesirable, it is impossible to be used in examining report.According to Relevant industries standard, convention, user require, and " support of thick stick two " corresponding habit expression is " 1-2# support ", " uplink 1- The corresponding habit expression of No. 2 beams " is " S1-2# beam "；" the zero point square meter " of speech recognition modeling identification mistake is also required to correct For " 0.8m²" " 3 points 1 meters multiplied by 0.8 meter " it is this Chinese and English mixing expression need be converted to " 3.1m*0.8m ",

Defective locations are described, it is related with across footpath to be described as " at 1/4L ", pronounce for " at a quarter L ", Recognition result is also just " at a quarter L ", it is also possible to be identified as accurately identifying and acquiring a certain degree of difficulty " at a quarter two ".

Therefore, to solve similar problems, corresponding conversion process scheme, including several default transformation rules are proposed.

The present invention is using first identifying transition entry, then the mode converted to transition entry realizes that it is pre- that output meets The final result of phase.

By defining default transformation rule, number is carried out to recognition result text by default transformation rule, fixed expression is practised It is used etc. to be handled, so that the result of output meets engineering discipline standard requirements, largely reduce manual amendment's amount.

At step 14, as shown in figure 3, default transformation rule includes at least:

Transformation rule 1: " the first expression " is converted into " the second expression ".Wherein transformation rule 1 includes transformation rule 2, turns Change rule 3 and/or transformation rule 4 and other customized transformation rules.

Transformation rule 2: Chinese figure is converted into Arabic numerals；

For example, " 0. 9 " are converted to 0.9, " Ling Dianjiu " of phonetically similar word is also required to be converted to 0.9, a kind of optional reality Existing mode are as follows: recognition result is first converted to phonetic, continuous multiple words with digital unisonance is found out, is converted to number, such as ling Dian jiu, three phonetics belong to the phonetic of number, it should be converted to number 0.9.

The unisonance character of digital (including 0-9 and decimal point " ") can indicate with asterisk wildcard pnum, transformation rule 2 it is thin Then it is exemplified below:

$ pnum pnum=1 2//and by adjacent 2 digital phonetically similar word symbols, it is corrected as 2 numbers

Number=1 2/ $ pnum/and by digital unisonance character+number, it is corrected as 2 numbers

$ number pnum=1 2//and by number+digital unisonance character, it is corrected as 2 numbers

Two, at two=2 $ 1

Measurement unit=1 2/ $ pnum/and by digital unisonance character+measurement unit, it is corrected as number+measurement unit

Number and pronunciation and identical 2 characters of number are converted to 2 numbers by the above rule, and recursive call can incite somebody to action Continuous number pronunciation character switchs to number, as can by " some clothing 2 " result be converted to correct number " 1.12 ".

Transformation rule 3: text number-mark is converted into predetermined symbol, predetermined symbol includes: half-angle or double byte character "-", " # " or "~".

In voice broadcast component number, usually digital number+" # "+Component Category such as " 1-2# beam " is also being numbered It is middle to indicate multiple continuous members, such as " 1-2~4# support " with "~" number connection start-stop digital number, represent 1-2# support, 1-3# support, 1-4# support, totally 3 components are numbered."-", " # ", "~" symbol in number, general phonitic entry method do not have Standard pronunciation is difficult identification correctly, is accustomed to according to engineering detecting on-the-spot report, and for "-" pronunciation with " thick stick ", " # " pronunciation is same " number ", "~" pronunciation is same " extremely ", meanwhile, recognition result is also text " thick stick ", " number ", " extremely " or its phonetically similar word, this identification knot accordingly Fruit needs manual modification, low efficiency for user.Solution provided by the invention is converted by setting number-mark Rule is handled, comprising:

$ number gang=$ 1- // by number+pronunciation be " gang " character, be corrected as number+"-"

Gang $ number=- $ 1//by pronunciation be " gang " character+number, be corrected as "-"+number

$ number hao=$ 1# // by number+pronunciation be " hao " character, be corrected as number+" # "

$ number zhi=$ 1~// by number+pronunciation be " zhi " character, be corrected as number+"~"

$ number zi=$ 1~// by number+pronunciation be " zi " character, be corrected as number+"~", take into account zh and z pronunciation Non-type user

By the above transformation rule, such as " 1 beam of steel 2 " " 1-2 beam ", " the good beam of 1-2 " can be correctly processed into symbol " the 1-2# beam " of industry expression convention is closed, " No. 4 supports of 1-2 matter ", " No. 4 supports of 1-2 matter " can also be correctly processed into and meet " 1-2~4# support " of industry expression convention.

Transformation rule expression in the present embodiment is merely illustrative, does not constrain, other Regularias are same.

Transformation rule 4: the English alphabet that Chinese measurement unit is converted to the International System of Units is expressed；

In engineering detecting, it usually needs measure the geometric attribute of defect, such as length, width, area etc., these attributes are general There are specific unit, such as " rice " " millimeter " " square metre ", when being broadcasted with Chinese pronunciation, the measurement list of speech recognition engine return Position is typically all Chinese, for example, " 1 meter " " 2 square metres " etc., user needs to be revised as the satisfactory International System of Units English alphabet expression, such as needs " rice " therein being revised as " m ", " square metre " is revised as " m²", by artificial treatment language Sound recognition result, inefficiency.

The present invention executes related conversion by setting unit transformation rule, to recognition result text, can increase substantially User inputs the feasibility of measurement unit by voice.

Context constraint can be added in order to avoid transcription error, in usual transformation rule, such as be by character combination " digital+ The character of Chinese measurement unit " is converted to " number+International System of Units English alphabet ", and transformation rule is similar:

$ number square metre=$ 1m²

$ number millimeter=$ 1mm

In addition, default transformation rule can also include: that other transformed representations, conversion dictionary or agreement need to convert Entry after entry and conversion, may include the asterisk wildcard for representing specific character.For example, this is big in the defect description of concrete Amount uses, such as user inputs description, and survey crew's casting " peels off dew muscle, area 0. 8 multiplies 0. 6 square meters ", recorder The text of member's record is usual are as follows: " peels off dew muscle, S:0.8 × 0.6m²", S represents area.Casting content is inputted with speech recognition When, the pronunciation of the letter such as S may usually identify mistake, and directly be pronounced with " area ", pass through the transformation rule 2-4 of front After processing, the recognition result text exported after speech processes " peels off dew muscle, 0.8 × 0.6m of area²" in area be replaced by "S:".In component number, " L " representative " left width " usually is used, with " R " representative " right width ", with " S " representative " uplink ", " X " is represented Downlink, these symbols, when directly being pronounced with English alphabet, recognition result is unreliable, and such as " X " directly pronounces to know as usual Not Wei " Ai Kesi ", can arrange substitute, define transformation rule for substitute and be converted to target character, such as:

Area=S:

Uplink=S

Downlink=X

Left width=L

Right width=R

In this way, symbol that can be required with the pronunciation input of agreement, including various spcial characters.In order to avoid conversion is wrong Accidentally, transformation rule scene can carry out Classification Management and utilization according to demand, if ad hoc rules is only applicable in when component is numbered and identified, It is inadaptable when carrying out defect description identification.Context determination can also be carried out in the definition or execution of transformation rule to reduce mistake Accidentally.

Equally, the frequent fault recognition result that transformed representation can be used for particular community is forced to correct, such as In component number, " production " identified is actually " support " certainly, passes through maintenance transformation rule and numbers identification in component Error correction is realized using the rule when processing:

Production=support

Following rule is some examples of transformation rule 1, but is not limited only to this.

Regular 1-1: establishing " $ number rice is multiplied by=$ 1m × ", and expression will be similar to that " 3 meters multiplied by " are converted to " 3m × "；

Regular 1-2: " two meters=2 meters of ^ " indicates two meters to replace with 2 meters

Regular 1-3: " $ number rice=$ 1m " indicates number+rice to be corrected as number+m, is such as converted to 2m for 2 meters.

For " three meters multiplied by two meters " in recognition result text, successively call the result of the above transformation rule as follows:

Regular 1-1 is converted to " 3m × two meter " after executing

Regular 1-2 is continued to execute, " 3m × 2 meter " are converted to

Regular 1-3 is continued to execute, " 3m × 2m " is converted to, obtains the final result for meeting code requirement.

Assuming that " recognition result text " has attribute or can be carried out attribute cutting, attribute includes component " number ", " defect Position ", " defect description " etc. can establish exclusive transformation rule, the recognition result text application of different attribute for different attributes Different transformation rules are converted.Same recognition result text, when belonging to different attribute, transformation result depends on being adopted Transformation rule, its transformation result of transformation rule difference may also be different.For example, transformation rule can be passed through for number By the wind in " 1-2 wind " " it is corrected as " stitching ", but " wind " in " weathering " in cannot describing defect is corrected as " stitching ".

As shown in figure 3, default transformation rule can also include: user's error correction dictionary, user's error correction dictionary records user couple The error correction entry obtained after the modification replacement operation of recognition result text, for " modification before expression " to be converted to " table after modification Up to ".

" recognition result text " is edited for example, providing interface for users, the content of editor front and back is compared, The corresponding relationship of entry and modified entry before identifying user's modification, by the entry and modified entry pair before modification It should be related to and default transformation rule is added as transformation rule, the conversion process for step 14.

User's error correction dictionary can also distinguish setting according to the attribute of " detection record ", it is assumed that " detection records " to Analysis recording data or " recognition result text " have attribute or can be carried out attribute cutting, attribute include component " number " and " defect description " etc., then different attribute produces different error correction dictionaries, turns to the recognition result text for being applied to different attribute It changes.

User's error correction dictionary can also carry out Identity Management personalized error correction dictionary is arranged, for entangling for different user Identification mistake caused by positive individual's pronunciation characteristic.

For example, it is " one meter of wind field that step 13, which calls speech recognition modeling to return to recognition result text, after user speech input Mad 0. 1 millimeters ", result is " wind field 1m madness 0.1mm " after the conversion of 14 steps, and user, which veritifies, finds that result is incorrect, It is revised as " stitching long 1m slit width 0.1mm ", by comparing the entry corresponding relationship for determining user's modification front and back, such as passes through text Editing distance scheduling algorithm is not difficult to find out user and " seam length " has been changed to " wind field ", and " madness " has been changed to " slit width ", before modification Error correction dictionary and " default transformation rule " is added in entry and modified entry corresponding relationship:

Wind field=seam is long

Madness=slit width

Newly-increased transformation rule can be used for subsequent step 14.When next time, speech recognition modeling returns to recognition result text When this " one meter of wind field ", after step 14 application this transformation rule processing, " wind field " will be replaced with " seam length ", continue to execute After other transformation rules, " one meter " is converted into " 1m ", and final output meets correct result expected from user " stitching long 1m ", constantly Default transformation rule is augmented, the result that there is identification mistake and do not meet expression specification that speech recognition modeling can be exported Disposable transformation is at the correct text for closing rule.

Further, in order to reduce " default transformation rule " identification mistake, the context knowledge before and after modification content can be added Not, more reliable more stable default transformation rule, in above-mentioned example, the number closely followed after modification entry, by the number are formed Transformation rule library is added in word feature:

Wind field $ num=stitches long $ 1

Mad $ num=slit width $ 1

In this way, will not be then replaced, " wind field is larger " would not when the successive character for encountering " wind field " is not number It is replaced by " seam length is larger ", and " wind field 2m " can then be converted into " stitching long 2m ".

It is not simply to see Chinese figure just to replace with number, such as " eight when specific design and application transformation rule " eight " of word wall " should not just replace, and " four " of " surrounding " should not also convert, by considering context in treaty rule Assemblage characteristic, it is possible to reduce the conversion of mistake, meanwhile, for being unable to the emphasis entry of transcription error, it is clear that exclusion can also be set It is single, or setting inverse conversion rule, inverse conversion rule, such as:

4 weeks=surrounding

8 word walls=aliform

Inverse conversion rule, may be by the fixation expression way of false transitions by those in the last execution of default transformation rule Be converted to correct expression way.

Based on method of the invention, it is additionally arranged step 14, the recognition result of engineering detecting field voice data can be improved The preciseness of the accuracy of output, expression way, makes to export text and meets engineering discipline, drastically reduce manually to text into The workload of edlin adjustment improves the practical value of voice input recognition result.

The method according to claim 1, wherein after the step 15 further include:

Step 16: based on detection record attribute information, generating examining report and/or query statistic report

The present invention does not limit the execution terminal of step 11,13,14,15 and 16, the execution terminal of either step either Mobile terminal is also possible to PC or server.

The recording data to be analyzed of step 11 in above method can be historical data, be also possible to any recording arrangement The real-time phonetic data of generation.

It is optional, can also include: after the step 14 of Fig. 1

Step 141: recognition result text will be had converted and be updated to the display content that detection records.

Wherein, the display interface of application program shows user for show content, consults and edits convenient for user, generally To show detection record one by one, if display content has corresponding call format, also need that recognition result text will be had converted Format adjust accordingly after show again.

Assuming that current detection record includes different N number of editable attributes, N >=1；Then in the recording interface pair of voice data The sub- button of N number of recording that should be arranged, be respectively used to current detection record each attribute recording, that is, record sub- button with can compile Attribute is collected to correspond.Each attribute can be believed according to dedicated default conversion sub-rule is arranged the characteristics of itself and presets simultaneously Breath extract sub-rule, compared to mix processing, refinement divide object processing can to avoid each object processing method that This interference, reduces mistake, it is ensured that obtains more accurate text-processing result and detection record attribute information extraction result.

Based on the design, " being based on default transformation rule, " in step 14 be can be adjusted to: corresponding default turn based on attribute Change sub-rule.

Similarly, " pressing presupposed information extracting rule " adjustment in step 13 are as follows: extracted by the corresponding presupposed information of attribute Sub-rule.

It further, is the shared father's button of the sub- button setting one of N number of recording, father's button is for controlling the whole of sub- button Body is shown to be adjusted with hiding and/or position.Or sound-recording function is arranged in the specific interactive action of father's button, as long-pressing is recorded Photo remarks processed or general defect broadcast voice.

Such as the editable attribute of current detection record includes " number ", " defective locations " and " defect description ", then is " volume Number " the independent sub- button of recording of setting, independent sub- button of recording is set for " defective locations ", for " defect description " setting independently recording Sub- button.One father's record button is set again, 2 or more sub- record buttons of its display, son are controlled by father's record button Record button corresponds respectively to the different attribute of current detection record, is stored in by the voice data that sub- record button is recorded Detection corresponding with the sub- record button records storage zone.

The corresponding relationship of sub- record button and attribute indicated by visual representation, such as identical color, caption. Such as the sub- button of defective locations recording increases title " defective locations ", the sub- button of recording of component number increases title " number ".

Further, before step 11 further include:

Step 10: any sub- button of recording generates the recording data to be analyzed that current detection records corresponding attribute.

Wherein step 10 may be configured as generating the method that Fig. 1 is immediately performed after recording data to be analyzed, on the other hand any The history recording data to be analyzed that the sub- button of recording generates can also trigger the method for executing Fig. 1 at any time, re-start identification and It extracts.

Based on the design of the sub- button of above-mentioned recording, step 14 can also include: that basis has converted the setting of recognition result text The icon or word tag of recording data to be analyzed.

If the content for having converted recognition result text is less, it can will have converted recognition result text and all be both configured to The icon or word tag of recording data to be analyzed.If the content for having converted recognition result text is more, display is considered Space is limited, the icon or text that can will be had converted the key content in recognition result text and be set as recording data to be analyzed Label.

In this way, it is aobvious that there is icon or word tag recording data to be analyzed can synchronize in the recording interface of voice data Show, intuitively understand convenient for user have converted recognition result content of text or the key that has converted in recognition result content of text in Hold, after recording data to be analyzed saves, playback check and correction can be carried out to recording data to be analyzed according to user instructions and re-recognized Conversion.

If a certain recording data to be analyzed corresponding " having converted recognition result text " is " 1-2# beam ", then the record to be analyzed Sound data literal label is shown as " 1-2# beam ".As another recording data to be analyzed corresponding " having converted recognition result text " is " 1 longitudinal crack stitches long 1m ", and label text can be set to " 1 longitudinal crack " or " long 1m " is stitched in crack.

After sub- button of recording generates recording data to be analyzed, recording data to be analyzed and/or the recording number to be analyzed are saved Recognition result text is had converted according to associated, and by the recording data to be analyzed and/or associated has converted recognition result text With the corresponding Attribute Association of current detection record.It is synchronous to save that voice data is corresponding with the voice data to have converted recognition result Text can be convenient user and check and correction or re-recognize to having converted recognition result text and carry out review, be not easy at the scene into When row identification, the recording of preservation can be used for later period progress speech recognition.

The received pronunciations input method input voice data such as winged, Baidu of traditional news is identified, is recorded due to not having to save Data, if recording quality is problematic or recognition result is wrong, the later period is difficult error correction by memory, this is also speech recognition The input technology one of the major reasons limited in detection field application.Recording data and detection record (are had converted knowledge by the present invention Other resulting text) associated storage, facilitate later period playback check and correction and re-recognize, this can increase substantially the fault-tolerant of speech recognition Property and reliability.

Identification process of the invention, detection record attribute extract, audio playback check and correction, conversion, process, can move in detection Dynamic terminal is implemented, and can also implement in server-side and the end PC.

Component serial number and classification of the attribute " number " to describe detection record, for the ease of user's operation, when having converted Recognition result text corresponds to attribute when being number, identifies that this has converted the Component Category in recognition result text, lookup component class Not corresponding defective locations template and defect description template are shown corresponding when user records defective locations and defect describes Template.

There is provided template can help user using unified expression way casting voice, avoid the randomness of recognition result, The accuracy for also contributing to improving identification improves normalization, the uniformity of detection unit coherent detection record.

Shown template can be used as when user broadcasts voice and refer to, can also support user click template carry out it is defeated Enter.

When describing the sub- button recorded speech of recording by defect, show the defect description template of selected defect type with side Just new user's specification expression way improves the normalization of detection record.

Such as: user inputs " 1-2# beam ", judges that element type is " beam ".

Defective locations template filter:

The relevant defective locations template with beam is filtered out, such as:

" away fromGreatlyPile No. direction beam-ends * m, away fromDownstreamSide * mBottom surface”

" soffit1/4LPlace " etc.；

The screening of defect description template:

The defect type and its description template of beam are filtered out, such as:

Crack:

1 longitudinal crack stitches long 4m, slit width 0.03mm

Chicken-wire cracking at 1, area 1.5m*1.2m

Voids and pits:

Voids and pits, 0.8 ㎡ of area

...

Template can be specific text, also may include asterisk wildcard, such as:

Voids and pits, area $ Num ㎡

$ Num in template represents number.

It can be the different corresponding input interfaces of template-setup.

Defective locations template shows that defect description template is recorded in user and lacked when user records defective locations voice Fall into description voice when show (such as pressing sub- button of accordingly recording), by prompt user according to suggestion in a manner of broadcast, with Ensure the standardization of record, display mode and content design as needed.

Intuitively understand to detect for the ease of user and record corresponding text information, after step 14 further include:

Step 17: display is each to detect the figure for recording the recording data to be analyzed and recording data to be analyzed of each Attribute Association Mark or word tag content；Having turned for the recording data to be analyzed for showing each Attribute Association is corresponded in the edit page of detection record Change recognition result text；It responds user and drags the order for adjusting the position of any recording data to be analyzed, corresponding regulating object category Any associated position for having converted recognition result text of recording data to be analyzed in the edit page of property.

In the edit page of object properties, if the same attribute of same target includes that at least two has converted recognition result text This, is inserted into separator between recognition result text having converted.

Multistage voice can be recorded and be saved to each attribute of detection record, corresponding to show multiple voice labels, and response is used Voice label selected by user is moved to new target position, while adjusting other to the touch drag operation of voice label by family The sequence of impacted voice label reconfigures display according to voice label sequence under attribute and has converted recognition result text.

Such as, attribute is being described to defect, the text label that the defect of typing describes voice icon 1 is shown as " peeling off dew Muscle ", the text label that defect describes voice icon 2 are shown as " S=0.5 ㎡ ", are combined into continuous recording text and " peel off dew Muscle, S=0.5 ㎡ ".When voice label " S=0.5 ㎡ " is dragged to " peeling off dew muscle " front by user, voice label exchange display Sequentially, corresponding combination recording text also becomes " S=0.5 ㎡ peels off dew muscle "

Preferentially use and be the reason of subsection record: when on-site test, the geometrical characteristic of defect needs to measure respectively, Several attributes of defect generally can not be disposably broadcasted, or even is also calculated or is broadcasted after being estimated.Such as it sends out first Existing " peeling off dew muscle " defect, is first broadcasted, and is estimated its area again after casting " peeling off dew muscle ", is then broadcasted " S=0.5 ㎡ ", It has converted recognition result text and is combined to display to facilitate user to modify text, generally required between different attribute with ", " It number separates, ", " can add automatically, also can according to need the rule of setting addition ", ".

To sum up, the method for the present invention supports following operation:

I. sound bite (recording data to be analyzed) can be set as needed whether carry out speech processes (method of Fig. 1) or Person's speech processes (method of Fig. 1) result whether required to update current detection record or current detection record attribute；

Ii. for sound bite (recording data to be analyzed) generate an icon relevant to recognition result text is had converted or Word tag；

Iii. sound bite (recording data to be analyzed) supports drag operation, according to the direction of dragging, distance, end position Sound bite is operated, comprising: adjustment sequence adjusts corresponding attribute, delete etc.；

1. meeting when by the distance of icon progress dragging up and down, dragging more than after the preset threshold or position of dragging end When preset condition, for example, showing dustbin icon when dragging on interface, which being dragged on dustbin icon and is released It puts, deletes the sound bite；Revocation is supported to reform the operation of sound bite；

2. being closed by dragging apart from size and with the position of icon in other same areas when icon carries out horizontal dragging System carries out the additions and deletions of punctuation mark or the sequence adjustment of sound bite；

An icon is dragged to the right, when there is icon in its left side, if the distance of dragging is in preset threshold range, such as There is no ", " number on the left of the fruit icon, increases ", " number on the left of the icon.

After sound bite (recording data to be analyzed) dragging sequence adjusts, corresponding has converted recognition result text root It carries out reconfiguring display according to the sequence of voice segments.

The present invention also provides a kind of voice processing apparatus and comprising the system of the device, including processor and above-mentioned non- Instantaneous computer readable storage medium.

As shown in figure 4, voice processing apparatus includes:

Voice obtains module: obtaining and records associated recording data to be analyzed with detection；

Speech recognition module: it based on the speech recognition modeling with engineering detecting specialized vocabulary recognition capability, is analysed to Voice data is converted to recognition result text；

Extraction module: pressing presupposed information extracting rule, and detection record attribute information is extracted from recognition result text.

Optionally, after speech recognition module and before extraction module further include:

Text conversion module module: being to meet engineering detecting to use by recognition result text conversion based on default transformation rule The recognition result text of language specification.

Optionally, after extraction module further include:

Examining report generation module: based on detection record attribute information, examining report and/or query statistic report are generated.

Optionally, the default transformation rule in text conversion module includes at least:

Transformation rule 1: " the first expression " is converted into " the second expression ".

And transformation rule 1 may include transformation rule 2, transformation rule 3 and/or transformation rule 4:

Transformation rule 2: Chinese figure is converted into Arabic numerals；

Transformation rule 3: text number-mark is converted into predetermined symbol, the predetermined symbol includes: half-angle or Fully Formed Character The "-" of symbol, " # " or "~".

Transformation rule 4: the English alphabet that Chinese measurement unit is converted to the International System of Units is expressed.

Further, default transformation rule further includes user's error correction dictionary, and user's error correction dictionary, which records user, to be known to having converted The error correction entry obtained after the modification replacement operation of other resulting text, for " modification before expression " to be converted to " table after modification Up to ".

Optionally, above-mentioned presupposed information extracting rule includes at least:

Extracting rule 3: Define defects describe extracting rule, for proposing " description of the first defect " in recognition result text It is taken as " description of the second defect ".

Optionally, current detection record includes different N number of editable attributes, N >=1；The device further includes and currently examines Survey the sub- button of N number of recording that the attribute of record is correspondingly arranged；It and is the shared father's button of the sub- button setting one of N number of recording, father The whole display that button is used to control sub- button is adjusted with hiding and/or position；The device further include:

Record module: any sub- button of recording generates the recording data to be analyzed that current detection records corresponding attribute.Recording Sub- button and father's button, which are arranged at, to be recorded in module.

Optionally, text conversion module further include: recording number to be analyzed is set according to recognition result content of text is had converted According to icon or word tag.

Into one: recording module and save recording data to be analyzed, and the recording data to be analyzed and current detection are recorded Correspondence Attribute Association.Text conversion module preservation has converted recognition result text, and by this have converted recognition result text with The correspondence Attribute Association of current detection record.

Optionally, attribute includes at least number, defective locations and defect description, numbers the component to describe detection record Classification and serial number, text conversion module further include:

When having converted recognition result text to correspond to attribute is number, identification has converted the component class in recognition result text Not, the corresponding defective locations template of Component Category and defect description template are searched, defective locations is recorded in user or defect is retouched Triggering records module and shows corresponding template when stating.

Optionally, the device further include:

Display module: each recording data to be analyzed and recording data to be analyzed for detecting and recording each Attribute Association is shown Icon or word tag content；The recording data to be analyzed of each Attribute Association of display has been corresponded in the edit page of detection record Conversion identification resulting text；It responds user and drags the order for adjusting the position of any recording data to be analyzed, corresponding regulating object Any associated position for having converted recognition result text of recording data to be analyzed in the edit page of attribute.

Further, in the edit page of object properties, if the same attribute of same target includes that at least two has converted knowledge Other resulting text is inserted into separator between recognition result text having converted.

Further, default transformation rule includes N number of default conversion sub-rule corresponding with N number of attribute；

It is adjusted based on default conversion sub-rule are as follows: be based on the corresponding default conversion sub-rule of attribute.

Further, presupposed information extracting rule includes that N number of presupposed information corresponding with N number of attribute extracts sub-rule；Phase Ying Di is adjusted by presupposed information extracting rule are as follows: extracts sub-rule by the corresponding presupposed information of attribute.

Except above-mentioned module, which can also include:

Detection record management module: for managing history detection record, update, the deletion, position tune of detection record are supported Whole equal operation.It should be noted that the embodiment of voice data processing apparatus of the invention, the reality with voice data processing method It is identical to apply a principle, related place can mutual reference.

It should be noted that the embodiment of the present invention is by taking bridge as an example, when concrete application, it is also applied in tunnel, harbour code Head, dam, the engineering detecting of building construction etc., the furthermore embodiment of voice data processing apparatus of the invention, with voice data The embodiment principle of processing method is identical, and related place can mutual reference.

The foregoing is merely illustrative of the preferred embodiments of the present invention, not to limit scope of the invention, it is all Within the spirit and principle of technical solution of the present invention, any modification, equivalent substitution, improvement and etc. done should be included in this hair Within bright protection scope.

Claims

1. a kind of method of speech processing characterized by comprising

Step 13: based on the speech recognition modeling with engineering detecting specialized vocabulary recognition capability, by the voice number to be analyzed According to being converted to recognition result text；

Step 15: pressing presupposed information extracting rule, detection record attribute information is extracted from the recognition result text.

2. the method according to claim 1, wherein after the step 15 further include:

Step 16: being based on the detection record attribute information, generate examining report and/or query statistic report.

3. the method according to claim 1, wherein also being wrapped after the step 13 and before the step 15 It includes:

Step 14: being to meet the knowledge of engineering detecting language specification by the recognition result text conversion based on default transformation rule Other resulting text.

4. according to the method described in claim 3, it is characterized in that, the default transformation rule includes at least:

5. according to the method described in claim 4, it is characterized in that, the transformation rule 1 includes transformation rule 2, transformation rule 3 And/or transformation rule 4:

Transformation rule 2: Chinese figure is converted into Arabic numerals；

Transformation rule 3: text number-mark is converted into predetermined symbol, the predetermined symbol includes: half-angle or double byte character "-", " # " or "~".

6. according to the method described in claim 4, it is characterized in that, the default transformation rule further includes user's error correction dictionary, User's error correction dictionary record user uses the error correction entry obtained after the modification replacement operation for having converted recognition result text " expressing after modification " is converted in that " will express before modification ".

7. the method according to claim 1, wherein the presupposed information extracting rule includes at least:

Extracting rule 1: defining component number extracting rule, for " the first number is expressed " in recognition result text to be extracted as " the second number expression "；

Extracting rule 2: Define defects position extracting rule, for " the first position expression " in recognition result text to be extracted as " second position expression "；

Extracting rule 3: Define defects describe extracting rule, for " description of the first defect " in recognition result text to be extracted as " description of the second defect ".

8. according to any method of claim 3-6, which is characterized in that current detection record includes different N number of compiles Collect attribute, N >=1；N number of recording being correspondingly arranged the method also includes the attribute recorded with the current detection by Button；And for the shared father's button of N number of sub- button setting one of recording, the entirety that father's button is used to control sub- button is aobvious Show and hide and/or position adjust；Before the step 11 further include:

Step 10: any sub- button of recording generates the recording data to be analyzed that the current detection records corresponding attribute.

9. according to the method described in claim 8, it is characterized in that, the step 14 further include: according to having converted recognition result The icon or word tag of the recording data to be analyzed is arranged in text.

10. according to the method described in claim 8, it is characterized in that, after the step 10 further include: save described to be analyzed Recording data, and the corresponding Attribute Association that the recording data to be analyzed is recorded with the current detection.

11. according to the method described in claim 8, it is characterized in that, the attribute includes at least number, defective locations and defect Description, Component Category and serial number of the number to describe detection record, the step 14 further include:

When having converted recognition result text to correspond to attribute is number, the component class in recognition result text is had converted described in identification Not, the corresponding defective locations template of the Component Category and defect description template are searched, record defective locations in user or is lacked Corresponding template is shown when falling into description.

12. according to the method described in claim 9, it is characterized in that, including: after the step 14

Step 17: display is each to detect the figure for recording the recording data to be analyzed and the recording data to be analyzed of each Attribute Association Mark or word tag content；Having turned for the recording data to be analyzed for showing each Attribute Association is corresponded in the edit page of detection record Change recognition result text；It responds user and drags the order for adjusting any recording data position to be analyzed, corresponding regulating object attribute Edit page described in any associated position for having converted recognition result text of recording data to be analyzed.

13. according to the method for claim 12, which is characterized in that in the edit page of the object properties, if same The same attribute of object includes that at least two has converted recognition result text, has converted insertion point between recognition result text described Every symbol.

14. according to the method described in claim 8, it is characterized in that, the default transformation rule includes corresponding with the attribute N number of default conversion sub-rule；

It is described based on the default conversion sub-rule include: based on the corresponding default conversion sub-rule of attribute.

15. method according to claim 8, which is characterized in that the presupposed information extracting rule includes corresponding with the attribute N number of presupposed information extract sub-rule；

Described by presupposed information extracting rule includes: to extract sub-rule by the corresponding presupposed information of attribute.

16. a kind of non-transitory computer-readable storage medium, the non-transitory computer-readable storage medium store instruction is special Sign is that described instruction executes the processor as described in any in claim 1 to 15 Step in voice data processing method.

17. a kind of voice processing apparatus, which is characterized in that including processor and non-instantaneous computer as claimed in claim 16 Readable storage medium storing program for executing.