CN109710904A - Text accuracy rate calculation method, device, computer equipment based on semanteme parsing - Google Patents
Text accuracy rate calculation method, device, computer equipment based on semanteme parsing Download PDFInfo
- Publication number
- CN109710904A CN109710904A CN201811348583.1A CN201811348583A CN109710904A CN 109710904 A CN109710904 A CN 109710904A CN 201811348583 A CN201811348583 A CN 201811348583A CN 109710904 A CN109710904 A CN 109710904A
- Authority
- CN
- China
- Prior art keywords
- text
- track
- editing distance
- distance matrix
- character
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000004364 calculation method Methods 0.000 title claims abstract description 21
- 239000011159 matrix material Substances 0.000 claims abstract description 474
- 238000013518 transcription Methods 0.000 claims abstract description 259
- 230000035897 transcription Effects 0.000 claims abstract description 259
- 238000004422 calculation algorithm Methods 0.000 claims abstract description 23
- 238000000034 method Methods 0.000 claims abstract description 14
- 238000004590 computer program Methods 0.000 claims description 12
- 238000012216 screening Methods 0.000 claims description 10
- 230000001965 increasing effect Effects 0.000 claims description 8
- 238000010586 diagram Methods 0.000 description 2
- 238000013461 design Methods 0.000 description 1
- 230000002708 enhancing effect Effects 0.000 description 1
- 239000000686 essence Substances 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 230000003068 static effect Effects 0.000 description 1
- 230000001360 synchronised effect Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/26—Speech to text systems
Landscapes
- Engineering & Computer Science (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Human Computer Interaction (AREA)
- Physics & Mathematics (AREA)
- Acoustics & Sound (AREA)
- Multimedia (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
This application involves semantic analytic technique field, in particular to a kind of text accuracy rate calculation method based on semanteme parsing, device, computer equipment.When the initial point of template text starts by transcription, establish editing distance matrix, calculate the value of each element in editing distance matrix, track matrix is generated according to the calculating track of the value of each element in editing distance matrix, calculate the similarity of each track in the matrix of track, it screens the highest track of similarity and obtains the first track, part transcription text corresponding terminal on template text is obtained according to the first track, to obtain new template text, part transcription text and new template text are compared again, the accuracy rate of calculating section transcription text, aim to solve the problem that the transcription accuracy rate algorithm of existing text, the full text of text and template text that transcription comes out is compared, when in part, text is come out by transcription, the problem of transcription accuracy rate of text cannot accurately be calculated.
Description
Technical field
This application involves semantic analytic technique field, in particular to a kind of text accuracy rate calculating side based on semanteme parsing
Method, device, computer equipment.
Background technique
When counting ASR (speech recognition) engine transcription accuracy rate, common algorithm is editing distance algorithm.The calculation
Method changes into minimum edit operation needed for template text (edit operation includes: to replace a character by counting transcription text
At another character, it is inserted into a character, deletes a character) number calculates the similarity of transcription text Yu template text
(transcription accuracy rate).But under the scene of the real-time transcription accuracy rate in concern ASR engine, the calculated result of the algorithm can not be enabled
People is satisfied.Since the algorithm always takes the full text of the text and template text of transcription out to compare, when
When only part text is come out by transcription, which can not be accurately calculated turning for the text that this part transcription comes out
Write accuracy rate.Therefore, editing distance under the scene of the concern real-time transcription accuracy rate of ASR engine and is not suitable for.
Apply for content
In view of the shortcomings of the prior art, the application propose it is a kind of based on semanteme parsing text accuracy rate calculation method, device,
Computer equipment, it is intended to the transcription accuracy rate algorithm for solving existing text, the text and template text that transcription is come out
Full text compare, when text is come out by transcription in part, cannot accurately calculate the transcription accuracy rate of text
Problem.
The technical solution that the application proposes is:
A kind of text accuracy rate calculation method based on semanteme parsing, which comprises
It obtains since the initial point of template text by the part transcription text of transcription;
Using the length of the length of the template text character two characters of increase as columns, with the part transcription text word
The length that the length of symbol increases by two characters is line number, establishes editing distance matrix;
According to the part transcription text, the template text, the value of each element in the editing distance matrix is calculated;
The calculating track of the value of each element in the editing distance matrix is recorded, is generated corresponding with the editing distance matrix
Track matrix;
The similarity for calculating each track in the track matrix screens the part transcription text and the template text phase
Like a highest track is spent, the first track is obtained;
According to first track, the part transcription text corresponding terminal on the template text is determined, obtain
First terminal point;
According to the initial point and the First terminal point of the template text, new template text is obtained from the template text;
The part transcription text and the new template text are compared, the portion is calculated by editing distance algorithm
Divide the accuracy rate of transcription text.
Further, described using the length of the length of the template text character two characters of increase as columns, with institute
It is line number that the length for stating part transcription text character, which increases the length of two characters, after the step of establishing editing distance matrix,
Described according to the part transcription text, the template text, the step of the value of each element in the editing distance matrix is calculated
Before rapid, comprising:
The character of the template text is inputted since the third element of the first row of the editing distance matrix;
The character of the part transcription text is inputted since the third element of the first row of the editing distance matrix;
The value for defining second element in the second row of the editing distance matrix is 0;
With the value of second element in the second row of the editing distance matrix for 0 numerical value 1 incremented by successively, institute is initialized
State the value of each element of the second row of editing distance matrix;
With the value of second element in the secondary series of the editing distance matrix for 0 numerical value 1 incremented by successively, institute is initialized
State the value of each element of the secondary series of editing distance matrix.
Further, the value for each element not being initialised in the editing distance matrix is by its left, the upper left corner, top
In the value of some element determine, described according to the part transcription text, the template text, calculate the editor
In distance matrix the step of the value of each element in, comprising:
Identify columns, line number at the third element in the third column of the editing distance matrix;
Identify that columns at the third element in the third column of the editing distance matrix, line number respectively correspond institute
State the character of template text, the character of the part transcription text;
Judge that columns at the third element in the third column of the editing distance matrix corresponds to the template text
This character part transcription text corresponding with line number at the third element in the third of editing distance matrix column
Whether this character is equal;
If columns at the third element in the third column of the editing distance matrix corresponds to the template text
Character and the editing distance matrix third column in third element at the corresponding part transcription text of line number
Character it is equal, then the editing distance matrix third column in third element value for its upper left corner element value;
If columns at the third element in the third column of the editing distance matrix corresponds to the template text
Character and the editing distance matrix third column in third element at the corresponding part transcription text of line number
Character it is unequal, then the editing distance matrix third column in third element value be its left, the upper left corner, top
Element in minimum value add 1 to obtain;
The value of the 4th element in the third column of the editing distance matrix is successively calculated, until completing to calculate the volume
Collect the value of each element in distance matrix.
Further, in the calculating track for recording the value of each element in the editing distance matrix, generate with it is described
In the step of editing distance matrix corresponding track matrix, comprising:
Record the calculating track of the value of each element in the editing distance matrix;
According to the calculating track of the value of each element in the editing distance matrix, each member in the editing distance matrix is marked
The value of element generates origin;
After completing label, track matrix corresponding with the editing distance matrix is generated.
Further, in described the step of calculating the similarity of each track in the track matrix, comprising:
Identify the character of part transcription text described in each track and the corresponding template text in the track matrix
The equal number of character, obtain equal character number;
Compare the length of the character of part transcription text described in each track and the corresponding mould in the track matrix
The length of the character of plate text chooses the total as character of length length;
The ratio of the equal character number of each track and corresponding character sum in the track matrix is calculated, described in acquisition
The similarity of each track in the matrix of track.
Further, determine the part transcription text on the template text according to first track described
In the step of corresponding terminal, acquisition First terminal point, comprising:
Mark the last one element in first track;
According to the last one element in first track, the character of the corresponding template text is marked, obtains first
Terminal.
Further, in the initial point and the First terminal point according to the template text, from the template text
In the step of obtaining new template text, comprising:
The first character that the template text is marked in the template text is initial point;
The character between the initial point and the First terminal point of the template text is intercepted, wherein the initial point of the template text
Character between the First terminal point includes that the corresponding character of the initial point of the template text and the First terminal point are corresponding
Character;
Text is generated according to the character being truncated to, obtains the new template text.
The application also provides a kind of text accuracy rate computing device based on semanteme parsing, and described device includes:
First obtains module, for obtaining the part transcription text since the initial point of template text by transcription;
Establish module, for increased using the length of the template text character length of two characters as columns, with described
The length that the length of part transcription text character increases by two characters is line number, establishes editing distance matrix;
First computing module, for calculating the editing distance square according to the part transcription text, the template text
The value of each element in battle array;
Generation module generates and the volume for recording the calculating track of the value of each element in the editing distance matrix
Collect the corresponding track matrix of distance matrix;
Screening module, for calculating the similarity of each track in the track matrix, screen the part transcription text with
The highest track of template text similarity obtains the first track;
Module is obtained, for determining that the part transcription text is right on the template text according to first track
The terminal answered obtains First terminal point;
Second obtains module, for the initial point and the First terminal point according to the template text, from the template text
Middle acquisition new template text;
Second computing module passes through editor for comparing the part transcription text and the new template text
Distance algorithm calculates the accuracy rate of the part transcription text.
The application also provides a kind of computer equipment, including memory and processor, and the memory is stored with computer
The step of program, the processor realizes method described in any of the above embodiments when executing the computer program.
The application also provides a kind of computer readable storage medium, is stored thereon with computer program, the computer journey
The step of method described in any of the above embodiments is realized when sequence is executed by processor.
According to above-mentioned technical solution, the application is established and is compiled the utility model has the advantages that when the initial point of template text starts by transcription
Distance matrix is collected, the value of each element in editing distance matrix is calculated, according to the calculating rail of the value of each element in editing distance matrix
Mark generates track matrix, calculates the similarity of each track in the matrix of track, and the highest track of screening similarity obtains first
Track obtains part transcription text corresponding terminal on template text according to the first track, so that new template text is obtained, then
Part transcription text and new template text are compared, the accuracy rate of calculating section transcription text, it is intended to solve existing text
This transcription accuracy rate algorithm compares the full text of text and template text that transcription comes out, in part text
When this is come out by transcription, the problem of cannot accurately calculating the transcription accuracy rate of text.
Detailed description of the invention
Fig. 1 is the flow chart using the text accuracy rate calculation method provided by the embodiments of the present application based on semanteme parsing;
Fig. 2 is the functional module using the text accuracy rate computing device provided by the embodiments of the present application based on semanteme parsing
Figure;
Fig. 3 is the structural schematic block diagram using computer equipment provided by the embodiments of the present application.
Specific embodiment
It is with reference to the accompanying drawings and embodiments, right in order to which the objects, technical solutions and advantages of the application are more clearly understood
The application is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the application, and
It is not used in restriction the application.
As shown in Figure 1, the embodiment of the present application proposes a kind of text accuracy rate calculation method based on semanteme parsing, the side
Method the following steps are included:
Step S101, it obtains since the initial point of template text by the part transcription text of transcription.
By transcription since the initial point of template text, and template text is all by transcription, that is, from template text
First character starts by transcription, but the end point of transcription is instead of in the last character of template text, in addition to
Except the last character of template text, any one character in template text.Due to not being whole to template text
The transcription of character, for this purpose, the text obtained by transcription is known as part transcription text since the initial point of template text.
Template text is a correct text, the text for comparing with part transcription text.
Above-mentioned transcription refers to that by ASR (speech recognition) engine be text by speech transcription.
Step S102, using the length of the length of the template text character two characters of increase as columns, with the part
The length that the length of transcription text character increases by two characters is line number, establishes editing distance matrix.
In the present embodiment, template text is the writing text for rejecting punctuation mark.Transcription text in part is to reject punctuate
The writing text of symbol.
The length for obtaining template text character, the length of two characters is further added by according to template text character length, as
Columns.The length of fetching portion transcription text character is further added by the length of two characters according to the length of part transcription text character
Degree, as line number, then using the length of the length of template text character two characters of increase as columns, with part transcription text word
The length that the length of symbol increases by two characters is line number, establishes editing distance matrix.Template text character length is further added by two
The length of character is as the purpose of line number as the length that the length of columns, part transcription text character is further added by two characters
In order to distinguish input template text, part transcription text on the first row, first row, and it is first in the second row, secondary series input
The value of beginningization.
Specifically, after step s 102, and before step S103, comprising:
The character of the template text is inputted since the third element of the first row of the editing distance matrix;
The character of the part transcription text is inputted since the third element of the first row of the editing distance matrix;
The value for defining second element in the second row of the editing distance matrix is 0;
With the value of second element in the second row of the editing distance matrix for 0 numerical value 1 incremented by successively, institute is initialized
State the value of each element of the second row of editing distance matrix;
With the value of second element in the secondary series of the editing distance matrix for 0 numerical value 1 incremented by successively, institute is initialized
State the value of each element of the secondary series of editing distance matrix.
The character of input template text in the first row of editing distance matrix, specifically, from the of editing distance matrix
The third element of a line starts the character of input template text.Accordingly, the input unit in the first row of editing distance matrix
Divide the character of transcription text, specifically, the importation transcription text since the third element of the first row of editing distance matrix
This character.The first row of editing distance matrix, the third element of first row start the character of input template text, portion respectively
The character for dividing transcription text, each character and each character of part transcription text for making template text are all in the presence pair of editing distance matrix
It should be related to, in addition, the numerical value also for the initialization to the second row, secondary series provides corresponding positional relationship.Firstly, definition
The value of second element in second row of editing distance matrix is 0, then, with second in the second row of editing distance matrix
The value of a element be 0 numerical value 1 incremented by successively, initialize editing distance matrix the second row each element value, for example, editor away from
From in the second row of matrix second and third, the values of four, five elements be respectively 0,1,2,3.Define the second of editing distance matrix
The value of second element in row is 0, and substantially, the value for also defining second in the secondary series of editing distance matrix element is
0, because of second element in the second row of editing distance matrix and second element in the secondary series of editing distance matrix
It is in the same position, is 0 successively with the value of second element in the secondary series of editing distance matrix even if the same element
Incremental value 1 initializes the value of each element of the secondary series of editing distance matrix, for example, in the secondary series of editing distance matrix
Second and third, the values of four, five elements be respectively 0,1,2,3.In the initialization secondary series of editing distance matrix, the second row
After numerical value, the value for calculating each element in editing distance matrix can be can be carried out.
Step S103, according to the part transcription text, the template text, each member in the editing distance matrix is calculated
The value of element.
According to part transcription text, template text, it is, in editing distance matrix, the character of part transcription text
Whether the character of corresponding templates text is equal, determines the calculation of the value of each element in editing distance matrix, and then calculate and compile
Collect the value of each element in distance matrix.
In the present embodiment, the value for each element not being initialised in editing distance matrix is by its left, the upper left corner, top
In the value of some element determine.In step s 103, comprising:
Identify columns, line number at the third element in the third column of the editing distance matrix;
Identify that columns at the third element in the third column of the editing distance matrix, line number respectively correspond institute
State the character of template text, the character of the part transcription text;
Judge that columns at the third element in the third column of the editing distance matrix corresponds to the template text
This character part transcription text corresponding with line number at the third element in the third of editing distance matrix column
Whether this character is equal;
If columns at the third element in the third column of the editing distance matrix corresponds to the template text
Character and the editing distance matrix third column in third element at the corresponding part transcription text of line number
Character it is equal, then the editing distance matrix third column in third element value for its upper left corner element value;
If columns at the third element in the third column of the editing distance matrix corresponds to the template text
Character and the editing distance matrix third column in third element at the corresponding part transcription text of line number
Character it is unequal, then the editing distance matrix third column in third element value be its left, the upper left corner, top
Element in minimum value add 1 to obtain;
The value of the 4th element in the third column of the editing distance matrix is successively calculated, until completing to calculate the volume
Collect the value of each element in distance matrix.
Since the value for each element not being initialised in editing distance matrix is by a certain in its left, the upper left corner, top
The value of a element determines, when starting to calculate, meet left, the upper left corner, top element all there is only editing for numerical value
Distance matrix third column in third element, in other words editing distance matrix third column in third element,
In embodiment, the third element in the third column of editing distance matrix is calculated, is identified in the third column of editing distance matrix
Columns at third element, line number, at the third element in the third column for obtaining editing distance matrix
After columns, line number, columns corresponding templates text at the third element in the third column of identification editing distance matrix
Character, editing distance matrix third column in third element at line number corresponding part transcription text character.
After obtaining corresponding character, judge that columns at the third element in the third column of editing distance matrix corresponds to mould
Line number corresponding part transcription text at third element in the character of plate text and the third column of editing distance matrix
Whether character is equal, according to the character of the template text of the third element in the third of editing distance matrix column and corresponding portion
Divide the character of transcription text whether equal, the value of the third element in third column for determining editing distance matrix, if compiling
Collect the character and editing distance matrix of columns corresponding templates text at the third element in the third column of distance matrix
Third column in third element at line number corresponding part transcription text character it is equal, then the of editing distance matrix
The value of third element in three column is the value of the element in its upper left corner.If the third member in the third column of editing distance matrix
Row at third element in the third of the character of columns corresponding templates text at element and editing distance matrix column
The character of number corresponding part transcription text is unequal, then the value of the third element in the third column of editing distance matrix is that it is left
Side, the upper left corner, top element in minimum value add 1 to obtain.Third member in the third column that editing distance matrix has been calculated
Element value after, successively calculate editing distance matrix third column in the 4th element value, until complete calculate editor away from
Value from each element in matrix is being counted it is, then calculating the value of the 4th element in the third column of editing distance matrix
Calculated the value of the third column each element of editing distance matrix, then calculate editing distance matrix the 4th column in each element value, directly
To the value for calculating each element in last column for completing editing distance matrix, just complete to calculate each element in editing distance matrix
Value.
Step S104, record the calculating track of the value of each element in the editing distance matrix, generate with the editor away from
Track matrix corresponding from matrix.
In calculating editing distance matrix during the value of each element, the value of each element in editing distance matrix is recorded
Track is calculated, it is, the value of each element is determined by the value of which element in editing distance matrix.It is compiled completing to calculate
It collects in distance matrix after the value of each element, the calculating track for recording the value of each element in editing distance matrix is also completed, thus
Generate track corresponding with editing distance matrix matrix.
In the present embodiment, in step S104, comprising:
Record the calculating track of the value of each element in the editing distance matrix;
According to the calculating track of the value of each element in the editing distance matrix, each member in the editing distance matrix is marked
The value of element generates origin;
After completing label, track matrix corresponding with the editing distance matrix is generated.
The calculating track for recording the value of each element in editing distance matrix, according to the value of each element in editing distance matrix
Track is calculated, marking the value of each element in editing distance matrix to generate origin indicates that the element passes through with lt in the present embodiment
Upper left element calculates, and indicates the element by the element calculating of left with l it is upper to indicate that the element passes through with t
The element of side calculates, for example, if the third element in the third column of editing distance matrix is by editing distance matrix
What second element in secondary series determined, then the third element in the third column of editing distance matrix inputs lt, if compiling
Third element in the third column of volume distance matrix is that the third element in the secondary series by editing distance matrix determines,
Then the third element in the third column of editing distance matrix inputs l, if the third in the third column of editing distance matrix
Element is that second element in the third column by editing distance matrix determines, then in the third column of editing distance matrix
Third element inputs t, to mark the generation origin of the third element in the third column of editing distance matrix.It completes to mark
After note, track corresponding with editing distance matrix matrix is generated.
In the present embodiment, in the calculating track according to the value of each element in the editing distance matrix, institute is marked
In the step of stating the value generation origin of each element in editing distance matrix, comprising:
It is every record the calculating track of the value of an element in the editing distance matrix when, mark the editing distance square
The value of the element generates origin in battle array;
Until the value of each element in the editing distance matrix is marked to generate origin.
As soon as the calculating track of the value of element in every record editing distance matrix, marking at once should in editing distance matrix
The value of element generates origin, it is, recording the calculating track of the value of each element in editing distance matrix on one side, label is compiled on one side
The value for collecting each element in distance matrix generates origin.
In some embodiments, in the calculating track according to the value of each element in the editing distance matrix, label
The value of each element generated in the step of origin in the editing distance matrix, comprising:
It completes to record in the editing distance matrix after the calculating track of the value of each element, according to the editing distance
The calculating track of the value of each element in matrix marks the value of each element in the editing distance matrix to generate origin.
In completing editing distance matrix after the calculating track of the value of each element, it can just trigger and start to execute label editor
The value of each element generates origin in distance matrix, until the value for completing each element in label editing distance matrix generates origin.?
It is exactly that the calculating track of the value of each element in not completing editing distance matrix will not be marked in editing distance matrix each
The value of element generates origin.
Step S105, the similarity for calculating each track in the track matrix, screen the part transcription text with it is described
The highest track of template text similarity obtains the first track.
After generating track matrix, the similarity of each track in the matrix of track is calculated, in the phase for completing each track of calculating
After degree, screen fraction transcription text and the highest track of template text similarity obtain the first track, first rail
Mark is considered that transcription text in part corresponds to track on template text.
In the present embodiment, in described the step of calculating the similarity of each track in the track matrix, comprising:
Identify the character of part transcription text described in each track and the corresponding template text in the track matrix
The equal number of character, obtain equal character number;
Compare the length of the character of part transcription text described in each track and the corresponding mould in the track matrix
The length of the character of plate text chooses the total as character of length length;
The ratio of the equal character number of each track and corresponding character sum in the track matrix is calculated, described in acquisition
The similarity of each track in the matrix of track.
After generating track matrix, the character of part transcription text and corresponding mould in each track are identified in the matrix of track
The equal number of the character of plate text, obtains equal character number, the equal character number in each track in obtaining track matrix
Later, compare the length of the length of the character of part transcription text and the character of corresponding template text in each track in the matrix of track
Degree chooses the total as character of length length, if the length of the character of part transcription text is greater than in a track in the matrix of track
The length of the character of corresponding template text, then in the matrix of track in a track character of selected part transcription text length
For character sum.If the length of the character of part transcription text is less than the word of corresponding template text in a track in the matrix of track
The length of symbol, then the length for choosing the character of template text in a track in the matrix of track is character sum.Choosing length
Long is used as after character sum, calculates the equal character number of each track and the ratio of corresponding character sum in the matrix of track
Value obtains the similarity of each track in the matrix of track after completing ratio calculated.
Step S106, according to first track, determine that the part transcription text is corresponding on the template text
Terminal obtains First terminal point.
After obtaining the first track, determined according to the first track since there are terminals in the matrix of track for the first track
Part transcription text corresponding terminal on template text, to obtain First terminal point.
In the present embodiment, in step s 106, comprising:
Mark the last one element in first track;
According to the last one element in first track, the character of the corresponding template text is marked, obtains first
Terminal.
After obtaining the first track, the last one element in the first track of label, according to the last one in the first track
Element obtains the character of the template text in the first track in the last one element respective column, marks corresponding template text
Character, to obtain First terminal point.
Step S107, it according to the initial point and the First terminal point of the template text, is obtained from the template text new
Template text.
After obtaining First terminal point, according to the initial point and First terminal point of template text, the text between two o'clock, packet are obtained
The corresponding character of initial point, First terminal point of template text is included, to obtain new template text from template text.
In the present embodiment, in step s 107, comprising:
The first character that the template text is marked in the template text is initial point;
The character between the initial point and the First terminal point of the template text is intercepted, wherein the initial point of the template text
Character between the First terminal point includes that the corresponding character of the initial point of the template text and the First terminal point are corresponding
Character;
Text is generated according to the character being truncated to, obtains the new template text.
After obtaining First terminal point, it is initial point that the first character of template text is marked in template text, intercepts mould
Character between the initial point and First terminal point of plate text, wherein the character between the initial point and First terminal point of template text includes mould
The corresponding character of the initial point of plate text and the corresponding character of First terminal point.After the character being truncated to, according to the word being truncated to
Symbol generates the text of format same as template text, obtains new template text.
Step S108, the part transcription text and the new template text are compared, passes through editing distance algorithm
Calculate the accuracy rate of the part transcription text.
After obtaining new template text, part transcription text and new template text are compared, are not and template
Text compares, by the accuracy rate of editing distance algorithm calculating section transcription text, to solve turning for existing text
Accuracy rate algorithm is write, the full text of text and template text that transcription comes out is compared, text is turned in part
When writing out, the problem of cannot accurately calculating the transcription accuracy rate of text.
In conclusion establishing editing distance matrix when the initial point of template text starts by transcription, editing distance square is calculated
The value of each element in battle array generates track matrix according to the calculating track of the value of each element in editing distance matrix, calculates track square
The similarity of each track in battle array, the highest track of screening similarity obtain the first track, obtain part according to the first track
Transcription text corresponding terminal on template text, to obtain new template text, then part transcription text and new template is literary
Originally it compares, the accuracy rate of calculating section transcription text, it is intended to the transcription accuracy rate algorithm for solving existing text, it will
The full text of text and template text that transcription comes out compares, when in part, text is come out by transcription, Bu Nengzhun
The problem of really calculating the transcription accuracy rate of text.
As shown in Fig. 2, the embodiment of the present application proposes a kind of text accuracy rate computing device 1 based on semanteme parsing, device 1
Including the first acquisition module 11, establish module 12, the first computing module 13, generation module 14, screening module 15, acquisition module
16, second module 17 and the second computing module 18 are obtained.
First obtains module 11, for obtaining the part transcription text since the initial point of template text by transcription.
By transcription since the initial point of template text, and template text is all by transcription, that is, from template text
First character starts by transcription, but the end point of transcription is instead of in the last character of template text, in addition to
Except the last character of template text, any one character in template text.Due to not being whole to template text
The transcription of character, for this purpose, the text obtained by transcription is known as part transcription text since the initial point of template text.
Template text is a correct text, the text for comparing with part transcription text.
Above-mentioned transcription refers to that by ASR (speech recognition) engine be text by speech transcription.
Establish module 12, for increased using the length of the template text character length of two characters as columns, with institute
It is line number that the length for stating part transcription text character, which increases the length of two characters, establishes editing distance matrix.
In the present embodiment, template text is the writing text for rejecting punctuation mark.Transcription text in part is to reject punctuate
The writing text of symbol.
The length for obtaining template text character, the length of two characters is further added by according to template text character length, as
Columns.The length of fetching portion transcription text character is further added by the length of two characters according to the length of part transcription text character
Degree, as line number, then using the length of the length of template text character two characters of increase as columns, with part transcription text word
The length that the length of symbol increases by two characters is line number, establishes editing distance matrix.Template text character length is further added by two
The length of character is as the purpose of line number as the length that the length of columns, part transcription text character is further added by two characters
In order to distinguish input template text, part transcription text on the first row, first row, and it is first in the second row, secondary series input
The value of beginningization.
Specifically, device 1 includes:
First input module, for inputting the mould since the third element of the first row of the editing distance matrix
The character of plate text;
Second input module, for inputting the portion since the third element of the first row of the editing distance matrix
Divide the character of transcription text;
Definition module, the value of second element in the second row for defining the editing distance matrix are 0;
First initialization module, for the value of second element in the second row of the editing distance matrix be 0 according to
Secondary incremental value 1 initializes the value of each element of the second row of the editing distance matrix;
Second initialization module, for the value of second element in the secondary series of the editing distance matrix be 0 according to
Secondary incremental value 1 initializes the value of each element of the secondary series of the editing distance matrix.
The character of input template text in the first row of editing distance matrix, specifically, from the of editing distance matrix
The third element of a line starts the character of input template text.Accordingly, the input unit in the first row of editing distance matrix
Divide the character of transcription text, specifically, the importation transcription text since the third element of the first row of editing distance matrix
This character.The first row of editing distance matrix, the third element of first row start the character of input template text, portion respectively
The character for dividing transcription text, each character and each character of part transcription text for making template text are all in the presence pair of editing distance matrix
It should be related to, in addition, the numerical value also for the initialization to the second row, secondary series provides corresponding positional relationship.Firstly, definition
The value of second element in second row of editing distance matrix is 0, then, with second in the second row of editing distance matrix
The value of a element be 0 numerical value 1 incremented by successively, initialize editing distance matrix the second row each element value, for example, editor away from
From in the second row of matrix second and third, the values of four, five elements be respectively 0,1,2,3.Define the second of editing distance matrix
The value of second element in row is 0, and substantially, the value for also defining second in the secondary series of editing distance matrix element is
0, because of second element in the second row of editing distance matrix and second element in the secondary series of editing distance matrix
It is in the same position, is 0 successively with the value of second element in the secondary series of editing distance matrix even if the same element
Incremental value 1 initializes the value of each element of the secondary series of editing distance matrix, for example, in the secondary series of editing distance matrix
Second and third, the values of four, five elements be respectively 0,1,2,3.In the initialization secondary series of editing distance matrix, the second row
After numerical value, the value for calculating each element in editing distance matrix can be can be carried out.
First computing module 13, for calculating the editing distance according to the part transcription text, the template text
The value of each element in matrix.
According to part transcription text, template text, it is, in editing distance matrix, the character of part transcription text
Whether the character of corresponding templates text is equal, determines the calculation of the value of each element in editing distance matrix, and then calculate and compile
Collect the value of each element in distance matrix.
In the present embodiment, the value for each element not being initialised in editing distance matrix is by its left, the upper left corner, top
In the value of some element determine.First computing module 13 includes:
First identification module, for identification the editing distance matrix third column in third element at column
Number, line number;
Second identification module, for identification the editing distance matrix third column in third element at column
Number, line number respectively correspond the character of the character of the template text, the part transcription text;
First judgment module, for judge the editing distance matrix third arrange in third element at column
Line number pair at third element in the character of the corresponding template text of number and the third column of the editing distance matrix
Answer the character of the part transcription text whether equal;If locating for the third element in the third column of the editing distance matrix
In columns correspond to the template text character and the editing distance matrix third column in third element at
The character that line number corresponds to the part transcription text is equal, then the third element in the third column of the editing distance matrix
Value is the value of the element in its upper left corner;If columns pair at the third element in the third column of the editing distance matrix
Answer the character of template text institute corresponding with line number at the third element in the third of editing distance matrix column
The character for stating part transcription text is unequal, then the value of the third element in the third column of the editing distance matrix is that it is left
Side, the upper left corner, top element in minimum value add 1 to obtain;
First sub- computing module, the third for successively calculating the editing distance matrix arrange in the 4th element
Value, until completing the value of each element in the calculating editing distance matrix.
Since the value for each element not being initialised in editing distance matrix is by a certain in its left, the upper left corner, top
The value of a element determines, when starting to calculate, meet left, the upper left corner, top element all there is only editing for numerical value
Distance matrix third column in third element, in other words editing distance matrix third column in third element,
In embodiment, the third element in the third column of editing distance matrix is calculated, is identified in the third column of editing distance matrix
Columns at third element, line number, at the third element in the third column for obtaining editing distance matrix
After columns, line number, columns corresponding templates text at the third element in the third column of identification editing distance matrix
Character, editing distance matrix third column in third element at line number corresponding part transcription text character.
After obtaining corresponding character, judge that columns at the third element in the third column of editing distance matrix corresponds to mould
Line number corresponding part transcription text at third element in the character of plate text and the third column of editing distance matrix
Whether character is equal, according to the character of the template text of the third element in the third of editing distance matrix column and corresponding portion
Divide the character of transcription text whether equal, the value of the third element in third column for determining editing distance matrix, if compiling
Collect the character and editing distance matrix of columns corresponding templates text at the third element in the third column of distance matrix
Third column in third element at line number corresponding part transcription text character it is equal, then the of editing distance matrix
The value of third element in three column is the value of the element in its upper left corner.If the third member in the third column of editing distance matrix
Row at third element in the third of the character of columns corresponding templates text at element and editing distance matrix column
The character of number corresponding part transcription text is unequal, then the value of the third element in the third column of editing distance matrix is that it is left
Side, the upper left corner, top element in minimum value add 1 to obtain.Third member in the third column that editing distance matrix has been calculated
Element value after, successively calculate editing distance matrix third column in the 4th element value, until complete calculate editor away from
Value from each element in matrix is being counted it is, then calculating the value of the 4th element in the third column of editing distance matrix
Calculated the value of the third column each element of editing distance matrix, then calculate editing distance matrix the 4th column in each element value, directly
To the value for calculating each element in last column for completing editing distance matrix, just complete to calculate each element in editing distance matrix
Value.
Generation module 14, for recording the calculating track of the value of each element in the editing distance matrix, generate with it is described
The corresponding track matrix of editing distance matrix.
In calculating editing distance matrix during the value of each element, the value of each element in editing distance matrix is recorded
Track is calculated, it is, the value of each element is determined by the value of which element in editing distance matrix.It is compiled completing to calculate
It collects in distance matrix after the value of each element, the calculating track for recording the value of each element in editing distance matrix is also completed, thus
Generate track corresponding with editing distance matrix matrix.
In the present embodiment, generation module 14 includes:
First logging modle, for recording the calculating track of the value of each element in the editing distance matrix;
First mark module, for the calculating track according to the value of each element in the editing distance matrix, described in label
The value of each element generates origin in editing distance matrix;
First generation module, for generating track matrix corresponding with the editing distance matrix after completing label.
The calculating track for recording the value of each element in editing distance matrix, according to the value of each element in editing distance matrix
Track is calculated, marking the value of each element in editing distance matrix to generate origin indicates that the element passes through with lt in the present embodiment
Upper left element calculates, and indicates the element by the element calculating of left with l it is upper to indicate that the element passes through with t
The element of side calculates, for example, if the third element in the third column of editing distance matrix is by editing distance matrix
What second element in secondary series determined, then the third element in the third column of editing distance matrix inputs lt, if compiling
Third element in the third column of volume distance matrix is that the third element in the secondary series by editing distance matrix determines,
Then the third element in the third column of editing distance matrix inputs l, if the third in the third column of editing distance matrix
Element is that second element in the third column by editing distance matrix determines, then in the third column of editing distance matrix
Third element inputs t, to mark the generation origin of the third element in the third column of editing distance matrix.It completes to mark
After note, track corresponding with editing distance matrix matrix is generated.
In the present embodiment, the first mark module includes:
First sub- mark module, in every calculating track for recording the value of an element in the editing distance matrix
When, mark the value of the element in the editing distance matrix to generate origin;
First son label completes module, for until the value of each element in the editing distance matrix is marked to generate origin.
As soon as the calculating track of the value of element in every record editing distance matrix, marking at once should in editing distance matrix
The value of element generates origin, it is, recording the calculating track of the value of each element in editing distance matrix on one side, label is compiled on one side
The value for collecting each element in distance matrix generates origin.
In some embodiments, the first mark module includes:
Second sub- mark module, for complete record the value of each element in the editing distance matrix calculating track it
Afterwards, according to the calculating track of the value of each element in the editing distance matrix, each element in the editing distance matrix is marked
Value generates origin.
In completing editing distance matrix after the calculating track of the value of each element, it can just trigger and start to execute label editor
The value of each element generates origin in distance matrix, until the value for completing each element in label editing distance matrix generates origin.?
It is exactly that the calculating track of the value of each element in not completing editing distance matrix will not be marked in editing distance matrix each
The value of element generates origin.
Screening module 15 screens the part transcription text for calculating the similarity of each track in the track matrix
With the highest track of the template text similarity, the first track is obtained.
After generating track matrix, the similarity of each track in the matrix of track is calculated, in the phase for completing each track of calculating
After degree, screen fraction transcription text and the highest track of template text similarity obtain the first track, first rail
Mark is considered that transcription text in part corresponds to track on template text.
In the present embodiment, screening module 15 includes:
Third identification module, for identification in the track matrix character of transcription text in part described in each track with it is right
The equal number of the character for the template text answered, obtains equal character number;
First comparison module, the length of the character for transcription text in part described in each track in the track matrix
Degree is used as character sum with the length of the character of the corresponding template text, selection length length;
Third computing module, the equal character number and corresponding character for calculating each track in the track matrix are total
Several ratio obtains the similarity of each track in the track matrix.
After generating track matrix, the character of part transcription text and corresponding mould in each track are identified in the matrix of track
The equal number of the character of plate text, obtains equal character number, the equal character number in each track in obtaining track matrix
Later, compare the length of the length of the character of part transcription text and the character of corresponding template text in each track in the matrix of track
Degree chooses the total as character of length length, if the length of the character of part transcription text is greater than in a track in the matrix of track
The length of the character of corresponding template text, then in the matrix of track in a track character of selected part transcription text length
For character sum.If the length of the character of part transcription text is less than the word of corresponding template text in a track in the matrix of track
The length of symbol, then the length for choosing the character of template text in a track in the matrix of track is character sum.Choosing length
Long is used as after character sum, calculates the equal character number of each track and the ratio of corresponding character sum in the matrix of track
Value obtains the similarity of each track in the matrix of track after completing ratio calculated.
Module 16 is obtained, for determining the part transcription text on the template text according to first track
Corresponding terminal obtains First terminal point.
After obtaining the first track, determined according to the first track since there are terminals in the matrix of track for the first track
Part transcription text corresponding terminal on template text, to obtain First terminal point.
In the present embodiment, obtaining module 16 includes:
Second mark module, for marking the last one element in first track;
First obtains module, for marking the corresponding template text according to the last one element in first track
This character obtains First terminal point.
After obtaining the first track, the last one element in the first track of label, according to the last one in the first track
Element obtains the character of the template text in the first track in the last one element respective column, marks corresponding template text
Character, to obtain First terminal point.
Second obtains module 17, for the initial point and the First terminal point according to the template text, from the template text
New template text is obtained in this.
After obtaining First terminal point, according to the initial point and First terminal point of template text, the text between two o'clock, packet are obtained
The corresponding character of initial point, First terminal point of template text is included, to obtain new template text from template text.
In the present embodiment, the second acquisition module 17 includes:
Third mark module is initial point for marking the first character of the template text in the template text;
Interception module, the character between initial point and the First terminal point for intercepting the template text, wherein described
Character between the initial point of template text and the First terminal point includes the corresponding character of the initial point of the template text and described
The corresponding character of First terminal point;
Second sub-acquisition module obtains the new template text for generating text according to the character being truncated to.
After obtaining First terminal point, it is initial point that the first character of template text is marked in template text, intercepts mould
Character between the initial point and First terminal point of plate text, wherein the character between the initial point and First terminal point of template text includes mould
The corresponding character of the initial point of plate text and the corresponding character of First terminal point.After the character being truncated to, according to the word being truncated to
Symbol generates the text of format same as template text, obtains new template text.
Second computing module 18 passes through volume for comparing the part transcription text and the new template text
Collect the accuracy rate that distance algorithm calculates the part transcription text.
After obtaining new template text, part transcription text and new template text are compared, are not and template
Text compares, by the accuracy rate of editing distance algorithm calculating section transcription text, to solve turning for existing text
Accuracy rate algorithm is write, the full text of text and template text that transcription comes out is compared, text is turned in part
When writing out, the problem of cannot accurately calculating the transcription accuracy rate of text.
In conclusion establishing editing distance matrix when the initial point of template text starts by transcription, editing distance square is calculated
The value of each element in battle array generates track matrix according to the calculating track of the value of each element in editing distance matrix, calculates track square
The similarity of each track in battle array, the highest track of screening similarity obtain the first track, obtain part according to the first track
Transcription text corresponding terminal on template text, to obtain new template text, then part transcription text and new template is literary
Originally it compares, the accuracy rate of calculating section transcription text, it is intended to the transcription accuracy rate algorithm for solving existing text, it will
The full text of text and template text that transcription comes out compares, when in part, text is come out by transcription, Bu Nengzhun
The problem of really calculating the transcription accuracy rate of text.
As shown in figure 3, also providing a kind of computer equipment in the embodiment of the present application, which can be service
Device, internal structure can be as shown in Figure 3.The computer equipment includes processor, the memory, net connected by system bus
Network interface and database.Wherein, the processor of the Computer Design is for providing calculating and control ability.The computer equipment
Memory includes non-volatile memory medium, built-in storage.The non-volatile memory medium is stored with operating system, computer journey
Sequence and database.The internal memory provides environment for the operation of operating system and computer program in non-volatile memory medium.
The database of the computer equipment is used to store the data such as the model of text accuracy rate calculation method based on semanteme parsing.The meter
The network interface for calculating machine equipment is used to communicate with external terminal by network connection.When the computer program is executed by processor
To realize a kind of text accuracy rate calculation method based on semanteme parsing.
Above-mentioned processor executes the step of above-mentioned text accuracy rate calculation method based on semanteme parsing: obtaining from template text
This initial point starts by the part transcription text of transcription;It is with the length that the length of the template text character increases by two characters
Columns, the length for increasing by two characters using the length of the part transcription text character establish editing distance matrix as line number;Root
According to the part transcription text, the template text, the value of each element in the editing distance matrix is calculated;Record the editor
The calculating track of the value of each element in distance matrix generates track matrix corresponding with the editing distance matrix;Described in calculating
The similarity of each track in the matrix of track screens the part transcription text and the highest rail of the template text similarity
Mark obtains the first track;According to first track, the part transcription text corresponding end on the template text is determined
Point obtains First terminal point;According to the initial point and the First terminal point of the template text, new mould is obtained from the template text
Plate text;The part transcription text and the new template text are compared, the portion is calculated by editing distance algorithm
Divide the accuracy rate of transcription text.
In one embodiment, the above-mentioned length using the template text character increase by the length of two characters as columns,
The length for increasing by two characters using the length of the part transcription text character as line number, the step of establishing editing distance matrix it
Afterwards, the value of each element in the editing distance matrix is calculated according to the part transcription text, the template text described
Before step, comprising:
The character of the template text is inputted since the third element of the first row of the editing distance matrix;
The character of the part transcription text is inputted since the third element of the first row of the editing distance matrix;
The value for defining second element in the second row of the editing distance matrix is 0;
With the value of second element in the second row of the editing distance matrix for 0 numerical value 1 incremented by successively, institute is initialized
State the value of each element of the second row of editing distance matrix;
With the value of second element in the secondary series of the editing distance matrix for 0 numerical value 1 incremented by successively, institute is initialized
State the value of each element of the secondary series of editing distance matrix.
In one embodiment, the value for each element not being initialised in above-mentioned editing distance matrix is by its left, upper left
The value of some element in angle, top determines, described according to the part transcription text, the template text, calculates
In the editing distance matrix the step of value of each element in, comprising:
Identify columns, line number at the third element in the third column of the editing distance matrix;
Identify that columns at the third element in the third column of the editing distance matrix, line number respectively correspond institute
State the character of template text, the character of the part transcription text;
Judge that columns at the third element in the third column of the editing distance matrix corresponds to the template text
This character part transcription text corresponding with line number at the third element in the third of editing distance matrix column
Whether this character is equal;
If columns at the third element in the third column of the editing distance matrix corresponds to the template text
Character and the editing distance matrix third column in third element at the corresponding part transcription text of line number
Character it is equal, then the editing distance matrix third column in third element value for its upper left corner element value;
If columns at the third element in the third column of the editing distance matrix corresponds to the template text
Character and the editing distance matrix third column in third element at the corresponding part transcription text of line number
Character it is unequal, then the editing distance matrix third column in third element value be its left, the upper left corner, top
Element in minimum value add 1 to obtain;
The value of the 4th element in the third column of the editing distance matrix is successively calculated, until completing to calculate the volume
Collect the value of each element in distance matrix.
In one embodiment, the above-mentioned calculating track for recording the value of each element in the editing distance matrix, generate with
In the step of editing distance matrix corresponding track matrix, comprising:
Record the calculating track of the value of each element in the editing distance matrix;
According to the calculating track of the value of each element in the editing distance matrix, each member in the editing distance matrix is marked
The value of element generates origin;
After completing label, track matrix corresponding with the editing distance matrix is generated.
In one embodiment, in the above-mentioned calculating track matrix the step of similarity of each track, comprising:
Identify the character of part transcription text described in each track and the corresponding template text in the track matrix
The equal number of character, obtain equal character number;
Compare the length of the character of part transcription text described in each track and the corresponding mould in the track matrix
The length of the character of plate text chooses the total as character of length length;
The ratio of the equal character number of each track and corresponding character sum in the track matrix is calculated, described in acquisition
The similarity of each track in the matrix of track.
In one embodiment, above-mentioned according to first track, determine the part transcription text in the template text
In the step of corresponding terminal in sheet, acquisition First terminal point, comprising:
Mark the last one element in first track;
According to the last one element in first track, the character of the corresponding template text is marked, obtains first
Terminal.
In one embodiment, above-mentioned initial point and the First terminal point according to the template text, from the template text
In the step of obtaining new template text in this, comprising:
The first character that the template text is marked in the template text is initial point;
The character between the initial point and the First terminal point of the template text is intercepted, wherein the initial point of the template text
Character between the First terminal point includes that the corresponding character of the initial point of the template text and the First terminal point are corresponding
Character;
Text is generated according to the character being truncated to, obtains the new template text.
It will be understood by those skilled in the art that structure shown in Fig. 3, only part relevant to application scheme is tied
The block diagram of structure does not constitute the restriction for the computer equipment being applied thereon to application scheme.
The computer equipment of the embodiment of the present application establishes editing distance square when the initial point of template text starts by transcription
Battle array calculates the value of each element in editing distance matrix, generates rail according to the calculating track of the value of each element in editing distance matrix
Mark matrix calculates the similarity of each track in the matrix of track, and the highest track of screening similarity obtains the first track, according to
First track obtains part transcription text corresponding terminal on template text, to obtain new template text, then part is turned
It writes text to compare with new template text, the accuracy rate of calculating section transcription text, it is intended to solve the transcription of existing text
Accuracy rate algorithm compares the full text of text and template text that transcription comes out, and in part, text is by transcription
When out, the problem of cannot accurately calculating the transcription accuracy rate of text.
One embodiment of the application also provides a kind of computer readable storage medium, is stored thereon with computer program, calculates
Machine program realizes a kind of text accuracy rate calculation method based on semanteme parsing when being executed by processor, specifically: it obtains from mould
The initial point of plate text starts by the part transcription text of transcription;Increase the length of two characters with the length of the template text character
Degree is columns, the length for increasing by two characters using the length of the part transcription text character as line number, establishes editing distance square
Battle array;According to the part transcription text, the template text, the value of each element in the editing distance matrix is calculated;Record institute
The calculating track of the value of each element in editing distance matrix is stated, track matrix corresponding with the editing distance matrix is generated;Meter
The similarity for calculating each track in the track matrix, screens the part transcription text and the template text similarity is highest
One track obtains the first track;According to first track, determine that the part transcription text is right on the template text
The terminal answered obtains First terminal point;According to the initial point and the First terminal point of the template text, obtained from the template text
Take new template text;The part transcription text and the new template text are compared, calculated by editing distance algorithm
The accuracy rate of the part transcription text.
In one embodiment, the above-mentioned length using the template text character increase by the length of two characters as columns,
The length for increasing by two characters using the length of the part transcription text character as line number, the step of establishing editing distance matrix it
Afterwards, the value of each element in the editing distance matrix is calculated according to the part transcription text, the template text described
Before step, comprising:
The character of the template text is inputted since the third element of the first row of the editing distance matrix;
The character of the part transcription text is inputted since the third element of the first row of the editing distance matrix;
The value for defining second element in the second row of the editing distance matrix is 0;
With the value of second element in the second row of the editing distance matrix for 0 numerical value 1 incremented by successively, institute is initialized
State the value of each element of the second row of editing distance matrix;
With the value of second element in the secondary series of the editing distance matrix for 0 numerical value 1 incremented by successively, institute is initialized
State the value of each element of the secondary series of editing distance matrix.
In one embodiment, the value for each element not being initialised in above-mentioned editing distance matrix is by its left, upper left
The value of some element in angle, top determines, described according to the part transcription text, the template text, calculates
In the editing distance matrix the step of value of each element in, comprising:
Identify columns, line number at the third element in the third column of the editing distance matrix;
Identify that columns at the third element in the third column of the editing distance matrix, line number respectively correspond institute
State the character of template text, the character of the part transcription text;
Judge that columns at the third element in the third column of the editing distance matrix corresponds to the template text
This character part transcription text corresponding with line number at the third element in the third of editing distance matrix column
Whether this character is equal;
If columns at the third element in the third column of the editing distance matrix corresponds to the template text
Character and the editing distance matrix third column in third element at the corresponding part transcription text of line number
Character it is equal, then the editing distance matrix third column in third element value for its upper left corner element value;
If columns at the third element in the third column of the editing distance matrix corresponds to the template text
Character and the editing distance matrix third column in third element at the corresponding part transcription text of line number
Character it is unequal, then the editing distance matrix third column in third element value be its left, the upper left corner, top
Element in minimum value add 1 to obtain;
The value of the 4th element in the third column of the editing distance matrix is successively calculated, until completing to calculate the volume
Collect the value of each element in distance matrix.
In one embodiment, the above-mentioned calculating track for recording the value of each element in the editing distance matrix, generate with
In the step of editing distance matrix corresponding track matrix, comprising:
Record the calculating track of the value of each element in the editing distance matrix;
According to the calculating track of the value of each element in the editing distance matrix, each member in the editing distance matrix is marked
The value of element generates origin;
After completing label, track matrix corresponding with the editing distance matrix is generated.
In one embodiment, in the above-mentioned calculating track matrix the step of similarity of each track, comprising:
Identify the character of part transcription text described in each track and the corresponding template text in the track matrix
The equal number of character, obtain equal character number;
Compare the length of the character of part transcription text described in each track and the corresponding mould in the track matrix
The length of the character of plate text chooses the total as character of length length;
The ratio of the equal character number of each track and corresponding character sum in the track matrix is calculated, described in acquisition
The similarity of each track in the matrix of track.
In one embodiment, above-mentioned according to first track, determine the part transcription text in the template text
In the step of corresponding terminal in sheet, acquisition First terminal point, comprising:
Mark the last one element in first track;
According to the last one element in first track, the character of the corresponding template text is marked, obtains first
Terminal.
In one embodiment, above-mentioned initial point and the First terminal point according to the template text, from the template text
In the step of obtaining new template text in this, comprising:
The first character that the template text is marked in the template text is initial point;
The character between the initial point and the First terminal point of the template text is intercepted, wherein the initial point of the template text
Character between the First terminal point includes that the corresponding character of the initial point of the template text and the First terminal point are corresponding
Character;
Text is generated according to the character being truncated to, obtains the new template text.
The storage medium of the embodiment of the present application establishes editing distance matrix when the initial point of template text starts by transcription,
The value for calculating each element in editing distance matrix generates track square according to the calculating track of the value of each element in editing distance matrix
Battle array calculates the similarity of each track in the matrix of track, and the highest track of screening similarity obtains the first track, according to first
Track obtains part transcription text corresponding terminal on template text, to obtain new template text, then part transcription is literary
This is compared with new template text, the accuracy rate of calculating section transcription text, it is intended to which the transcription for solving existing text is accurate
Rate algorithm compares the full text of text and template text that transcription comes out, and in part, text is come out by transcription
When, the problem of cannot accurately calculating the transcription accuracy rate of text.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with
Relevant hardware is instructed to complete by computer program, the computer program can be stored in a non-volatile computer
In read/write memory medium, the computer program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein,
Any reference used in provided herein and embodiment to memory, storage, database or other media,
Including non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM
(PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include
Random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms,
Such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double speed are according to rate SDRAM (SSRSDRAM), enhancing
Type SDRAM (ESDRAM), synchronization link (Synchl ink) DRAM (SLDRAM), memory bus (Rambus) direct RAM
(RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..
The foregoing is merely the preferred embodiments of the application, not to limit the application, all essences in the application
Made any modifications, equivalent replacements, and improvements etc. within mind and principle should all include within the scope of protection of this application.
Claims (10)
1. a kind of text accuracy rate calculation method based on semanteme parsing, which is characterized in that the described method includes:
It obtains since the initial point of template text by the part transcription text of transcription;
Using the length of the length of the template text character two characters of increase as columns, with the part transcription text character
The length that length increases by two characters is line number, establishes editing distance matrix;
According to the part transcription text, the template text, the value of each element in the editing distance matrix is calculated;
The calculating track of the value of each element in the editing distance matrix is recorded, rail corresponding with the editing distance matrix is generated
Mark matrix;
The similarity for calculating each track in the track matrix screens the part transcription text and the template text similarity
A highest track obtains the first track;
According to first track, the part transcription text corresponding terminal on the template text is determined, obtain first
Terminal;
According to the initial point and the First terminal point of the template text, new template text is obtained from the template text;
The part transcription text and the new template text are compared, the part is calculated by editing distance algorithm and is turned
Write the accuracy rate of text.
2. it is according to claim 1 based on semanteme parsing text accuracy rate calculation method, which is characterized in that it is described with
The length that the length of the template text character increases by two characters is columns, the length increasing with the part transcription text character
The length for adding two characters is line number, after the step of establishing editing distance matrix, it is described according to the part transcription text,
The template text, before the step of calculating the value of each element in the editing distance matrix, comprising:
The character of the template text is inputted since the third element of the first row of the editing distance matrix;
The character of the part transcription text is inputted since the third element of the first row of the editing distance matrix;
The value for defining second element in the second row of the editing distance matrix is 0;
With the value of second element in the second row of the editing distance matrix for 0 numerical value 1 incremented by successively, the volume is initialized
Collect the value of each element of the second row of distance matrix;
With the value of second element in the secondary series of the editing distance matrix for 0 numerical value 1 incremented by successively, the volume is initialized
Collect the value of each element of the secondary series of distance matrix.
3. the text accuracy rate calculation method according to claim 2 based on semanteme parsing, which is characterized in that the editor
The value for each element not being initialised in distance matrix determines by the value of some element in its left, the upper left corner, top,
Described according to the part transcription text, the template text, the step of the value of each element in the editing distance matrix is calculated
In rapid, comprising:
Identify columns, line number at the third element in the third column of the editing distance matrix;
Identify that columns at the third element in the third column of the editing distance matrix, line number respectively correspond the mould
The character of the character of plate text, the part transcription text;
Judge that columns at the third element in the third column of the editing distance matrix corresponds to the template text
The character part transcription text corresponding with line number at the third element in the third of editing distance matrix column
Whether character is equal;
If columns at the third element in the third column of the editing distance matrix corresponds to the word of the template text
Accord with the word of the part transcription text corresponding with line number at the third element in the third of editing distance matrix column
Accord with it is equal, then the editing distance matrix third column in third element value for its upper left corner element value;
If columns at the third element in the third column of the editing distance matrix corresponds to the word of the template text
Accord with the word of the part transcription text corresponding with line number at the third element in the third of editing distance matrix column
Accord with it is unequal, then the editing distance matrix third column in third element value be its left, the upper left corner, top member
Minimum value adds 1 to obtain in element;
Successively calculate the editing distance matrix third column in the 4th element value, until complete calculate it is described editor away from
Value from each element in matrix.
4. the text accuracy rate calculation method according to claim 1 based on semanteme parsing, which is characterized in that in the note
The calculating track of the value of each element in the editing distance matrix is recorded, track matrix corresponding with the editing distance matrix is generated
The step of in, comprising:
Record the calculating track of the value of each element in the editing distance matrix;
According to the calculating track of the value of each element in the editing distance matrix, each element in the editing distance matrix is marked
Value generates origin;
After completing label, track matrix corresponding with the editing distance matrix is generated.
5. the text accuracy rate calculation method according to claim 1 based on semanteme parsing, which is characterized in that in the meter
In the step of calculating the similarity of each track in the track matrix, comprising:
Identify the word of the character of part transcription text described in each track and the corresponding template text in the track matrix
Equal number is accorded with, equal character number is obtained;
It is literary to compare the length of the character of part transcription text described in each track and the corresponding template in the track matrix
The length of this character chooses the total as character of length length;
The ratio for calculating the equal character number of each track and corresponding character sum in the track matrix, obtains the track
The similarity of each track in matrix.
6. the text accuracy rate calculation method according to claim 1 based on semanteme parsing, which is characterized in that at described
According to first track, determines the part transcription text corresponding terminal on the template text, obtain First terminal point
In step, comprising:
Mark the last one element in first track;
According to the last one element in first track, the character of the corresponding template text is marked, obtains First terminal point.
7. the text accuracy rate calculation method according to claim 6 based on semanteme parsing, which is characterized in that at described
According to the initial point and the First terminal point of the template text, from the step of obtaining new template text in the template text, packet
It includes:
The first character that the template text is marked in the template text is initial point;
The character between the initial point and the First terminal point of the template text is intercepted, wherein the initial point of the template text and institute
State the corresponding character of initial point and the corresponding character of the First terminal point that the character between First terminal point includes the template text;
Text is generated according to the character being truncated to, obtains the new template text.
8. a kind of text accuracy rate computing device based on semanteme parsing, which is characterized in that described device includes:
First obtains module, for obtaining the part transcription text since the initial point of template text by transcription;
Establish module, for increased using the length of the template text character length of two characters as columns, with the part
The length that the length of transcription text character increases by two characters is line number, establishes editing distance matrix;
First computing module, for calculating in the editing distance matrix according to the part transcription text, the template text
The value of each element;
Generation module, for recording the calculating track of the value of each element in the editing distance matrix, generate with the editor away from
Track matrix corresponding from matrix;
Screening module, for calculating the similarity of each track in the track matrix, screen the part transcription text with it is described
The highest track of template text similarity obtains the first track;
Module is obtained, for determining that the part transcription text is corresponding on the template text according to first track
Terminal obtains First terminal point;
Second acquisition module is obtained from the template text for the initial point and the First terminal point according to the template text
Take new template text;
Second computing module passes through editing distance for comparing the part transcription text and the new template text
Algorithm calculates the accuracy rate of the part transcription text.
9. a kind of computer equipment, including memory and processor, the memory are stored with computer program, feature exists
In the processor realizes method described in any one of claims 1 to 7 when executing computer program the step of.
10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program
The step of method described in any one of claims 1 to 7 is realized when being executed by processor.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811348583.1A CN109710904B (en) | 2018-11-13 | 2018-11-13 | Text accuracy rate calculation method and device based on semantic analysis and computer equipment |
PCT/CN2018/124398 WO2020098098A1 (en) | 2018-11-13 | 2018-12-27 | Semantic analysis-based text accuracy calculation method, device and computer device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201811348583.1A CN109710904B (en) | 2018-11-13 | 2018-11-13 | Text accuracy rate calculation method and device based on semantic analysis and computer equipment |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109710904A true CN109710904A (en) | 2019-05-03 |
CN109710904B CN109710904B (en) | 2023-11-14 |
Family
ID=66254868
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201811348583.1A Active CN109710904B (en) | 2018-11-13 | 2018-11-13 | Text accuracy rate calculation method and device based on semantic analysis and computer equipment |
Country Status (2)
Country | Link |
---|---|
CN (1) | CN109710904B (en) |
WO (1) | WO2020098098A1 (en) |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2999768B1 (en) * | 1999-03-04 | 2000-01-17 | 株式会社エイ・ティ・アール音声翻訳通信研究所 | Speech recognition error correction device |
CN104464736A (en) * | 2014-12-15 | 2015-03-25 | 北京百度网讯科技有限公司 | Error correction method and device for voice recognition text |
US20170133008A1 (en) * | 2015-11-05 | 2017-05-11 | Le Holdings (Beijing) Co., Ltd. | Method and apparatus for determining a recognition rate |
CN106847288A (en) * | 2017-02-17 | 2017-06-13 | 上海创米科技有限公司 | The error correction method and device of speech recognition text |
Family Cites Families (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8001136B1 (en) * | 2007-07-10 | 2011-08-16 | Google Inc. | Longest-common-subsequence detection for common synonyms |
CN103699591A (en) * | 2013-12-11 | 2014-04-02 | 湖南大学 | Page body extraction method based on sample page |
CN108399163B (en) * | 2018-03-21 | 2021-01-12 | 北京理工大学 | Text similarity measurement method combining word aggregation and word combination semantic features |
-
2018
- 2018-11-13 CN CN201811348583.1A patent/CN109710904B/en active Active
- 2018-12-27 WO PCT/CN2018/124398 patent/WO2020098098A1/en active Application Filing
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2999768B1 (en) * | 1999-03-04 | 2000-01-17 | 株式会社エイ・ティ・アール音声翻訳通信研究所 | Speech recognition error correction device |
CN104464736A (en) * | 2014-12-15 | 2015-03-25 | 北京百度网讯科技有限公司 | Error correction method and device for voice recognition text |
US20170133008A1 (en) * | 2015-11-05 | 2017-05-11 | Le Holdings (Beijing) Co., Ltd. | Method and apparatus for determining a recognition rate |
CN106847288A (en) * | 2017-02-17 | 2017-06-13 | 上海创米科技有限公司 | The error correction method and device of speech recognition text |
Non-Patent Citations (1)
Title |
---|
张均胜;石崇德;徐红姣;高影繁;何彦青;: "一种基于短文本相似度计算的主观题自动阅卷方法", 图书情报工作, no. 19, pages 31 - 37 * |
Also Published As
Publication number | Publication date |
---|---|
CN109710904B (en) | 2023-11-14 |
WO2020098098A1 (en) | 2020-05-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109510737B (en) | Protocol interface testing method and device, computer equipment and storage medium | |
CN109446514A (en) | Construction method, device and the computer equipment of news property identification model | |
US3711863A (en) | Source code comparator computer program | |
CN105653517A (en) | Recognition rate determining method and apparatus | |
CN112651238A (en) | Training corpus expansion method and device and intention recognition model training method and device | |
CN110188761A (en) | Recognition methods, device, computer equipment and the storage medium of identifying code | |
CN109033058B (en) | Contract text verification method, apparatus, computer device and storage medium | |
CN109783785B (en) | Method and device for generating experiment detection report and computer equipment | |
CN105930159A (en) | Image-based interface code generation method and system | |
CN107273032A (en) | Information typesetting method, device and equipment and computer storage medium | |
CN109002768A (en) | Medical bill class text extraction method based on the identification of neural network text detection | |
CN110413961A (en) | The method, apparatus and computer equipment of text scoring are carried out based on disaggregated model | |
CN103488482A (en) | Method and device for generating test cases | |
CN109933754A (en) | Search method, apparatus, computer equipment and the storage medium of change to the contract part | |
CN110010121A (en) | Verify method, apparatus, computer equipment and the storage medium of the art that should answer | |
US11907656B2 (en) | Machine based expansion of contractions in text in digital media | |
CN111357015B (en) | Text conversion method, apparatus, computer device, and computer-readable storage medium | |
CN113343677A (en) | Intention identification method and device, electronic equipment and storage medium | |
CN108400980A (en) | User ID authentication method, device, computer equipment and storage medium | |
CN106066881B (en) | Data processing method and device | |
CN109657210A (en) | Text accuracy rate calculation method, device, computer equipment based on semanteme parsing | |
CN116029080A (en) | Chip storage device design and verification method and device and electronic equipment | |
CN106250755A (en) | For generating the method and device of identifying code | |
CN105095826B (en) | A kind of character recognition method and device | |
CN110399601B (en) | Method and device for identifying document sequence, storage medium and electronic equipment |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |