CN109710904A - Text accuracy rate calculation method, device, computer equipment based on semanteme parsing - Google Patents

Text accuracy rate calculation method, device, computer equipment based on semanteme parsing Download PDF

Info

Publication number
CN109710904A
CN109710904A CN201811348583.1A CN201811348583A CN109710904A CN 109710904 A CN109710904 A CN 109710904A CN 201811348583 A CN201811348583 A CN 201811348583A CN 109710904 A CN109710904 A CN 109710904A
Authority
CN
China
Prior art keywords
text
track
editing distance
distance matrix
character
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201811348583.1A
Other languages
Chinese (zh)
Other versions
CN109710904B (en
Inventor
吴建财
邹芳
邢艳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201811348583.1A priority Critical patent/CN109710904B/en
Priority to PCT/CN2018/124398 priority patent/WO2020098098A1/en
Publication of CN109710904A publication Critical patent/CN109710904A/en
Application granted granted Critical
Publication of CN109710904B publication Critical patent/CN109710904B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/26Speech to text systems

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

This application involves semantic analytic technique field, in particular to a kind of text accuracy rate calculation method based on semanteme parsing, device, computer equipment.When the initial point of template text starts by transcription, establish editing distance matrix, calculate the value of each element in editing distance matrix, track matrix is generated according to the calculating track of the value of each element in editing distance matrix, calculate the similarity of each track in the matrix of track, it screens the highest track of similarity and obtains the first track, part transcription text corresponding terminal on template text is obtained according to the first track, to obtain new template text, part transcription text and new template text are compared again, the accuracy rate of calculating section transcription text, aim to solve the problem that the transcription accuracy rate algorithm of existing text, the full text of text and template text that transcription comes out is compared, when in part, text is come out by transcription, the problem of transcription accuracy rate of text cannot accurately be calculated.

Description

Text accuracy rate calculation method, device, computer equipment based on semanteme parsing
Technical field
This application involves semantic analytic technique field, in particular to a kind of text accuracy rate calculating side based on semanteme parsing Method, device, computer equipment.
Background technique
When counting ASR (speech recognition) engine transcription accuracy rate, common algorithm is editing distance algorithm.The calculation Method changes into minimum edit operation needed for template text (edit operation includes: to replace a character by counting transcription text At another character, it is inserted into a character, deletes a character) number calculates the similarity of transcription text Yu template text (transcription accuracy rate).But under the scene of the real-time transcription accuracy rate in concern ASR engine, the calculated result of the algorithm can not be enabled People is satisfied.Since the algorithm always takes the full text of the text and template text of transcription out to compare, when When only part text is come out by transcription, which can not be accurately calculated turning for the text that this part transcription comes out Write accuracy rate.Therefore, editing distance under the scene of the concern real-time transcription accuracy rate of ASR engine and is not suitable for.
Apply for content
In view of the shortcomings of the prior art, the application propose it is a kind of based on semanteme parsing text accuracy rate calculation method, device, Computer equipment, it is intended to the transcription accuracy rate algorithm for solving existing text, the text and template text that transcription is come out Full text compare, when text is come out by transcription in part, cannot accurately calculate the transcription accuracy rate of text Problem.
The technical solution that the application proposes is:
A kind of text accuracy rate calculation method based on semanteme parsing, which comprises
It obtains since the initial point of template text by the part transcription text of transcription;
Using the length of the length of the template text character two characters of increase as columns, with the part transcription text word The length that the length of symbol increases by two characters is line number, establishes editing distance matrix;
According to the part transcription text, the template text, the value of each element in the editing distance matrix is calculated;
The calculating track of the value of each element in the editing distance matrix is recorded, is generated corresponding with the editing distance matrix Track matrix;
The similarity for calculating each track in the track matrix screens the part transcription text and the template text phase Like a highest track is spent, the first track is obtained;
According to first track, the part transcription text corresponding terminal on the template text is determined, obtain First terminal point;
According to the initial point and the First terminal point of the template text, new template text is obtained from the template text;
The part transcription text and the new template text are compared, the portion is calculated by editing distance algorithm Divide the accuracy rate of transcription text.
Further, described using the length of the length of the template text character two characters of increase as columns, with institute It is line number that the length for stating part transcription text character, which increases the length of two characters, after the step of establishing editing distance matrix, Described according to the part transcription text, the template text, the step of the value of each element in the editing distance matrix is calculated Before rapid, comprising:
The character of the template text is inputted since the third element of the first row of the editing distance matrix;
The character of the part transcription text is inputted since the third element of the first row of the editing distance matrix;
The value for defining second element in the second row of the editing distance matrix is 0;
With the value of second element in the second row of the editing distance matrix for 0 numerical value 1 incremented by successively, institute is initialized State the value of each element of the second row of editing distance matrix;
With the value of second element in the secondary series of the editing distance matrix for 0 numerical value 1 incremented by successively, institute is initialized State the value of each element of the secondary series of editing distance matrix.
Further, the value for each element not being initialised in the editing distance matrix is by its left, the upper left corner, top In the value of some element determine, described according to the part transcription text, the template text, calculate the editor In distance matrix the step of the value of each element in, comprising:
Identify columns, line number at the third element in the third column of the editing distance matrix;
Identify that columns at the third element in the third column of the editing distance matrix, line number respectively correspond institute State the character of template text, the character of the part transcription text;
Judge that columns at the third element in the third column of the editing distance matrix corresponds to the template text This character part transcription text corresponding with line number at the third element in the third of editing distance matrix column Whether this character is equal;
If columns at the third element in the third column of the editing distance matrix corresponds to the template text Character and the editing distance matrix third column in third element at the corresponding part transcription text of line number Character it is equal, then the editing distance matrix third column in third element value for its upper left corner element value;
If columns at the third element in the third column of the editing distance matrix corresponds to the template text Character and the editing distance matrix third column in third element at the corresponding part transcription text of line number Character it is unequal, then the editing distance matrix third column in third element value be its left, the upper left corner, top Element in minimum value add 1 to obtain;
The value of the 4th element in the third column of the editing distance matrix is successively calculated, until completing to calculate the volume Collect the value of each element in distance matrix.
Further, in the calculating track for recording the value of each element in the editing distance matrix, generate with it is described In the step of editing distance matrix corresponding track matrix, comprising:
Record the calculating track of the value of each element in the editing distance matrix;
According to the calculating track of the value of each element in the editing distance matrix, each member in the editing distance matrix is marked The value of element generates origin;
After completing label, track matrix corresponding with the editing distance matrix is generated.
Further, in described the step of calculating the similarity of each track in the track matrix, comprising:
Identify the character of part transcription text described in each track and the corresponding template text in the track matrix The equal number of character, obtain equal character number;
Compare the length of the character of part transcription text described in each track and the corresponding mould in the track matrix The length of the character of plate text chooses the total as character of length length;
The ratio of the equal character number of each track and corresponding character sum in the track matrix is calculated, described in acquisition The similarity of each track in the matrix of track.
Further, determine the part transcription text on the template text according to first track described In the step of corresponding terminal, acquisition First terminal point, comprising:
Mark the last one element in first track;
According to the last one element in first track, the character of the corresponding template text is marked, obtains first Terminal.
Further, in the initial point and the First terminal point according to the template text, from the template text In the step of obtaining new template text, comprising:
The first character that the template text is marked in the template text is initial point;
The character between the initial point and the First terminal point of the template text is intercepted, wherein the initial point of the template text Character between the First terminal point includes that the corresponding character of the initial point of the template text and the First terminal point are corresponding Character;
Text is generated according to the character being truncated to, obtains the new template text.
The application also provides a kind of text accuracy rate computing device based on semanteme parsing, and described device includes:
First obtains module, for obtaining the part transcription text since the initial point of template text by transcription;
Establish module, for increased using the length of the template text character length of two characters as columns, with described The length that the length of part transcription text character increases by two characters is line number, establishes editing distance matrix;
First computing module, for calculating the editing distance square according to the part transcription text, the template text The value of each element in battle array;
Generation module generates and the volume for recording the calculating track of the value of each element in the editing distance matrix Collect the corresponding track matrix of distance matrix;
Screening module, for calculating the similarity of each track in the track matrix, screen the part transcription text with The highest track of template text similarity obtains the first track;
Module is obtained, for determining that the part transcription text is right on the template text according to first track The terminal answered obtains First terminal point;
Second obtains module, for the initial point and the First terminal point according to the template text, from the template text Middle acquisition new template text;
Second computing module passes through editor for comparing the part transcription text and the new template text Distance algorithm calculates the accuracy rate of the part transcription text.
The application also provides a kind of computer equipment, including memory and processor, and the memory is stored with computer The step of program, the processor realizes method described in any of the above embodiments when executing the computer program.
The application also provides a kind of computer readable storage medium, is stored thereon with computer program, the computer journey The step of method described in any of the above embodiments is realized when sequence is executed by processor.
According to above-mentioned technical solution, the application is established and is compiled the utility model has the advantages that when the initial point of template text starts by transcription Distance matrix is collected, the value of each element in editing distance matrix is calculated, according to the calculating rail of the value of each element in editing distance matrix Mark generates track matrix, calculates the similarity of each track in the matrix of track, and the highest track of screening similarity obtains first Track obtains part transcription text corresponding terminal on template text according to the first track, so that new template text is obtained, then Part transcription text and new template text are compared, the accuracy rate of calculating section transcription text, it is intended to solve existing text This transcription accuracy rate algorithm compares the full text of text and template text that transcription comes out, in part text When this is come out by transcription, the problem of cannot accurately calculating the transcription accuracy rate of text.
Detailed description of the invention
Fig. 1 is the flow chart using the text accuracy rate calculation method provided by the embodiments of the present application based on semanteme parsing;
Fig. 2 is the functional module using the text accuracy rate computing device provided by the embodiments of the present application based on semanteme parsing Figure;
Fig. 3 is the structural schematic block diagram using computer equipment provided by the embodiments of the present application.
Specific embodiment
It is with reference to the accompanying drawings and embodiments, right in order to which the objects, technical solutions and advantages of the application are more clearly understood The application is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the application, and It is not used in restriction the application.
As shown in Figure 1, the embodiment of the present application proposes a kind of text accuracy rate calculation method based on semanteme parsing, the side Method the following steps are included:
Step S101, it obtains since the initial point of template text by the part transcription text of transcription.
By transcription since the initial point of template text, and template text is all by transcription, that is, from template text First character starts by transcription, but the end point of transcription is instead of in the last character of template text, in addition to Except the last character of template text, any one character in template text.Due to not being whole to template text The transcription of character, for this purpose, the text obtained by transcription is known as part transcription text since the initial point of template text.
Template text is a correct text, the text for comparing with part transcription text.
Above-mentioned transcription refers to that by ASR (speech recognition) engine be text by speech transcription.
Step S102, using the length of the length of the template text character two characters of increase as columns, with the part The length that the length of transcription text character increases by two characters is line number, establishes editing distance matrix.
In the present embodiment, template text is the writing text for rejecting punctuation mark.Transcription text in part is to reject punctuate The writing text of symbol.
The length for obtaining template text character, the length of two characters is further added by according to template text character length, as Columns.The length of fetching portion transcription text character is further added by the length of two characters according to the length of part transcription text character Degree, as line number, then using the length of the length of template text character two characters of increase as columns, with part transcription text word The length that the length of symbol increases by two characters is line number, establishes editing distance matrix.Template text character length is further added by two The length of character is as the purpose of line number as the length that the length of columns, part transcription text character is further added by two characters In order to distinguish input template text, part transcription text on the first row, first row, and it is first in the second row, secondary series input The value of beginningization.
Specifically, after step s 102, and before step S103, comprising:
The character of the template text is inputted since the third element of the first row of the editing distance matrix;
The character of the part transcription text is inputted since the third element of the first row of the editing distance matrix;
The value for defining second element in the second row of the editing distance matrix is 0;
With the value of second element in the second row of the editing distance matrix for 0 numerical value 1 incremented by successively, institute is initialized State the value of each element of the second row of editing distance matrix;
With the value of second element in the secondary series of the editing distance matrix for 0 numerical value 1 incremented by successively, institute is initialized State the value of each element of the secondary series of editing distance matrix.
The character of input template text in the first row of editing distance matrix, specifically, from the of editing distance matrix The third element of a line starts the character of input template text.Accordingly, the input unit in the first row of editing distance matrix Divide the character of transcription text, specifically, the importation transcription text since the third element of the first row of editing distance matrix This character.The first row of editing distance matrix, the third element of first row start the character of input template text, portion respectively The character for dividing transcription text, each character and each character of part transcription text for making template text are all in the presence pair of editing distance matrix It should be related to, in addition, the numerical value also for the initialization to the second row, secondary series provides corresponding positional relationship.Firstly, definition The value of second element in second row of editing distance matrix is 0, then, with second in the second row of editing distance matrix The value of a element be 0 numerical value 1 incremented by successively, initialize editing distance matrix the second row each element value, for example, editor away from From in the second row of matrix second and third, the values of four, five elements be respectively 0,1,2,3.Define the second of editing distance matrix The value of second element in row is 0, and substantially, the value for also defining second in the secondary series of editing distance matrix element is 0, because of second element in the second row of editing distance matrix and second element in the secondary series of editing distance matrix It is in the same position, is 0 successively with the value of second element in the secondary series of editing distance matrix even if the same element Incremental value 1 initializes the value of each element of the secondary series of editing distance matrix, for example, in the secondary series of editing distance matrix Second and third, the values of four, five elements be respectively 0,1,2,3.In the initialization secondary series of editing distance matrix, the second row After numerical value, the value for calculating each element in editing distance matrix can be can be carried out.
Step S103, according to the part transcription text, the template text, each member in the editing distance matrix is calculated The value of element.
According to part transcription text, template text, it is, in editing distance matrix, the character of part transcription text Whether the character of corresponding templates text is equal, determines the calculation of the value of each element in editing distance matrix, and then calculate and compile Collect the value of each element in distance matrix.
In the present embodiment, the value for each element not being initialised in editing distance matrix is by its left, the upper left corner, top In the value of some element determine.In step s 103, comprising:
Identify columns, line number at the third element in the third column of the editing distance matrix;
Identify that columns at the third element in the third column of the editing distance matrix, line number respectively correspond institute State the character of template text, the character of the part transcription text;
Judge that columns at the third element in the third column of the editing distance matrix corresponds to the template text This character part transcription text corresponding with line number at the third element in the third of editing distance matrix column Whether this character is equal;
If columns at the third element in the third column of the editing distance matrix corresponds to the template text Character and the editing distance matrix third column in third element at the corresponding part transcription text of line number Character it is equal, then the editing distance matrix third column in third element value for its upper left corner element value;
If columns at the third element in the third column of the editing distance matrix corresponds to the template text Character and the editing distance matrix third column in third element at the corresponding part transcription text of line number Character it is unequal, then the editing distance matrix third column in third element value be its left, the upper left corner, top Element in minimum value add 1 to obtain;
The value of the 4th element in the third column of the editing distance matrix is successively calculated, until completing to calculate the volume Collect the value of each element in distance matrix.
Since the value for each element not being initialised in editing distance matrix is by a certain in its left, the upper left corner, top The value of a element determines, when starting to calculate, meet left, the upper left corner, top element all there is only editing for numerical value Distance matrix third column in third element, in other words editing distance matrix third column in third element, In embodiment, the third element in the third column of editing distance matrix is calculated, is identified in the third column of editing distance matrix Columns at third element, line number, at the third element in the third column for obtaining editing distance matrix After columns, line number, columns corresponding templates text at the third element in the third column of identification editing distance matrix Character, editing distance matrix third column in third element at line number corresponding part transcription text character. After obtaining corresponding character, judge that columns at the third element in the third column of editing distance matrix corresponds to mould Line number corresponding part transcription text at third element in the character of plate text and the third column of editing distance matrix Whether character is equal, according to the character of the template text of the third element in the third of editing distance matrix column and corresponding portion Divide the character of transcription text whether equal, the value of the third element in third column for determining editing distance matrix, if compiling Collect the character and editing distance matrix of columns corresponding templates text at the third element in the third column of distance matrix Third column in third element at line number corresponding part transcription text character it is equal, then the of editing distance matrix The value of third element in three column is the value of the element in its upper left corner.If the third member in the third column of editing distance matrix Row at third element in the third of the character of columns corresponding templates text at element and editing distance matrix column The character of number corresponding part transcription text is unequal, then the value of the third element in the third column of editing distance matrix is that it is left Side, the upper left corner, top element in minimum value add 1 to obtain.Third member in the third column that editing distance matrix has been calculated Element value after, successively calculate editing distance matrix third column in the 4th element value, until complete calculate editor away from Value from each element in matrix is being counted it is, then calculating the value of the 4th element in the third column of editing distance matrix Calculated the value of the third column each element of editing distance matrix, then calculate editing distance matrix the 4th column in each element value, directly To the value for calculating each element in last column for completing editing distance matrix, just complete to calculate each element in editing distance matrix Value.
Step S104, record the calculating track of the value of each element in the editing distance matrix, generate with the editor away from Track matrix corresponding from matrix.
In calculating editing distance matrix during the value of each element, the value of each element in editing distance matrix is recorded Track is calculated, it is, the value of each element is determined by the value of which element in editing distance matrix.It is compiled completing to calculate It collects in distance matrix after the value of each element, the calculating track for recording the value of each element in editing distance matrix is also completed, thus Generate track corresponding with editing distance matrix matrix.
In the present embodiment, in step S104, comprising:
Record the calculating track of the value of each element in the editing distance matrix;
According to the calculating track of the value of each element in the editing distance matrix, each member in the editing distance matrix is marked The value of element generates origin;
After completing label, track matrix corresponding with the editing distance matrix is generated.
The calculating track for recording the value of each element in editing distance matrix, according to the value of each element in editing distance matrix Track is calculated, marking the value of each element in editing distance matrix to generate origin indicates that the element passes through with lt in the present embodiment Upper left element calculates, and indicates the element by the element calculating of left with l it is upper to indicate that the element passes through with t The element of side calculates, for example, if the third element in the third column of editing distance matrix is by editing distance matrix What second element in secondary series determined, then the third element in the third column of editing distance matrix inputs lt, if compiling Third element in the third column of volume distance matrix is that the third element in the secondary series by editing distance matrix determines, Then the third element in the third column of editing distance matrix inputs l, if the third in the third column of editing distance matrix Element is that second element in the third column by editing distance matrix determines, then in the third column of editing distance matrix Third element inputs t, to mark the generation origin of the third element in the third column of editing distance matrix.It completes to mark After note, track corresponding with editing distance matrix matrix is generated.
In the present embodiment, in the calculating track according to the value of each element in the editing distance matrix, institute is marked In the step of stating the value generation origin of each element in editing distance matrix, comprising:
It is every record the calculating track of the value of an element in the editing distance matrix when, mark the editing distance square The value of the element generates origin in battle array;
Until the value of each element in the editing distance matrix is marked to generate origin.
As soon as the calculating track of the value of element in every record editing distance matrix, marking at once should in editing distance matrix The value of element generates origin, it is, recording the calculating track of the value of each element in editing distance matrix on one side, label is compiled on one side The value for collecting each element in distance matrix generates origin.
In some embodiments, in the calculating track according to the value of each element in the editing distance matrix, label The value of each element generated in the step of origin in the editing distance matrix, comprising:
It completes to record in the editing distance matrix after the calculating track of the value of each element, according to the editing distance The calculating track of the value of each element in matrix marks the value of each element in the editing distance matrix to generate origin.
In completing editing distance matrix after the calculating track of the value of each element, it can just trigger and start to execute label editor The value of each element generates origin in distance matrix, until the value for completing each element in label editing distance matrix generates origin.? It is exactly that the calculating track of the value of each element in not completing editing distance matrix will not be marked in editing distance matrix each The value of element generates origin.
Step S105, the similarity for calculating each track in the track matrix, screen the part transcription text with it is described The highest track of template text similarity obtains the first track.
After generating track matrix, the similarity of each track in the matrix of track is calculated, in the phase for completing each track of calculating After degree, screen fraction transcription text and the highest track of template text similarity obtain the first track, first rail Mark is considered that transcription text in part corresponds to track on template text.
In the present embodiment, in described the step of calculating the similarity of each track in the track matrix, comprising:
Identify the character of part transcription text described in each track and the corresponding template text in the track matrix The equal number of character, obtain equal character number;
Compare the length of the character of part transcription text described in each track and the corresponding mould in the track matrix The length of the character of plate text chooses the total as character of length length;
The ratio of the equal character number of each track and corresponding character sum in the track matrix is calculated, described in acquisition The similarity of each track in the matrix of track.
After generating track matrix, the character of part transcription text and corresponding mould in each track are identified in the matrix of track The equal number of the character of plate text, obtains equal character number, the equal character number in each track in obtaining track matrix Later, compare the length of the length of the character of part transcription text and the character of corresponding template text in each track in the matrix of track Degree chooses the total as character of length length, if the length of the character of part transcription text is greater than in a track in the matrix of track The length of the character of corresponding template text, then in the matrix of track in a track character of selected part transcription text length For character sum.If the length of the character of part transcription text is less than the word of corresponding template text in a track in the matrix of track The length of symbol, then the length for choosing the character of template text in a track in the matrix of track is character sum.Choosing length Long is used as after character sum, calculates the equal character number of each track and the ratio of corresponding character sum in the matrix of track Value obtains the similarity of each track in the matrix of track after completing ratio calculated.
Step S106, according to first track, determine that the part transcription text is corresponding on the template text Terminal obtains First terminal point.
After obtaining the first track, determined according to the first track since there are terminals in the matrix of track for the first track Part transcription text corresponding terminal on template text, to obtain First terminal point.
In the present embodiment, in step s 106, comprising:
Mark the last one element in first track;
According to the last one element in first track, the character of the corresponding template text is marked, obtains first Terminal.
After obtaining the first track, the last one element in the first track of label, according to the last one in the first track Element obtains the character of the template text in the first track in the last one element respective column, marks corresponding template text Character, to obtain First terminal point.
Step S107, it according to the initial point and the First terminal point of the template text, is obtained from the template text new Template text.
After obtaining First terminal point, according to the initial point and First terminal point of template text, the text between two o'clock, packet are obtained The corresponding character of initial point, First terminal point of template text is included, to obtain new template text from template text.
In the present embodiment, in step s 107, comprising:
The first character that the template text is marked in the template text is initial point;
The character between the initial point and the First terminal point of the template text is intercepted, wherein the initial point of the template text Character between the First terminal point includes that the corresponding character of the initial point of the template text and the First terminal point are corresponding Character;
Text is generated according to the character being truncated to, obtains the new template text.
After obtaining First terminal point, it is initial point that the first character of template text is marked in template text, intercepts mould Character between the initial point and First terminal point of plate text, wherein the character between the initial point and First terminal point of template text includes mould The corresponding character of the initial point of plate text and the corresponding character of First terminal point.After the character being truncated to, according to the word being truncated to Symbol generates the text of format same as template text, obtains new template text.
Step S108, the part transcription text and the new template text are compared, passes through editing distance algorithm Calculate the accuracy rate of the part transcription text.
After obtaining new template text, part transcription text and new template text are compared, are not and template Text compares, by the accuracy rate of editing distance algorithm calculating section transcription text, to solve turning for existing text Accuracy rate algorithm is write, the full text of text and template text that transcription comes out is compared, text is turned in part When writing out, the problem of cannot accurately calculating the transcription accuracy rate of text.
In conclusion establishing editing distance matrix when the initial point of template text starts by transcription, editing distance square is calculated The value of each element in battle array generates track matrix according to the calculating track of the value of each element in editing distance matrix, calculates track square The similarity of each track in battle array, the highest track of screening similarity obtain the first track, obtain part according to the first track Transcription text corresponding terminal on template text, to obtain new template text, then part transcription text and new template is literary Originally it compares, the accuracy rate of calculating section transcription text, it is intended to the transcription accuracy rate algorithm for solving existing text, it will The full text of text and template text that transcription comes out compares, when in part, text is come out by transcription, Bu Nengzhun The problem of really calculating the transcription accuracy rate of text.
As shown in Fig. 2, the embodiment of the present application proposes a kind of text accuracy rate computing device 1 based on semanteme parsing, device 1 Including the first acquisition module 11, establish module 12, the first computing module 13, generation module 14, screening module 15, acquisition module 16, second module 17 and the second computing module 18 are obtained.
First obtains module 11, for obtaining the part transcription text since the initial point of template text by transcription.
By transcription since the initial point of template text, and template text is all by transcription, that is, from template text First character starts by transcription, but the end point of transcription is instead of in the last character of template text, in addition to Except the last character of template text, any one character in template text.Due to not being whole to template text The transcription of character, for this purpose, the text obtained by transcription is known as part transcription text since the initial point of template text.
Template text is a correct text, the text for comparing with part transcription text.
Above-mentioned transcription refers to that by ASR (speech recognition) engine be text by speech transcription.
Establish module 12, for increased using the length of the template text character length of two characters as columns, with institute It is line number that the length for stating part transcription text character, which increases the length of two characters, establishes editing distance matrix.
In the present embodiment, template text is the writing text for rejecting punctuation mark.Transcription text in part is to reject punctuate The writing text of symbol.
The length for obtaining template text character, the length of two characters is further added by according to template text character length, as Columns.The length of fetching portion transcription text character is further added by the length of two characters according to the length of part transcription text character Degree, as line number, then using the length of the length of template text character two characters of increase as columns, with part transcription text word The length that the length of symbol increases by two characters is line number, establishes editing distance matrix.Template text character length is further added by two The length of character is as the purpose of line number as the length that the length of columns, part transcription text character is further added by two characters In order to distinguish input template text, part transcription text on the first row, first row, and it is first in the second row, secondary series input The value of beginningization.
Specifically, device 1 includes:
First input module, for inputting the mould since the third element of the first row of the editing distance matrix The character of plate text;
Second input module, for inputting the portion since the third element of the first row of the editing distance matrix Divide the character of transcription text;
Definition module, the value of second element in the second row for defining the editing distance matrix are 0;
First initialization module, for the value of second element in the second row of the editing distance matrix be 0 according to Secondary incremental value 1 initializes the value of each element of the second row of the editing distance matrix;
Second initialization module, for the value of second element in the secondary series of the editing distance matrix be 0 according to Secondary incremental value 1 initializes the value of each element of the secondary series of the editing distance matrix.
The character of input template text in the first row of editing distance matrix, specifically, from the of editing distance matrix The third element of a line starts the character of input template text.Accordingly, the input unit in the first row of editing distance matrix Divide the character of transcription text, specifically, the importation transcription text since the third element of the first row of editing distance matrix This character.The first row of editing distance matrix, the third element of first row start the character of input template text, portion respectively The character for dividing transcription text, each character and each character of part transcription text for making template text are all in the presence pair of editing distance matrix It should be related to, in addition, the numerical value also for the initialization to the second row, secondary series provides corresponding positional relationship.Firstly, definition The value of second element in second row of editing distance matrix is 0, then, with second in the second row of editing distance matrix The value of a element be 0 numerical value 1 incremented by successively, initialize editing distance matrix the second row each element value, for example, editor away from From in the second row of matrix second and third, the values of four, five elements be respectively 0,1,2,3.Define the second of editing distance matrix The value of second element in row is 0, and substantially, the value for also defining second in the secondary series of editing distance matrix element is 0, because of second element in the second row of editing distance matrix and second element in the secondary series of editing distance matrix It is in the same position, is 0 successively with the value of second element in the secondary series of editing distance matrix even if the same element Incremental value 1 initializes the value of each element of the secondary series of editing distance matrix, for example, in the secondary series of editing distance matrix Second and third, the values of four, five elements be respectively 0,1,2,3.In the initialization secondary series of editing distance matrix, the second row After numerical value, the value for calculating each element in editing distance matrix can be can be carried out.
First computing module 13, for calculating the editing distance according to the part transcription text, the template text The value of each element in matrix.
According to part transcription text, template text, it is, in editing distance matrix, the character of part transcription text Whether the character of corresponding templates text is equal, determines the calculation of the value of each element in editing distance matrix, and then calculate and compile Collect the value of each element in distance matrix.
In the present embodiment, the value for each element not being initialised in editing distance matrix is by its left, the upper left corner, top In the value of some element determine.First computing module 13 includes:
First identification module, for identification the editing distance matrix third column in third element at column Number, line number;
Second identification module, for identification the editing distance matrix third column in third element at column Number, line number respectively correspond the character of the character of the template text, the part transcription text;
First judgment module, for judge the editing distance matrix third arrange in third element at column Line number pair at third element in the character of the corresponding template text of number and the third column of the editing distance matrix Answer the character of the part transcription text whether equal;If locating for the third element in the third column of the editing distance matrix In columns correspond to the template text character and the editing distance matrix third column in third element at The character that line number corresponds to the part transcription text is equal, then the third element in the third column of the editing distance matrix Value is the value of the element in its upper left corner;If columns pair at the third element in the third column of the editing distance matrix Answer the character of template text institute corresponding with line number at the third element in the third of editing distance matrix column The character for stating part transcription text is unequal, then the value of the third element in the third column of the editing distance matrix is that it is left Side, the upper left corner, top element in minimum value add 1 to obtain;
First sub- computing module, the third for successively calculating the editing distance matrix arrange in the 4th element Value, until completing the value of each element in the calculating editing distance matrix.
Since the value for each element not being initialised in editing distance matrix is by a certain in its left, the upper left corner, top The value of a element determines, when starting to calculate, meet left, the upper left corner, top element all there is only editing for numerical value Distance matrix third column in third element, in other words editing distance matrix third column in third element, In embodiment, the third element in the third column of editing distance matrix is calculated, is identified in the third column of editing distance matrix Columns at third element, line number, at the third element in the third column for obtaining editing distance matrix After columns, line number, columns corresponding templates text at the third element in the third column of identification editing distance matrix Character, editing distance matrix third column in third element at line number corresponding part transcription text character. After obtaining corresponding character, judge that columns at the third element in the third column of editing distance matrix corresponds to mould Line number corresponding part transcription text at third element in the character of plate text and the third column of editing distance matrix Whether character is equal, according to the character of the template text of the third element in the third of editing distance matrix column and corresponding portion Divide the character of transcription text whether equal, the value of the third element in third column for determining editing distance matrix, if compiling Collect the character and editing distance matrix of columns corresponding templates text at the third element in the third column of distance matrix Third column in third element at line number corresponding part transcription text character it is equal, then the of editing distance matrix The value of third element in three column is the value of the element in its upper left corner.If the third member in the third column of editing distance matrix Row at third element in the third of the character of columns corresponding templates text at element and editing distance matrix column The character of number corresponding part transcription text is unequal, then the value of the third element in the third column of editing distance matrix is that it is left Side, the upper left corner, top element in minimum value add 1 to obtain.Third member in the third column that editing distance matrix has been calculated Element value after, successively calculate editing distance matrix third column in the 4th element value, until complete calculate editor away from Value from each element in matrix is being counted it is, then calculating the value of the 4th element in the third column of editing distance matrix Calculated the value of the third column each element of editing distance matrix, then calculate editing distance matrix the 4th column in each element value, directly To the value for calculating each element in last column for completing editing distance matrix, just complete to calculate each element in editing distance matrix Value.
Generation module 14, for recording the calculating track of the value of each element in the editing distance matrix, generate with it is described The corresponding track matrix of editing distance matrix.
In calculating editing distance matrix during the value of each element, the value of each element in editing distance matrix is recorded Track is calculated, it is, the value of each element is determined by the value of which element in editing distance matrix.It is compiled completing to calculate It collects in distance matrix after the value of each element, the calculating track for recording the value of each element in editing distance matrix is also completed, thus Generate track corresponding with editing distance matrix matrix.
In the present embodiment, generation module 14 includes:
First logging modle, for recording the calculating track of the value of each element in the editing distance matrix;
First mark module, for the calculating track according to the value of each element in the editing distance matrix, described in label The value of each element generates origin in editing distance matrix;
First generation module, for generating track matrix corresponding with the editing distance matrix after completing label.
The calculating track for recording the value of each element in editing distance matrix, according to the value of each element in editing distance matrix Track is calculated, marking the value of each element in editing distance matrix to generate origin indicates that the element passes through with lt in the present embodiment Upper left element calculates, and indicates the element by the element calculating of left with l it is upper to indicate that the element passes through with t The element of side calculates, for example, if the third element in the third column of editing distance matrix is by editing distance matrix What second element in secondary series determined, then the third element in the third column of editing distance matrix inputs lt, if compiling Third element in the third column of volume distance matrix is that the third element in the secondary series by editing distance matrix determines, Then the third element in the third column of editing distance matrix inputs l, if the third in the third column of editing distance matrix Element is that second element in the third column by editing distance matrix determines, then in the third column of editing distance matrix Third element inputs t, to mark the generation origin of the third element in the third column of editing distance matrix.It completes to mark After note, track corresponding with editing distance matrix matrix is generated.
In the present embodiment, the first mark module includes:
First sub- mark module, in every calculating track for recording the value of an element in the editing distance matrix When, mark the value of the element in the editing distance matrix to generate origin;
First son label completes module, for until the value of each element in the editing distance matrix is marked to generate origin.
As soon as the calculating track of the value of element in every record editing distance matrix, marking at once should in editing distance matrix The value of element generates origin, it is, recording the calculating track of the value of each element in editing distance matrix on one side, label is compiled on one side The value for collecting each element in distance matrix generates origin.
In some embodiments, the first mark module includes:
Second sub- mark module, for complete record the value of each element in the editing distance matrix calculating track it Afterwards, according to the calculating track of the value of each element in the editing distance matrix, each element in the editing distance matrix is marked Value generates origin.
In completing editing distance matrix after the calculating track of the value of each element, it can just trigger and start to execute label editor The value of each element generates origin in distance matrix, until the value for completing each element in label editing distance matrix generates origin.? It is exactly that the calculating track of the value of each element in not completing editing distance matrix will not be marked in editing distance matrix each The value of element generates origin.
Screening module 15 screens the part transcription text for calculating the similarity of each track in the track matrix With the highest track of the template text similarity, the first track is obtained.
After generating track matrix, the similarity of each track in the matrix of track is calculated, in the phase for completing each track of calculating After degree, screen fraction transcription text and the highest track of template text similarity obtain the first track, first rail Mark is considered that transcription text in part corresponds to track on template text.
In the present embodiment, screening module 15 includes:
Third identification module, for identification in the track matrix character of transcription text in part described in each track with it is right The equal number of the character for the template text answered, obtains equal character number;
First comparison module, the length of the character for transcription text in part described in each track in the track matrix Degree is used as character sum with the length of the character of the corresponding template text, selection length length;
Third computing module, the equal character number and corresponding character for calculating each track in the track matrix are total Several ratio obtains the similarity of each track in the track matrix.
After generating track matrix, the character of part transcription text and corresponding mould in each track are identified in the matrix of track The equal number of the character of plate text, obtains equal character number, the equal character number in each track in obtaining track matrix Later, compare the length of the length of the character of part transcription text and the character of corresponding template text in each track in the matrix of track Degree chooses the total as character of length length, if the length of the character of part transcription text is greater than in a track in the matrix of track The length of the character of corresponding template text, then in the matrix of track in a track character of selected part transcription text length For character sum.If the length of the character of part transcription text is less than the word of corresponding template text in a track in the matrix of track The length of symbol, then the length for choosing the character of template text in a track in the matrix of track is character sum.Choosing length Long is used as after character sum, calculates the equal character number of each track and the ratio of corresponding character sum in the matrix of track Value obtains the similarity of each track in the matrix of track after completing ratio calculated.
Module 16 is obtained, for determining the part transcription text on the template text according to first track Corresponding terminal obtains First terminal point.
After obtaining the first track, determined according to the first track since there are terminals in the matrix of track for the first track Part transcription text corresponding terminal on template text, to obtain First terminal point.
In the present embodiment, obtaining module 16 includes:
Second mark module, for marking the last one element in first track;
First obtains module, for marking the corresponding template text according to the last one element in first track This character obtains First terminal point.
After obtaining the first track, the last one element in the first track of label, according to the last one in the first track Element obtains the character of the template text in the first track in the last one element respective column, marks corresponding template text Character, to obtain First terminal point.
Second obtains module 17, for the initial point and the First terminal point according to the template text, from the template text New template text is obtained in this.
After obtaining First terminal point, according to the initial point and First terminal point of template text, the text between two o'clock, packet are obtained The corresponding character of initial point, First terminal point of template text is included, to obtain new template text from template text.
In the present embodiment, the second acquisition module 17 includes:
Third mark module is initial point for marking the first character of the template text in the template text;
Interception module, the character between initial point and the First terminal point for intercepting the template text, wherein described Character between the initial point of template text and the First terminal point includes the corresponding character of the initial point of the template text and described The corresponding character of First terminal point;
Second sub-acquisition module obtains the new template text for generating text according to the character being truncated to.
After obtaining First terminal point, it is initial point that the first character of template text is marked in template text, intercepts mould Character between the initial point and First terminal point of plate text, wherein the character between the initial point and First terminal point of template text includes mould The corresponding character of the initial point of plate text and the corresponding character of First terminal point.After the character being truncated to, according to the word being truncated to Symbol generates the text of format same as template text, obtains new template text.
Second computing module 18 passes through volume for comparing the part transcription text and the new template text Collect the accuracy rate that distance algorithm calculates the part transcription text.
After obtaining new template text, part transcription text and new template text are compared, are not and template Text compares, by the accuracy rate of editing distance algorithm calculating section transcription text, to solve turning for existing text Accuracy rate algorithm is write, the full text of text and template text that transcription comes out is compared, text is turned in part When writing out, the problem of cannot accurately calculating the transcription accuracy rate of text.
In conclusion establishing editing distance matrix when the initial point of template text starts by transcription, editing distance square is calculated The value of each element in battle array generates track matrix according to the calculating track of the value of each element in editing distance matrix, calculates track square The similarity of each track in battle array, the highest track of screening similarity obtain the first track, obtain part according to the first track Transcription text corresponding terminal on template text, to obtain new template text, then part transcription text and new template is literary Originally it compares, the accuracy rate of calculating section transcription text, it is intended to the transcription accuracy rate algorithm for solving existing text, it will The full text of text and template text that transcription comes out compares, when in part, text is come out by transcription, Bu Nengzhun The problem of really calculating the transcription accuracy rate of text.
As shown in figure 3, also providing a kind of computer equipment in the embodiment of the present application, which can be service Device, internal structure can be as shown in Figure 3.The computer equipment includes processor, the memory, net connected by system bus Network interface and database.Wherein, the processor of the Computer Design is for providing calculating and control ability.The computer equipment Memory includes non-volatile memory medium, built-in storage.The non-volatile memory medium is stored with operating system, computer journey Sequence and database.The internal memory provides environment for the operation of operating system and computer program in non-volatile memory medium. The database of the computer equipment is used to store the data such as the model of text accuracy rate calculation method based on semanteme parsing.The meter The network interface for calculating machine equipment is used to communicate with external terminal by network connection.When the computer program is executed by processor To realize a kind of text accuracy rate calculation method based on semanteme parsing.
Above-mentioned processor executes the step of above-mentioned text accuracy rate calculation method based on semanteme parsing: obtaining from template text This initial point starts by the part transcription text of transcription;It is with the length that the length of the template text character increases by two characters Columns, the length for increasing by two characters using the length of the part transcription text character establish editing distance matrix as line number;Root According to the part transcription text, the template text, the value of each element in the editing distance matrix is calculated;Record the editor The calculating track of the value of each element in distance matrix generates track matrix corresponding with the editing distance matrix;Described in calculating The similarity of each track in the matrix of track screens the part transcription text and the highest rail of the template text similarity Mark obtains the first track;According to first track, the part transcription text corresponding end on the template text is determined Point obtains First terminal point;According to the initial point and the First terminal point of the template text, new mould is obtained from the template text Plate text;The part transcription text and the new template text are compared, the portion is calculated by editing distance algorithm Divide the accuracy rate of transcription text.
In one embodiment, the above-mentioned length using the template text character increase by the length of two characters as columns, The length for increasing by two characters using the length of the part transcription text character as line number, the step of establishing editing distance matrix it Afterwards, the value of each element in the editing distance matrix is calculated according to the part transcription text, the template text described Before step, comprising:
The character of the template text is inputted since the third element of the first row of the editing distance matrix;
The character of the part transcription text is inputted since the third element of the first row of the editing distance matrix;
The value for defining second element in the second row of the editing distance matrix is 0;
With the value of second element in the second row of the editing distance matrix for 0 numerical value 1 incremented by successively, institute is initialized State the value of each element of the second row of editing distance matrix;
With the value of second element in the secondary series of the editing distance matrix for 0 numerical value 1 incremented by successively, institute is initialized State the value of each element of the secondary series of editing distance matrix.
In one embodiment, the value for each element not being initialised in above-mentioned editing distance matrix is by its left, upper left The value of some element in angle, top determines, described according to the part transcription text, the template text, calculates In the editing distance matrix the step of value of each element in, comprising:
Identify columns, line number at the third element in the third column of the editing distance matrix;
Identify that columns at the third element in the third column of the editing distance matrix, line number respectively correspond institute State the character of template text, the character of the part transcription text;
Judge that columns at the third element in the third column of the editing distance matrix corresponds to the template text This character part transcription text corresponding with line number at the third element in the third of editing distance matrix column Whether this character is equal;
If columns at the third element in the third column of the editing distance matrix corresponds to the template text Character and the editing distance matrix third column in third element at the corresponding part transcription text of line number Character it is equal, then the editing distance matrix third column in third element value for its upper left corner element value;
If columns at the third element in the third column of the editing distance matrix corresponds to the template text Character and the editing distance matrix third column in third element at the corresponding part transcription text of line number Character it is unequal, then the editing distance matrix third column in third element value be its left, the upper left corner, top Element in minimum value add 1 to obtain;
The value of the 4th element in the third column of the editing distance matrix is successively calculated, until completing to calculate the volume Collect the value of each element in distance matrix.
In one embodiment, the above-mentioned calculating track for recording the value of each element in the editing distance matrix, generate with In the step of editing distance matrix corresponding track matrix, comprising:
Record the calculating track of the value of each element in the editing distance matrix;
According to the calculating track of the value of each element in the editing distance matrix, each member in the editing distance matrix is marked The value of element generates origin;
After completing label, track matrix corresponding with the editing distance matrix is generated.
In one embodiment, in the above-mentioned calculating track matrix the step of similarity of each track, comprising:
Identify the character of part transcription text described in each track and the corresponding template text in the track matrix The equal number of character, obtain equal character number;
Compare the length of the character of part transcription text described in each track and the corresponding mould in the track matrix The length of the character of plate text chooses the total as character of length length;
The ratio of the equal character number of each track and corresponding character sum in the track matrix is calculated, described in acquisition The similarity of each track in the matrix of track.
In one embodiment, above-mentioned according to first track, determine the part transcription text in the template text In the step of corresponding terminal in sheet, acquisition First terminal point, comprising:
Mark the last one element in first track;
According to the last one element in first track, the character of the corresponding template text is marked, obtains first Terminal.
In one embodiment, above-mentioned initial point and the First terminal point according to the template text, from the template text In the step of obtaining new template text in this, comprising:
The first character that the template text is marked in the template text is initial point;
The character between the initial point and the First terminal point of the template text is intercepted, wherein the initial point of the template text Character between the First terminal point includes that the corresponding character of the initial point of the template text and the First terminal point are corresponding Character;
Text is generated according to the character being truncated to, obtains the new template text.
It will be understood by those skilled in the art that structure shown in Fig. 3, only part relevant to application scheme is tied The block diagram of structure does not constitute the restriction for the computer equipment being applied thereon to application scheme.
The computer equipment of the embodiment of the present application establishes editing distance square when the initial point of template text starts by transcription Battle array calculates the value of each element in editing distance matrix, generates rail according to the calculating track of the value of each element in editing distance matrix Mark matrix calculates the similarity of each track in the matrix of track, and the highest track of screening similarity obtains the first track, according to First track obtains part transcription text corresponding terminal on template text, to obtain new template text, then part is turned It writes text to compare with new template text, the accuracy rate of calculating section transcription text, it is intended to solve the transcription of existing text Accuracy rate algorithm compares the full text of text and template text that transcription comes out, and in part, text is by transcription When out, the problem of cannot accurately calculating the transcription accuracy rate of text.
One embodiment of the application also provides a kind of computer readable storage medium, is stored thereon with computer program, calculates Machine program realizes a kind of text accuracy rate calculation method based on semanteme parsing when being executed by processor, specifically: it obtains from mould The initial point of plate text starts by the part transcription text of transcription;Increase the length of two characters with the length of the template text character Degree is columns, the length for increasing by two characters using the length of the part transcription text character as line number, establishes editing distance square Battle array;According to the part transcription text, the template text, the value of each element in the editing distance matrix is calculated;Record institute The calculating track of the value of each element in editing distance matrix is stated, track matrix corresponding with the editing distance matrix is generated;Meter The similarity for calculating each track in the track matrix, screens the part transcription text and the template text similarity is highest One track obtains the first track;According to first track, determine that the part transcription text is right on the template text The terminal answered obtains First terminal point;According to the initial point and the First terminal point of the template text, obtained from the template text Take new template text;The part transcription text and the new template text are compared, calculated by editing distance algorithm The accuracy rate of the part transcription text.
In one embodiment, the above-mentioned length using the template text character increase by the length of two characters as columns, The length for increasing by two characters using the length of the part transcription text character as line number, the step of establishing editing distance matrix it Afterwards, the value of each element in the editing distance matrix is calculated according to the part transcription text, the template text described Before step, comprising:
The character of the template text is inputted since the third element of the first row of the editing distance matrix;
The character of the part transcription text is inputted since the third element of the first row of the editing distance matrix;
The value for defining second element in the second row of the editing distance matrix is 0;
With the value of second element in the second row of the editing distance matrix for 0 numerical value 1 incremented by successively, institute is initialized State the value of each element of the second row of editing distance matrix;
With the value of second element in the secondary series of the editing distance matrix for 0 numerical value 1 incremented by successively, institute is initialized State the value of each element of the secondary series of editing distance matrix.
In one embodiment, the value for each element not being initialised in above-mentioned editing distance matrix is by its left, upper left The value of some element in angle, top determines, described according to the part transcription text, the template text, calculates In the editing distance matrix the step of value of each element in, comprising:
Identify columns, line number at the third element in the third column of the editing distance matrix;
Identify that columns at the third element in the third column of the editing distance matrix, line number respectively correspond institute State the character of template text, the character of the part transcription text;
Judge that columns at the third element in the third column of the editing distance matrix corresponds to the template text This character part transcription text corresponding with line number at the third element in the third of editing distance matrix column Whether this character is equal;
If columns at the third element in the third column of the editing distance matrix corresponds to the template text Character and the editing distance matrix third column in third element at the corresponding part transcription text of line number Character it is equal, then the editing distance matrix third column in third element value for its upper left corner element value;
If columns at the third element in the third column of the editing distance matrix corresponds to the template text Character and the editing distance matrix third column in third element at the corresponding part transcription text of line number Character it is unequal, then the editing distance matrix third column in third element value be its left, the upper left corner, top Element in minimum value add 1 to obtain;
The value of the 4th element in the third column of the editing distance matrix is successively calculated, until completing to calculate the volume Collect the value of each element in distance matrix.
In one embodiment, the above-mentioned calculating track for recording the value of each element in the editing distance matrix, generate with In the step of editing distance matrix corresponding track matrix, comprising:
Record the calculating track of the value of each element in the editing distance matrix;
According to the calculating track of the value of each element in the editing distance matrix, each member in the editing distance matrix is marked The value of element generates origin;
After completing label, track matrix corresponding with the editing distance matrix is generated.
In one embodiment, in the above-mentioned calculating track matrix the step of similarity of each track, comprising:
Identify the character of part transcription text described in each track and the corresponding template text in the track matrix The equal number of character, obtain equal character number;
Compare the length of the character of part transcription text described in each track and the corresponding mould in the track matrix The length of the character of plate text chooses the total as character of length length;
The ratio of the equal character number of each track and corresponding character sum in the track matrix is calculated, described in acquisition The similarity of each track in the matrix of track.
In one embodiment, above-mentioned according to first track, determine the part transcription text in the template text In the step of corresponding terminal in sheet, acquisition First terminal point, comprising:
Mark the last one element in first track;
According to the last one element in first track, the character of the corresponding template text is marked, obtains first Terminal.
In one embodiment, above-mentioned initial point and the First terminal point according to the template text, from the template text In the step of obtaining new template text in this, comprising:
The first character that the template text is marked in the template text is initial point;
The character between the initial point and the First terminal point of the template text is intercepted, wherein the initial point of the template text Character between the First terminal point includes that the corresponding character of the initial point of the template text and the First terminal point are corresponding Character;
Text is generated according to the character being truncated to, obtains the new template text.
The storage medium of the embodiment of the present application establishes editing distance matrix when the initial point of template text starts by transcription, The value for calculating each element in editing distance matrix generates track square according to the calculating track of the value of each element in editing distance matrix Battle array calculates the similarity of each track in the matrix of track, and the highest track of screening similarity obtains the first track, according to first Track obtains part transcription text corresponding terminal on template text, to obtain new template text, then part transcription is literary This is compared with new template text, the accuracy rate of calculating section transcription text, it is intended to which the transcription for solving existing text is accurate Rate algorithm compares the full text of text and template text that transcription comes out, and in part, text is come out by transcription When, the problem of cannot accurately calculating the transcription accuracy rate of text.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with Relevant hardware is instructed to complete by computer program, the computer program can be stored in a non-volatile computer In read/write memory medium, the computer program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, Any reference used in provided herein and embodiment to memory, storage, database or other media, Including non-volatile and/or volatile memory.Nonvolatile memory may include read-only memory (ROM), programming ROM (PROM), electrically programmable ROM (EPROM), electrically erasable ROM (EEPROM) or flash memory.Volatile memory may include Random access memory (RAM) or external cache.By way of illustration and not limitation, RAM is available in many forms, Such as static state RAM (SRAM), dynamic ram (DRAM), synchronous dram (SDRAM), double speed are according to rate SDRAM (SSRSDRAM), enhancing Type SDRAM (ESDRAM), synchronization link (Synchl ink) DRAM (SLDRAM), memory bus (Rambus) direct RAM (RDRAM), direct memory bus dynamic ram (DRDRAM) and memory bus dynamic ram (RDRAM) etc..
The foregoing is merely the preferred embodiments of the application, not to limit the application, all essences in the application Made any modifications, equivalent replacements, and improvements etc. within mind and principle should all include within the scope of protection of this application.

Claims (10)

1. a kind of text accuracy rate calculation method based on semanteme parsing, which is characterized in that the described method includes:
It obtains since the initial point of template text by the part transcription text of transcription;
Using the length of the length of the template text character two characters of increase as columns, with the part transcription text character The length that length increases by two characters is line number, establishes editing distance matrix;
According to the part transcription text, the template text, the value of each element in the editing distance matrix is calculated;
The calculating track of the value of each element in the editing distance matrix is recorded, rail corresponding with the editing distance matrix is generated Mark matrix;
The similarity for calculating each track in the track matrix screens the part transcription text and the template text similarity A highest track obtains the first track;
According to first track, the part transcription text corresponding terminal on the template text is determined, obtain first Terminal;
According to the initial point and the First terminal point of the template text, new template text is obtained from the template text;
The part transcription text and the new template text are compared, the part is calculated by editing distance algorithm and is turned Write the accuracy rate of text.
2. it is according to claim 1 based on semanteme parsing text accuracy rate calculation method, which is characterized in that it is described with The length that the length of the template text character increases by two characters is columns, the length increasing with the part transcription text character The length for adding two characters is line number, after the step of establishing editing distance matrix, it is described according to the part transcription text, The template text, before the step of calculating the value of each element in the editing distance matrix, comprising:
The character of the template text is inputted since the third element of the first row of the editing distance matrix;
The character of the part transcription text is inputted since the third element of the first row of the editing distance matrix;
The value for defining second element in the second row of the editing distance matrix is 0;
With the value of second element in the second row of the editing distance matrix for 0 numerical value 1 incremented by successively, the volume is initialized Collect the value of each element of the second row of distance matrix;
With the value of second element in the secondary series of the editing distance matrix for 0 numerical value 1 incremented by successively, the volume is initialized Collect the value of each element of the secondary series of distance matrix.
3. the text accuracy rate calculation method according to claim 2 based on semanteme parsing, which is characterized in that the editor The value for each element not being initialised in distance matrix determines by the value of some element in its left, the upper left corner, top, Described according to the part transcription text, the template text, the step of the value of each element in the editing distance matrix is calculated In rapid, comprising:
Identify columns, line number at the third element in the third column of the editing distance matrix;
Identify that columns at the third element in the third column of the editing distance matrix, line number respectively correspond the mould The character of the character of plate text, the part transcription text;
Judge that columns at the third element in the third column of the editing distance matrix corresponds to the template text The character part transcription text corresponding with line number at the third element in the third of editing distance matrix column Whether character is equal;
If columns at the third element in the third column of the editing distance matrix corresponds to the word of the template text Accord with the word of the part transcription text corresponding with line number at the third element in the third of editing distance matrix column Accord with it is equal, then the editing distance matrix third column in third element value for its upper left corner element value;
If columns at the third element in the third column of the editing distance matrix corresponds to the word of the template text Accord with the word of the part transcription text corresponding with line number at the third element in the third of editing distance matrix column Accord with it is unequal, then the editing distance matrix third column in third element value be its left, the upper left corner, top member Minimum value adds 1 to obtain in element;
Successively calculate the editing distance matrix third column in the 4th element value, until complete calculate it is described editor away from Value from each element in matrix.
4. the text accuracy rate calculation method according to claim 1 based on semanteme parsing, which is characterized in that in the note The calculating track of the value of each element in the editing distance matrix is recorded, track matrix corresponding with the editing distance matrix is generated The step of in, comprising:
Record the calculating track of the value of each element in the editing distance matrix;
According to the calculating track of the value of each element in the editing distance matrix, each element in the editing distance matrix is marked Value generates origin;
After completing label, track matrix corresponding with the editing distance matrix is generated.
5. the text accuracy rate calculation method according to claim 1 based on semanteme parsing, which is characterized in that in the meter In the step of calculating the similarity of each track in the track matrix, comprising:
Identify the word of the character of part transcription text described in each track and the corresponding template text in the track matrix Equal number is accorded with, equal character number is obtained;
It is literary to compare the length of the character of part transcription text described in each track and the corresponding template in the track matrix The length of this character chooses the total as character of length length;
The ratio for calculating the equal character number of each track and corresponding character sum in the track matrix, obtains the track The similarity of each track in matrix.
6. the text accuracy rate calculation method according to claim 1 based on semanteme parsing, which is characterized in that at described According to first track, determines the part transcription text corresponding terminal on the template text, obtain First terminal point In step, comprising:
Mark the last one element in first track;
According to the last one element in first track, the character of the corresponding template text is marked, obtains First terminal point.
7. the text accuracy rate calculation method according to claim 6 based on semanteme parsing, which is characterized in that at described According to the initial point and the First terminal point of the template text, from the step of obtaining new template text in the template text, packet It includes:
The first character that the template text is marked in the template text is initial point;
The character between the initial point and the First terminal point of the template text is intercepted, wherein the initial point of the template text and institute State the corresponding character of initial point and the corresponding character of the First terminal point that the character between First terminal point includes the template text;
Text is generated according to the character being truncated to, obtains the new template text.
8. a kind of text accuracy rate computing device based on semanteme parsing, which is characterized in that described device includes:
First obtains module, for obtaining the part transcription text since the initial point of template text by transcription;
Establish module, for increased using the length of the template text character length of two characters as columns, with the part The length that the length of transcription text character increases by two characters is line number, establishes editing distance matrix;
First computing module, for calculating in the editing distance matrix according to the part transcription text, the template text The value of each element;
Generation module, for recording the calculating track of the value of each element in the editing distance matrix, generate with the editor away from Track matrix corresponding from matrix;
Screening module, for calculating the similarity of each track in the track matrix, screen the part transcription text with it is described The highest track of template text similarity obtains the first track;
Module is obtained, for determining that the part transcription text is corresponding on the template text according to first track Terminal obtains First terminal point;
Second acquisition module is obtained from the template text for the initial point and the First terminal point according to the template text Take new template text;
Second computing module passes through editing distance for comparing the part transcription text and the new template text Algorithm calculates the accuracy rate of the part transcription text.
9. a kind of computer equipment, including memory and processor, the memory are stored with computer program, feature exists In the processor realizes method described in any one of claims 1 to 7 when executing computer program the step of.
10. a kind of computer readable storage medium, is stored thereon with computer program, which is characterized in that the computer program The step of method described in any one of claims 1 to 7 is realized when being executed by processor.
CN201811348583.1A 2018-11-13 2018-11-13 Text accuracy rate calculation method and device based on semantic analysis and computer equipment Active CN109710904B (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201811348583.1A CN109710904B (en) 2018-11-13 2018-11-13 Text accuracy rate calculation method and device based on semantic analysis and computer equipment
PCT/CN2018/124398 WO2020098098A1 (en) 2018-11-13 2018-12-27 Semantic analysis-based text accuracy calculation method, device and computer device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201811348583.1A CN109710904B (en) 2018-11-13 2018-11-13 Text accuracy rate calculation method and device based on semantic analysis and computer equipment

Publications (2)

Publication Number Publication Date
CN109710904A true CN109710904A (en) 2019-05-03
CN109710904B CN109710904B (en) 2023-11-14

Family

ID=66254868

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201811348583.1A Active CN109710904B (en) 2018-11-13 2018-11-13 Text accuracy rate calculation method and device based on semantic analysis and computer equipment

Country Status (2)

Country Link
CN (1) CN109710904B (en)
WO (1) WO2020098098A1 (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2999768B1 (en) * 1999-03-04 2000-01-17 株式会社エイ・ティ・アール音声翻訳通信研究所 Speech recognition error correction device
CN104464736A (en) * 2014-12-15 2015-03-25 北京百度网讯科技有限公司 Error correction method and device for voice recognition text
US20170133008A1 (en) * 2015-11-05 2017-05-11 Le Holdings (Beijing) Co., Ltd. Method and apparatus for determining a recognition rate
CN106847288A (en) * 2017-02-17 2017-06-13 上海创米科技有限公司 The error correction method and device of speech recognition text

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8001136B1 (en) * 2007-07-10 2011-08-16 Google Inc. Longest-common-subsequence detection for common synonyms
CN103699591A (en) * 2013-12-11 2014-04-02 湖南大学 Page body extraction method based on sample page
CN108399163B (en) * 2018-03-21 2021-01-12 北京理工大学 Text similarity measurement method combining word aggregation and word combination semantic features

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2999768B1 (en) * 1999-03-04 2000-01-17 株式会社エイ・ティ・アール音声翻訳通信研究所 Speech recognition error correction device
CN104464736A (en) * 2014-12-15 2015-03-25 北京百度网讯科技有限公司 Error correction method and device for voice recognition text
US20170133008A1 (en) * 2015-11-05 2017-05-11 Le Holdings (Beijing) Co., Ltd. Method and apparatus for determining a recognition rate
CN106847288A (en) * 2017-02-17 2017-06-13 上海创米科技有限公司 The error correction method and device of speech recognition text

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
张均胜;石崇德;徐红姣;高影繁;何彦青;: "一种基于短文本相似度计算的主观题自动阅卷方法", 图书情报工作, no. 19, pages 31 - 37 *

Also Published As

Publication number Publication date
CN109710904B (en) 2023-11-14
WO2020098098A1 (en) 2020-05-22

Similar Documents

Publication Publication Date Title
CN109510737B (en) Protocol interface testing method and device, computer equipment and storage medium
CN109446514A (en) Construction method, device and the computer equipment of news property identification model
US3711863A (en) Source code comparator computer program
CN105653517A (en) Recognition rate determining method and apparatus
CN112651238A (en) Training corpus expansion method and device and intention recognition model training method and device
CN110188761A (en) Recognition methods, device, computer equipment and the storage medium of identifying code
CN109033058B (en) Contract text verification method, apparatus, computer device and storage medium
CN109783785B (en) Method and device for generating experiment detection report and computer equipment
CN105930159A (en) Image-based interface code generation method and system
CN107273032A (en) Information typesetting method, device and equipment and computer storage medium
CN109002768A (en) Medical bill class text extraction method based on the identification of neural network text detection
CN110413961A (en) The method, apparatus and computer equipment of text scoring are carried out based on disaggregated model
CN103488482A (en) Method and device for generating test cases
CN109933754A (en) Search method, apparatus, computer equipment and the storage medium of change to the contract part
CN110010121A (en) Verify method, apparatus, computer equipment and the storage medium of the art that should answer
US11907656B2 (en) Machine based expansion of contractions in text in digital media
CN111357015B (en) Text conversion method, apparatus, computer device, and computer-readable storage medium
CN113343677A (en) Intention identification method and device, electronic equipment and storage medium
CN108400980A (en) User ID authentication method, device, computer equipment and storage medium
CN106066881B (en) Data processing method and device
CN109657210A (en) Text accuracy rate calculation method, device, computer equipment based on semanteme parsing
CN116029080A (en) Chip storage device design and verification method and device and electronic equipment
CN106250755A (en) For generating the method and device of identifying code
CN105095826B (en) A kind of character recognition method and device
CN110399601B (en) Method and device for identifying document sequence, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant