CN104199805A - Text splicing method and device - Google Patents

Text splicing method and device Download PDF

Info

Publication number
CN104199805A
CN104199805A CN201410461259.6A CN201410461259A CN104199805A CN 104199805 A CN104199805 A CN 104199805A CN 201410461259 A CN201410461259 A CN 201410461259A CN 104199805 A CN104199805 A CN 104199805A
Authority
CN
China
Prior art keywords
text
row
string
spliced
identified
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410461259.6A
Other languages
Chinese (zh)
Other versions
CN104199805B (en
Inventor
李德斌
王巨宏
许勇
全琦
黄志斌
杨大威
谭志鹏
吴现
杨言
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tsinghua University
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University, Tencent Technology Shenzhen Co Ltd filed Critical Tsinghua University
Priority to CN201410461259.6A priority Critical patent/CN104199805B/en
Publication of CN104199805A publication Critical patent/CN104199805A/en
Application granted granted Critical
Publication of CN104199805B publication Critical patent/CN104199805B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The invention discloses a text splicing method and device. The text splicing method includes the steps that texts to be spliced are obtained; following operation is carried out on every two adjacent first text and second text in the texts to be spliced, and according to the splicing sequence of the first text and the second text, the first text is spliced before the second text; at least one row of identical text character strings is searched for in the first text and the second text, wherein the identical text character strings comprise the last row of text character strings in the first text and a first row of text character strings in the second text; if found, the identical text character strings are deleted from the first text or the second text, and the first text and the second text are spliced according to the splicing sequence after the identical text character strings are deleted. The technical problem that by the adoption of an existing text splicing method, continuity between the spliced texts is low is solved.

Description

Text joining method and device
Technical field
The present invention relates to computer realm, in particular to a kind of text joining method and device.
Background technology
At present, utilize optical character identification (OCR, Optical Character Recognition) when the text on paper document is identified, conventionally the mode adopting is to utilize camera to gather text message from above-mentioned paper document, recycling image registration and image fusion technology are spliced the multiple text images that collect from papery, to obtain a complete text image, again above-mentioned complete text image is carried out to OCR identification, thereby obtain the computing machine discernible full text information corresponding with text on above-mentioned paper document, wherein, in above-mentioned multiple text image, between each text image, may comprise the part of repetition.Wherein, image registration techniques refers to, will be not in the same time, different sensors (imaging device), different condition (position.Angle, illumination) multiple image that the obtains process of mating, superposeing.Image fusion technology refers to the view data about same target that multi-source channel is collected and passes through image processing etc., extracts to greatest extent the favourable information in each self-channel, the high-quality image of last comprehensive one-tenth.
But, while utilizing existing OCR recognition methods to carry out text identification, because the mode that adopts machine learning is identified, thereby the possibility of result that draws of identification can be subject to the impact of external environment, for example, illumination, angle difference while gathering text message, the text message that machine recognition goes out is compared from the urtext information on paper document also may be different, thereby cause utilizing the accuracy rate of OCR identification text greatly to decline.In addition, text image for some at the damaged text message in border, in existing text identification scheme, do not obtain good processing yet, thereby make the text message finally identifying cannot ensure the accuracy of identifying, and the continuity of splicing between text.
For above-mentioned problem, effective solution is not yet proposed at present.
Summary of the invention
The embodiment of the present invention provides a kind of text joining method and device, adopts the lower technical matters of continuity between the spliced text that caused of existing text joining method at least to solve.
According to the embodiment of the present invention aspect, a kind of text joining method is provided, comprising: obtain text to be spliced; Every two the first adjacent texts in above-mentioned text to be spliced and the second text are carried out to following operation, wherein, the splicing order of above-mentioned the first text and the second text was spliced before above-mentioned the second text for above-mentioned the first text: search above-mentioned the first text and at least one row text-string identical in above-mentioned the second text, wherein, above-mentioned at least one row text-string comprises last column text-string of above-mentioned the first text and the first row text-string of above-mentioned the second text; If find above-mentioned identical at least one row text-string, from above-mentioned the first text or above-mentioned the second text, delete above-mentioned at least one row text-string, and splice executing above-mentioned deletion above-mentioned the first text and above-mentioned the second text afterwards according to above-mentioned splicing order.
According to the embodiment of the present invention on the other hand, also provide a kind of text splicing apparatus, having comprised: acquiring unit, for obtaining text to be spliced; Concatenation unit, realize every two the first adjacent texts and the performed operation of the second text to above-mentioned text to be spliced for passing through with lower module, wherein, the splicing order of above-mentioned the first text and the second text was spliced before above-mentioned the second text for above-mentioned the first text: search module, for searching at least one row text-string that above-mentioned the first text is identical with above-mentioned the second text, wherein, above-mentioned at least one row text-string comprises last column text-string of above-mentioned the first text and the first row text-string of above-mentioned the second text; The first concatenation module, for in the time finding above-mentioned identical at least one row text-string, from above-mentioned the first text or above-mentioned the second text, delete above-mentioned at least one row text-string, and splice executing above-mentioned deletion above-mentioned the first text and above-mentioned the second text afterwards according to above-mentioned splicing order.
In embodiments of the present invention, by whether every two the first adjacent texts and the second String searching in the text to be spliced getting be there is to identical at least one row text-string, obtain the first text and at least one row text-string identical in the second text, it is deleted from the first text or the second text, then the text after deleting is spliced.Because available technology adopting text image directly splices, may there is the text-string of repetition in the text-string that OCR identifies, thereby cause the discontinuous problem of whole text, and, owing to there is the text-string repeating, also make the text-string identifying cannot ensure accuracy in the time of splicing.And by the embodiment of the present invention, make no longer to comprise in text to be spliced the text-string of repetition, and then reach the successional effect that improves text splicing.
In addition, in embodiments of the present invention, when to text identification, also can be filtered deletion to the incomplete text-string in text, thereby further ensure, in the time that text splices, not have the interference of incomplete text-string, improved the accuracy of text splicing.
Brief description of the drawings
Accompanying drawing described herein is used to provide a further understanding of the present invention, forms the application's a part, and schematic description and description of the present invention is used for explaining the present invention, does not form inappropriate limitation of the present invention.In the accompanying drawings:
Fig. 1 is according to the schematic flow sheet of a kind of optional text joining method of the embodiment of the present invention;
Fig. 2 is according to the schematic diagram of a kind of optional text splicing of the embodiment of the present invention;
Fig. 3 is according to the schematic diagram of the optional text splicing of the another kind of the embodiment of the present invention;
Fig. 4 is according to the schematic diagram of another optional text splicing of the embodiment of the present invention;
Fig. 5 is according to the schematic diagram of another optional text splicing of the embodiment of the present invention;
Fig. 6 is according to the schematic diagram of another optional text splicing of the embodiment of the present invention;
Fig. 7 is according to the schematic diagram of another optional text splicing of the embodiment of the present invention;
Fig. 8 is according to the schematic diagram of another optional text splicing of the embodiment of the present invention; And
Fig. 9 is according to the schematic diagram of a kind of optional text splicing apparatus of the embodiment of the present invention.
Embodiment
In order to make those skilled in the art person understand better the present invention program, below in conjunction with the accompanying drawing in the embodiment of the present invention, technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is only the embodiment of a part of the present invention, instead of whole embodiment.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtaining under creative work prerequisite, should belong to the scope of protection of the invention.
It should be noted that, term " first ", " second " etc. in instructions of the present invention and claims and above-mentioned accompanying drawing are for distinguishing similar object, and needn't be used for describing specific order or precedence.The data that should be appreciated that such use suitably can exchanged in situation, so as embodiments of the invention described herein can with except diagram here or describe those order enforcement.In addition, term " comprises " and " having " and their any distortion, intention is to cover not exclusive comprising, for example, those steps or unit that process, method, system, product or the equipment that has comprised series of steps or unit is not necessarily limited to clearly list, but can comprise clearly do not list or for these processes, method, product or equipment intrinsic other step or unit.
Embodiment 1
According to the embodiment of the present invention, a kind of text joining method is provided, as shown in Figure 1, the method comprises:
S102, obtains text to be spliced;
S104, carries out following operation to every two the first adjacent texts in text to be spliced and the second text, and wherein, the splicing order of the first text and the second text is that the first text spliced before the second text:
S1042, searches the first text and at least one row text-string identical in the second text, and wherein, at least one row text-string comprises last column text-string of the first text and the first row text-string of the second text;
S1044 if find identical at least one row text-string, deletes at least one row text-string from the first text or the second text, and according to splicing order, the first text and the second text that execute after deleting is spliced.
Alternatively, in the present embodiment, above-mentioned text joining method can be, but not limited to realize text splicing in terminal, wherein, above-mentioned terminal can include but not limited to following one of at least: mobile phone, notebook computer, panel computer, PC.Further, above-mentioned text joining method can be, but not limited to be applied in optical character identification (OCR, Optical Character Recognition) text splicing afterwards.For example, identified the image recognition of the text to be spliced by camera collection is become to text to be spliced by OCR, wherein, above-mentioned text to be spliced can include but not limited to one or more texts.For example, as shown in Figure 2, above-mentioned text T to be spliced can comprise the first text Text_1, the second text Text_2, the 3rd text Text_3, wherein, the splicing order of above-mentioned text to be spliced is before the first text Text_1 is positioned at the second text Text_2, before the second text Text_2 is positioned at the 3rd text Text_3, and there is the text-string that 3 row are identical between the first text Text_1 and the second text Text_2, by the text joining method providing in the present embodiment, can realize identical text-string is deleted, splice and obtain spliced text T' deleting text to be spliced after identical text-string, thereby realize the continuity that improves text splicing.Above-mentioned is a kind of example for example, and the present embodiment does not do any restriction to this.
Alternatively, in the present embodiment, above-mentioned first text of searching can include but not limited to the mode of at least one row text-string identical in the second text: the first text is comprised to last column comprises that at interior at least one row text-string and the second text the first row mates line by line at interior at least one row text-string, using the maximum number of lines matching result that coupling obtains line by line as the first text with at least one row text-string identical in the second text.Alternatively, in the present embodiment, above-mentioned at least one row text-string can include but not limited to: a line or two row or continuously multiline text character string.For example, in the time only having a line text-string identical, can directly delete above-mentioned identical a line text-string, to realize the seamless spliced of text to be spliced.Again for example, in the time that continuous multiline text character string is identical, need the multiline text character string of the line number maximum of repeatedly searching text-string identical between two texts, thereby realize and can completely obtain all identical text-strings, and then above-mentioned identical text-string is deleted, and can, because omitting to some extent, not cause the discontinuous problem of spliced text.
Alternatively, in the present embodiment, above-mentioned matching judgment can include but not limited to respectively, from last column text-string of the first text and the first row text character start of string of the second text, respectively the text-string in above-mentioned two texts to be carried out to string matching in the mode that successively increases progressively a line.For example, the first row text-string of last column text-string to the first text and the second text carries out matching judgment successively, the front two row text-strings of the last two row text-strings to the first text and the second text carry out matching judgment again, then the last three lines of text character string to the first text and the first three rows text-string of the second text carry out matching judgment, until travel through once a total less text of line number in above-mentioned the first text and the second text.
Alternatively, in the present embodiment, above-mentioned matching way can include but not limited to: single file coupling, multirow coupling.
Alternatively, in the present embodiment, the mode of above-mentioned single file coupling can include but not limited to: by comparing the editing distance of two row text-strings in two texts, judge that whether above-mentioned single file coupling is successful.Wherein, above-mentioned editing distance can include but not limited to: have a text-string to change into the required minimum editing operation number of times of another text-string, wherein, above-mentioned editing operation can include but not limited to that a character replacement becomes another character, inserts a character, deletes a character.Wherein, the above-mentioned single file coupling judgment mode whether success is mated can include but not limited to: in the time that the editing distance of two row text-strings is less than or equal to predetermined threshold, the match is successful to judge above-mentioned single file.For example, last column text-string of the first text is: android:name=" .gui.CodeActivity ", the first row text-string of the second text is: android:name=" .guu.CodeActivity12 ", the editing distance of two row text-strings of above-mentioned two texts is 3, wherein, a character " i " in " gui " replaces with " u ", end increases " 12 " two characters, thereby above-mentioned single file to mate the editing distance obtaining be 3, suppose that predefined threshold value is 5, when the editing distance 3 that can judge above-mentioned two row text-strings is less than predetermined threshold 5, the match is successful to judge above-mentioned single file.
Alternatively, in the present embodiment, the mode of above-mentioned multirow coupling can include but not limited to: using text-strings maximum coupling line number as final matched character string, and the namely identical text-string between two texts.Alternatively, in the present embodiment, the above-mentioned multirow coupling judgment mode whether success is mated can include but not limited to: the ratio of calculating the single file line number that the match is successful in the text-string of above-mentioned multirow coupling and account for total line number, in the time that aforementioned proportion is greater than predetermined threshold, the match is successful can to judge above-mentioned multirow coupling.
Alternatively, in the present embodiment, if find identical at least one row text-string, from the first text or the second text, delete at least one row text-string, and according to splicing order to execute the first text after deleting and the second text splice include but not limited to following one of at least:
1) from least one row text-string of the first text suppression, and the first text and the second text of having deleted at least one row text-string are spliced, the last column of wherein, having deleted the first text of at least one row text-string was spliced before the first row of the second text; Or
2) from least one row text-string of the second text suppression, and the first text and the second text of having deleted at least one row text-string are spliced, wherein, last column of the first text was spliced before having deleted the first row of the second text of at least one row text-string.
Specifically be described in conjunction with following example, suppose the first text with in the second text, find identical text-string: " XXXXXX; Yyyyyyyyy; zzz ", can select to delete above-mentioned identical text-string in the first text, before the last column of having deleted above-mentioned identical text-string in the first text is spliced to the first row of the second text, as shown in Figure 3.Also can select to delete above-mentioned identical text-string in the second text, before last column of the first text the first row of above-mentioned identical text-string that has been spliced to the second text suppression, as shown in Figure 4.
Alternatively, in the present embodiment, after searching the first text and whether having identical at least one row text-string in the second text, also comprise: if there is not identical at least one row text-string, according to splicing order, the first text and the second text are spliced, wherein, last column of the first text was spliced before the first row of the second text.
Specifically be described in conjunction with following example, suppose not find identical text-string at the first text with the second text, directly the first text and the second text are spliced according to splicing order, wherein, last column of the first text (for example, " zzz ") splicing for example, in the first row (, " 456789 ") of the second text before, as shown in Figure 5.
The embodiment providing by the application, judge whether to exist identical at least one row text-string by every two the first adjacent texts in the text to be spliced getting and the second String searching, obtain the first text and at least one row text-string identical in the second text, it is deleted from the first text or the second text, again the text after deleting is spliced, thereby realize in the time that text is spliced, no longer comprise the text-string of repetition, improve the continuity of spliced text, improved user's experience.
As the optional scheme of one, above-mentioned first text of searching comprises with at least one row text-string identical in the second text:
S1, repeats following steps, until N is greater than total less total line number of of line number in the first text and the second text, the initial value of N is 1:
S12, obtains last column that the first text comprises the first text and comprises the first row line number P that text-string is identical between the second text-string of interior N continuous row of the second text at the first text-string of interior N continuous row and the second text;
S14, storage P and corresponding N, and make N=N+1;
S2 obtains the P of value maximum from the P of storage maximal value, and obtain and P from the N of storage maximal valuecorresponding N target, and the last column that the first text is comprised to the first text is in interior N continuous targetthe first row that the first text-string of row and the second text comprise the second text is in interior N continuous targetthe first text that row conduct finds and at least one row text-string identical in the second text.
Specifically be described in conjunction with following example, as shown in Figure 6, suppose that the first text comprises 6 row text-strings, the second text comprises 7 row text-strings, wherein, above-mentioned two texts comprise the identical character string of continuous 4 row, in the time searching the first text with at least one row text-string identical in the second text, comprise the following steps:
S1, mates with the first row of the second text last column of the first text respectively, judges whether that the match is successful, if the match is successful, stores the line number P=1 of identical text-string, and the line number that participates in coupling is N=1;
S2, mates with front two row of the second text last two row of the first text respectively, judges whether that the match is successful, if the match is successful, stores the line number P=0 of identical text-string, and the line number that participates in coupling is N=2;
S3, mates with the first three rows of the second text last three row of the first text respectively, judges whether that the match is successful, if the match is successful, stores the line number P=0 of identical text-string, and the line number that participates in coupling is N=3;
S4, mates with the front four lines of the second text the last four lines of the first text respectively, judges whether that the match is successful, if the match is successful, stores the line number P=4 of identical text-string, and the line number that participates in coupling is N=4.
By that analogy, because total line number of the first text is less than total line number of the second text, thereby above-mentioned matching judgment will repeat until traveled through the first text that above-mentioned total line number is 6.
Draw by above-mentioned matching judgment, in the P of storage, maximal value is P maximal value=4, corresponding N target=4, the corresponding text-string of above-mentioned N=4 " 6,4,5,6 " is using as the first text finding and text-string identical in the second text.
The embodiment providing by the application, by the way the text-string in the first text and the second text is judged, to draw identical text-string in above-mentioned two texts, realize the accurate identification to identical text-string, and then improved the accuracy of text splicing.
As the optional scheme of one, above-mentioned storage P and corresponding N comprise:
S1, judges whether ratio value P/N is greater than predetermined ratio threshold value;
S2, if ratio value P/N is greater than predetermined ratio threshold value, stores P and corresponding N.
Alternatively, in the present embodiment, the above-mentioned multirow coupling judgment mode whether success is mated can include but not limited to: the ratio of calculating the single file line number that the match is successful in the text-string of above-mentioned multirow coupling and account for total line number, in the time that aforementioned proportion is greater than predetermined threshold, the match is successful can to judge above-mentioned multirow coupling.
Be described in conjunction with above example, as shown in Figure 6, suppose that the first text comprises 6 row text-strings, the second text comprises 7 row text-strings, describes, from the angle of multirow coupling in the time that last column of the first text is mated with the first row of the second text, obtain aforementioned proportion value P/N=1, suppose that predetermined threshold is 0.8, can judge aforementioned proportion value P/N and be greater than predetermined ratio threshold value, the match is successful; Further, in the time that last two row of the first text and front two row of the second text mate, obtain aforementioned proportion value P/N=0, can judge aforementioned proportion value P/N and be less than predetermined ratio threshold value, above-mentioned it fails to match; Moreover, in the time that last three row of the first text mate with the first three rows of the second text, obtain aforementioned proportion value P/N=0, can judge aforementioned proportion value P/N and be less than predetermined ratio threshold value, it is above-mentioned that it fails to match; Further, in the time that the last four lines of the first text and the front four lines of the second text are mated, obtain aforementioned proportion value P/N=1, can judge aforementioned proportion value P/N and be greater than predetermined ratio threshold value, the match is successful; Continue again coupling until traveled through.In all couplings, only has the above-mentioned situation that twice the match is successful, because the line number of the 4th coupling is maximum, using four lines text-strings maximum above-mentioned coupling line number as final matched character string, namely as text-string identical between the first text and the second text.
The embodiment providing by the application, judges that by proportion of utilization value whether above-mentioned multirow coupling is successful, has further ensured the accuracy of text matches, thereby can accurately delete identical text-string, reaches the effect of the accuracy that improves text splicing.
As the optional scheme of one, obtain text to be spliced and comprise:
S1, obtains one or more text images to be identified, wherein, and a text in the corresponding text to be spliced of each text image to be identified;
S2, carries out following identifying operation to each text image to be identified, obtains a text corresponding in text to be spliced and can be, but not limited to comprise following three kinds of modes:
As the optional embodiment of one, the first row in text image to be identified is identified:
S1, judges whether the first distance between the first row and the coboundary of text image to be identified in text image to be identified is less than or equal to the first distance threshold;
S2, if the first distance is less than or equal to the first distance threshold, carries out mark by the first row in text image to be identified;
S3, is identified as current text by text image to be identified, deletes the row that has carried out mark from current text, obtains a text corresponding in text to be spliced.
As the optional embodiment of another kind, the last column in text image to be identified is identified:
S1, judges whether the second distance between last column and the lower boundary of text image to be identified in text image to be identified is less than or equal to second distance threshold value;
S2, if second distance is less than or equal to second distance threshold value, carries out mark by the last column in text image to be identified;
S3, is identified as current text by text image to be identified, deletes the row that has carried out mark from current text, obtains a text corresponding in text to be spliced.
As another optional embodiment, the first row in text image to be identified and last column are identified simultaneously:
S1, judge that whether the first distance between the first row and the coboundary of text image to be identified in text image to be identified is less than or equal to the first distance threshold, and judge whether the second distance between last column and the lower boundary of text image to be identified in text image to be identified is less than or equal to second distance threshold value;
S2, if the first distance is less than or equal to the first distance threshold, carries out mark by the first row in text image to be identified; If second distance is less than or equal to second distance threshold value, the last column in text image to be identified is carried out to mark;
S3, is identified as current text by text image to be identified, deletes the row that has carried out mark from current text, obtains a text corresponding in text to be spliced.
Alternatively, text image to be identified being identified as to current text comprises: adopt OCR that text image to be identified is identified as to current text.
Alternatively, in the present embodiment, obtain text image to be identified by camera, recycling OCR identification, is identified as current text to be spliced by the above-mentioned text image getting.
But the above-mentioned text image to be identified getting may exist text incompleteness, as shown in Figure 7,, in the time that text image is identified, also need to filter above-mentioned incomplete text, to obtain text to be spliced.
Alternatively, in the present embodiment, the incomplete text-string that needs to delete can include but not limited to following one of at least: first between the first row in text image to be identified and the coboundary of text image to be identified is apart from the text-string that is less than or equal to the last column in text-string, the text image to be identified of the first distance threshold and the second distance between the lower boundary of text image to be identified and is less than or equal to second distance threshold value.Wherein, above-mentioned the first distance threshold can include but not limited to: the width of white space between the last column between the first row in above-mentioned text image to be identified and the coboundary of text image to be identified in the width of white space, above-mentioned text image to be identified and the lower boundary of text image to be identified.For example, as shown in Figure 8, between the last column in above-mentioned text image to be identified and the lower boundary of text image to be identified, the width h of white space will be served as second distance threshold value.
Alternatively, in the present embodiment, above-mentioned the first distance threshold can be, but not limited to identical according to different application scenarios values from second distance threshold value or value is different.For example, above-mentioned threshold value can be, but not limited to as 1/10th of above-mentioned white space, thereby ensures, in the time deleting incomplete text-string, can make other text-strings unaffected, thereby has ensured the accuracy of text splicing.
The embodiment providing by the application, obtains the incomplete text-string in text by the way, thereby realize, the incomplete text-string in above-mentioned text is accurately deleted, and then is reached the accuracy that improves text identification and text splicing.
It should be noted that, for aforesaid each embodiment of the method, for simple description, therefore it is all expressed as to a series of combination of actions, but those skilled in the art should know, the present invention is not subject to the restriction of described sequence of movement, because according to the present invention, some step can adopt other orders or carry out simultaneously.Secondly, those skilled in the art also should know, the embodiment described in instructions all belongs to preferred embodiment, and related action and module might not be that the present invention is necessary.
Through the above description of the embodiments, those skilled in the art can be well understood to the mode that can add essential general hardware platform by software according to the method for above-described embodiment and realize, can certainly pass through hardware, but in a lot of situation, the former is better embodiment.Based on such understanding, the part that technical scheme of the present invention contributes to prior art in essence in other words can embody with the form of software product, this computer software product is stored in a storage medium (as ROM/RAM, magnetic disc, CD), comprise that some instructions (can be mobile phones in order to make a station terminal equipment, computing machine, server, or the network equipment etc.) carry out the method described in each embodiment of the present invention.
Embodiment 2
According to the embodiment of the present invention, a kind of text splicing apparatus is also provided, as shown in Figure 9, this device comprises:
1) acquiring unit 902, for obtaining text to be spliced;
2) concatenation unit 904, realize every two the first adjacent texts and the performed operation of the second text to text to be spliced for passing through with lower module, wherein, the splicing of the first text and the second text order is that the first text spliced before the second text:
(1) search module 9042, for searching at least one row text-string that the first text is identical with the second text, wherein, at least one row text-string comprises last column text-string of the first text and the first row text-string of the second text;
(2) first concatenation module 9044, for in the time finding identical at least one row text-string, from the first text or the second text, delete at least one row text-string, and according to splicing order, the first text and the second text that execute after deleting are spliced.
Alternatively, in the present embodiment, above-mentioned text joining method can be, but not limited to realize text splicing in terminal, wherein, above-mentioned terminal can include but not limited to following one of at least: mobile phone, notebook computer, panel computer, PC.Further, above-mentioned text joining method can be, but not limited to be applied in optical character identification (OCR, Optical Character Recognition) text splicing afterwards.For example, identified the image recognition of the text to be spliced by camera collection is become to text to be spliced by OCR, wherein, above-mentioned text to be spliced can include but not limited to one or more texts.For example, as shown in Figure 2, above-mentioned text T to be spliced can comprise the first text Text_1, the second text Text_2, the 3rd text Text_3, wherein, the splicing order of above-mentioned text to be spliced is before the first text Text_1 is positioned at the second text Text_2, before the second text Text_2 is positioned at the 3rd text Text_3, and there is the text-string that 3 row are identical between the first text Text_1 and the second text Text_2, by the text joining method providing in the present embodiment, can realize identical text-string is deleted, splice and obtain spliced text T' deleting text to be spliced after identical text-string, thereby realize the continuity that improves text splicing.Above-mentioned is a kind of example for example, and the present embodiment does not do any restriction to this.
Alternatively, in the present embodiment, above-mentioned first text of searching can include but not limited to the mode of at least one row text-string identical in the second text: the first text is comprised to last column comprises that at interior at least one row text-string and the second text the first row mates line by line at interior at least one row text-string, using the maximum number of lines matching result that coupling obtains line by line as the first text with at least one row text-string identical in the second text.Alternatively, in the present embodiment, above-mentioned at least one row text-string can include but not limited to: a line or continuously multiline text character string.For example, in the time only having a line text-string identical, can directly delete above-mentioned identical a line text-string, to realize the seamless spliced of text to be spliced.Again for example, in the time that continuous multiline text character string is identical, need the multiline text character string of the line number maximum of repeatedly searching text-string identical between two texts, thereby realize and can completely obtain all identical text-strings, and then above-mentioned identical text-string is deleted, and can, because omitting to some extent, not cause the discontinuous problem of spliced text.
Alternatively, in the present embodiment, above-mentioned matching judgment can include but not limited to respectively, from last column text-string of the first text and the first row text character start of string of the second text, respectively the text-string in above-mentioned two texts to be carried out to string matching in the mode that successively increases progressively a line.For example, the first row text-string of last column text-string to the first text and the second text carries out matching judgment successively, the front two row text-strings of the last two row text-strings to the first text and the second text carry out matching judgment again, then the last three lines of text character string to the first text and the first three rows text-string of the second text carry out matching judgment, until a total less text of line number in above-mentioned first text of traversal and the second text.
Alternatively, in the present embodiment, above-mentioned matching way can include but not limited to: single file coupling, multirow coupling.
Alternatively, in the present embodiment, the mode of above-mentioned single file coupling can include but not limited to: by comparing the editing distance of two row text-strings in two texts, judge that whether above-mentioned single file coupling is successful.Wherein, above-mentioned editing distance can include but not limited to: have a text-string to change into the required minimum editing operation number of times of another text-string, wherein, above-mentioned editing operation can include but not limited to that a character replacement becomes another character, inserts a character, deletes a character.Wherein, the above-mentioned single file coupling judgment mode whether success is mated can include but not limited to: in the time that the editing distance of two row text-strings is less than or equal to predetermined threshold, the match is successful to judge above-mentioned single file.For example, last column text-string of the first text is: android:name=" .gui.CodeActivity ", the first row text-string of the second text is: android:name=" .guu.CodeActivity12 ", the editing distance of two row text-strings of above-mentioned two texts is 3, wherein, a character " i " in " gui " replaces with " u ", end increases " 12 " two characters, thereby above-mentioned single file to mate the editing distance obtaining be 3, suppose that predefined threshold value is 5, when the editing distance 3 that can judge above-mentioned two row text-strings is less than predetermined threshold 5, the match is successful to judge above-mentioned single file.
Alternatively, in the present embodiment, the mode of above-mentioned multirow coupling can include but not limited to: using text-strings maximum coupling line number as final matched character string, and the namely identical text-string between two texts.Alternatively, in the present embodiment, the above-mentioned multirow coupling judgment mode whether success is mated can include but not limited to: the ratio of calculating the single file line number that the match is successful in the text-string of above-mentioned multirow coupling and account for total line number, in the time that aforementioned proportion is greater than predetermined threshold, the match is successful can to judge above-mentioned multirow coupling.
The embodiment providing by the application, judge whether to exist identical at least one row text-string by every two the first adjacent texts in the text to be spliced getting and the second String searching, obtain the first text and at least one row text-string identical in the second text, it is deleted from the first text or the second text, again the text after deleting is spliced, thereby realize in the time that text is spliced, no longer comprise the text-string of repetition, improve the continuity of spliced text, improved user's experience.
As the optional scheme of one, search module and comprise:
1) process submodule, for repeating following steps, until N is greater than total less total line number of of line number in the first text and the second text, the initial value of N is 1:
(1) obtain last column that the first text comprises the first text and comprise the first row line number P that text-string is identical between the second text-string of interior N continuous row of the second text at the first text-string of interior N continuous row and the second text;
(2) storage P and corresponding N, and make N=N+1;
2) determine submodule, for obtain the P of value maximum from the P of storage maximal value, and obtain and P from the N of storage maximal valuecorresponding N target, and the last column that the first text is comprised to the first text is in interior N continuous targetthe first row that the first text-string of row and the second text comprise the second text is in interior N continuous targetthe first text that row conduct finds and at least one row text-string identical in the second text.
Specifically be described in conjunction with following example, as shown in Figure 6, suppose that the first text comprises 6 row text-strings, the second text comprises 7 row text-strings, wherein, above-mentioned two texts comprise the identical character string of continuous 4 row, in the time searching the first text with at least one row text-string identical in the second text, comprise the following steps:
S1, mates with the first row of the second text last column of the first text respectively, judges whether that the match is successful, if the match is successful, stores the line number P=1 of identical text-string, and the line number that participates in coupling is N=1;
S2, mates with front two row of the second text last two row of the first text respectively, judges whether that the match is successful, if the match is successful, stores the line number P=0 of identical text-string, and the line number that participates in coupling is N=2;
S3, mates with the first three rows of the second text last three row of the first text respectively, judges whether that the match is successful, if the match is successful, stores the line number P=0 of identical text-string, and the line number that participates in coupling is N=3;
S4, mates with the front four lines of the second text the last four lines of the first text respectively, judges whether that the match is successful, if the match is successful, stores the line number P=4 of identical text-string, and the line number that participates in coupling is N=4.
By that analogy, because total line number of the first text is less than total line number of the second text, thereby above-mentioned matching judgment will repeat until traveled through the first text that above-mentioned total line number is 6.
Draw by above-mentioned matching judgment, in the P of storage, maximal value is P maximal value=4, corresponding N target=4, the corresponding text-string of above-mentioned N=4 " 6,4,5,6 " is using as the first text finding and text-string identical in the second text.
The embodiment providing by the application, by the way the text-string in the first text and the second text is judged, to draw identical text-string in above-mentioned two texts, realize the accurate identification to identical text-string, and then improved the accuracy of text splicing.
As the optional scheme of one, processing submodule realizes storage P by following steps and corresponding N comprises:
S1, judges whether ratio value P/N is greater than predetermined ratio threshold value;
S2, if ratio value P/N is greater than predetermined ratio threshold value, stores P and corresponding N.
Alternatively, in the present embodiment, the above-mentioned multirow coupling judgment mode whether success is mated can include but not limited to: the ratio of calculating the single file line number that the match is successful in the text-string of above-mentioned multirow coupling and account for total line number, in the time that aforementioned proportion is greater than predetermined threshold, the match is successful can to judge above-mentioned multirow coupling.
Be described in conjunction with above example, as shown in Figure 6, suppose that the first text comprises 6 row text-strings, the second text comprises 7 row text-strings, describes, from the angle of multirow coupling in the time that last column of the first text is mated with the first row of the second text, obtain aforementioned proportion value P/N=1, suppose that predetermined threshold is 0.8, can judge aforementioned proportion value P/N and be greater than predetermined ratio threshold value, the match is successful; Further, in the time that last two row of the first text and front two row of the second text mate, obtain aforementioned proportion value P/N=0, can judge aforementioned proportion value P/N and be less than predetermined ratio threshold value, above-mentioned it fails to match; Moreover, in the time that last three row of the first text mate with the first three rows of the second text, obtain aforementioned proportion value P/N=0, can judge aforementioned proportion value P/N and be less than predetermined ratio threshold value, it is above-mentioned that it fails to match; Further, in the time that the last four lines of the first text and the front four lines of the second text are mated, obtain aforementioned proportion value P/N=1, can judge aforementioned proportion value P/N and be greater than predetermined ratio threshold value, the match is successful; Continue again coupling until traveled through.In all couplings, only has the above-mentioned situation that twice the match is successful, because the line number of the 4th coupling is maximum, using four lines text-strings maximum above-mentioned coupling line number as final matched character string, namely as text-string identical between the first text and the second text.
The embodiment providing by the application, judges that by proportion of utilization value whether above-mentioned multirow coupling is successful, has further ensured the accuracy of text matches, thereby can accurately delete identical text-string, reaches the effect of the accuracy that improves text splicing.
As the optional scheme of one, the first concatenation module comprises:
1) first delete submodule, be used for from least one row text-string of the first text suppression, and the first text and the second text of having deleted at least one row text-string are spliced, the last column of wherein, having deleted the first text of at least one row text-string was spliced before the first row of the second text; Or
2) second delete submodule, be used for from least one row text-string of the second text suppression, and the first text and the second text of having deleted at least one row text-string are spliced, wherein, last column of the first text was spliced before having deleted the first row of the second text of at least one row text-string.
Specifically be described in conjunction with following example, suppose the first text with in the second text, find identical text-string: " XXXXXX; Yyyyyyyyy; zzz ", can select to delete above-mentioned identical text-string in the first text, before the last column of having deleted above-mentioned identical text-string in the first text is spliced to the first row of the second text, as shown in Figure 3.Also can select to delete above-mentioned identical text-string in the second text, before last column of the first text the first row of above-mentioned identical text-string that has been spliced to the second text suppression, as shown in Figure 4.
The embodiment providing by the application, delete from the first text or the second text by the identical at least one row text-string finding, again the text after deleting is spliced, thereby realize in the time that text is spliced, no longer comprise the text-string of repetition, improved the continuity of spliced text.
As the optional scheme of one, said apparatus also comprises:
1) the second concatenation module, for searching after whether described the first text exist identical at least one row text-string with described the second text, in the time not there is not described identical at least one row text-string, according to described splicing order, described the first text and described the second text are spliced, wherein, last column of described the first text was spliced before the first row of described the second text.
Specifically be described in conjunction with following example, suppose not find identical text-string at the first text with the second text, directly the first text and the second text are spliced according to splicing order, wherein, last column of the first text (for example, " zzz ") splicing for example, in the first row (, " 456789 ") of the second text before, as shown in Figure 5.
The embodiment providing by the application, by the text that does not find identical at least one row text-string is directly spliced, thereby realizes in the time that text is spliced, and can improve the continuity of text splicing.
As the optional scheme of one, above-mentioned acquiring unit 902 comprises:
1) the first acquisition module, for obtaining one or more text images to be identified, wherein, a text in the corresponding text to be spliced of each text image to be identified;
2) the first identification module, realizes with lower module the identifying operation that each text image to be identified is carried out for passing through, and obtains a text corresponding in text to be spliced:
(1) first judges submodule, for judging whether the distance of first between the first row of text image to be identified and the coboundary of text image to be identified is less than or equal to the first distance threshold;
(2) first mark submodules, in the time that the first distance is less than or equal to the first distance threshold, carry out mark by the first row in text image to be identified;
(3) first recognin modules for text image to be identified is identified as to current text, are deleted the row that has carried out mark from current text, obtain a text corresponding in text to be spliced.
As the optional scheme of one, above-mentioned acquiring unit 902 comprises:
1) the second acquisition module, for obtaining one or more text images to be identified, wherein, a text in the corresponding text to be spliced of each text image to be identified;
2) the second identification module, realizes with lower module the identifying operation that each text image to be identified is carried out for passing through, and obtains a text corresponding in text to be spliced:
(1) second judges submodule, for judging whether the second distance between last column of text image to be identified and the lower boundary of text image to be identified is less than or equal to second distance threshold value;
(2) second mark submodules, when being less than or equal to second distance threshold value at second distance, carry out mark by the last column in text image to be identified;
(3) second recognin modules for text image to be identified is identified as to current text, are deleted the row that has carried out mark from current text, obtain a text corresponding in text to be spliced.
Alternatively, in the present embodiment, recognin module is realized text image to be identified is identified as to current text by following steps: adopt OCR that text image to be identified is identified as to current text.
Alternatively, in the present embodiment, obtain text image to be identified by camera, recycling OCR identification, is identified as current text to be spliced by the above-mentioned text image getting.
But the above-mentioned text image to be identified getting may exist text incompleteness, as shown in Figure 7,, in the time that text image is identified, also need to filter above-mentioned incomplete text, to obtain text to be spliced.
Alternatively, in the present embodiment, the incomplete text-string that needs to delete can include but not limited to following one of at least: first between the first row in text image to be identified and the coboundary of text image to be identified is apart from the text-string that is less than or equal to the last column in text-string, the text image to be identified of the first distance threshold and the second distance between the lower boundary of text image to be identified and is less than or equal to second distance threshold value.Wherein, above-mentioned the first distance threshold can include but not limited to: the width of white space between the last column between the first row in above-mentioned text image to be identified and the coboundary of text image to be identified in the width of white space, above-mentioned text image to be identified and the lower boundary of text image to be identified.For example, as shown in Figure 8, between the last column in above-mentioned text image to be identified and the lower boundary of text image to be identified, the width h of white space will be served as second distance threshold value.
Alternatively, in the present embodiment, above-mentioned the first distance threshold can be, but not limited to identical according to different application scenarios values from second distance threshold value or value is different.For example, above-mentioned threshold value can be, but not limited to as 1/10th of above-mentioned white space, thereby ensures, in the time deleting incomplete text-string, can make other text-strings unaffected, thereby has ensured the accuracy of text splicing.
The embodiment providing by the application, obtains the incomplete text-string in text by the way, thereby realize, the incomplete text-string in above-mentioned text is accurately deleted, and then is reached the accuracy that improves text identification and text splicing.
The invention described above embodiment sequence number, just to describing, does not represent the quality of embodiment.
Embodiment 3
According to the embodiment of the present invention, also provide a kind of for implementing the terminal of above-mentioned text joining method, this terminal comprises:
1) storer, is set to the text to be spliced that storage gets, and from text to be spliced, deletes the final goal text having spliced according to splicing order after identical at least one row text-string;
2) processor, is set to every two the first adjacent texts in text to be spliced and the second text to carry out following operation, and wherein, the splicing order of the first text and the second text is that the first text spliced before the second text:
S1, searches the first text and at least one row text-string identical in the second text, and wherein, at least one row text-string comprises last column text-string of the first text and the first row text-string of the second text;
S2 if find identical at least one row text-string, deletes at least one row text-string from the first text or the second text, and according to splicing order, the first text and the second text that execute after deleting is spliced.
Alternatively, in the present embodiment, above-mentioned storer can also be used for storing other data of storing in the text joining method process of above-described embodiment 1.
Alternatively, the concrete example in the present embodiment can be with reference to the example described in above-described embodiment 1 and embodiment 2, and the present embodiment does not repeat them here.
Embodiment 4
It is a kind of for implementing the storage medium of text joining method that embodiments of the invention also provide.Alternatively, in the present embodiment, above-mentioned storage medium can be, but not limited to be applied to optical character identification (OCR, Optical Character Recognition) in a terminal in text splicing afterwards, wherein, above-mentioned terminal can include but not limited to following one of at least: mobile phone, notebook computer, panel computer, PC.Above-mentioned is a kind of example for example, and the present embodiment does not do any restriction to this.
Alternatively, in the present embodiment, storage medium is set to storage for carrying out the program code of following steps:
S1, obtains text to be spliced;
S2, carries out following operation to every two the first adjacent texts in text to be spliced and the second text, and wherein, the splicing order of the first text and the second text is that the first text spliced before the second text:
S22, searches the first text and at least one row text-string identical in the second text, and wherein, at least one row text-string comprises last column text-string of the first text and the first row text-string of the second text;
S24 if find identical at least one row text-string, deletes at least one row text-string from the first text or the second text, and according to splicing order, the first text and the second text that execute after deleting is spliced.
Alternatively, in the present embodiment, above-mentioned storage medium can include but not limited to: USB flash disk, ROM (read-only memory) (ROM, Read-Only Memory), the various media that can be program code stored such as random access memory (RAM, Random Access Memory), portable hard drive, magnetic disc or CD.
Alternatively, the concrete example in the present embodiment can be with reference to the example described in above-described embodiment 1 and embodiment 2, and the present embodiment does not repeat them here.
The invention described above embodiment sequence number, just to describing, does not represent the quality of embodiment.
If the integrated unit in above-described embodiment is realized and during as production marketing independently or use, can be stored in the storage medium of above-mentioned embodied on computer readable using the form of SFU software functional unit.Based on such understanding, the all or part of of the part that technical scheme of the present invention contributes to prior art in essence in other words or this technical scheme can embody with the form of software product, this computer software product is stored in storage medium, comprises that some instructions are in order to make one or more computer equipment (can be personal computer, server or the network equipment etc.) carry out all or part of step of method described in the present invention each embodiment.
In the above embodiment of the present invention, the description of each embodiment is all emphasized particularly on different fields, in certain embodiment, there is no the part of detailed description, can be referring to the associated description of other embodiment.
In the several embodiment that provide in the application, should be understood that disclosed client can realize by another way.Wherein, device embodiment described above is only schematic, the division of for example described unit, be only that a kind of logic function is divided, when actual realization, can there is other dividing mode, for example multiple unit or assembly can in conjunction with or can be integrated into another system, or some features can ignore, or do not carry out.Another point, shown or discussed coupling each other or direct-coupling or communication connection can be by some interfaces, the indirect coupling of unit or module or communication connection can be electrical or other form.
The described unit as separating component explanation can or can not be also physically to separate, and the parts that show as unit can be or can not be also physical locations, can be positioned at a place, or also can be distributed in multiple network element.Can select according to the actual needs some or all of unit wherein to realize the object of the present embodiment scheme.
In addition, the each functional unit in each embodiment of the present invention can be integrated in a processing unit, can be also that the independent physics of unit exists, and also can be integrated in a unit two or more unit.Above-mentioned integrated unit both can adopt the form of hardware to realize, and also can adopt the form of SFU software functional unit to realize.
The above is only the preferred embodiment of the present invention; it should be pointed out that for those skilled in the art, under the premise without departing from the principles of the invention; can also make some improvements and modifications, these improvements and modifications also should be considered as protection scope of the present invention.

Claims (16)

1. a text joining method, is characterized in that, comprising:
Obtain text to be spliced;
Every two the first adjacent texts in described text to be spliced and the second text are carried out to following operation, and wherein, the splicing order of described the first text and the second text is that described the first text spliced before described the second text:
Search described the first text and at least one row text-string identical in described the second text, wherein, described at least one row text-string comprises last column text-string of described the first text and the first row text-string of described the second text;
If find described identical at least one row text-string, from described the first text or described the second text, delete described at least one row text-string, and splice executing described deletion described the first text and described the second text afterwards according to described splicing order.
2. method according to claim 1, is characterized in that, searches described the first text comprise with at least one row text-string identical in described the second text by following steps described in realizing:
Described the first text is comprised to described last column comprises that at interior at least one row text-string and described the second text described the first row mates line by line at interior at least one row text-string;
Using the described maximum number of lines matching result that obtains of coupling line by line as described the first text with described at least one row text-string identical in described the second text.
3. method according to claim 2, is characterized in that, described the described maximum number of lines matching result that obtains of coupling is line by line comprised with described at least one row text-string identical in described the second text as described the first text:
Repeat following steps, until N is greater than total less total line number of of line number in described the first text and described the second text, the initial value of N is 1:
Obtain last column that described the first text comprises described the first text and comprise the first row line number P that text-string is identical between the second capable text-string of interior continuous described N of described the second text at the first text-string of interior N continuous row and described the second text;
Store described P and corresponding described N, and make N=N+1;
From the described P of storage, obtain the P of value maximum maximal value, and obtain and described P from the described N of storage maximal valuecorresponding N target, and the last column that described the first text is comprised to described the first text is at interior continuous described N targetthe first row that the first text-string of row and described the second text comprise described the second text is at interior continuous described N targetdescribed the first text that row conduct finds and described at least one row text-string identical in described the second text.
4. method according to claim 3, is characterized in that, the described P of described storage and corresponding described N comprise:
Judge whether ratio value P/N is greater than predetermined ratio threshold value;
If described ratio value P/N is greater than described predetermined ratio threshold value, store described P and corresponding described N.
5. method according to claim 1, it is characterized in that, from described the first text or described the second text, delete described at least one row text-string, and splice and comprise executing described the first text after described deletion and described the second text according to described splicing order:
From at least one row text-string described in described the first text suppression, and described the first text and described the second text of having deleted described at least one row text-string are spliced, the last column of wherein, having deleted described first text of described at least one row text-string was spliced before the first row of described the second text; Or
From at least one row text-string described in described the second text suppression, and described the first text and described the second text of having deleted described at least one row text-string are spliced, wherein, last column of described the first text was spliced before having deleted the first row of described the second text of described at least one row text-string.
6. method according to claim 1, is characterized in that, after searching described the first text and whether having identical at least one row text-string in described the second text, also comprises:
If there is not described identical at least one row text-string, according to described splicing order, described the first text and described the second text are spliced, wherein, last column of described the first text was spliced before the first row of described the second text.
7. method according to claim 1, is characterized in that, described in obtain text to be spliced and comprise:
Obtain one or more text images to be identified, wherein, a text in the corresponding described text to be spliced of each described text image to be identified;
Each described text image to be identified is carried out to following identifying operation, obtains a text corresponding in described text to be spliced:
Judge whether the first distance between the first row and the coboundary of described text image to be identified in described text image to be identified is less than or equal to the first distance threshold;
If described the first distance is less than or equal to described the first distance threshold, the described the first row in described text image to be identified is carried out to mark;
Described text image to be identified is identified as to current text, from described current text, deletes the row that has carried out described mark, obtain a described text corresponding in described text to be spliced.
8. method according to claim 1, is characterized in that, described in obtain text to be spliced and comprise:
Obtain one or more text images to be identified, wherein, a text in the corresponding described text to be spliced of each described text image to be identified;
Each described text image to be identified is carried out to following identifying operation, obtains a text corresponding in described text to be spliced:
Judge whether the second distance between last column and the lower boundary of described text image to be identified in described text image to be identified is less than or equal to second distance threshold value;
If described second distance is less than or equal to described second distance threshold value, the described last column in described text image to be identified is carried out to mark;
Described text image to be identified is identified as to current text, from described current text, deletes the row that has carried out described mark, obtain a described text corresponding in described text to be spliced.
9. a text splicing apparatus, is characterized in that, comprising:
Acquiring unit, for obtaining text to be spliced;
Concatenation unit, realize every two the first adjacent texts and the performed operation of the second text to described text to be spliced for passing through with lower module, wherein, the splicing of described the first text and the second text order is that described the first text spliced before described the second text:
Search module, for searching at least one row text-string that described the first text is identical with described the second text, wherein, described at least one row text-string comprises last column text-string of described the first text and the first row text-string of described the second text;
The first concatenation module, for in the time finding described identical at least one row text-string, from described the first text or described the second text, delete described at least one row text-string, and splice executing described deletion described the first text and described the second text afterwards according to described splicing order.
10. device according to claim 9, is characterized in that, described in search module and search described the first text described in realizing by following steps and comprise with at least one row text-string identical in described the second text:
Described the first text is comprised to described last column comprises that at interior at least one row text-string and described the second text described the first row mates line by line at interior at least one row text-string;
Using the described maximum number of lines matching result that obtains of coupling line by line as described the first text with described at least one row text-string identical in described the second text.
11. devices according to claim 10, it is characterized in that, described in search module by realizing using lower module the described maximum number of lines matching result that obtains of coupling line by line as described the first text and described at least one row text-string identical in described the second text:
Process submodule, for repeating following steps, until N is greater than total less total line number of of line number in described the first text and described the second text, the initial value of N is 1:
Obtain last column that described the first text comprises described the first text and comprise the first row line number P that text-string is identical between the second capable text-string of interior continuous described N of described the second text at the first text-string of interior N continuous row and described the second text;
Store described P and corresponding described N, and make N=N+1;
Determine submodule, for obtain the P of value maximum from the described P of storage maximal value, and obtain and described P from the described N of storage maximal valuecorresponding N target, and the last column that described the first text is comprised to described the first text is at interior continuous described N targetthe first row that the first text-string of row and described the second text comprise described the second text is at interior continuous described N targetdescribed the first text that row conduct finds and described at least one row text-string identical in described the second text.
12. devices according to claim 11, is characterized in that, described processing submodule realizes the described P of described storage by following steps and corresponding described N comprises:
Judge whether ratio value P/N is greater than predetermined ratio threshold value;
If described ratio value P/N is greater than described predetermined ratio threshold value, store described P and corresponding described N.
13. devices according to claim 9, is characterized in that, described the first concatenation module comprises:
First deletes submodule, be used for from least one row text-string described in described the first text suppression, and described the first text and described the second text of having deleted described at least one row text-string are spliced, the last column of wherein, having deleted described first text of described at least one row text-string was spliced before the first row of described the second text; Or
Second deletes submodule, be used for from least one row text-string described in described the second text suppression, and described the first text and described the second text of having deleted described at least one row text-string are spliced, wherein, last column of described the first text was spliced before having deleted the first row of described the second text of described at least one row text-string.
14. devices according to claim 9, is characterized in that, also comprise:
The second concatenation module, for searching after whether described the first text exist identical at least one row text-string with described the second text, in the time not there is not described identical at least one row text-string, according to described splicing order, described the first text and described the second text are spliced, wherein, last column of described the first text was spliced before the first row of described the second text.
15. devices according to claim 9, is characterized in that, described acquiring unit comprises:
The first acquisition module, for obtaining one or more text images to be identified, wherein, a text in the corresponding described text to be spliced of each described text image to be identified;
The second identification module, realizes with lower module the identifying operation that each described text image to be identified is carried out for passing through, and obtains a text corresponding in described text to be spliced:
First judges submodule, for judging whether the distance of first between the first row of described text image to be identified and the coboundary of described text image to be identified is less than or equal to the first distance threshold;
The first mark submodule, in the time that described the first distance is less than or equal to described the first distance threshold, carries out mark by the described the first row in described text image to be identified;
The first recognin module for described text image to be identified is identified as to current text, is deleted the row that has carried out described mark from described current text, obtains a described text corresponding in described text to be spliced.
16. devices according to claim 9, is characterized in that, described acquiring unit comprises:
The second acquisition module, for obtaining one or more text images to be identified, wherein, a text in the corresponding described text to be spliced of each described text image to be identified;
The second identification module, for each described text image to be identified is carried out to following identifying operation, obtains a text corresponding in described text to be spliced:
Second judges submodule, for judging whether the second distance between last column of described text image to be identified and the lower boundary of described text image to be identified is less than or equal to second distance threshold value;
The second mark submodule, in the time that described second distance is less than or equal to described second distance threshold value, carries out mark by the described last column in described text image to be identified;
The second recognin module for described text image to be identified is identified as to current text, is deleted the row that has carried out described mark from described current text, obtains a described text corresponding in described text to be spliced.
CN201410461259.6A 2014-09-11 2014-09-11 Text joining method and device Active CN104199805B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410461259.6A CN104199805B (en) 2014-09-11 2014-09-11 Text joining method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410461259.6A CN104199805B (en) 2014-09-11 2014-09-11 Text joining method and device

Publications (2)

Publication Number Publication Date
CN104199805A true CN104199805A (en) 2014-12-10
CN104199805B CN104199805B (en) 2017-10-20

Family

ID=52085100

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410461259.6A Active CN104199805B (en) 2014-09-11 2014-09-11 Text joining method and device

Country Status (1)

Country Link
CN (1) CN104199805B (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105761211A (en) * 2016-03-30 2016-07-13 努比亚技术有限公司 Method and device for splicing frames of mobile terminal
CN106162298A (en) * 2015-03-27 2016-11-23 天脉聚源(北京)科技有限公司 A kind of method and system realizing barrage
CN106503634A (en) * 2016-10-11 2017-03-15 讯飞智元信息科技有限公司 A kind of image alignment method and device
CN110427891A (en) * 2019-08-05 2019-11-08 中国工商银行股份有限公司 The method, apparatus, system and medium of contract for identification
CN110852084A (en) * 2018-07-27 2020-02-28 杭州海康威视数字技术股份有限公司 Text generation method, device and equipment
CN111783645A (en) * 2020-06-30 2020-10-16 北京百度网讯科技有限公司 Character recognition method and device, electronic equipment and computer readable storage medium

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101393646B (en) * 2007-09-21 2011-09-21 北大方正集团有限公司 Method for implementing splice of Uygur artistic word
CN100555275C (en) * 2007-09-25 2009-10-28 北大方正集团有限公司 A kind of imposition method and device
CN101645096A (en) * 2008-08-07 2010-02-10 北京大学 Imposition method
CN101876967B (en) * 2010-03-25 2012-05-02 深圳市万兴软件有限公司 Method for generating PDF text paragraphs

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
高鸿: "文档图像拼接技术研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106162298A (en) * 2015-03-27 2016-11-23 天脉聚源(北京)科技有限公司 A kind of method and system realizing barrage
CN105761211A (en) * 2016-03-30 2016-07-13 努比亚技术有限公司 Method and device for splicing frames of mobile terminal
CN106503634A (en) * 2016-10-11 2017-03-15 讯飞智元信息科技有限公司 A kind of image alignment method and device
CN110852084A (en) * 2018-07-27 2020-02-28 杭州海康威视数字技术股份有限公司 Text generation method, device and equipment
CN110852084B (en) * 2018-07-27 2021-04-02 杭州海康威视数字技术股份有限公司 Text generation method, device and equipment
CN110427891A (en) * 2019-08-05 2019-11-08 中国工商银行股份有限公司 The method, apparatus, system and medium of contract for identification
CN110427891B (en) * 2019-08-05 2022-06-10 中国工商银行股份有限公司 Method, apparatus, system and medium for identifying contract
CN111783645A (en) * 2020-06-30 2020-10-16 北京百度网讯科技有限公司 Character recognition method and device, electronic equipment and computer readable storage medium

Also Published As

Publication number Publication date
CN104199805B (en) 2017-10-20

Similar Documents

Publication Publication Date Title
CN104199805A (en) Text splicing method and device
US20200012888A1 (en) Image annotating method and electronic device
EP3933686A2 (en) Video processing method, apparatus, electronic device, storage medium, and program product
US9355330B2 (en) In-video product annotation with web information mining
CN104050838B (en) A kind of point-of-reading system, equipment and method that can identify the common printed thing with reading
CN110705405A (en) Target labeling method and device
AU2016273851A1 (en) Accurate tag relevance prediction for image search
CN110751224A (en) Training method of video classification model, video classification method, device and equipment
CN105654027A (en) Fingerprint identification method and apparatus thereof
EP3989158A1 (en) Method, apparatus and device for video similarity detection
Tuna et al. Indexing and keyword search to ease navigation in lecture videos
CN109064467A (en) Analysis method, device and the electronic equipment of community security defence
CN106331848A (en) Panoramic video identification method and device, and video playing method and device
CN104142955A (en) Method and terminal for recommending learning courses
CN112639824A (en) Intelligent maintenance method and device for optical fiber network
CN109885708A (en) The searching method and device of certificate picture
CN111159411B (en) Knowledge graph fused text position analysis method, system and storage medium
CN110377790B (en) Video automatic labeling method based on multi-mode private features
CN104573132A (en) Method and device for finding songs
CN106202539A (en) Syndication search method and device
CN114398973B (en) Media content tag identification method, device, equipment and storage medium
CN104598289A (en) Recognition method and electronic device
CN110147516A (en) The intelligent identification Method and relevant device of front-end code in Pages Design
CN112257768B (en) Method and device for identifying illegal financial pictures and computer storage medium
CN111475699B (en) Website data crawling method and device, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant