CN110147429A - Text comparative approach, device, computer equipment and storage medium - Google Patents
Text comparative approach, device, computer equipment and storage medium Download PDFInfo
- Publication number
- CN110147429A CN110147429A CN201910297625.1A CN201910297625A CN110147429A CN 110147429 A CN110147429 A CN 110147429A CN 201910297625 A CN201910297625 A CN 201910297625A CN 110147429 A CN110147429 A CN 110147429A
- Authority
- CN
- China
- Prior art keywords
- text
- match point
- axis
- traversal
- region
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Data Mining & Analysis (AREA)
- Databases & Information Systems (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
This application involves big data fields, this application discloses a kind of text comparative approach, device, computer equipment and storage mediums, the described method includes: obtaining the first text and the second text, first text and second text are converted into single line text respectively, and by after conversion first text and second text be respectively mapped to X-axis and Y-axis;Second text to first text in X-axis and in Y-axis carries out traversal queries, obtains the match point information of same text in first text and second text;It is counted according to first text and the match point information of same text in second text, obtains text comparison result.The application finds out the identical characters between text according to the shortest distance between most short identical characters by the way that text to be compared is mapped to two-dimensional surface, improves the efficiency that text compares, reduces the complexity that text compares.
Description
Technical field
This application involves big data field, in particular to a kind of text comparative approach, device, computer equipment and storage are situated between
Matter.
Background technique
In daily use, text is relatively a relatively common problem, and application scenarios are also than wide, such as paper ratio
Equity.The core that text compares is exactly the difference compared between two given texts (can be byte stream etc.).Currently, mainstream
Comparison text between difference mainly have two major classes.One kind is calculated based on editing distance (Edit Distance), such as LD
Method.One kind is based on Longest Common Substring (Longest Common Subsequence), such as Needleman/Wunsch
Algorithm etc..But algorithm above is all more complicated, and consuming resource is serious, inefficiency.
Summary of the invention
The purpose of the application is in view of the deficiencies of the prior art, to provide a kind of text comparative approach, device, computer and set
Standby and storage medium by the way that text to be compared is mapped to two-dimensional surface, and is looked for according to the shortest distance between most short identical characters
Identical characters between text out improve the efficiency that text compares, and reduce the complexity that text compares.
In order to achieve the above objectives, the technical solution of the application provide a kind of text comparative approach, device, computer equipment and
Storage medium.
This application discloses a kind of text comparative approach, comprising the following steps:
The first text and the second text are obtained, first text and second text are converted respectively literary in single file
Word, and by after conversion first text and second text be respectively mapped to X-axis and Y-axis;
Second text to first text in X-axis and in Y-axis carries out traversal queries, obtains described the
The match point information of same text in one text and second text;
It is counted according to first text and the match point information of same text in second text, obtains text
Comparison result.
Preferably, first text by after conversion and second text are respectively mapped to X-axis and Y-axis, packet
It includes:
First text after conversion is mapped to any quadrant of X-axis, second text after conversion is mapped
To the quadrant identical with first text of Y-axis;
First text of first text after conversion is corresponded into any one coordinate points on the affiliated quadrant of X-axis, it will
First text of second text after conversion corresponds to any one coordinate points on the affiliated quadrant of Y-axis.
It is looked into preferably, first text in X-axis and second text in Y-axis carry out traversal
It askes, obtains the match point information of same text in first text and second text, comprising:
Second text to first text in X-axis and in Y-axis carries out traversal queries, obtains first
With information;
Traverse region according to the first match point acquisition of information, and on the traversal region to first text and
Second text carries out traversal queries, obtains remaining match point information.
It is looked into preferably, first text in X-axis and second text in Y-axis carry out traversal
It askes, obtains the first match point information, comprising:
Second text to first text in X-axis and in Y-axis carries out traversal queries, obtains described the
One text coordinate points corresponding with same text in second text;
Inquiry and the nearest coordinate points of initial point distance in the corresponding coordinate points of the same text, will described and origin away from
The first match point is labeled as from nearest coordinate points.
Preferably, described traverse region according to the first match point acquisition of information, and to institute on the traversal region
It states the first text and second text carries out traversal queries, obtain remaining match point information, comprising:
The corresponding coordinate points of the last one text in first text and second text are obtained, by the coordinate points
Rectangular area between coordinate points corresponding with first match point is as traversal region, to described on the traversal region
First text and second text carry out traversal queries;
When getting new match point, the traversal region is updated, and continue on the new traversal region
Traversal queries, until occurring without next match point.
Preferably, it is described when getting new match point, the traversal region is updated, and in the new traversal region
On continue traversal queries, until without next match point occur until, comprising:
It is when getting new match point, the last one text in first text and second text is corresponding
Rectangular area between coordinate points coordinate points corresponding with the new match point is as new traversal region;
Traversal queries are carried out to the region in addition to the new match point on the new traversal region, until not having
Until next match point occurs.
Preferably, described unite according to first text and the match point information of same text in second text
Meter obtains text comparison result, comprising:
According to the number of the match point Information Statistics match point of same text in first text and second text;
The word length of first text and second text is obtained, and according to the smaller text in the word length
Word length and the number of the match point obtain text comparison result.
Disclosed herein as well is a kind of text comparison unit, described device includes:
Text mapping block: being set as obtaining the first text and the second text, by first text and second text
This is converted into single line text respectively, and by after conversion first text and second text be respectively mapped to X-axis and Y
Axis;
Match point enquiry module: be set as to first text in X-axis and second text in Y-axis into
Row traversal queries obtain the match point information of same text in first text and second text;
Text comparison module: it is set as being believed according to the match point of same text in first text and second text
Breath is counted, and text comparison result is obtained.
Disclosed herein as well is a kind of computer equipment, the computer equipment includes memory and processor, described to deposit
Computer-readable instruction is stored in reservoir to be made when the computer-readable instruction is executed by one or more processors
Obtain the step of one or more processors execute text comparative approach described above.
Disclosed herein as well is a kind of storage medium, the storage medium can be read and write by processor, and the storage medium is deposited
Computer instruction is contained, when the computer-readable instruction is executed by one or more processors, so that one or more processing
Device executes the step of text comparative approach described above.
The beneficial effect of the application is: the application is by being mapped to two-dimensional surface for text to be compared, and according to most short phase
The shortest distance with intercharacter finds out the identical characters between text, improves the efficiency that text compares, and reduces text and compares
Complexity.
Detailed description of the invention
Fig. 1 is a kind of flow diagram of text comparative approach of the embodiment of the present application;
Fig. 2 is a kind of flow diagram of text comparative approach of the embodiment of the present application;
Fig. 3 is a kind of flow diagram of text comparative approach of the embodiment of the present application;
Fig. 4 is a kind of flow diagram of text comparative approach of the embodiment of the present application;
Fig. 5 is a kind of flow diagram of text comparative approach of the embodiment of the present application;
Fig. 6 is a kind of flow diagram of text comparative approach of the embodiment of the present application;
Fig. 7 is a kind of flow diagram of text comparative approach of the embodiment of the present application
Fig. 8 is a kind of text comparison unit structural schematic diagram of the embodiment of the present application.
Specific embodiment
It is with reference to the accompanying drawings and embodiments, right in order to which the objects, technical solutions and advantages of the application are more clearly understood
The application is further elaborated.It should be appreciated that specific embodiment described herein is only used to explain the application, and
It is not used in restriction the application.
Those skilled in the art of the present technique are appreciated that unless expressly stated, singular " one " used herein, " one
It is a ", " described " and "the" may also comprise plural form.It is to be further understood that being arranged used in the description of the present application
Diction " comprising " refer to that there are the feature, integer, step, operation, element and/or component, but it is not excluded that in the presence of or addition
Other one or more features, integer, step, operation, element, component and/or their group.
A kind of text comparative approach process of the embodiment of the present application as shown in Figure 1, the present embodiment the following steps are included:
Step s101 obtains the first text and the second text, first text and second text is converted respectively
Text in single file, and by after conversion first text and second text be respectively mapped to X-axis and Y-axis;
Specifically, the original text usually obtained is all the text comprising multline text, and since the margin of setting is not away from
Together, the quantity of every row text is likely to difference, therefore after getting two texts for needing to compare, can be by the need
Two texts to be compared all are converted into single line text, i.e., all convert multline text in a row, and carry out by the text
The text is respectively mapped in X-axis and Y-axis after conversion, such as the text of first text is mapped to X-axis, second text
This text is mapped in Y-axis;Wherein, for convenience of calculation, the corresponding coordinate of each text can be integer numerical value, and occupy
One number, such as the coordinate of first text, first text can be (1,0), then the coordinate of second text be (2,
0), and so on, similarly the coordinate of the first of second text text can be (0,1), then the coordinate of second text
It is (0,2).
Step s102, second text to first text in X-axis and in Y-axis carry out traversal queries,
Obtain the match point information of same text in first text and second text;
Specifically, firstly the need of first match point in two texts is found out, the match point is text in two texts
Identical and nearest from origin coordinates point, the origin coordinates depend on two texts in the initial mapping position of X-axis and Y-axis,
For example, origin coordinates can be original if first text of first text and second text is all corresponding origin
Point can be by first text and second if first text of first text and second text does not correspond to origin
The corresponding coordinate of first text of a text is reflected as origin coordinates to first text and second text
When penetrating, it is necessary to which first text and second text are placed on same quadrant.
Specifically, described can be by will be on the text and Y-axis in first text in X-axis to searching for the first match point
Second text in text carry out traversal queries, find out all identical texts in two texts, and record the phase
The corresponding coordinate with text, for example, the B text in second text in the A text and Y-axis in first text in X-axis
It is identical, and the corresponding coordinate of the A text is (a, 0) in first text in X-axis, the B text in second text in Y-axis
Corresponding coordinate is (0, b), then the corresponding coordinate of identical text is (a, b) in two texts, can similarly obtain two texts
In the corresponding coordinate of remaining same text, at this moment can inquire in the corresponding coordinate of all same texts from origin coordinates
Nearest coordinate, the nearest coordinate are exactly the first match point coordinate.
Specifically, after getting the first match point information, it is described to continue in first text and second text
Other match point information are searched by traversal queries, when getting all match points in first text and second text
After information, the match point information is stored, coordinate of the match point information comprising match point and match point are corresponding
Text.
Step s103 unites according to first text and the match point information of same text in second text
Meter obtains text comparison result.
Specifically, can unite first after getting match point information all in first text and second text
The quantity of all match points is counted, is then compared the word length of the word length of first text and second text,
Obtaining that lesser word length in two text sizes can be any if the word length of two texts is the same
The word length of one of text is selected, is finally obtained with the quantity of all match points divided by the word length of text
The similarity of two texts.
Specifically, similarity threshold can also be preset, it, can be by institute after getting the similarity of two texts
The similarity for stating acquisition is compared with preset similarity threshold, if the similarity of the acquisition is not less than preset similar
Threshold value is spent, then it is considered that two texts are consistent, otherwise it is considered that two texts are inconsistent.
In the present embodiment, by the way that text to be compared is mapped to two-dimensional surface, and according to most short between most short identical characters
Distance finds out the identical characters between text, improves the efficiency that text compares, and reduces the complexity that text compares.
Fig. 2 is a kind of text comparative approach flow diagram of the embodiment of the present application, as shown, the step s101,
By after conversion first text and second text be respectively mapped to X-axis and Y-axis, comprising:
First text after conversion is mapped to any quadrant of X-axis by step s201, by described second after conversion
Text is mapped to the quadrant identical with first text of Y-axis;
Specifically, first text and second text can be mapped to any one when mapping the text
A quadrant, but it is to ensure that first text and second text in the same quadrant, are determining quadrant and then by first
A text is mapped to X-axis, and second text is mapped to Y-axis.
Either one or two of step s202, first text of first text after conversion is corresponded on the affiliated quadrant of X-axis
First text of second text after conversion is corresponded to any one coordinate points on the affiliated quadrant of Y-axis by coordinate points.
Specifically, needing to select first text in first text corresponding when first text is mapped to X-axis
Coordinate, the coordinate needs and the quadrant of selection is corresponding, for example, if selection first quartile, first in first text
The range of choice of the corresponding coordinate of a text is [0, ∞], similarly, can be equally to when second text is mapped to Y-axis
The corresponding coordinate of first text is selected in two texts.
Specifically, understanding to simplify, first quartile is may be selected in the quadrant of first text and second text, when to institute
When stating text progress two-dimensional surface mapping, first text of first text can be mapped to origin or closer from origin
Distance, such as coordinate points (1,0), by first text of second text be mapped to origin or from origin it is closer with a distance from, such as
Coordinate points (0,1).
In the present embodiment, by carrying out Planar Mapping to text information, the identical of text can be conveniently found out by coordinate
The efficiency that text compares is improved at place.
Fig. 3 is a kind of text comparative approach flow diagram of the embodiment of the present application, as shown, the step s102,
Second text to first text in X-axis and in Y-axis carries out traversal queries, obtain first text with
The match point information of same text in second text, comprising:
Step s301, second text to first text in X-axis and in Y-axis carry out traversal queries,
Obtain the first match point information;
Specifically, described can be by will be on the text and Y-axis in first text in X-axis to searching for the first match point
Second text in text carry out traversal queries, find out all identical texts in two texts, and record the phase
The corresponding coordinate with text, for example, the B text in second text in the A text and Y-axis in first text in X-axis
It is identical, and the corresponding coordinate of the A text is (a, 0) in first text in X-axis, the B text in second text in Y-axis
Corresponding coordinate is (0, b), then the corresponding coordinate of identical text is (a, b) in two texts, can similarly obtain two texts
In the corresponding coordinate of remaining same text;The traversal queries can first determine first text in first text, then root
Traversal queries are carried out since first text in second text according to first text, are found all with described first
The identical text of first text, writes down the coordinate of the text in a text, then determines second in first text again
A text continues to carry out traversal queries since first text in second text according to second text, find
All texts identical with second text in first text, write down the coordinate of the text, and so on, Zhi Dao
Until all texts in one text finish traversal queries;
Specifically, after finding the coordinate of all same texts in first text and second text, it can be according to institute
Coordinate inquiry and origin coordinates are stated apart from nearest coordinate, the described and nearest coordinate of origin coordinates is exactly that the first match point is sat
Mark, the origin coordinates depend on two texts in the initial mapping position of X-axis and Y-axis, for example, if first text and the
First text of two texts is all corresponding origin, then origin coordinates can be origin, if first text and second
First text of a text does not correspond to origin, then can be corresponding by first text of first text and second text
Coordinate is as origin coordinates.
Step s302 traverses region according to the first match point acquisition of information, and to described on the traversal region
First text and second text carry out traversal queries, obtain remaining match point information.
It, can be by last of first text and second text specifically, after getting first match point
Rectangular area between a corresponding coordinate of text and the first match point respective coordinates is used as traversal region, and at described time
It goes through on region and traversal queries is carried out to first text and second text, obtain remaining match point information.When first
After getting all match point information in a text and second text, the match point information is stored, described
It include the coordinate and the corresponding text of match point of match point with information.
In the present embodiment, traversal region is obtained by the inquiry to the first match point, and according to first match point, is obtained
Remaining match point is taken, can effectively improve text relative efficiency.
Fig. 4 is a kind of text comparative approach flow diagram of the embodiment of the present application, as shown, the step s301,
Second text to first text in X-axis and in Y-axis carries out traversal queries, obtains the first match point letter
Breath, comprising:
Step s401, second text to first text in X-axis and in Y-axis carry out traversal queries,
Obtain first text coordinate points corresponding with same text in second text;
Specifically, described can be by will be on the text and Y-axis in first text in X-axis to searching for the first match point
Second text in text carry out traversal queries, find out all identical texts in two texts, and record the phase
The corresponding coordinate with text, for example, the B text in second text in the A text and Y-axis in first text in X-axis
It is identical, and the corresponding coordinate of the A text is (a, 0) in first text in X-axis, the B text in second text in Y-axis
Corresponding coordinate is (0, b), then the corresponding coordinate of identical text is (a, b) in two texts, can similarly obtain two texts
In the corresponding coordinate of remaining same text;The traversal queries can first determine first text in first text, then root
Traversal queries are carried out since first text in second text according to first text, are found all with described first
The identical text of first text, writes down the coordinate of the text in a text, then determines second in first text again
A text continues to carry out traversal queries since first text in second text according to second text, find
All texts identical with second text in first text, write down the coordinate of the text, and so on, Zhi Dao
Until all texts in one text finish traversal queries.
Step s402, inquiry and the nearest coordinate points of initial point distance in the corresponding coordinate points of the same text, by institute
It states with the nearest coordinate points of initial point distance labeled as the first match point.
Specifically, after finding the coordinate of all same texts in first text and second text, it can be according to institute
Coordinate inquiry and origin coordinates are stated apart from nearest coordinate, the shortest distance can be calculated by Pythagorean theorem and be obtained, it is described with
The nearest coordinate of origin coordinates is exactly the first match point coordinate, the origin coordinates depend on two texts X-axis and Y-axis just
Beginning mapping position, for example, starting is sat if first text of first text and second text is all corresponding origin
Mark can be origin, can be by first if first text of first text and second text does not correspond to origin
The corresponding coordinate of first text of text and second text is as origin coordinates.
In the present embodiment, shortest distance comparison is carried out by the coordinate to all same texts, it can be with quick obtaining first
Match point effectively improves text relative efficiency.
Fig. 5 is a kind of text comparative approach flow diagram of the embodiment of the present application, as shown, the step s302,
Region is traversed according to the first match point acquisition of information, and to first text and described second on the traversal region
Text carries out traversal queries, obtains remaining match point information, comprising:
Step s501 obtains the corresponding coordinate points of the last one text in first text and second text, will
Rectangular area between coordinate points coordinate points corresponding with first match point is as traversal region, in the traversal area
Traversal queries are carried out to first text and second text on domain;
Specifically, can be from the corresponding coordinate points of the last one text in first text are obtained in X-axis, it can also be from Y-axis
The corresponding coordinate points of the last one text in second text are obtained, according to the corresponding seat of the last one text in first text
The last one text is corresponding in available two texts of the corresponding coordinate points of the last one text in punctuate and second text
Coordinate points, for example, the corresponding coordinate points of the last one text are (A, 0) in first text, last in second text
The corresponding coordinate points of a text are (0, B), then the corresponding coordinate points of the last one text are (A, B) in two texts, by institute
State the rectangle region between the coordinate points corresponding with first match point of the corresponding coordinate points of the last one text in two texts
Domain is used as traversal region, for example, if the coordinate points of the first match point are (C, D), then traversal region is (C, D) to (A, B),
After determining traversal region, so that it may carry out traversal to first text and second text on the traversal region and look into
It askes.
Step s502 updates the traversal region, and on the new traversal region when getting new match point
Continue traversal queries, until occurring without next match point.
Specifically, when carrying out traversal queries on the traversal region, it can be by calculating and first match point
The shortest distance obtains new match point, and when getting new match point, traversal area can be redefined according to step s501
Domain, the new traversal region determine by the corresponding coordinate points of the last one text in new match point and two texts, for example,
If the coordinate points of new match point are (E, F), then traversal region is (E, F) to (A, B), on the new traversal region
When carrying out traversal queries, next match point can be obtained with the shortest distance of the new match point by calculating, and so on,
Until occurring without next match point.
In the present embodiment, pass through the update to traversal region, it is possible to reduce the range of traversal queries reduces calculation amount, mentions
High text relative efficiency.
Fig. 6 is a kind of text comparative approach flow diagram of the embodiment of the present application, as shown, the step s502,
When getting new match point, the traversal region is updated, and continue traversal queries on the new traversal region,
Until occurring without next match point, comprising:
Step s601, when getting new match point, by the last one in first text and second text
Rectangular area between the corresponding coordinate points of text coordinate points corresponding with the new match point is as new traversal region;
Specifically, traversal region can be redefined according to step s501 when getting new match point, it is described new
Traversal region determined by the corresponding coordinate points of the last one text in new match point and two texts, for example, if new
The coordinate points of match point are (E, F), then traversal region is (E, F) to (A, B).
Step s602 carries out traversal to the region in addition to the new match point on the new traversal region and looks into
It askes, until occurring without next match point.
Specifically, when carrying out traversal queries on the new traversal region, it can be by calculating and the new match point
The shortest distance obtain next match point, and so on, until until without the appearance of next match point, wherein described
When carrying out traversal queries on new traversal region, the new match point exclusion can traversed except region, that is, do not needed pair
The new match point carries out traversal queries.
Specifically, can constantly obtain new match point, and according to the new matching after by looping through inquiry
Point updates traversal region, and when determining next match point, the calculating of the shortest distance is according to same text coordinate and upper one
The shortest distance of match point, after passing through traversal queries and minimum distance calculation, if this not new match point generates,
It can terminate traversal queries, this text compares end.
In the present embodiment, all match point information are obtained by looping through inquiry, text is can effectively improve and compares effect
Rate reduces complexity.
Fig. 7 is a kind of text comparative approach flow diagram of the embodiment of the present application, as shown, the step s103,
It is counted according to first text and the match point information of same text in second text, obtains text and compare knot
Fruit, comprising:
Step s701 is matched according to first text with the match point Information Statistics of same text in second text
The number of point;
Specifically, can unite first after getting match point information all in first text and second text
Count the quantity of all match points.
Step s702 obtains the word length of first text and second text, and according to the word length
In smaller word length and the match point number obtain text comparison result.
Specifically, the word length of the word length of first text and second text can be compared, two are obtained
That lesser word length can arbitrarily select it if the word length of two texts is the same in a text size
In a text word length, the quantity of all match points is finally obtained with two texts divided by the word length of text
This similarity.
In the present embodiment, the similarity between two texts is obtained by the number of match point, text can be effectively reduced
More complicated degree.
A kind of text comparison unit structure of the embodiment of the present application is as shown in Figure 8, comprising:
Text mapping block 801, match point enquiry module 802 and text comparison module 803;Wherein, text mapping block
801 are connected with match point enquiry module 802, and match point enquiry module 802 is connected with text comparison module 803;Text maps mould
Block 801 is set as obtaining the first text and the second text, and first text and second text are converted in single file respectively
Text, and by after conversion first text and second text be respectively mapped to X-axis and Y-axis;Match point enquiry module
802, which are set as second texts to first text in X-axis and in Y-axis, carries out traversal queries, described in acquisition
The match point information of same text in first text and second text;Text comparison module 803 is set as according to described
One text and the match point information of same text in second text are counted, and text comparison result is obtained.
The embodiment of the present application also discloses a kind of computer equipment, and the computer equipment includes memory and processor,
Computer-readable instruction is stored in the memory, the computer-readable instruction is executed by one or more processors
When, so that one or more processors execute the step in text comparative approach described in the various embodiments described above.
The embodiment of the present application also discloses a kind of storage medium, and the storage medium can be read and write by processor, the storage
Device is stored with computer-readable instruction, when the computer-readable instruction is executed by one or more processors so that one or
Multiple processors execute the step in text comparative approach described in the various embodiments described above.
Those of ordinary skill in the art will appreciate that realizing all or part of the process in above-described embodiment method, being can be with
Relevant hardware is instructed to complete by computer program, which can be stored in a computer-readable storage and be situated between
In matter, the program is when being executed, it may include such as the process of the embodiment of above-mentioned each method.Wherein, storage medium above-mentioned can be
The non-volatile memory mediums such as magnetic disk, CD, read-only memory (Read-Only Memory, ROM) or random storage note
Recall body (Random Access Memory, RAM) etc..
Each technical characteristic of embodiment described above can be combined arbitrarily, for simplicity of description, not to above-mentioned reality
It applies all possible combination of each technical characteristic in example to be all described, as long as however, the combination of these technical characteristics is not deposited
In contradiction, all should be considered as described in this specification.
The several embodiments of the application above described embodiment only expresses, the description thereof is more specific and detailed, but simultaneously
The limitation to the application the scope of the patents therefore cannot be interpreted as.It should be pointed out that for those of ordinary skill in the art
For, without departing from the concept of this application, various modifications and improvements can be made, these belong to the guarantor of the application
Protect range.Therefore, the scope of protection shall be subject to the appended claims for the application patent.
Claims (10)
1. a kind of text comparative approach, which comprises the following steps:
The first text and the second text are obtained, first text and second text are converted into single line text respectively, and
By after conversion first text and second text be respectively mapped to X-axis and Y-axis;
Second text to first text in X-axis and in Y-axis carries out traversal queries, obtains first text
The match point information of this and same text in second text;
It is counted according to first text and the match point information of same text in second text, obtains text and compare
As a result.
2. text comparative approach as described in claim 1, which is characterized in that first text and institute by after conversion
It states the second text and is respectively mapped to X-axis and Y-axis, comprising:
First text after conversion is mapped to any quadrant of X-axis, second text after conversion is mapped to Y-axis
Quadrant identical with first text;
First text of first text after conversion is corresponded into any one coordinate points on the affiliated quadrant of X-axis, will be converted
First text of second text afterwards corresponds to any one coordinate points on the affiliated quadrant of Y-axis.
3. text comparative approach as claimed in claim 2, which is characterized in that first text in X-axis and
Second text in Y-axis carries out traversal queries, obtains same text in first text and second text
Match point information, comprising:
Second text to first text in X-axis and in Y-axis carries out traversal queries, obtains the first match point
Information;
Region is traversed according to the first match point acquisition of information, and to first text and described on the traversal region
Second text carries out traversal queries, obtains remaining match point information.
4. text comparative approach as claimed in claim 3, which is characterized in that first text in X-axis and
Second text in Y-axis carries out traversal queries, obtains the first match point information, comprising:
Second text to first text in X-axis and in Y-axis carries out traversal queries, obtains first text
This coordinate points corresponding with same text in second text;
Inquiry and the nearest coordinate points of initial point distance in the corresponding coordinate points of the same text, most by described and initial point distance
Close coordinate points are labeled as the first match point.
5. text comparative approach as claimed in claim 3, which is characterized in that described according to the first match point acquisition of information
Region is traversed, and traversal queries are carried out to first text and second text on the traversal region, obtains remaining
Match point information, comprising:
The corresponding coordinate points of the last one text in first text and second text are obtained, by the coordinate points and institute
The rectangular area between the corresponding coordinate points of the first match point is stated as traversal region, to described first on the traversal region
Text and second text carry out traversal queries;
When getting new match point, the traversal region is updated, and continue to traverse on the new traversal region
Inquiry, until occurring without next match point.
6. text comparative approach as claimed in claim 5, which is characterized in that it is described when getting new match point, it updates
The traversal region, and continue traversal queries on the new traversal region, until occurring without next match point
Until, comprising:
When getting new match point, by the corresponding coordinate of the last one text in first text and second text
Rectangular area between point coordinate points corresponding with the new match point is as new traversal region;
Traversal queries are carried out to the region in addition to the new match point on the new traversal region, until not next
Until a match point occurs.
7. text comparative approach as described in claim 1, which is characterized in that described according to first text and described second
The match point information of same text is counted in text, obtains text comparison result, comprising:
According to the number of the match point Information Statistics match point of same text in first text and second text;
The word length of first text and second text is obtained, and long according to the smaller text in the word length
The number of degree and the match point obtains text comparison result.
8. a kind of text comparison unit, which is characterized in that described device includes:
Text mapping block: being set as obtaining the first text and the second text, by first text and second text point
Be not converted into single line text, and by after conversion first text and second text be respectively mapped to X-axis and Y-axis;
Match point enquiry module: it is set as second text progress time to first text in X-axis and in Y-axis
Inquiry is gone through, the match point information of same text in first text and second text is obtained;
Text comparison module: be set as according to the match point information of same text in first text and second text into
Row statistics, obtains text comparison result.
9. a kind of computer equipment, which is characterized in that the computer equipment includes memory and processor, in the memory
It is stored with computer-readable instruction, when the computer-readable instruction is executed by one or more processors, so that one
Or multiple processors are executed as described in any one of claims 1 to 7 the step of text comparative approach.
10. a kind of storage medium, which is characterized in that the storage medium can be read and write by processor, and the storage medium is stored with
Computer instruction, when the computer-readable instruction is executed by one or more processors, so that one or more processors are held
Row is as described in any one of claims 1 to 7 the step of text comparative approach.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910297625.1A CN110147429B (en) | 2019-04-15 | 2019-04-15 | Text comparison method, apparatus, computer device and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910297625.1A CN110147429B (en) | 2019-04-15 | 2019-04-15 | Text comparison method, apparatus, computer device and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110147429A true CN110147429A (en) | 2019-08-20 |
CN110147429B CN110147429B (en) | 2023-08-15 |
Family
ID=67588900
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910297625.1A Active CN110147429B (en) | 2019-04-15 | 2019-04-15 | Text comparison method, apparatus, computer device and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110147429B (en) |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102063510A (en) * | 2011-01-17 | 2011-05-18 | 珠海全志科技有限公司 | Method for searching matched character string |
CN106980870A (en) * | 2016-12-30 | 2017-07-25 | 中国银联股份有限公司 | Text matches degree computational methods between short text |
CN107085568A (en) * | 2017-03-29 | 2017-08-22 | 腾讯科技(深圳)有限公司 | A kind of text similarity method of discrimination and device |
CN107315817A (en) * | 2017-06-30 | 2017-11-03 | 华自科技股份有限公司 | Electronic drawing text matching technique, device, storage medium and computer equipment |
CN107679219A (en) * | 2017-10-19 | 2018-02-09 | 广州视睿电子科技有限公司 | Matching process and device, interactive intelligent tablet computer and storage medium |
CN108170684A (en) * | 2018-01-22 | 2018-06-15 | 京东方科技集团股份有限公司 | Text similarity computing method and system, data query system and computer product |
CN108182222A (en) * | 2017-12-26 | 2018-06-19 | 东软集团股份有限公司 | A kind of text matching technique and device |
CN108920580A (en) * | 2018-06-25 | 2018-11-30 | 腾讯科技(深圳)有限公司 | Image matching method, device, storage medium and terminal |
-
2019
- 2019-04-15 CN CN201910297625.1A patent/CN110147429B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102063510A (en) * | 2011-01-17 | 2011-05-18 | 珠海全志科技有限公司 | Method for searching matched character string |
CN106980870A (en) * | 2016-12-30 | 2017-07-25 | 中国银联股份有限公司 | Text matches degree computational methods between short text |
CN107085568A (en) * | 2017-03-29 | 2017-08-22 | 腾讯科技(深圳)有限公司 | A kind of text similarity method of discrimination and device |
CN107315817A (en) * | 2017-06-30 | 2017-11-03 | 华自科技股份有限公司 | Electronic drawing text matching technique, device, storage medium and computer equipment |
CN107679219A (en) * | 2017-10-19 | 2018-02-09 | 广州视睿电子科技有限公司 | Matching process and device, interactive intelligent tablet computer and storage medium |
CN108182222A (en) * | 2017-12-26 | 2018-06-19 | 东软集团股份有限公司 | A kind of text matching technique and device |
CN108170684A (en) * | 2018-01-22 | 2018-06-15 | 京东方科技集团股份有限公司 | Text similarity computing method and system, data query system and computer product |
CN108920580A (en) * | 2018-06-25 | 2018-11-30 | 腾讯科技(深圳)有限公司 | Image matching method, device, storage medium and terminal |
Also Published As
Publication number | Publication date |
---|---|
CN110147429B (en) | 2023-08-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11573942B2 (en) | System and method for determining exact location results using hash encoding of multi-dimensioned data | |
Driemel et al. | Jaywalking your dog: computing the Fréchet distance with shortcuts | |
Ta et al. | Signature-based trajectory similarity join | |
Cvitanović et al. | Topological and metric properties of Hénon-type strange attractors | |
Chen et al. | A benchmark for evaluating moving object indexes | |
Lee et al. | Scalable skyline computation using a balanced pivot selection technique | |
US9286312B2 (en) | Data coreset compression | |
US11307049B2 (en) | Methods, apparatuses, systems, and storage media for storing and loading visual localization maps | |
JP6311404B2 (en) | Management program, management apparatus, and management method | |
US20160019248A1 (en) | Methods for processing within-distance queries | |
CN106528790A (en) | Method and device for selecting support point in metric space | |
US20240078255A1 (en) | Method and apparatus for determining spatial two-tuple, computer device, and storage medium | |
Sun et al. | On efficient aggregate nearest neighbor query processing in road networks | |
KR101116663B1 (en) | Partitioning Method for High Dimensional Data | |
Zhou et al. | Design and implementation of multi-scale databases | |
Cho et al. | A basis of spatial big data analysis with map-matching system | |
Sinha | LSH vs randomized partition trees: Which one to use for nearest neighbor search? | |
CN110147429A (en) | Text comparative approach, device, computer equipment and storage medium | |
Doraiswamy et al. | Spade: Gpu-powered spatial database engine for commodity hardware | |
WO2016107440A1 (en) | Method and apparatus for generating and displaying an electronic map | |
CN113297430B (en) | Sketch-based high-performance arbitrary partial key measurement method and system | |
US11449566B2 (en) | Methods and systems for processing geospatial data | |
US11537622B2 (en) | K-nearest neighbour spatial queries on a spatial database | |
CN111130569B (en) | Spatial information data self-adaptive fault-tolerant processing method and system | |
CN110945499B (en) | Method and system for real-time three-dimensional space search and point cloud registration by applying dimension shuffling transformation |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |