CN110147429B - Text comparison method, apparatus, computer device and storage medium - Google Patents

Text comparison method, apparatus, computer device and storage medium Download PDF

Info

Publication number
CN110147429B
CN110147429B CN201910297625.1A CN201910297625A CN110147429B CN 110147429 B CN110147429 B CN 110147429B CN 201910297625 A CN201910297625 A CN 201910297625A CN 110147429 B CN110147429 B CN 110147429B
Authority
CN
China
Prior art keywords
text
axis
matching point
matching
traversal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910297625.1A
Other languages
Chinese (zh)
Other versions
CN110147429A (en
Inventor
余宁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ping An Technology Shenzhen Co Ltd
Original Assignee
Ping An Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ping An Technology Shenzhen Co Ltd filed Critical Ping An Technology Shenzhen Co Ltd
Priority to CN201910297625.1A priority Critical patent/CN110147429B/en
Publication of CN110147429A publication Critical patent/CN110147429A/en
Application granted granted Critical
Publication of CN110147429B publication Critical patent/CN110147429B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The application relates to the field of big data, and discloses a text comparison method, a text comparison device, computer equipment and a storage medium, wherein the method comprises the following steps: acquiring a first text and a second text, respectively converting the first text and the second text into single-line characters, and respectively mapping the converted first text and second text to an X axis and a Y axis; performing traversal query on the first text on the X axis and the second text on the Y axis to obtain matching point information of the same characters in the first text and the second text; and counting according to the matching point information of the same text in the first text and the second text, and obtaining a text comparison result. According to the text comparison method and device, the text to be compared is mapped to the two-dimensional plane, and the same characters among the texts are found out according to the shortest distance among the shortest same characters, so that the text comparison efficiency is improved, and the complexity of text comparison is reduced.

Description

Text comparison method, apparatus, computer device and storage medium
Technical Field
The present application relates to the field of big data, and in particular, to a text comparison method, apparatus, computer device, and storage medium.
Background
Text comparison is a common problem in daily applications, and application scenes are also wider, such as paper comparison and the like. At the heart of text comparison is the comparison of the difference between two given texts (which may be byte streams, etc.). Currently, there are two main categories of differences between the dominant comparison texts. One is based on Edit Distance (Edit Distance), such as LD algorithm. One class is based on the longest common substring (Longest Common Subsequence), such as Needleman/Wunsch algorithm, etc. However, the above algorithms are complex, consume serious resources and have low efficiency.
Disclosure of Invention
The application aims at overcoming the defects of the prior art, and provides a text comparison method, a device, computer equipment and a storage medium.
In order to achieve the above objective, the technical scheme of the present application provides a text comparison method, a device, a computer device and a storage medium.
The application discloses a text comparison method, which comprises the following steps:
acquiring a first text and a second text, respectively converting the first text and the second text into single-line characters, and respectively mapping the converted first text and second text to an X axis and a Y axis;
performing traversal query on the first text on the X axis and the second text on the Y axis to obtain matching point information of the same characters in the first text and the second text;
and counting according to the matching point information of the same text in the first text and the second text, and obtaining a text comparison result.
Preferably, the mapping the converted first text and the converted second text to the X-axis and the Y-axis includes:
mapping the converted first text to any quadrant of an X axis, and mapping the converted second text to the same quadrant of a Y axis as the first text;
and the first character of the converted first text corresponds to any coordinate point on the quadrant to which the X axis belongs, and the first character of the converted second text corresponds to any coordinate point on the quadrant to which the Y axis belongs.
Preferably, the traversing query is performed on the first text on the X axis and the second text on the Y axis, and the obtaining the matching point information of the same text in the first text and the second text includes:
performing traversal query on the first text on the X axis and the second text on the Y axis to obtain first matching point information;
and acquiring a traversing area according to the first matching point information, and performing traversing inquiry on the first text and the second text on the traversing area to acquire the rest matching point information.
Preferably, the traversing query is performed on the first text on the X axis and the second text on the Y axis, to obtain first matching point information, including:
traversing and inquiring the first text on the X axis and the second text on the Y axis to obtain coordinate points corresponding to the same characters in the first text and the second text;
and inquiring a coordinate point closest to the origin in the coordinate points corresponding to the same characters, and marking the coordinate point closest to the origin as a first matching point.
Preferably, the obtaining the traversal region according to the first matching point information, and performing traversal query on the first text and the second text on the traversal region, to obtain the rest matching point information, includes:
acquiring coordinate points corresponding to the last text in the first text and the second text, taking a rectangular area between the coordinate points and the coordinate points corresponding to the first matching points as a traversing area, and performing traversing inquiry on the first text and the second text on the traversing area;
when a new matching point is acquired, updating the traversing area, and continuing traversing inquiry on the new traversing area until no next matching point appears.
Preferably, when a new matching point is obtained, updating the traversal region, and continuing to perform traversal inquiry on the new traversal region until no next matching point appears, including:
when a new matching point is obtained, taking a rectangular area between the coordinate point corresponding to the last word in the first text and the second text and the coordinate point corresponding to the new matching point as a new traversal area;
and traversing the areas except the new matching points on the new traversing area until no next matching points appear.
Preferably, the counting according to the matching point information of the same text in the first text and the second text, to obtain a text comparison result, includes:
counting the number of matching points according to the matching point information of the same characters in the first text and the second text;
and acquiring the text lengths of the first text and the second text, and acquiring a text comparison result according to the smaller text length in the text lengths and the number of the matching points.
The application also discloses a text comparison device, which comprises:
a text mapping module: the method comprises the steps of obtaining a first text and a second text, respectively converting the first text and the second text into single-line characters, and respectively mapping the converted first text and second text to an X axis and a Y axis;
and the matching point query module: the method comprises the steps of setting the first text on an X axis and the second text on a Y axis to carry out traversal inquiry, and obtaining matching point information of the same characters in the first text and the second text;
and a text comparison module: and counting according to the matching point information of the same text in the first text and the second text, and obtaining a text comparison result.
The application also discloses a computer device comprising a memory and a processor, wherein the memory stores computer readable instructions which, when executed by one or more of the processors, cause the one or more processors to perform the steps of the text comparison method described above.
The application also discloses a storage medium which can be read and written by a processor, wherein the storage medium stores computer instructions which, when executed by one or more processors, cause the one or more processors to execute the steps of the text comparison method.
The beneficial effects of the application are as follows: according to the text comparison method and device, the text to be compared is mapped to the two-dimensional plane, and the same characters among the texts are found out according to the shortest distance among the shortest same characters, so that the text comparison efficiency is improved, and the complexity of text comparison is reduced.
Drawings
FIG. 1 is a flow chart of a text comparison method according to an embodiment of the application;
FIG. 2 is a flow chart of a text comparison method according to an embodiment of the application;
FIG. 3 is a flow chart of a text comparison method according to an embodiment of the application;
FIG. 4 is a flow chart of a text comparison method according to an embodiment of the application;
FIG. 5 is a flow chart of a text comparison method according to an embodiment of the application;
FIG. 6 is a flow chart of a text comparison method according to an embodiment of the application;
FIG. 7 is a flow chart of a text comparison method according to an embodiment of the application
Fig. 8 is a schematic structural diagram of a text comparing device according to an embodiment of the application.
Detailed Description
The present application will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present application more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the application.
As used herein, the singular forms "a", "an", "the" and "the" are intended to include the plural forms as well, unless expressly stated otherwise, as understood by those skilled in the art. It will be further understood that the terms "comprises" and/or "comprising," when used in this specification, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
A text comparison method flow of the embodiment of the application is shown in FIG. 1, and the embodiment comprises the following steps:
step s101, a first text and a second text are obtained, the first text and the second text are respectively converted into single-line characters, and the converted first text and second text are respectively mapped to an X axis and a Y axis;
specifically, the initial text is usually obtained as a text containing multiple lines of characters, and because the set margins are different, the number of characters in each line is possibly different, so after two texts to be compared are obtained, the two texts to be compared can be converted into a single line of characters, namely, the multiple lines of characters are converted into one line, and after the texts are converted, the characters are respectively mapped onto an X axis and a Y axis, for example, the characters of a first text are mapped onto the X axis, and the characters of a second text are mapped onto the Y axis; for convenience of calculation, the coordinates corresponding to each text may be an integer number and occupy a number, for example, the coordinates of the first text may be (1, 0), then the coordinates of the second text may be (2, 0), and so on, and the coordinates of the first text may be (0, 1), then the coordinates of the second text may be (0, 2).
Step s102, performing traversal query on the first text on the X axis and the second text on the Y axis to obtain matching point information of the same characters in the first text and the second text;
specifically, a first matching point in the two texts needs to be found, where the matching point is the point in the two texts that has the same text and is closest to the start coordinate, where the start coordinate depends on the initial mapping positions of the two texts on the X-axis and the Y-axis, for example, if the first text of the first text and the second text is the corresponding origin, the start coordinate may be the origin, if the first text of the first text and the second text is not the corresponding origin, the coordinate corresponding to the first text of the first text and the second text may be taken as the start coordinate, but when mapping the first text and the second text, the first text and the second text must be placed in the same quadrant.
Specifically, the searching for the first matching point may search for all the same words in the two texts by traversing the words in the first text on the X-axis and the words in the second text on the Y-axis, and record the coordinates corresponding to the same words, for example, the coordinates corresponding to the a-th word in the first text on the X-axis and the B-th word in the second text on the Y-axis are the same, and the coordinates corresponding to the a-th word in the first text on the X-axis are (a, 0), and the coordinates corresponding to the B-th word in the second text on the Y-axis are (0, B), so that the coordinates corresponding to the same words in the two texts are (a, B), and similarly, the coordinates corresponding to the rest of the same words in the two texts may be searched for the coordinates closest to the starting coordinates in the coordinates corresponding to all the same words, and the closest coordinates are the coordinates of the first matching point.
Specifically, after the first matching point information is obtained, the first text and the second text can be searched for other matching point information through traversal inquiry continuously, and after all the matching point information is obtained in the first text and the second text, the matching point information is stored, wherein the matching point information comprises coordinates of the matching point and characters corresponding to the matching point.
Step s103, counting according to the matching point information of the same text in the first text and the second text, and obtaining a text comparison result.
Specifically, after obtaining all the matching point information in the first text and the second text, the number of all the matching points can be counted first, then the text length of the first text and the text length of the second text are compared, the smaller text length of the two text lengths is obtained, if the text lengths of the two texts are the same, the text length of one of the two texts can be selected at will, and finally the number of all the matching points is divided by the text length of the text, so that the similarity of the two texts can be obtained.
Specifically, a similarity threshold may be preset, after the similarity of the two texts is obtained, the obtained similarity may be compared with a preset similarity threshold, if the obtained similarity is not smaller than the preset similarity threshold, the two texts may be considered to be consistent, otherwise, the two texts may be considered to be inconsistent.
In the embodiment, the text to be compared is mapped to the two-dimensional plane, and the same characters among the texts are found out according to the shortest distance among the shortest same characters, so that the text comparison efficiency is improved, and the complexity of text comparison is reduced.
Fig. 2 is a schematic flow chart of a text comparison method according to an embodiment of the present application, as shown in the drawing, the step s101 of mapping the converted first text and the converted second text to an X axis and a Y axis respectively includes:
step s201, mapping the converted first text to any quadrant of an X-axis, and mapping the converted second text to the same quadrant of a Y-axis as the first text;
specifically, when mapping the text, the first text and the second text may be mapped to any quadrant, but the first text and the second text are guaranteed to be in the same quadrant, after the condition is determined, the first text is mapped to the X-axis, and the second text is mapped to the Y-axis.
Step s202, the first text after conversion corresponds to any coordinate point on the quadrant to which the X axis belongs, and the first text after conversion corresponds to any coordinate point on the quadrant to which the Y axis belongs.
Specifically, when the first text is mapped to the X-axis, the coordinate corresponding to the first text in the first text needs to be selected, where the coordinate needs to correspond to the selected quadrant, for example, if the first quadrant is selected, the selection range of the coordinate corresponding to the first text in the first text is [0, ], and similarly, when the second text is mapped to the Y-axis, the coordinate corresponding to the first text in the second text may be selected.
Specifically, to simplify understanding, a first quadrant may be selected for a first text and a second text, where when the text is mapped in a two-dimensional plane, the first text of the first text may be mapped to an origin or a distance closer to the origin, such as coordinate point (1, 0), and the first text of the second text may be mapped to the origin or a distance closer to the origin, such as coordinate point (0, 1).
In the embodiment, the text information is subjected to plane mapping, so that the same position of the text can be conveniently found out through the coordinates, and the text comparison efficiency is improved.
Fig. 3 is a flow chart of a text comparison method according to an embodiment of the present application, as shown in the drawing, the step s102 of performing traversal query on the first text on the X-axis and the second text on the Y-axis to obtain matching point information of the same text in the first text and the second text, including:
step s301, performing traversal query on the first text on the X axis and the second text on the Y axis to obtain first matching point information;
specifically, the searching for the first matching point may find out all the same words in the two texts by performing traversal query on the words in the first text on the X-axis and the words in the second text on the Y-axis, and record the coordinates corresponding to the same words, for example, the coordinates corresponding to the a-th word in the first text on the X-axis and the B-th word in the second text on the Y-axis are the same, while the coordinates corresponding to the a-th word in the first text on the X-axis are (a, 0), and the coordinates corresponding to the B-th word in the second text on the Y-axis are (0, B), so that the coordinates corresponding to the same words in the two texts are (a, B), and the coordinates corresponding to the rest of the same words in the two texts can be obtained by the same method; the traversal inquiry can firstly determine a first text in a first text, then, according to the first text, the traversal inquiry is carried out from the first text in a second text, all the characters which are the same as the first text in the first text are found, the coordinates of the characters are recorded, then, the second text in the first text is determined, the traversal inquiry is carried out from the first text in the second text according to the second text, all the characters which are the same as the second text in the first text are found, the coordinates of the characters are recorded, and the like is carried out until all the characters in the first text complete the traversal inquiry;
specifically, after the coordinates of all the same text are found in the first text and the second text, the coordinates closest to the start coordinates, that is, the first matching point coordinates, may be queried according to the coordinates, where the start coordinates depend on the initial mapping positions of the two texts on the X axis and the Y axis, for example, if the first text of the first text and the second text is the corresponding origin, the start coordinates may be the origin, and if the first text of the first text and the second text is not the corresponding origin, the coordinates corresponding to the first text of the first text and the second text may be used as the start coordinates.
Step s302, obtaining a traversal region according to the first matching point information, and performing traversal query on the first text and the second text on the traversal region to obtain the rest matching point information.
Specifically, after the first matching point is obtained, a rectangular area between coordinates corresponding to the last text of the first text and the second text and coordinates corresponding to the first matching point is used as a traversing area, and traversing inquiry is performed on the first text and the second text on the traversing area to obtain information of other matching points. After all the matching point information is obtained from the first text and the second text, the matching point information is stored, and the matching point information comprises coordinates of the matching points and characters corresponding to the matching points.
In this embodiment, the text comparison efficiency can be effectively improved by querying the first matching point and acquiring the traversal region according to the first matching point to acquire the rest matching points.
Fig. 4 is a flow chart of a text comparison method according to an embodiment of the present application, as shown in the drawing, the step s301 of performing traversal query on the first text on the X axis and the second text on the Y axis to obtain first matching point information includes:
step s401, performing traversal query on the first text on the X axis and the second text on the Y axis, and obtaining coordinate points corresponding to the same characters in the first text and the second text;
specifically, the searching for the first matching point may find out all the same words in the two texts by performing traversal query on the words in the first text on the X-axis and the words in the second text on the Y-axis, and record the coordinates corresponding to the same words, for example, the coordinates corresponding to the a-th word in the first text on the X-axis and the B-th word in the second text on the Y-axis are the same, while the coordinates corresponding to the a-th word in the first text on the X-axis are (a, 0), and the coordinates corresponding to the B-th word in the second text on the Y-axis are (0, B), so that the coordinates corresponding to the same words in the two texts are (a, B), and the coordinates corresponding to the rest of the same words in the two texts can be obtained by the same method; the traversal inquiry can firstly determine a first text in a first text, then carry out traversal inquiry from the first text in a second text according to the first text, find out all the characters which are the same as the first text in the first text, record the coordinates of the characters, then determine a second text in the first text, continue to carry out traversal inquiry from the first text in the second text according to the second text, find out all the characters which are the same as the second text in the first text, record the coordinates of the characters, and so on until all the characters in the first text finish the traversal inquiry.
And step s402, searching a coordinate point closest to the origin among the coordinate points corresponding to the same characters, and marking the coordinate point closest to the origin as a first matching point.
Specifically, after the coordinates of all the same text are found in the first text and the second text, the coordinates closest to the initial coordinates may be queried according to the coordinates, the shortest distance may be obtained through Pythagorean theorem calculation, the coordinates closest to the initial coordinates are the first matching point coordinates, the initial coordinates depend on the initial mapping positions of the two texts on the X axis and the Y axis, for example, if the first text of the first text and the second text is the corresponding origin, the initial coordinates may be the origin, and if the first text of the first text and the second text is not the corresponding origin, the coordinates corresponding to the first text of the first text and the second text may be taken as the initial coordinates.
In this embodiment, the shortest distance comparison is performed on the coordinates of all the same characters, so that the first matching point can be quickly obtained, and the text comparison efficiency is effectively improved.
Fig. 5 is a flow chart of a text comparison method according to an embodiment of the present application, as shown in the drawing, the step s302 of obtaining a traversal region according to the first matching point information, and performing traversal query on the first text and the second text on the traversal region to obtain remaining matching point information, where the step includes:
step s501, obtaining coordinate points corresponding to the last text in the first text and the second text, taking a rectangular area between the coordinate points and the coordinate points corresponding to the first matching point as a traversing area, and performing traversing inquiry on the first text and the second text on the traversing area;
specifically, the coordinate point corresponding to the last word in the first text may be obtained from the X axis, the coordinate point corresponding to the last word in the second text may be obtained from the Y axis, the coordinate point corresponding to the last word in the first text and the coordinate point corresponding to the last word in the second text may be obtained according to the coordinate point corresponding to the last word in the first text and the coordinate point corresponding to the last word in the second text, for example, (a, 0) for the coordinate point corresponding to the last word in the first text and (0, B) for the coordinate point corresponding to the last word in the second text, then the coordinate point corresponding to the last word in the two texts is (a, B), and a rectangular area between the coordinate point corresponding to the last word in the two texts and the coordinate point corresponding to the first matching point is used as a traversal area, for example, (C, D) to (a, B) for the traversal area if the coordinate point of the first matching point is (C, D) to (a, B) for the traversal query on the traversal area after determining the traversal area.
Step s502, when a new matching point is obtained, updating the traversal region, and continuing to perform traversal inquiry on the new traversal region until no next matching point appears.
Specifically, when performing the traversal query on the traversal region, a new matching point may be obtained by calculating the shortest distance to the first matching point, when obtaining a new matching point, the traversal region may be redetermined according to step s501, where the new traversal region is determined by the new matching point and the coordinate point corresponding to the last word in the two texts, for example, if the coordinate point of the new matching point is (E, F), the traversal regions are (E, F) to (a, B), and when performing the traversal query on the new traversal region, a next matching point may be obtained by calculating the shortest distance to the new matching point, and so on until no next matching point occurs.
In this embodiment, the range of the traversal query can be reduced, the calculation amount is reduced, and the text comparison efficiency is improved by updating the traversal region.
Fig. 6 is a flow chart of a text comparison method according to an embodiment of the present application, as shown in the drawing, in step s502, when a new matching point is obtained, updating the traversal region, and continuing to perform traversal query on the new traversal region until no next matching point appears, including:
step s601, when a new matching point is obtained, taking a rectangular area between coordinate points corresponding to the last text in the first text and the second text and coordinate points corresponding to the new matching point as a new traversal area;
specifically, when a new matching point is obtained, the traversal region may be redetermined according to step s501, where the new traversal region is determined by the new matching point and the coordinate point corresponding to the last text of the two texts, for example, if the coordinate point of the new matching point is (E, F), the traversal regions are (E, F) to (a, B).
Step s602, performing traversal inquiry on the area except the new matching point on the new traversal area until no next matching point appears.
Specifically, when performing the traversal query on the new traversal region, the next matching point can be obtained by calculating the shortest distance between the new matching point and the new matching point, and so on until no next matching point appears, wherein when performing the traversal query on the new traversal region, the new matching point can be excluded from the traversal region, i.e. the traversal query on the new matching point is not needed.
Specifically, after the query is traversed through circulation, new matching points are continuously obtained, the traversal area is updated according to the new matching points, the shortest distance is calculated according to the shortest distance between the same text coordinates and the last matching point when the next matching point is determined, after the query is traversed and the shortest distance is calculated, if no new matching point is generated at this time, the query is traversed, and the text comparison is finished.
In the embodiment, all the matching point information is acquired by circularly traversing the query, so that the text comparison efficiency can be effectively improved, and the complexity is reduced.
Fig. 7 is a schematic flow chart of a text comparison method according to an embodiment of the present application, as shown in the drawing, the step s103 of counting according to matching point information of the same text in the first text and the second text, to obtain a text comparison result, including:
step s701, counting the number of matching points according to the matching point information of the same text in the first text and the second text;
specifically, after all the matching point information in the first text and the second text is obtained, the number of all the matching points may be counted first.
Step s702, obtaining the text lengths of the first text and the second text, and obtaining a text comparison result according to the smaller text length in the text lengths and the number of the matching points.
Specifically, the text length of the first text and the text length of the second text can be compared to obtain the smaller text length of the two text lengths, if the text lengths of the two texts are the same, the text length of one of the two texts can be arbitrarily selected, and finally, the similarity of the two texts can be obtained by dividing the number of all the matching points by the text length of the text.
In this embodiment, the similarity between two texts is obtained through the number of matching points, so that the complexity of text comparison can be effectively reduced.
A text comparing device structure according to an embodiment of the present application is shown in FIG. 8, and includes:
a text mapping module 801, a matching point query module 802, and a text comparison module 803; the text mapping module 801 is connected with the matching point query module 802, and the matching point query module 802 is connected with the text comparison module 803; the text mapping module 801 is configured to obtain a first text and a second text, respectively convert the first text and the second text into a single line of text, and map the converted first text and second text to an X axis and a Y axis respectively; the matching point query module 802 is configured to perform a traversal query on the first text on the X axis and the second text on the Y axis, so as to obtain matching point information of the same text in the first text and the second text; the text comparison module 803 is configured to obtain a text comparison result by performing statistics according to matching point information of the same text in the first text and the second text.
The embodiment of the application also discloses a computer device, which comprises a memory and a processor, wherein the memory stores computer readable instructions, and the computer readable instructions are executed by one or more processors, so that the one or more processors execute the steps in the text comparison method in the above embodiments.
The embodiment of the application also discloses a storage medium which can be read and written by a processor, and the memory stores computer readable instructions which, when being executed by one or more processors, cause the one or more processors to execute the steps in the text comparison method in the above embodiments.
Those skilled in the art will appreciate that implementing all or part of the above-described methods in accordance with the embodiments may be accomplished by way of a computer program stored in a computer-readable storage medium, which when executed, may comprise the steps of the embodiments of the methods described above. The storage medium may be a nonvolatile storage medium such as a magnetic disk, an optical disk, a Read-Only Memory (ROM), or a random access Memory (Random Access Memory, RAM).
The technical features of the above-described embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above-described embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.
The foregoing examples illustrate only a few embodiments of the application and are described in detail herein without thereby limiting the scope of the application. It should be noted that it will be apparent to those skilled in the art that several variations and modifications can be made without departing from the spirit of the application, which are all within the scope of the application. Accordingly, the scope of protection of the present application is to be determined by the appended claims.

Claims (5)

1. A text comparison method, comprising the steps of:
acquiring a first text and a second text, respectively converting the first text and the second text into single-line characters, and respectively mapping the converted first text and second text to an X axis and a Y axis;
performing traversal query on the first text on the X axis and the second text on the Y axis to obtain matching point information of the same characters in the first text and the second text;
counting according to matching point information of the same characters in the first text and the second text, and obtaining a text comparison result;
the step of performing traversal query on the first text on the X axis and the second text on the Y axis to obtain matching point information of the same text in the first text and the second text, including:
performing traversal query on the first text on the X axis and the second text on the Y axis to obtain first matching point information;
acquiring a traversing area according to the first matching point information, and performing traversing inquiry on the first text and the second text on the traversing area to acquire the rest matching point information;
the step of performing traversal query on the first text on the X axis and the second text on the Y axis to obtain first matching point information includes:
traversing and inquiring the first text on the X axis and the second text on the Y axis to obtain coordinate points corresponding to the same characters in the first text and the second text;
inquiring a coordinate point closest to an origin in the coordinate points corresponding to the same characters, and marking the coordinate point closest to the origin as a first matching point;
the step of obtaining a traversing area according to the first matching point information, and performing traversing query on the first text and the second text on the traversing area to obtain the rest matching point information, including:
acquiring coordinate points corresponding to the last text in the first text and the second text, taking a rectangular area between the coordinate points and the coordinate points corresponding to the first matching points as a traversing area, and performing traversing inquiry on the first text and the second text on the traversing area;
when a new matching point is acquired, updating the traversing area, and continuing traversing inquiry on the new traversing area until no next matching point appears;
when a new matching point is acquired, updating the traversal region, and continuing to perform traversal inquiry on the new traversal region until no next matching point appears, including:
when a new matching point is obtained, taking a rectangular area between the coordinate point corresponding to the last word in the first text and the second text and the coordinate point corresponding to the new matching point as a new traversal area;
performing traversal inquiry on the new traversal region except the new matching points until no next matching points appear;
the step of counting according to the matching point information of the same text in the first text and the second text to obtain a text comparison result comprises the following steps:
counting the number of matching points according to the matching point information of the same characters in the first text and the second text;
acquiring the text lengths of the first text and the second text, and acquiring a text comparison result according to the shorter text length in the text lengths and the number of the matching points;
the obtaining the text lengths of the first text and the second text, and obtaining the text comparison result according to the shorter text length in the text lengths and the number of the matching points comprises:
comparing the text length of the first text with the text length of the second text;
if the text lengths of the two texts are different, determining the shorter text length of the two text lengths as a target text length;
if the text lengths of the two texts are the same, determining the text length of any one text as a target text length;
the similarity of the two texts can be obtained by dividing the number of all matching points by the target text length.
2. The text comparison method of claim 1, wherein mapping the converted first text and the second text to an X-axis and a Y-axis, respectively, comprises:
mapping the converted first text to any quadrant of an X axis, and mapping the converted second text to the same quadrant of a Y axis as the first text;
and the first character of the converted first text corresponds to any coordinate point on the quadrant to which the X axis belongs, and the first character of the converted second text corresponds to any coordinate point on the quadrant to which the Y axis belongs.
3. A text comparison apparatus that performs the text comparison method of claim 1 or claim 2, the text comparison apparatus comprising:
a text mapping module: the method comprises the steps of obtaining a first text and a second text, respectively converting the first text and the second text into single-line characters, and respectively mapping the converted first text and second text to an X axis and a Y axis;
and the matching point query module: the method comprises the steps of setting the first text on an X axis and the second text on a Y axis to carry out traversal inquiry, and obtaining matching point information of the same characters in the first text and the second text;
and a text comparison module: and counting according to the matching point information of the same text in the first text and the second text, and obtaining a text comparison result.
4. A computer device comprising a memory and a processor, the memory having stored therein computer readable instructions which, when executed by one or more of the processors, cause the one or more processors to perform the steps of the text comparison method of claim 1 or claim 2.
5. A storage medium readable by a processor, the storage medium storing computer instructions that, when executed by one or more processors, cause the one or more processors to perform the steps of the text comparison method of claim 1 or claim 2.
CN201910297625.1A 2019-04-15 2019-04-15 Text comparison method, apparatus, computer device and storage medium Active CN110147429B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910297625.1A CN110147429B (en) 2019-04-15 2019-04-15 Text comparison method, apparatus, computer device and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910297625.1A CN110147429B (en) 2019-04-15 2019-04-15 Text comparison method, apparatus, computer device and storage medium

Publications (2)

Publication Number Publication Date
CN110147429A CN110147429A (en) 2019-08-20
CN110147429B true CN110147429B (en) 2023-08-15

Family

ID=67588900

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910297625.1A Active CN110147429B (en) 2019-04-15 2019-04-15 Text comparison method, apparatus, computer device and storage medium

Country Status (1)

Country Link
CN (1) CN110147429B (en)

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102063510A (en) * 2011-01-17 2011-05-18 珠海全志科技有限公司 Method for searching matched character string
CN106980870A (en) * 2016-12-30 2017-07-25 中国银联股份有限公司 Text matches degree computational methods between short text
CN107085568A (en) * 2017-03-29 2017-08-22 腾讯科技(深圳)有限公司 A kind of text similarity method of discrimination and device
CN107315817A (en) * 2017-06-30 2017-11-03 华自科技股份有限公司 Electronic drawing text matching technique, device, storage medium and computer equipment
CN107679219A (en) * 2017-10-19 2018-02-09 广州视睿电子科技有限公司 Matching process and device, interactive intelligent tablet computer and storage medium
CN108170684A (en) * 2018-01-22 2018-06-15 京东方科技集团股份有限公司 Text similarity computing method and system, data query system and computer product
CN108182222A (en) * 2017-12-26 2018-06-19 东软集团股份有限公司 A kind of text matching technique and device
CN108920580A (en) * 2018-06-25 2018-11-30 腾讯科技(深圳)有限公司 Image matching method, device, storage medium and terminal

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102063510A (en) * 2011-01-17 2011-05-18 珠海全志科技有限公司 Method for searching matched character string
CN106980870A (en) * 2016-12-30 2017-07-25 中国银联股份有限公司 Text matches degree computational methods between short text
CN107085568A (en) * 2017-03-29 2017-08-22 腾讯科技(深圳)有限公司 A kind of text similarity method of discrimination and device
CN107315817A (en) * 2017-06-30 2017-11-03 华自科技股份有限公司 Electronic drawing text matching technique, device, storage medium and computer equipment
CN107679219A (en) * 2017-10-19 2018-02-09 广州视睿电子科技有限公司 Matching process and device, interactive intelligent tablet computer and storage medium
CN108182222A (en) * 2017-12-26 2018-06-19 东软集团股份有限公司 A kind of text matching technique and device
CN108170684A (en) * 2018-01-22 2018-06-15 京东方科技集团股份有限公司 Text similarity computing method and system, data query system and computer product
CN108920580A (en) * 2018-06-25 2018-11-30 腾讯科技(深圳)有限公司 Image matching method, device, storage medium and terminal

Also Published As

Publication number Publication date
CN110147429A (en) 2019-08-20

Similar Documents

Publication Publication Date Title
CN107609098B (en) Searching method and device
KR100903961B1 (en) Indexing And Searching Method For High-Demensional Data Using Signature File And The System Thereof
WO2022033252A1 (en) Video matching method and apparatus, and blockchain-based infringement evidence storage method and apparatus
US10649997B2 (en) Method, system and computer program product for performing numeric searches related to biometric information, for finding a matching biometric identifier in a biometric database
CN107015985B (en) Data storage and acquisition method and device
CN108304409B (en) Carry-based data frequency estimation method of Sketch data structure
KR102316271B1 (en) Method for managing of memory address mapping table for data storage device
CN108460123B (en) High-dimensional data retrieval method, computer device, and storage medium
CN108846016A (en) A kind of searching algorithm towards Chinese word segmentation
CN106933824B (en) Method and device for determining document set similar to target document in multiple documents
CN107807989B (en) Small file processing method and device
CN113065036B (en) Method and device for measuring performance of space supporting point and related components
CN112699195B (en) Geospatial data processing method, device, computer equipment and storage medium
CN110147429B (en) Text comparison method, apparatus, computer device and storage medium
CN111143587B (en) Data retrieval method and device and electronic equipment
CN110020001A (en) Storage, querying method and the corresponding equipment of string data
CN106897315B (en) KV item validity acquisition method and device
TW202004521A (en) LSM tree optimization method and device and computer equipment
US20230168830A1 (en) Method and apparatus for data access of nand flash file, and storage medium
US8805891B2 (en) B-tree ordinal approximation
CN110941730B (en) Retrieval method and device based on human face feature data migration
US20220100385A1 (en) Data reduction in block-based storage systems using content-based block alignment
CN110413716B (en) Data storage and data query method and device and electronic equipment
CN111460325B (en) POI searching method, device and equipment
CN114238334A (en) Heterogeneous data encoding method and device, heterogeneous data decoding method and device, computer equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant